- See Also
-
Links
- “Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review”, Prakriya et al 2024
- “Improving Pretraining Data Using Perplexity Correlations”, Thrush et al 2024
- “Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process”, Ye et al 2024
- “Super(ficial)-Alignment: Strong Models May Deceive Weak Models in Weak-To-Strong Generalization”, Yang et al 2024
- “The Scaling Law in Stellar Light Curves”, Pan et al 2024
- “From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step”, Deng et al 2024
- “Grokked Transformers Are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization”, Wang et al 2024
- “Fishing for Magikarp: Automatically Detecting Under-Trained Tokens in Large Language Models”, Land & Bartolo 2024
- “Test-Time Augmentation to Solve ARC”, Cole 2024
- “Σ-GPTs: A New Approach to Autoregressive Models”, Pannatier et al 2024
- “Language Imbalance Can Boost Cross-Lingual Generalization”, Schäfer et al 2024
- “Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws”, Allen-Zhu & Li 2024
- “Do Language Models Plan Ahead for Future Tokens?”, Wu et al 2024
- “Neural Redshift: Random Networks Are Not Random Functions”, Teney et al 2024
- “A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task”, Brinkmann et al 2024
- “Mission: Impossible Language Models”, Kallini et al 2024
- “A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity”, Lee et al 2024
- “Language Model Alignment With Elastic Reset”, Noukhovitch et al 2023
- “Eliciting Language Model Behaviors Using Reverse Language Models”, Pfau et al 2023
- “Controlled Text Generation via Language Model Arithmetic”, Dekoninck et al 2023
- “Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks”, Ramesh et al 2023
- “Tokenizer Choice For LLM Training: Negligible or Crucial?”, Ali et al 2023
- “What OpenAI Really Wants”, Levy 2023
- “Linearity of Relation Decoding in Transformer Language Models”, Hernandez et al 2023
- “Accelerating LLM Inference With Staged Speculative Decoding”, Spector & Re 2023
- “Stay on Topic With Classifier-Free Guidance”, Sanchez et al 2023
- “Likelihood-Based Diffusion Language Models”, Gulrajani & Hashimoto 2023
- “Mimetic Initialization of Self-Attention Layers”, Trockman & Kolter 2023
- “How Does GPT-2 Compute Greater-Than?: Interpreting Mathematical Abilities in a Pre-Trained Language Model”, Hanna et al 2023
- “LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”, Wu et al 2023
- “Tractable Control for Autoregressive Language Generation”, Zhang et al 2023
- “How Does In-Context Learning Help Prompt Tuning?”, Sun et al 2023
- “MarioGPT: Open-Ended Text2Level Generation through Large Language Models”, Sudhakaran et al 2023
- “GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of AI CPA Capabilities”, Bommarito et al 2023
- “Geographic and Geopolitical Biases of Language Models”, Faisal & Anastasopoulos 2022
- “Structured Prompting: Scaling In-Context Learning to 1,000 Examples”, Hao et al 2022
- “Contrastive Decoding: Open-Ended Text Generation As Optimization”, Li et al 2022
- “Contrastive Search Is What You Need For Neural Text Generation”, Su & Collier 2022
- “Perfectly Secure Steganography Using Minimum Entropy Coupling”, Witt et al 2022
- “Fine-Tuning Pre-Trained Transformers into Decaying Fast Weights”, Mao 2022
- “Semantic Reconstruction of Continuous Language from Non-Invasive Brain Recordings”, Tang et al 2022
- “Deep Language Algorithms Predict Semantic Comprehension from Brain Activity”, Caucheteux et al 2022
- “Correspondence between the Layered Structure of Deep Language Models and Temporal Structure of Natural Language Processing in the Human Brain”, Goldstein et al 2022
- “DIRECTOR: Generator-Classifiers For Supervised Language Modeling”, Arora et al 2022
- “Offline RL for Natural Language Generation With Implicit Language Q Learning”, Snell et al 2022
- “FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness”, Dao et al 2022
- “Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models”, Tirumala et al 2022
- “AdaVAE: Exploring Adaptive GPT-2s in Variational Autoencoders for Language Modeling”, Tu et al 2022
- “FLOTA: An Embarrassingly Simple Method to Mitigate Und-Es-Ira-Ble Properties of Pretrained Language Model Tokenizers”, Hofmann et al 2022
- “Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space”, Geva et al 2022
- “Time Control: Language Modeling via Stochastic Processes”, Wang et al 2022
- “Quantifying and Alleviating Political Bias in Language Models”, Liu et al 2022c
- “Controllable Natural Language Generation With Contrastive Prefixes”, Qian et al 2022
- “LID: Pre-Trained Language Models for Interactive Decision-Making”, Li et al 2022
- “Typical Decoding for Natural Language Generation”, Meister et al 2022
- “Can Wikipedia Help Offline Reinforcement Learning?”, Reid et al 2022
- “ClipCap: CLIP Prefix for Image Captioning”, Mokady et al 2021
- “Mapping Language Models to Grounded Conceptual Spaces”, Patel & Pavlick 2021
- “Relating Neural Text Degeneration to Exposure Bias”, Chiang & Chen 2021
- “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Lin et al 2021
- “Scarecrow: A Framework for Scrutinizing Machine Text”, Dou et al 2021
- “LoRA: Low-Rank Adaptation of Large Language Models”, Hu et al 2021
- “Let the Algorithm Speak: How to Use Neural Networks for Automatic Item Generation in Psychological Scale Development”, Götz et al 2021
- “GPT-J-6B: 6B JAX-Based Transformer”, EleutherAI 2021
- “LHOPT: A Generalizable Approach to Learning Optimizers”, Almeida et al 2021
- “A Hierarchy of Linguistic Predictions during Natural Language Comprehension”, Heilbron et al 2021
- “Why Are Tar.xz Files 15× Smaller When Using Python’s Tar Library Compared to MacOS Tar?”, Lindestøkke 2021
- “The Pile: An 800GB Dataset of Diverse Text for Language Modeling”, Gao et al 2021
- “Prefix-Tuning: Optimizing Continuous Prompts for Generation”, Li & Liang 2021
- “Bot-Adversarial Dialogue for Safe Conversational Agents”, Xu et al 2021
- “Extracting Training Data from Large Language Models”, Carlini et al 2020
- “NeuroLogic Decoding: (Un)supervised Neural Text Generation With Predicate Logic Constraints”, Lu et al 2020
- “Interacting With GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation”, Geerlings & Meroño-Peñuela 2020
- “GeDi: Generative Discriminator Guided Sequence Generation”, Krause et al 2020
- “Generative Models Are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”, Bahri et al 2020
- “Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”, Yoshida et al 2020
- “The Chess Transformer: Mastering Play Using Generative Language Models”, Noever et al 2020
- “True_poetry: Poetry Generator by GPT-2 With Meter and Rhyme Constraints”, Summers-Stay 2020
- “TREC CAsT 2019: The Conversational Assistance Track Overview”, Dalton et al 2020
- “OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for ‘Most Tedious Game in History’”, Whalen 2020
- “Reducing Non-Normative Text Generation from Language Models”, Peng et al 2020
- “How Novelists Use Generative Language Models: An Exploratory User Study”, Calderwood et al 2020
- “Writing the Next American Hit: Using GPT-2 to Explore the Possibility of Creating Successful AI-Generated Song Lyrics Possibility of Creating Successful AI-Generated Song Lyric”, Barrio 2020
- “Controlling Text Generation With Plug and Play Language Models”, Liu et al 2019
- “AI Dungeon 2”, Walton 2019
- “Release Strategies and the Social Impacts of Language Models”, Solaiman et al 2019
- “GPT-2: 1.5B Release”, Solaiman et al 2019
- “GPT-2 Folk Music”, Gwern & Presser 2019
- “Fine-Tuning GPT-2 from Human Preferences § Bugs Can Optimize for Bad Behavior”, Ziegler et al 2019
- “Fine-Tuning GPT-2 from Human Preferences”, Ziegler et al 2019
- “Fine-Tuning Language Models from Human Preferences”, Ziegler et al 2019
- “Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism”, Shoeybi et al 2019
- “Lm-Human-Preferences”, Ziegler et al 2019
- “How To Make Custom AI-Generated Text With GPT-2”, Woolf 2019
- “OpenGPT-2: We Replicated GPT-2-1.5b Because You Can Too”, Gokaslan & Cohen 2019
- “Universal Adversarial Triggers for Attacking and Analyzing NLP”, Wallace et al 2019
- “GPT-2: 6-Month Follow-Up”, OpenAI 2019
- “MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, ADLR 2019
- “Addendum: Evaluation of My Model”, Leahy 2019
- “Replicating GPT-2-1.5B”, Leahy 2019
- “Unraveling the JPEG: JPEG Images Are Everywhere in Our Digital Lives, but behind the Veil of Familiarity Lie Algorithms That Remove Details That Are Imperceptible to the Human Eye. This Produces the Highest Visual Quality With the Smallest File Size—But What Does That Look Like? Let’s See What Our Eyes Can’t See!”, Shehata 2019
- “Some Pretty Impressive Machine-Learning Generated Poetry Courtesy of GPT-2”
- “LM Explorer (alpha)”, Intelligence 2019
- “GPT-2 As Step Toward General Intelligence”, Alexander 2019
- “Language Models Are Unsupervised Multitask Learners”, Radford et al 2019
- “Better Language Models and Their Implications”, Radford et al 2019
- “Talk To Transformer”, King 2019
- “Notes on a New Philosophy of Empirical Science”, Burfoot 2011
- “Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers”, Ren et al 2010
- “Timm S. Mueller”
- “The Difficulties of Text Generation Using Autoregressive Language Models: A Brief Overview”, Gao 2024
- “Let’s Reproduce GPT-2 (1.6B): One 8×H100 Node, 24 Hours, $672”
- “Alec Radford”
- “TensorFlow Research Cloud (TRC): Accelerate Your Cutting-Edge Machine Learning Research With Free Cloud TPUs”, TRC 2024
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography
See Also
Links
“Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review”, Prakriya et al 2024
Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review
“Improving Pretraining Data Using Perplexity Correlations”, Thrush et al 2024
“Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process”, Ye et al 2024
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
“Super(ficial)-Alignment: Strong Models May Deceive Weak Models in Weak-To-Strong Generalization”, Yang et al 2024
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
“The Scaling Law in Stellar Light Curves”, Pan et al 2024
“From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step”, Deng et al 2024
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
“Grokked Transformers Are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization”, Wang et al 2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
“Fishing for Magikarp: Automatically Detecting Under-Trained Tokens in Large Language Models”, Land & Bartolo 2024
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
“Test-Time Augmentation to Solve ARC”, Cole 2024
“Σ-GPTs: A New Approach to Autoregressive Models”, Pannatier et al 2024
“Language Imbalance Can Boost Cross-Lingual Generalization”, Schäfer et al 2024
“Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws”, Allen-Zhu & Li 2024
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
“Do Language Models Plan Ahead for Future Tokens?”, Wu et al 2024
“Neural Redshift: Random Networks Are Not Random Functions”, Teney et al 2024
“A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task”, Brinkmann et al 2024
A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task
“Mission: Impossible Language Models”, Kallini et al 2024
“A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity”, Lee et al 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
“Language Model Alignment With Elastic Reset”, Noukhovitch et al 2023
“Eliciting Language Model Behaviors Using Reverse Language Models”, Pfau et al 2023
Eliciting Language Model Behaviors using Reverse Language Models
“Controlled Text Generation via Language Model Arithmetic”, Dekoninck et al 2023
“Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks”, Ramesh et al 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
“Tokenizer Choice For LLM Training: Negligible or Crucial?”, Ali et al 2023
“What OpenAI Really Wants”, Levy 2023
“Linearity of Relation Decoding in Transformer Language Models”, Hernandez et al 2023
Linearity of Relation Decoding in Transformer Language Models
“Accelerating LLM Inference With Staged Speculative Decoding”, Spector & Re 2023
“Stay on Topic With Classifier-Free Guidance”, Sanchez et al 2023
“Likelihood-Based Diffusion Language Models”, Gulrajani & Hashimoto 2023
“Mimetic Initialization of Self-Attention Layers”, Trockman & Kolter 2023
“How Does GPT-2 Compute Greater-Than?: Interpreting Mathematical Abilities in a Pre-Trained Language Model”, Hanna et al 2023
“LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”, Wu et al 2023
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
“Tractable Control for Autoregressive Language Generation”, Zhang et al 2023
“How Does In-Context Learning Help Prompt Tuning?”, Sun et al 2023
“MarioGPT: Open-Ended Text2Level Generation through Large Language Models”, Sudhakaran et al 2023
MarioGPT: Open-Ended Text2Level Generation through Large Language Models
“GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of AI CPA Capabilities”, Bommarito et al 2023
GPT-3 as Knowledge Worker: A Zero-Shot Evaluation of AI CPA Capabilities
“Geographic and Geopolitical Biases of Language Models”, Faisal & Anastasopoulos 2022
“Structured Prompting: Scaling In-Context Learning to 1,000 Examples”, Hao et al 2022
Structured Prompting: Scaling In-Context Learning to 1,000 Examples
“Contrastive Decoding: Open-Ended Text Generation As Optimization”, Li et al 2022
Contrastive Decoding: Open-ended Text Generation as Optimization
“Contrastive Search Is What You Need For Neural Text Generation”, Su & Collier 2022
Contrastive Search Is What You Need For Neural Text Generation
“Perfectly Secure Steganography Using Minimum Entropy Coupling”, Witt et al 2022
Perfectly Secure Steganography Using Minimum Entropy Coupling
“Fine-Tuning Pre-Trained Transformers into Decaying Fast Weights”, Mao 2022
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
“Semantic Reconstruction of Continuous Language from Non-Invasive Brain Recordings”, Tang et al 2022
Semantic reconstruction of continuous language from non-invasive brain recordings
“Deep Language Algorithms Predict Semantic Comprehension from Brain Activity”, Caucheteux et al 2022
Deep language algorithms predict semantic comprehension from brain activity
“Correspondence between the Layered Structure of Deep Language Models and Temporal Structure of Natural Language Processing in the Human Brain”, Goldstein et al 2022
“DIRECTOR: Generator-Classifiers For Supervised Language Modeling”, Arora et al 2022
DIRECTOR: Generator-Classifiers For Supervised Language Modeling
“Offline RL for Natural Language Generation With Implicit Language Q Learning”, Snell et al 2022
Offline RL for Natural Language Generation with Implicit Language Q Learning
“FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness”, Dao et al 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
“Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models”, Tirumala et al 2022
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
“AdaVAE: Exploring Adaptive GPT-2s in Variational Autoencoders for Language Modeling”, Tu et al 2022
AdaVAE: Exploring Adaptive GPT-2s in Variational Autoencoders for Language Modeling
“FLOTA: An Embarrassingly Simple Method to Mitigate Und-Es-Ira-Ble Properties of Pretrained Language Model Tokenizers”, Hofmann et al 2022
“Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space”, Geva et al 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
“Time Control: Language Modeling via Stochastic Processes”, Wang et al 2022
“Quantifying and Alleviating Political Bias in Language Models”, Liu et al 2022c
Quantifying and alleviating political bias in language models
“Controllable Natural Language Generation With Contrastive Prefixes”, Qian et al 2022
Controllable Natural Language Generation with Contrastive Prefixes
“LID: Pre-Trained Language Models for Interactive Decision-Making”, Li et al 2022
LID: Pre-Trained Language Models for Interactive Decision-Making
“Typical Decoding for Natural Language Generation”, Meister et al 2022
“Can Wikipedia Help Offline Reinforcement Learning?”, Reid et al 2022
“ClipCap: CLIP Prefix for Image Captioning”, Mokady et al 2021
“Mapping Language Models to Grounded Conceptual Spaces”, Patel & Pavlick 2021
“Relating Neural Text Degeneration to Exposure Bias”, Chiang & Chen 2021
“TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Lin et al 2021
“Scarecrow: A Framework for Scrutinizing Machine Text”, Dou et al 2021
“LoRA: Low-Rank Adaptation of Large Language Models”, Hu et al 2021
“Let the Algorithm Speak: How to Use Neural Networks for Automatic Item Generation in Psychological Scale Development”, Götz et al 2021
“GPT-J-6B: 6B JAX-Based Transformer”, EleutherAI 2021
“LHOPT: A Generalizable Approach to Learning Optimizers”, Almeida et al 2021
“A Hierarchy of Linguistic Predictions during Natural Language Comprehension”, Heilbron et al 2021
A hierarchy of linguistic predictions during natural language comprehension
“Why Are Tar.xz Files 15× Smaller When Using Python’s Tar Library Compared to MacOS Tar?”, Lindestøkke 2021
Why are tar.xz files 15× smaller when using Python’s tar library compared to macOS tar?
“The Pile: An 800GB Dataset of Diverse Text for Language Modeling”, Gao et al 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
“Prefix-Tuning: Optimizing Continuous Prompts for Generation”, Li & Liang 2021
“Bot-Adversarial Dialogue for Safe Conversational Agents”, Xu et al 2021
“Extracting Training Data from Large Language Models”, Carlini et al 2020
“NeuroLogic Decoding: (Un)supervised Neural Text Generation With Predicate Logic Constraints”, Lu et al 2020
NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints
“Interacting With GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation”, Geerlings & Meroño-Peñuela 2020
Interacting with GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation
“GeDi: Generative Discriminator Guided Sequence Generation”, Krause et al 2020
“Generative Models Are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”, Bahri et al 2020
Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study
“Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”, Yoshida et al 2020
Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size
“The Chess Transformer: Mastering Play Using Generative Language Models”, Noever et al 2020
The Chess Transformer: Mastering Play using Generative Language Models
“True_poetry: Poetry Generator by GPT-2 With Meter and Rhyme Constraints”, Summers-Stay 2020
true_poetry: Poetry generator by GPT-2 with meter and rhyme constraints
“TREC CAsT 2019: The Conversational Assistance Track Overview”, Dalton et al 2020
TREC CAsT 2019: The Conversational Assistance Track Overview
“OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for ‘Most Tedious Game in History’”, Whalen 2020
OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for ‘Most Tedious Game in History’
“Reducing Non-Normative Text Generation from Language Models”, Peng et al 2020
“How Novelists Use Generative Language Models: An Exploratory User Study”, Calderwood et al 2020
How Novelists Use Generative Language Models: An Exploratory User Study
“Writing the Next American Hit: Using GPT-2 to Explore the Possibility of Creating Successful AI-Generated Song Lyrics Possibility of Creating Successful AI-Generated Song Lyric”, Barrio 2020
“Controlling Text Generation With Plug and Play Language Models”, Liu et al 2019
Controlling Text Generation with Plug and Play Language Models
“AI Dungeon 2”, Walton 2019
“Release Strategies and the Social Impacts of Language Models”, Solaiman et al 2019
Release Strategies and the Social Impacts of Language Models
“GPT-2: 1.5B Release”, Solaiman et al 2019
“GPT-2 Folk Music”, Gwern & Presser 2019
“Fine-Tuning GPT-2 from Human Preferences § Bugs Can Optimize for Bad Behavior”, Ziegler et al 2019
Fine-Tuning GPT-2 from Human Preferences § Bugs can optimize for bad behavior
“Fine-Tuning GPT-2 from Human Preferences”, Ziegler et al 2019
“Fine-Tuning Language Models from Human Preferences”, Ziegler et al 2019
“Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism”, Shoeybi et al 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
“Lm-Human-Preferences”, Ziegler et al 2019
“How To Make Custom AI-Generated Text With GPT-2”, Woolf 2019
“OpenGPT-2: We Replicated GPT-2-1.5b Because You Can Too”, Gokaslan & Cohen 2019
“Universal Adversarial Triggers for Attacking and Analyzing NLP”, Wallace et al 2019
Universal Adversarial Triggers for Attacking and Analyzing NLP
“GPT-2: 6-Month Follow-Up”, OpenAI 2019
“MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, ADLR 2019
MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism
“Addendum: Evaluation of My Model”, Leahy 2019
“Replicating GPT-2-1.5B”, Leahy 2019
“Unraveling the JPEG: JPEG Images Are Everywhere in Our Digital Lives, but behind the Veil of Familiarity Lie Algorithms That Remove Details That Are Imperceptible to the Human Eye. This Produces the Highest Visual Quality With the Smallest File Size—But What Does That Look Like? Let’s See What Our Eyes Can’t See!”, Shehata 2019
“Some Pretty Impressive Machine-Learning Generated Poetry Courtesy of GPT-2”
Some pretty impressive machine-learning generated poetry courtesy of GPT-2:
“LM Explorer (alpha)”, Intelligence 2019
“GPT-2 As Step Toward General Intelligence”, Alexander 2019
“Language Models Are Unsupervised Multitask Learners”, Radford et al 2019
“Better Language Models and Their Implications”, Radford et al 2019
“Talk To Transformer”, King 2019
“Notes on a New Philosophy of Empirical Science”, Burfoot 2011
“Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers”, Ren et al 2010
Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers
“Timm S. Mueller”
“The Difficulties of Text Generation Using Autoregressive Language Models: A Brief Overview”, Gao 2024
The Difficulties of Text Generation using Autoregressive Language Models: A Brief Overview:
“Let’s Reproduce GPT-2 (1.6B): One 8×H100 Node, 24 Hours, $672”
Let’s reproduce GPT-2 (1.6B): one 8×H100 node, 24 hours, $672
“Alec Radford”
“TensorFlow Research Cloud (TRC): Accelerate Your Cutting-Edge Machine Learning Research With Free Cloud TPUs”, TRC 2024
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
language-alignment
optimizers-scaling
fine-tuning
language-comprehension
text-generation
Wikipedia
Miscellaneous
-
/doc/ai/nn/transformer/gpt/2/2020-nadeem-figure1-gpt2samplingqualityvsdiversity.png
: -
View External Link:
-
https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce
-
https://www.aiweirdness.com/d-and-d-character-bios-now-making-19-03-15/
: -
https://www.lesswrong.com/posts/ukTLGe5CQq9w8FMne/inducing-unprompted-misalignment-in-llms
-
https://www.reddit.com/r/mlscaling/comments/1d3a793/andrej_karpathy_gpt2_124m_in_llmc_in_90_minutes/
:
Bibliography
-
https://arxiv.org/abs/2405.14838
: “From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step”, -
https://arxiv.org/abs/2405.15071
: “Grokked Transformers Are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization”, -
https://lab42.global/community-interview-jack-cole/
: “Test-Time Augmentation to Solve ARC”, -
https://arxiv.org/abs/2312.07551
: “Language Model Alignment With Elastic Reset”, -
https://www.wired.com/story/what-openai-really-wants/
: “What OpenAI Really Wants”, -
https://arxiv.org/abs/2306.17806#eleutherai
: “Stay on Topic With Classifier-Free Guidance”, -
https://arxiv.org/abs/2305.09828
: “Mimetic Initialization of Self-Attention Layers”, -
https://arxiv.org/abs/2302.05981
: “MarioGPT: Open-Ended Text2Level Generation through Large Language Models”, -
https://arxiv.org/abs/2301.04408
: “GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of AI CPA Capabilities”, -
https://arxiv.org/abs/2210.15097
: “Contrastive Decoding: Open-Ended Text Generation As Optimization”, -
https://arxiv.org/abs/2210.14140
: “Contrastive Search Is What You Need For Neural Text Generation”, -
https://arxiv.org/abs/2210.04243
: “Fine-Tuning Pre-Trained Transformers into Decaying Fast Weights”, -
https://www.nature.com/articles/s41598-022-20460-9
: “Deep Language Algorithms Predict Semantic Comprehension from Brain Activity”, -
https://arxiv.org/abs/2205.14135
: “FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness”, -
https://aclanthology.org/2022.acl-short.43.pdf
: “FLOTA: An Embarrassingly Simple Method to Mitigate Und-Es-Ira-Ble Properties of Pretrained Language Model Tokenizers”, -
2022-liu-3.pdf
: “Quantifying and Alleviating Political Bias in Language Models”, -
https://arxiv.org/abs/2111.09734
: “ClipCap: CLIP Prefix for Image Captioning”, -
https://openreview.net/forum?id=gJcEM8sxHK
: “Mapping Language Models to Grounded Conceptual Spaces”, -
https://arxiv.org/abs/2109.07958
: “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, -
https://arxiv.org/abs/2107.01294#allen
: “Scarecrow: A Framework for Scrutinizing Machine Text”, -
https://arxiv.org/abs/2106.09685#microsoft
: “LoRA: Low-Rank Adaptation of Large Language Models”, -
https://osf.io/preprints/psyarxiv/m6s28/
: “Let the Algorithm Speak: How to Use Neural Networks for Automatic Item Generation in Psychological Scale Development”, -
https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/
: “GPT-J-6B: 6B JAX-Based Transformer”, -
https://arxiv.org/abs/2106.00958#openai
: “LHOPT: A Generalizable Approach to Learning Optimizers”, -
https://arxiv.org/abs/2101.00027#eleutherai
: “The Pile: An 800GB Dataset of Diverse Text for Language Modeling”, -
https://arxiv.org/abs/2101.00190
: “Prefix-Tuning: Optimizing Continuous Prompts for Generation”, -
https://aclanthology.org/2021.naacl-main.235.pdf#facebook
: “Bot-Adversarial Dialogue for Safe Conversational Agents”, -
https://arxiv.org/abs/2003.13624
: “TREC CAsT 2019: The Conversational Assistance Track Overview”, -
https://www.newsweek.com/openai-text-generator-gpt-2-video-game-walkthrough-most-tedious-1488334
: “OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for ‘Most Tedious Game in History’”, -
2020-calderwood.pdf
: “How Novelists Use Generative Language Models: An Exploratory User Study”, -
https://www.uber.com/blog/pplm/
: “Controlling Text Generation With Plug and Play Language Models”, -
https://play.aidungeon.com/main/home
: “AI Dungeon 2”, -
gpt-2-music
: “GPT-2 Folk Music”, -
https://openai.com/research/fine-tuning-gpt-2
: “Fine-Tuning GPT-2 from Human Preferences”, -
https://arxiv.org/abs/1909.08053#nvidia
: “Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism”, -
https://minimaxir.com/2019/09/howto-gpt2/
: “How To Make Custom AI-Generated Text With GPT-2”, -
https://medium.com/@vanya_cohen/opengpt-2-we-replicated-gpt-2-because-you-can-too-45e34e6d36dc
: “OpenGPT-2: We Replicated GPT-2-1.5b Because You Can Too”, -
https://nv-adlr.github.io/MegatronLM
: “MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, -
https://medium.com/@NPCollapse/replicating-gpt2-1-5b-86454a7f26af
: “Replicating GPT-2-1.5B”, -
https://openai.com/index/better-language-models/
: “Better Language Models and Their Implications”, -
https://sites.research.google/trc/
: “TensorFlow Research Cloud (TRC): Accelerate Your Cutting-Edge Machine Learning Research With Free Cloud TPUs”,