‘GPT’ tag

See Also
Gwern
Links
Miscellaneous
Bibliography

See Also

Gwern

“GPT-3 Semantic Derealization”, Gwern 2024

GPT-3 Semantic Derealization

“Research Ideas”, Gwern 2017

Research Ideas

“You Should Write More Online—It’s Still a Good Time”, Gwern 2024

You should write more online—it’s still a good time

“Machine Learning Scaling”, Gwern 2021

Machine Learning Scaling

Links

“Metadata Conditioning Accelerates Language Model Pre-Training”, Gao et al 2025

Metadata Conditioning Accelerates Language Model Pre-training

“How AI Is Unlocking Ancient Texts: From Deciphering Burnt Roman Scrolls to Reading Crumbling Cuneiform Tablets, Neural Networks Could Give Researchers More Data Than They’ve Had in Centuries”, Marchant 2024

How AI is unlocking ancient texts: From deciphering burnt Roman scrolls to reading crumbling cuneiform tablets, neural networks could give researchers more data than they’ve had in centuries

“Continuous Autoregressive Models With Noise Augmentation Avoid Error Accumulation”, Pasini et al 2024

Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation

“Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?”, Yang et al 2024

Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

“2:4 Sparse Llama: Smaller Models for Efficient GPU Inference”, Kurtić et al 2024

2:4 Sparse Llama: Smaller Models for Efficient GPU Inference

“Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?”, Jeong et al 2024

Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?

“Model Equality Testing: Which Model Is This API Serving?”, Gao et al 2024

Model Equality Testing: Which Model Is This API Serving?

“Centaur: a Foundation Model of Human Cognition”, Binz et al 2024

Centaur: a foundation model of human cognition

“Do LLMs Estimate Uncertainty Well in Instruction-Following?”, Heo et al 2024

Do LLMs estimate uncertainty well in instruction-following?

“Interpretable Contrastive Monte Carlo Tree Search Reasoning”, Gao et al 2024

Interpretable Contrastive Monte Carlo Tree Search Reasoning

“NGPT: Normalized Transformer With Representation Learning on the Hypersphere”, Loshchilov et al 2024

nGPT: Normalized Transformer with Representation Learning on the Hypersphere

“LLM Applications I Want To See”, Constantin 2024

LLM Applications I Want To See⁠:

View HTML:

/doc/www/www.greaterwrong.com/994c2f94d62a984842ed3fa41412926dccca6241.html

“Ensemble Everything Everywhere: Multi-Scale Aggregation for Adversarial Robustness”, Fort & Lakshminarayanan 2024

Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness

“Token Erasure As a Footprint of Implicit Vocabulary Items in LLMs”, Feucht et al 2024

Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

“Resolving Discrepancies in Compute-Optimal Scaling of Language Models”, Porian et al 2024

Resolving Discrepancies in Compute-Optimal Scaling of Language Models

“When Parts Are Greater Than Sums: Individual LLM Components Can Outperform Full Models”, Chang et al 2024

When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models

“Nemotron-4 340B Technical Report”, Adler et al 2024

Nemotron-4 340B Technical Report

“DataComp-LM: In Search of the next Generation of Training Sets for Language Models”, Li et al 2024

DataComp-LM: In search of the next generation of training sets for language models

“How Do Large Language Models Acquire Factual Knowledge During Pretraining?”, Chang et al 2024

How Do Large Language Models Acquire Factual Knowledge During Pretraining?

“Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs”, Hans et al 2024

Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs

“Discovering Preference Optimization Algorithms With and for Large Language Models”, Lu et al 2024

Discovering Preference Optimization Algorithms with and for Large Language Models

“MCTSr: Accessing GPT-4 Level Mathematical Olympiad Solutions via Monte Carlo Tree Self-Refine With LLaMA-3-8B”, Zhang et al 2024

MCTSr: Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMA-3-8B

“For Chinese Students, the New Tactic Against AI Checks: More AI”, Qitong 2024

For Chinese Students, the New Tactic Against AI Checks: More AI

“MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series”, Zhang et al 2024

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

“Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass”, Shen et al 2024

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

“Transformers Can Do Arithmetic With the Right Embeddings”, McLeish et al 2024

Transformers Can Do Arithmetic with the Right Embeddings

“SpaceByte: Towards Deleting Tokenization from Large Language Modeling”, Slagle 2024

SpaceByte: Towards Deleting Tokenization from Large Language Modeling

“Towards Smaller, Faster Decoder-Only Transformers: Architectural Variants and Their Implications”, Suresh & P 2024

Towards smaller, faster decoder-only transformers: Architectural variants and their implications

“Design of Highly Functional Genome Editors by Modeling the Universe of CRISPR-Cas Sequences”, Ruffolo et al 2024

Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences

“From r to Q^✱: Your Language Model Is Secretly a Q-Function”, Rafailov et al 2024

From r to Q^✱: Your Language Model is Secretly a Q-Function

“CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models”, Lee et al 2024

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

“CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack Of) Multicultural Knowledge”, Chiu et al 2024

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack of) Multicultural Knowledge

“Training LLMs over Neurally Compressed Text”, Lester et al 2024

Training LLMs over Neurally Compressed Text

“Reverse Training to Nurse the Reversal Curse”, Golovneva et al 2024

Reverse Training to Nurse the Reversal Curse

“Evolutionary Optimization of Model Merging Recipes”, Akiba et al 2024

Evolutionary Optimization of Model Merging Recipes

“Yi: Open Foundation Models by 01.AI”, Young et al 2024

Yi: Open Foundation Models by 01.AI

“Actions Speak Louder Than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU)”, Zhai et al 2024

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU)

“Fast Adversarial Attacks on Language Models In One GPU Minute”, Sadasivan et al 2024

Fast Adversarial Attacks on Language Models In One GPU Minute

“Assisting in Writing Wikipedia-Like Articles From Scratch With Large Language Models”, Shao et al 2024

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

“Autonomous Data Selection With Language Models for Mathematical Texts”, Zhang et al 2024

Autonomous Data Selection with Language Models for Mathematical Texts

“Grandmaster-Level Chess Without Search”, Ruoss et al 2024

Grandmaster-Level Chess Without Search

“Neural Networks Learn Statistics of Increasing Complexity”, Belrose et al 2024

Neural Networks Learn Statistics of Increasing Complexity

“Arrows of Time for Large Language Models”, Papadopoulos et al 2024

Arrows of Time for Large Language Models

“SliceGPT: Compress Large Language Models by Deleting Rows and Columns”, Ashkboos et al 2024

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

“Excuse Me, Sir? Your Language Model Is Leaking (information)”, Zamir 2024

Excuse me, sir? Your language model is leaking (information)

“TinyLlama: An Open-Source Small Language Model”, Zhang et al 2024

TinyLlama: An Open-Source Small Language Model

“LLaMA Pro: Progressive LLaMA With Block Expansion”, Wu et al 2024

LLaMA Pro: Progressive LLaMA with Block Expansion

“Generative AI Is Already Widespread in the Public Sector”, Bright et al 2024

Generative AI is already widespread in the public sector

“Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws”, Sardana & Frankle 2023

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

“TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones”, Yuan et al 2023

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

“Reasons to Reject? Aligning Language Models With Judgments”, Xu et al 2023

Reasons to Reject? Aligning Language Models with Judgments

“Generative Multimodal Models Are In-Context Learners”, Sun et al 2023

Generative Multimodal Models are In-Context Learners

“Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning”, Dutta et al 2023

Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

“Object Recognition As Next Token Prediction”, Yue et al 2023

Object Recognition as Next Token Prediction

“MEDITRON-70B: Scaling Medical Pretraining for Large Language Models”, Chen et al 2023

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

“Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching”, Campbell et al 2023

Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching

“OpenAI Researchers Warned Board of AI Breakthrough ahead of CEO Ouster, Sources Say”, Tong et al 2023

OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say

“Positional Description Matters for Transformers Arithmetic”, Shen et al 2023

Positional Description Matters for Transformers Arithmetic

“Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models”, Zhang et al 2023

Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models

“Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game”, Toyer et al 2023

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

“Learn Your Tokens: Word-Pooled Tokenization for Language Modeling”, Thawani et al 2023

Learn Your Tokens: Word-Pooled Tokenization for Language Modeling

“Llemma: An Open Language Model For Mathematics”, Azerbayev et al 2023

Llemma: An Open Language Model For Mathematics

“In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries”, Shi et al 2023

In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries

“OSD: Online Speculative Decoding”, Liu et al 2023

OSD: Online Speculative Decoding

“Let Models Speak Ciphers: Multiagent Debate through Embeddings”, Pham et al 2023

Let Models Speak Ciphers: Multiagent Debate through Embeddings

“OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text”, Paster et al 2023

OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

“XVal: A Continuous Number Encoding for Large Language Models”, Golkar et al 2023

xVal: A Continuous Number Encoding for Large Language Models

“MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models”, Yu et al 2023

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

“Language Modeling Is Compression”, Delétang et al 2023

Language Modeling Is Compression

“Sparse Autoencoders Find Highly Interpretable Features in Language Models”, Cunningham et al 2023

Sparse Autoencoders Find Highly Interpretable Features in Language Models

“Anchor Points: Benchmarking Models With Much Fewer Examples”, Vivek et al 2023

Anchor Points: Benchmarking Models with Much Fewer Examples

“When Less Is More: Investigating Data Pruning for Pretraining LLMs at Scale”, Marion et al 2023

When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

“Language Reward Modulation for Pretraining Reinforcement Learning”, Adeniji et al 2023

Language Reward Modulation for Pretraining Reinforcement Learning

“ReST: Reinforced Self-Training (ReST) for Language Modeling”, Gulcehre et al 2023

ReST: Reinforced Self-Training (ReST) for Language Modeling

“Studying Large Language Model Generalization With Influence Functions”, Grosse et al 2023

Studying Large Language Model Generalization with Influence Functions

“Multimodal Neurons in Pretrained Text-Only Transformers”, Schwettmann et al 2023

Multimodal Neurons in Pretrained Text-Only Transformers

“Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models”, Chen et al 2023

Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models

“Length Generalization in Arithmetic Transformers”, Jelassi et al 2023

Length Generalization in Arithmetic Transformers

“Are Aligned Neural Networks Adversarially Aligned?”, Carlini et al 2023

Are aligned neural networks adversarially aligned?

“Improving Long-Horizon Imitation Through Instruction Prediction”, Hejna et al 2023

Improving Long-Horizon Imitation Through Instruction Prediction

“Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Roger 2023

Large Language Models Sometimes Generate Purely Negatively-Reinforced Text

“SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression”, Dettmers et al 2023

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

“Undetectable Watermarks for Language Models”, Christ et al 2023

Undetectable Watermarks for Language Models

“Improving Language Models With Advantage-Based Offline Policy Gradients”, Baheti et al 2023

Improving Language Models with Advantage-based Offline Policy Gradients

“Reasoning With Language Model Is Planning With World Model”, Hao et al 2023

Reasoning with Language Model is Planning with World Model

“Accelerating Transformer Inference for Translation via Parallel Decoding”, Santilli et al 2023

Accelerating Transformer Inference for Translation via Parallel Decoding

“DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining”, Xie et al 2023

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

“Memorization for Good: Encryption With Autoregressive Language Models”, Stevens & Su 2023

Memorization for Good: Encryption with Autoregressive Language Models

“MEGABYTE: Predicting Million-Byte Sequences With Multiscale Transformers”, Yu et al 2023

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

“Finding Neurons in a Haystack: Case Studies With Sparse Probing”, Gurnee et al 2023

Finding Neurons in a Haystack: Case Studies with Sparse Probing

“Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot”, Konrad 2023

Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot

“Emergent and Predictable Memorization in Large Language Models”, Biderman et al 2023

Emergent and Predictable Memorization in Large Language Models

“A Comparative Study between Full-Parameter and LoRA-Based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model”, Sun et al 2023

A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model

“Shall We Pretrain Autoregressive Language Models With Retrieval? A Comprehensive Study”, Wang et al 2023

Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study

“How Large-Language Models Can Revolutionize Military Planning”, Jensen & Tadross 2023

How Large-Language Models Can Revolutionize Military Planning

“Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling”, Biderman et al 2023

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

“8 Things to Know about Large Language Models”, Bowman 2023

8 Things to Know about Large Language Models

“BloombergGPT: A Large Language Model for Finance”, Wu et al 2023

BloombergGPT: A Large Language Model for Finance

“The Quantization Model of Neural Scaling”, Michaud et al 2023

The Quantization Model of Neural Scaling

“Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org 2023

Int-4 LLaMa is not enough—Int-3 and beyond: More compression, easier to build apps on LLMs that run locally

“Consistency Analysis of ChatGPT”, Jang & Lukasiewicz 2023

Consistency Analysis of ChatGPT

“Rewarding Chatbots for Real-World Engagement With Millions of Users”, Irvine et al 2023

Rewarding Chatbots for Real-World Engagement with Millions of Users

“Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan”, Kataoka 2023

Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan

“SpikeGPT: Generative Pre-Trained Language Model With Spiking Neural Networks”, Zhu et al 2023

SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

“A Prompt Pattern Catalog to Enhance Prompt Engineering With ChatGPT”, White et al 2023

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

“BiLD: Big Little Transformer Decoder”, Kim et al 2023

BiLD: Big Little Transformer Decoder

“Data Selection for Language Models via Importance Resampling”, Xie et al 2023

Data Selection for Language Models via Importance Resampling

“In-Context Retrieval-Augmented Language Models”, Ram et al 2023

In-Context Retrieval-Augmented Language Models

“Crawling the Internal Knowledge-Base of Language Models”, Cohen et al 2023

Crawling the Internal Knowledge-Base of Language Models

“Big Tech Was Moving Cautiously on AI. Then Came ChatGPT. Google, Facebook and Microsoft Helped Build the Scaffolding of AI. Smaller Companies Are Taking It to the Masses, Forcing Big Tech to React”, Tiku et al 2023

Big Tech was moving cautiously on AI. Then came ChatGPT. Google, Facebook and Microsoft helped build the scaffolding of AI. Smaller companies are taking it to the masses, forcing Big Tech to react

“Rock Guitar Tablature Generation via Natural Language Processing”, Casco-Rodriguez 2023

Rock Guitar Tablature Generation via Natural Language Processing

“InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”, Boytsov et al 2023

InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

“A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A New Wave of Chat Bots like ChatGPT Use Artificial Intelligence That Could Reinvent or Even Replace the Traditional Internet Search Engine”, Grant & Metz 2022

A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A new wave of chat bots like ChatGPT use artificial intelligence that could reinvent or even replace the traditional internet search engine

“Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent As Meta-Optimizers”, Dai et al 2022

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

“Rethinking the Role of Scale for In-Context Learning: An Interpretability-Based Case Study at 66 Billion Scale”, Bansal et al 2022

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

“Interpreting Neural Networks through the Polytope Lens”, Black et al 2022

Interpreting Neural Networks through the Polytope Lens

“SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Xiao et al 2022

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

“InstructPix2Pix: Learning to Follow Image Editing Instructions”, Brooks et al 2022

InstructPix2Pix: Learning to Follow Image Editing Instructions

“Galactica: A Large Language Model for Science”, Taylor et al 2022

Galactica: A Large Language Model for Science

“Large Language Models Struggle to Learn Long-Tail Knowledge”, Kandpal et al 2022

Large Language Models Struggle to Learn Long-Tail Knowledge

“The CRINGE Loss: Learning What Language Not to Model”, Adolphs et al 2022

The CRINGE Loss: Learning what language not to model

“Mysteries of Mode Collapse § Inescapable Wedding Parties”, Janus 2022

Mysteries of mode collapse § Inescapable wedding parties

“GPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers”, Frantar et al 2022

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

“What Is My Math Transformer Doing? – 3 Results on Interpretability and Generalization”, Charton 2022

What is my math transformer doing? – 3 results on interpretability and generalization

“When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels”, Shi et al 2022

When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels

“Can Language Models Handle Recursively Nested Grammatical Structures? A Case Study on Comparing Models and Humans”, Lampinen 2022

Can language models handle recursively nested grammatical structures? A case study on comparing models and humans

“Evaluating Parameter Efficient Learning for Generation”, Xu et al 2022

Evaluating Parameter Efficient Learning for Generation

“BioGPT: Generative Pre-Trained Transformer for Biomedical Text Generation and Mining”, Luo et al 2022

BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining

“Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models”, Vilnis et al 2022

Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

“MTEB: Massive Text Embedding Benchmark”, Muennighoff et al 2022

MTEB: Massive Text Embedding Benchmark

“Foundation Transformers”, Wang et al 2022

Foundation Transformers

“Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, Arora et al 2022

Ask Me Anything (AMA): A simple strategy for prompting language models

“Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization”, Ramamurthy et al 2022

Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

“Sparrow: Improving Alignment of Dialogue Agents via Targeted Human Judgements”, Glaese et al 2022

Sparrow: Improving alignment of dialogue agents via targeted human judgements

“Generate rather than Retrieve (GenRead): Large Language Models Are Strong Context Generators”, Yu et al 2022

Generate rather than Retrieve (GenRead): Large Language Models are Strong Context Generators

“FP8 Formats for Deep Learning”, Micikevicius et al 2022

FP8 Formats for Deep Learning

“Petals: Collaborative Inference and Fine-Tuning of Large Models”, Borzunov et al 2022

Petals: Collaborative Inference and Fine-tuning of Large Models

“`LLM.int8()`: 8-Bit Matrix Multiplication for Transformers at Scale”, Dettmers et al 2022

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

“Meaning without Reference in Large Language Models”, Piantadosi & Hill 2022

Meaning without reference in large language models

“Effidit: Your AI Writing Assistant”, Shi et al 2022

Effidit: Your AI Writing Assistant

“Language Models Show Human-Like Content Effects on Reasoning”, Dasgupta et al 2022

Language models show human-like content effects on reasoning

“LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, Shah et al 2022

LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action

“Can Foundation Models Talk Causality?”, Willig et al 2022

Can Foundation Models Talk Causality?

“NOAH: Neural Prompt Search”, Zhang et al 2022

NOAH: Neural Prompt Search

“ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Yao et al 2022

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

“Quark: Controllable Text Generation With Reinforced Unlearning”, Lu et al 2022

Quark: Controllable Text Generation with Reinforced Unlearning

“RankGen: Improving Text Generation With Large Ranking Models”, Krishna et al 2022

RankGen: Improving Text Generation with Large Ranking Models

“Opal: Multimodal Image Generation for News Illustration”, Liu et al 2022

Opal: Multimodal Image Generation for News Illustration

“What Language Model to Train If You Have One Million GPU Hours?”, Scao et al 2022

What Language Model to Train if You Have One Million GPU Hours?

“WAVPROMPT: Towards Few-Shot Spoken Language Understanding With Frozen Language Models”, Gao et al 2022

WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models

“Shared Computational Principles for Language Processing in Humans and Deep Language Models”, Goldstein et al 2022

Shared computational principles for language processing in humans and deep language models

“Vector-Quantized Image Modeling With Improved VQGAN”, Yu et al 2022

Vector-quantized Image Modeling with Improved VQGAN

“Brains and Algorithms Partially Converge in Natural Language Processing”, Caucheteux & King 2022

Brains and algorithms partially converge in natural language processing

“Quantifying Memorization Across Neural Language Models”, Carlini et al 2022

Quantifying Memorization Across Neural Language Models

“A Contrastive Framework for Neural Text Generation”, Su et al 2022

A Contrastive Framework for Neural Text Generation

“AdaPrompt: Adaptive Model Training for Prompt-Based NLP”, Chen et al 2022

AdaPrompt: Adaptive Model Training for Prompt-based NLP

“InPars: Data Augmentation for Information Retrieval Using Large Language Models”, Bonifacio et al 2022

InPars: Data Augmentation for Information Retrieval using Large Language Models

“ROME: Locating and Editing Factual Associations in GPT”, Meng et al 2022

ROME: Locating and Editing Factual Associations in GPT

“Cedille: A Large Autoregressive French Language Model”, Müller & Laurent 2022

Cedille: A large autoregressive French language model

“Data Scaling Laws in NMT: The Effect of Noise and Architecture”, Bansal et al 2022

Data Scaling Laws in NMT: The Effect of Noise and Architecture

“PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts”, Bach et al 2022

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

“Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, Smith et al 2022

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

“Language Models As Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents”, Huang et al 2022

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

“WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Liu et al 2022

WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation

“A Survey of Controllable Text Generation Using Transformer-Based Pre-Trained Language Models”, Zhang et al 2022

A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models

“The Defeat of the Winograd Schema Challenge”, Kocijan et al 2022

The Defeat of the Winograd Schema Challenge

“Learning To Retrieve Prompts for In-Context Learning”, Rubin et al 2021

Learning To Retrieve Prompts for In-Context Learning

“Learning to Prompt for Continual Learning”, Wang et al 2021

Learning to Prompt for Continual Learning

“Amortized Noisy Channel Neural Machine Translation”, Pang et al 2021

Amortized Noisy Channel Neural Machine Translation

“Few-Shot Instruction Prompts for Pretrained Language Models to Detect Social Biases”, Prabhumoye et al 2021

Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases

“PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, Khashabi et al 2021

PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts

“LMTurk: Few-Shot Learners As Crowdsourcing Workers”, Zhao et al 2021

LMTurk: Few-Shot Learners as Crowdsourcing Workers

“Improving Language Models by Retrieving from Trillions of Tokens”, Borgeaud et al 2021

Improving language models by retrieving from trillions of tokens

“Linear Algebra With Transformers”, Charton 2021

Linear algebra with transformers

“Zero-Shot Image-To-Text Generation for Visual-Semantic Arithmetic”, Tewel et al 2021

Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

“Long-Range and Hierarchical Language Predictions in Brains and Algorithms”, Caucheteux et al 2021

Long-range and hierarchical language predictions in brains and algorithms

“True Few-Shot Learning With Prompts—A Real-World Perspective”, Schick & Schütze 2021

True Few-Shot Learning with Prompts—A Real-World Perspective

“Few-Shot Named Entity Recognition With Cloze Questions”, Gatta et al 2021

Few-shot Named Entity Recognition with Cloze Questions

“Evaluating Distributional Distortion in Neural Language Modeling”, Anonymous 2021

Evaluating Distributional Distortion in Neural Language Modeling

“On Transferability of Prompt Tuning for Natural Language Understanding”, Su et al 2021

On Transferability of Prompt Tuning for Natural Language Understanding

“CLUES: Few-Shot Learning Evaluation in Natural Language Understanding”, Mukherjee et al 2021

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

“Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey”, Min et al 2021

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

“Fast Model Editing at Scale”, Mitchell et al 2021

Fast Model Editing at Scale

“Yuan 1.0: Large-Scale Pre-Trained Language Model in Zero-Shot and Few-Shot Learning”, Wu et al 2021

Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning

“Towards a Unified View of Parameter-Efficient Transfer Learning”, He et al 2021

Towards a Unified View of Parameter-Efficient Transfer Learning

“A Few More Examples May Be Worth Billions of Parameters”, Kirstain et al 2021

A Few More Examples May Be Worth Billions of Parameters

“Scaling Laws for Neural Machine Translation”, Ghorbani et al 2021

Scaling Laws for Neural Machine Translation

“Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color”, Abdou et al 2021

Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color

“What Changes Can Large-Scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-Scale Korean Generative Pretrained Transformers”, Kim et al 2021

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

“Medically Aware GPT-3 As a Data Generator for Medical Dialogue Summarization”, Chintagunta et al 2021

Medically Aware GPT-3 as a Data Generator for Medical Dialogue Summarization

“General-Purpose Question-Answering With Macaw”, Tafjord & Clark 2021

General-Purpose Question-Answering with Macaw

“An Empirical Exploration in Quality Filtering of Text Data”, Gao 2021

An Empirical Exploration in Quality Filtering of Text Data

“Want To Reduce Labeling Cost? GPT-3 Can Help”, Wang et al 2021

Want To Reduce Labeling Cost? GPT-3 Can Help

“Multimodal Few-Shot Learning With Frozen Language Models”, Tsimpoukelli et al 2021

Multimodal Few-Shot Learning with Frozen Language Models

“Cutting Down on Prompts and Parameters: Simple Few-Shot Learning With Language Models”, IV et al 2021

Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

“RASP: Thinking Like Transformers”, Weiss et al 2021

RASP: Thinking Like Transformers

“ByT5: Towards a Token-Free Future With Pre-Trained Byte-To-Byte Models”, Xue et al 2021

ByT5: Towards a token-free future with pre-trained byte-to-byte models

“Anthropic Raises $124 Million to Build More Reliable, General AI Systems”, Anthropic 2021

Anthropic raises $124 million to build more reliable, general AI systems

“Naver Unveils First ‘Hyperscale’ AI Platform”, Jae-eun 2021

Naver unveils first ‘hyperscale’ AI platform

“Scaling Laws for Language Transfer Learning”, Kim 2021

Scaling Laws for Language Transfer Learning

“GPT Understands, Too”, Liu et al 2021

GPT Understands, Too

“How Many Data Points Is a Prompt Worth?”, Scao & Rush 2021

How Many Data Points is a Prompt Worth?

“Pretrained Transformers As Universal Computation Engines”, Lu et al 2021

Pretrained Transformers as Universal Computation Engines

“Language Models Have a Moral Dimension”, Schramowski et al 2021

Language Models have a Moral Dimension

“Learning Chess Blindfolded: Evaluating Language Models on State Tracking”, Toshniwal et al 2021

Learning Chess Blindfolded: Evaluating Language Models on State Tracking

“Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Nogueira et al 2021

Investigating the Limitations of the Transformers with Simple Arithmetic Tasks

“Proof Artifact Co-Training for Theorem Proving With Language Models”, Han et al 2021

Proof Artifact Co-training for Theorem Proving with Language Models

“Clinical Outcome Prediction from Admission Notes Using Self-Supervised Knowledge Integration”, Aken et al 2021

Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration

“Scaling Laws for Transfer”, Hernandez et al 2021

Scaling Laws for Transfer

“MAUVE: Measuring the Gap Between Neural Text and Human Text Using Divergence Frontiers”, Pillutla et al 2021

MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers

“Apparently ‘What Ho’ Is a Corruption Of…”, Marguerite 2021

Apparently ‘what ho’ is a corruption of…

“Making Pre-Trained Language Models Better Few-Shot Learners”, Gao et al 2020

Making Pre-trained Language Models Better Few-shot Learners

“Thinking Ahead: Prediction in Context As a Keystone of Language in Humans and Machines”, Goldstein et al 2020

Thinking ahead: prediction in context as a keystone of language in humans and machines

“CPM: A Large-Scale Generative Chinese Pre-Trained Language Model”, Zhang et al 2020

CPM: A Large-scale Generative Chinese Pre-trained Language Model

“L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, Pudipeddi et al 2020

L2L: Training Large Neural Networks with Constant Memory using a New Execution Algorithm

“Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries”, Sun et al 2020

Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries

“The Neural Architecture of Language: Integrative Reverse-Engineering Converges on a Model for Predictive Processing”, Schrimpf et al 2020

The neural architecture of language: Integrative reverse-engineering converges on a model for predictive processing

“RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text”, Dugan et al 2020

RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text

“A Systematic Characterization of Sampling Algorithms for Open-Ended Language Generation”, Nadeem et al 2020

A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation

“Generative Language Modeling for Automated Theorem Proving”, Polu & Sutskever 2020

Generative Language Modeling for Automated Theorem Proving

“Learning to Summarize from Human Feedback”, Stiennon et al 2020

Learning to summarize from human feedback

“ETHICS: Aligning AI With Shared Human Values”, Hendrycks et al 2020

ETHICS: Aligning AI With Shared Human Values

“Mirostat: A Neural Text Decoding Algorithm That Directly Controls Perplexity”, Basu et al 2020

Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity

“Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”, Bender & Koller 2020

Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data

“Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Katharopoulos et al 2020

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

“OpenAI API Beta Homepage”, OpenAI 2020

OpenAI API Beta homepage

“Trading Off Diversity and Quality in Natural Language Generation”, Zhang et al 2020

Trading Off Diversity and Quality in Natural Language Generation

“Scaling Laws from the Data Manifold Dimension”, Sharma & Kaplan 2020

Scaling Laws from the Data Manifold Dimension

“Unigram LM: Byte Pair Encoding Is Suboptimal for Language Model Pretraining”, Bostrom & Durrett 2020

Unigram LM: Byte Pair Encoding is Suboptimal for Language Model Pretraining

“Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, Hasson et al 2020

Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks

“Pop Music Transformer: Beat-Based Modeling and Generation of Expressive Pop Piano Compositions”, Huang & Yang 2020

Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions

“Scaling Laws for Neural Language Models”, Kaplan et al 2020

Scaling Laws for Neural Language Models

“Reformer: The Efficient Transformer”, Kitaev et al 2020

Reformer: The Efficient Transformer

“What Does BERT Dream Of? A Visual Investigation of Nightmares in Sesame Street”, Bäuerle & Wexler 2020

What does BERT dream of? A visual investigation of nightmares in Sesame Street

“Generative Language Modeling for Automated Theorem Proving § Experiments”, Polu & Sutskever 2020 (page 11 org openai)

Generative Language Modeling for Automated Theorem Proving § Experiments

“Plug and Play Language Models: A Simple Approach to Controlled Text Generation”, Dathathri et al 2019

Plug and Play Language Models: A Simple Approach to Controlled Text Generation

“How Can We Know What Language Models Know?”, Jiang et al 2019

How Can We Know What Language Models Know?

“CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”, Lin et al 2019

CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning

“Generalization through Memorization: Nearest Neighbor Language Models”, Khandelwal et al 2019

Generalization through Memorization: Nearest Neighbor Language Models

“DialoGPT: Large-Scale Generative Pre-Training for Conversational Response Generation”, Zhang et al 2019

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

“CTRL: A Conditional Transformer Language Model For Controllable Generation”, Keskar et al 2019

CTRL: A Conditional Transformer Language Model For Controllable Generation

“Smaller, Faster, Cheaper, Lighter: Introducing DistilGPT, a Distilled Version of GPT”, Sanh 2019

Smaller, faster, cheaper, lighter: Introducing DistilGPT, a distilled version of GPT

“Language Modeling State-Of-The-Art Leaderboards”, paperswithcode.com 2019

Language Modeling State-of-the-art leaderboards

“Neural Text Generation With Unlikelihood Training”, Welleck et al 2019

Neural Text Generation with Unlikelihood Training

“GROVER: Defending Against Neural Fake News”, Zellers et al 2019

GROVER: Defending Against Neural Fake News

“Generative Modeling With Sparse Transformers: We’ve Developed the Sparse Transformer, a Deep Neural Network Which Sets New Records at Predicting What Comes next in a Sequence—Whether Text, Images, or Sound. It Uses an Algorithmic Improvement of the attention Mechanism to Extract Patterns from Sequences 30× Longer Than Possible Previously”, Child & Gray 2019

Generative Modeling with Sparse Transformers: We’ve developed the Sparse Transformer, a deep neural network which sets new records at predicting what comes next in a sequence—whether text, images, or sound. It uses an algorithmic improvement of the attention mechanism to extract patterns from sequences 30× longer than possible previously

“The Curious Case of Neural Text Degeneration”, Holtzman et al 2019

The Curious Case of Neural Text Degeneration

“Smart Vet: Autocompleting Sentences in Veterinary Medical Records”, Ginn 2019

Smart Vet: Autocompleting Sentences in Veterinary Medical Records

“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, Dai et al 2019

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

“Music Transformer: Generating Music With Long-Term Structure”, Huang et al 2018

Music Transformer: Generating Music with Long-Term Structure

“Universal Transformers”, Dehghani et al 2018

Universal Transformers

“Adversarial Reprogramming of Neural Networks”, Elsayed et al 2018

Adversarial Reprogramming of Neural Networks

“GPT-1: Improving Language Understanding With Unsupervised Learning”, OpenAI 2018

GPT-1: Improving Language Understanding with Unsupervised Learning

“GPT-1: Improving Language Understanding by Generative Pre-Training”, Radford et al 2018

GPT-1: Improving Language Understanding by Generative Pre-Training

“GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Radford et al 2018 (page 5)

GPT-1: Improving Language Understanding by Generative Pre-Training § Model specifications

“Deep Reinforcement Learning from Human Preferences § Appendix A.2: Atari”, Christiano et al 2017 (page 15 org openai)

Deep reinforcement learning from human preferences § Appendix A.2: Atari

“Learning to Generate Reviews and Discovering Sentiment”, Radford et al 2017

Learning to Generate Reviews and Discovering Sentiment

“Design a Role-Playing Game Using 200 Words or Less.”

Design a role-playing game using 200 words or less.⁠:

View HTML:

/doc/www/200wordrpg.github.io/68dc29784a54b4b94a1215b358244b267755a4e1.html

“How Does In-Context Learning Work? A Framework for Understanding the Differences from Traditional Supervised Learning”

How does in-context learning work? A framework for understanding the differences from traditional supervised learning⁠:

View HTML:

/doc/www/ai.stanford.edu/bdf17c80e1ed5dc516811f03acef03415b220143.html

“AI Dungeon: Dragon Model Upgrade—You Can Now Play AI Dungeon With One of the Most Powerful AI Models in the World.”

AI Dungeon: Dragon Model Upgrade—You can now play AI Dungeon with one of the most powerful AI models in the world.⁠:

View HTML:

/doc/www/aidungeon.medium.com/52d2fe5633e74d1f355221dba088b17ff34db79d.html

“Introducing AI Dungeon Translate: AI Dungeon Players Can Now Translate Their Stories into Emojis by Just Clicking a Button. [ 🤔 💯 🤷‍♂️ 🤔 🤔 🤔 💯]”

Introducing AI Dungeon Translate: AI Dungeon players can now translate their stories into emojis by just clicking a button. [ 🤔 💯 🤷‍♂️ 🤔 🤔 🤔 💯]

“OpenAI API Alchemy: Emoji Storytelling 🤖”

OpenAI API Alchemy: Emoji storytelling 🤖⁠:

View External Link:

https://andrewmayne.com/2020/06/24/open-ai-alchemy-emoji-storytelling/

“Llama-3.1-405B Now Runs at 969 Tokens/s on Cerebras Inference”

Llama-3.1-405B now runs at 969 tokens/s on Cerebras Inference

“I Blew $720 on 100 Notebooks from Alibaba and Started a Paper Website Business”

I blew $720 on 100 notebooks from Alibaba and started a Paper Website business⁠:

View HTML:

/doc/www/daily.tinyprojects.dev/b295888386dd5c1f2e2a679fd7b84432811d3917.html

“AlphaStar: Mastering the Real-Time Strategy Game StarCraft II”

AlphaStar: Mastering the Real-Time Strategy Game StarCraft II

“Transformers As Variational Autoencoders”

Transformers as Variational Autoencoders

“BlinkDL/RWKV-LM: RWKV Is an RNN With Transformer-Level LLM Performance. It Can Be Directly Trained like a GPT (parallelizable). So It’s Combining the Best of RNN and Transformer—Great Performance, Fast Inference, Saves VRAM, Fast Training, "Infinite" Ctx_len, and Free Sentence Embedding.”

BlinkDL/RWKV-LM: RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it’s combining the best of RNN and transformer—great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

“Aidan Bench Attempts to Measure ‘Big Model Smell’ in LLMs”

Aidan Bench attempts to measure ‘big model smell’ in LLMs⁠:

View HTML:

/doc/www/github.com/7b52f6865d70bf69d20923e878fa8ec94c95f332.html

“Efficient, Reusable RNNs and LSTMs for Torch”

Efficient, reusable RNNs and LSTMs for torch

“Updated Training?”

Updated training?

“Karpathy/minGPT: A Minimal PyTorch Re-Implementation of the OpenAI GPT (Generative Pretrained Transformer) Training”

karpathy/minGPT: A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

“Minimaxir/textgenrnn: Easily Train Your Own Text-Generating Neural Network of Any Size and Complexity on Any Text Dataset With a Few Lines of Code.”

minimaxir/textgenrnn: Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

“Loom: Multiversal Tree Writing Interface for Human-AI Collaboration”, Janus 2025

Loom: Multiversal tree writing interface for human-AI collaboration⁠:

View HTML:

/doc/www/github.com/ab1b1b61962d42831ad82c1ecaab2a7d3aef8423.html

“Zphang/minimal-Opt”

zphang/minimal-opt

“Math: OpenAI API Can Do Some Math out of the Gate, but Most Math It Seems It Has to Learn. Many Times, the Numbers That It Spits out Are Just Random. However, including Different Priming Prompts Can Result in Decent Results.”

Math: OpenAI API can do some math out of the gate, but most math it seems it has to learn. Many times, the numbers that it spits out are just random. However, including different priming prompts can result in decent results.

“Deep Learning for Assisting the Process of Music Composition (part 3)”

Deep learning for assisting the process of music composition (part 3)⁠:

View External Link:

https://highnoongmt.wordpress.com/2015/08/13/deep-learning-for-assisting-the-process-of-music-composition-part-3/

“Google DeepMind’s Grandmaster-Level Chess Without Search”

Google DeepMind’s Grandmaster-Level Chess Without Search

“The Technology Behind BLOOM Training”

The Technology Behind BLOOM Training

“Psych-101 Dataset [For Centaur]”

Psych-101 dataset [for Centaur]

The Gostak

The Gostak

“Imprompter”

“Your Next New Best Friend Might Be a Robot”

Your Next New Best Friend Might Be a Robot⁠:

View External Link:

https://nautil.us/your-next-new-best-friend-might-be-a-robot-235779/

“I Made a Custom Gpt That Incorporates Advertisement/product Placement With Its...”

I made a custom gpt that incorporates advertisement/product placement with its...

“The Annotated Transformer”

The Annotated Transformer

“Homepage of Paul F. Christiano”, Christiano 2025

Homepage of Paul F. Christiano

“Data Exfiltration from Slack AI via Indirect Prompt Injection”, PromptArmor 2025

Data Exfiltration from Slack AI via indirect prompt injection

“Introductory Antimemetics (abandoned First Draft)”, Hughes 2025

Introductory Antimemetics (abandoned first draft)

“Jared Kaplan”

Jared Kaplan

“Meditations on Moloch”

Meditations on Moloch

“Stream Seaandsailor”

Stream seaandsailor

“Humans Who Are Not Concentrating Are Not General Intelligences”

Humans Who Are Not Concentrating Are Not General Intelligences

“Monitor: An AI-Driven Observability Interface”

Monitor: An AI-Driven Observability Interface

“This Is the OpenAI API. It Makes Spookily Good Twitter Bots. 13⁄10 Would Retweet”

This is the OpenAI API. It makes spookily good twitter bots. 13⁄10 would retweet⁠:

View HTML:

/doc/www/www.aiweirdness.com/22435625719a0806d8474097cd0740ce131d6684.html

“AMA Conjecture, A New Alignment Startup”

AMA Conjecture, A New Alignment Startup⁠:

View External Link:

https://www.alignmentforum.org/posts/rtEtTybuCcDWLk7N9/ama-conjecture-a-new-alignment-startup

“WikiCrow”

WikiCrow⁠:

View HTML:

/doc/www/www.futurehouse.org/1754102bd82fb703b3c17ea47232c04c76f7452e.html

“ChatGPT As Muse, Not Oracle”, Litt 2025

ChatGPT as muse, not oracle

“Interpreting GPT: the Logit Lens”

interpreting GPT: the logit lens⁠:

View External Link:

https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens

“Assessing AlephAlpha’s Multimodal Model”

Assessing AlephAlpha’s Multimodal Model⁠:

View External Link:

https://www.lesswrong.com/posts/EzuBSASuui5qekhLA/assessing-alephalphas-multimodal-model

“Is GPT-3 a Good Rationalist?”

Is GPT-3 a Good Rationalist?⁠:

View External Link:

https://www.lesswrong.com/posts/a3FuA7fGgpTQ7mX3W/is-gpt3-a-good-rationalist-instructgpt3-2-2

“We Are Conjecture, A New Alignment Research Startup”

We Are Conjecture, A New Alignment Research Startup⁠:

View External Link:

https://www.lesswrong.com/posts/jfq2BH5kfQqu2vYv3/we-are-conjecture-a-new-alignment-research-startup

“Investigating Causal Understanding in LLMs”

Investigating causal understanding in LLMs⁠:

View External Link:

https://www.lesswrong.com/posts/yZb5eFvDoaqB337X5/investigating-causal-understanding-in-llms

“A One-Question Turing Test for GPT-3”

A one-question Turing test for GPT-3⁠:

View External Link:

https://www.lesswrong.com/posts/ydeaHqDPJ5REJWvat/a-one-question-turing-test-for-gpt-3

“This Mystical Book Was Co-Authored by a Disturbingly Realistic AI”

This Mystical Book Was Co-Authored by a Disturbingly Realistic AI

“The Guy Behind the Fake AI Halloween Parade Listing Says You’ve Got It All Wrong”

The Guy Behind the Fake AI Halloween Parade Listing Says You’ve Got It All Wrong

“Season 1 Ep. 22 OpenAI's Ilya Sutskever: The Man Who Made AI Work”

Season 1 Ep. 22 OpenAI's Ilya Sutskever: The man who made AI work⁠:

https://www.youtube.com/watch?v=fCoavgGZ64Y&t=2796s

“WELM”

WELM

nickwalton00

I've been testing the largest of @OpenAI's models with AI Dungeon and been constantly impressed at how interesting and dynamic the characters are, like this queen, long thought to be dead, hiding from enemies and not happy about me prying into her personal life.⁠:

/doc/www/localhost/1db9c3cd59a3984ef24043ef1f7c9cf659591f74.html

sama

OpenAI now generates about 100 billion words per day

voooooogel

Llama-3.3-70b correctly guesses the sampling constraint (only allowed to use words that are in the Bible)

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`creative-generation`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`model-evaluation preference-optimization controllable-generation ethical-alignment interpretability`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`knowledge-augmentation ethical-ai multimodal-learning biomedical-gpt model-compression adaptive-training`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`chatbot-innovation`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`parameter-efficient`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`prompt-learning`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`large-language-models`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

Wikipedia

Poe (software)⁠:

https://en.wikipedia.org/wiki/Poe_(software)

Miscellaneous

Bibliography

https://arxiv.org/abs/2501.01956: “Metadata Conditioning Accelerates Language Model Pre-Training”, Tianyu Gao, Alexander Wettig, Luxi He, Yihe Dong, Sadhika Malladi, Danqi Chen

link-bibliography
https://arxiv.org/abs/2410.01707: “Interpretable Contrastive Monte Carlo Tree Search Reasoning”, Zitian Gao, Boye Niu, Xuzheng He, Haotian Xu, Hongzhang Liu, Aiwei Liu, Xuming Hu, Lijie Wen

link-bibliography
https://arxiv.org/abs/2408.05446: “Ensemble Everything Everywhere: Multi-Scale Aggregation for Adversarial Robustness”, Stanislav Fort, Balaji Lakshminarayanan

link-bibliography
https://arxiv.org/abs/2406.20086: “Token Erasure As a Footprint of Implicit Vocabulary Items in LLMs”, Sheridan Feucht, David Atkinson, Byron Wallace, David Bau

link-bibliography
https://arxiv.org/abs/2406.19146: “Resolving Discrepancies in Compute-Optimal Scaling of Language Models”, Tomer Porian, Mitchell Wortsman, Jenia Jitsev, Ludwig Schmidt, Yair Carmon

link-bibliography
https://arxiv.org/abs/2406.13131: “When Parts Are Greater Than Sums: Individual LLM Components Can Outperform Full Models”, Ting-Yun Chang, Jesse Thomason, Robin Jia

link-bibliography
https://arxiv.org/abs/2406.11794: “DataComp-LM: In Search of the next Generation of Training Sets for Language Models”, Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner, Maciej Kilian, Hanlin Zhang, Rulin Shao, Sarah Pratt, Sunny Sanyal, Gabriel Ilharco, Giannis Daras, Kalyani Marathe, Aaron Gokaslan, Jieyu Zhang, Khyathi Chandu, Thao Nguyen, Igor Vasiljevic, Sham Kakade, Shuran Song, Sujay Sanghavi, Fartash Faghri, Sewoong Oh, Luke Zettlemoyer, Kyle Lo, Alaaeldin El-Nouby, Hadi Pouransari, Alexander Toshev, Stephanie Wang, Dirk Groeneveld, Luca Soldaini, Pang Wei Koh, Jenia Jitsev, Thomas Kollar, Alexandros G. Dimakis, Yair Carmon, Achal Dave, Ludwig Schmidt, Vaishaal Shankar

link-bibliography
https://arxiv.org/abs/2406.07394: “MCTSr: Accessing GPT-4 Level Mathematical Olympiad Solutions via Monte Carlo Tree Self-Refine With LLaMA-3-8B”, Di Zhang, Xiaoshui Huang, Dongzhan Zhou, Yuqiang Li, Wanli Ouyang

link-bibliography
https://arxiv.org/abs/2405.18400: “Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass”, Ethan Shen, Alan Fan, Sarah M. Pratt, Jae Sung Park, Matthew Wallingford, Sham M. Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati

link-bibliography
https://arxiv.org/abs/2404.12358: “From r to Q^✱: Your Language Model Is Secretly a Q-Function”, Rafael Rafailov, Joey Hejna, Ryan Park, Chelsea Finn

link-bibliography
https://arxiv.org/abs/2404.06664: “CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack Of) Multicultural Knowledge”, Yu Ying Chiu, Liwei Jiang, Maria Antoniak, Chan Young Park, Shuyue Stella Li, Mehar Bhatia, Sahithya Ravi, Yulia Tsvetkov, Vered Shwartz, Yejin Choi

link-bibliography
https://arxiv.org/abs/2402.17152#facebook: “Actions Speak Louder Than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU)”, Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Michael He, Yinghai Lu, Yu Shi

link-bibliography
https://arxiv.org/abs/2402.15570: “Fast Adversarial Attacks on Language Models In One GPU Minute”, Vinu Sankar Sadasivan, Shoumik Saha, Gaurang Sriramanan, Priyatham Kattakinda, Atoosa Chegini, Soheil Feizi

link-bibliography
https://arxiv.org/abs/2402.07625: “Autonomous Data Selection With Language Models for Mathematical Texts”, Yifan Zhang, Yifan Luo, Yang Yuan, Andrew Chi-Chih Yao

link-bibliography
https://arxiv.org/abs/2402.04494#deepmind: “Grandmaster-Level Chess Without Search”, Anian Ruoss, Grégoire Delétang, Sourabh Medapati, Jordi Grau-Moya, Li Kevin Wenliang, Elliot Catt, John Reid, Tim Genewein

link-bibliography
https://arxiv.org/abs/2401.15024#microsoft: “SliceGPT: Compress Large Language Models by Deleting Rows and Columns”, Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, James Hensman

link-bibliography
https://arxiv.org/abs/2401.02385: “TinyLlama: An Open-Source Small Language Model”, Peiyuan Zhang, Guangtao Zeng, Tianduo Wang, Wei Lu

link-bibliography
https://arxiv.org/abs/2312.16862: “TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones”, Zhengqing Yuan, Zhaoxu Li, Lichao Sun

link-bibliography
https://arxiv.org/abs/2311.16079: “MEDITRON-70B: Scaling Medical Pretraining for Large Language Models”, Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, Antoine Bosselut

link-bibliography
https://www.reuters.com/technology/sam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22/: “OpenAI Researchers Warned Board of AI Breakthrough ahead of CEO Ouster, Sources Say”, Anna Tong, Jeffrey Dastin, Krystal Hu

link-bibliography
https://arxiv.org/abs/2310.06786: “OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text”, Keiran Paster, Marco Dos Santos, Zhangir Azerbayev, Jimmy Ba

link-bibliography
https://arxiv.org/abs/2309.12284: “MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models”, Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T. Kwok, Zhenguo Li, Adrian Weller, Weiyang Liu

link-bibliography
https://arxiv.org/abs/2309.10668#deepmind: “Language Modeling Is Compression”, Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness

link-bibliography
https://arxiv.org/abs/2306.07567: “Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Fabien Roger

link-bibliography
https://arxiv.org/abs/2305.10429#google: “DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining”, Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

link-bibliography
https://www.forbes.com/sites/alexkonrad/2023/05/02/inflection-ai-ex-deepmind-launches-pi-chatbot/: “Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot”, Alex Konrad

link-bibliography
https://arxiv.org/abs/2304.06762#nvidia: “Shall We Pretrain Autoregressive Language Models With Retrieval? A Comprehensive Study”, Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar, Bryan Catanzaro

link-bibliography
https://warontherocks.com/2023/04/how-large-language-models-can-revolutionize-military-planning/: “How Large-Language Models Can Revolutionize Military Planning”, Benjamin Jensen, Dan Tadross

link-bibliography
https://arxiv.org/abs/2303.13506: “The Quantization Model of Neural Scaling”, Eric J. Michaud, Ziming Liu, Uzay Girit, Max Tegmark

link-bibliography
https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-int-3-and: “Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org

link-bibliography
https://osf.io/5uxra/: “Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan”, Yuki Kataoka

link-bibliography
https://arxiv.org/abs/2302.13939: “SpikeGPT: Generative Pre-Trained Language Model With Spiking Neural Networks”, Rui-Jie Zhu, Qihang Zhao, Jason K. Eshraghian

link-bibliography
https://arxiv.org/abs/2302.03169: “Data Selection for Language Models via Importance Resampling”, Sang Michael Xie, Shibani Santurkar, Tengyu Ma, Percy Liang

link-bibliography
https://www.nytimes.com/2022/12/21/technology/ai-chatgpt-google-search.html: “A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A New Wave of Chat Bots like ChatGPT Use Artificial Intelligence That Could Reinvent or Even Replace the Traditional Internet Search Engine”, Nico Grant, Cade Metz

link-bibliography
https://arxiv.org/abs/2211.10438: “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Guangxuan Xiao, Ji Lin, Mickael Seznec, Julien Demouth, Song Han

link-bibliography
https://arxiv.org/abs/2211.09800: “InstructPix2Pix: Learning to Follow Image Editing Instructions”, Tim Brooks, Aleksander Holynski, Alexei A. Efros

link-bibliography
https://arxiv.org/abs/2211.09085#facebook: “Galactica: A Large Language Model for Science”, Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, Robert Stojnic

link-bibliography
https://arxiv.org/abs/2211.08411: “Large Language Models Struggle to Learn Long-Tail Knowledge”, Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, Colin Raffel

link-bibliography
https://arxiv.org/abs/2210.17323: “GPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers”, Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh

link-bibliography
https://arxiv.org/abs/2210.13673#nvidia: “Evaluating Parameter Efficient Learning for Generation”, Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J. Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro

link-bibliography
https://arxiv.org/abs/2210.10341#microsoft: “BioGPT: Generative Pre-Trained Transformer for Biomedical Text Generation and Mining”, Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, Tie-Yan Liu

link-bibliography
https://arxiv.org/abs/2210.15458#google: “Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models”, Luke Vilnis, Yury Zemlyanskiy, Patrick Murray, Alexandre Passos, Sumit Sanghai

link-bibliography
https://arxiv.org/abs/2210.06423#microsoft: “Foundation Transformers”, Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei

link-bibliography
https://arxiv.org/abs/2210.02441: “Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré

link-bibliography
https://arxiv.org/abs/2210.01241: “Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization”, Rajkumar Ramamurthy, Prithviraj Ammanabrolu, Kianté Brantley, Jack Hessel, Rafet Sifa, Christian Bauckhage, Hannaneh Hajishirzi, Yejin Choi

link-bibliography
https://arxiv.org/abs/2207.04429: “LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, Dhruv Shah, Blazej Osinski, Brian Ichter, Sergey Levine

link-bibliography
https://arxiv.org/abs/2206.01861#microsoft: “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He

link-bibliography
https://www.nature.com/articles/s41593-022-01026-4: “Shared Computational Principles for Language Processing in Humans and Deep Language Models”, Ariel Goldstein, Zaid Zada, Eliav Buchnik, Mariano Schain, Amy Price, Bobbi Aubrey, Samuel A. Nastase, Amir Feder, Dotan Emanuel, Alon Cohen, Aren Jansen, Harshvardhan Gazula, Gina Choe, Aditi Rao, Catherine Kim, Colton Casto, Lora Fanda, Werner Doyle, Daniel Friedman, Patricia Dugan, Lucia Melloni, Roi Reichart, Sasha Devore, Adeen Flinker, Liat Hasenfratz, Omer Levy, Avinatan Hassidim, Michael Brenner, Yossi Matias, Kenneth A. Norman, Orrin Devinsky, Uri Hasson

link-bibliography
https://arxiv.org/abs/2110.04627#google: “Vector-Quantized Image Modeling With Improved VQGAN”, Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu

link-bibliography
https://www.nature.com/articles/s42003-022-03036-1: “Brains and Algorithms Partially Converge in Natural Language Processing”, Charlotte Caucheteux, Jean-Rémi King

link-bibliography
https://arxiv.org/abs/2201.11990#microsoftnvidia: “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro

link-bibliography
https://swabhs.com/assets/pdf/wanli.pdf#allen: “WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Alisa Liu, Swabha Swayamdipta, Noah A. Smith, Yejin Choi

link-bibliography
https://arxiv.org/abs/2112.04426#deepmind: “Improving Language Models by Retrieving from Trillions of Tokens”, Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack W. Rae, Erich Elsen, Laurent Sifre

link-bibliography
https://arxiv.org/abs/2111.13440: “True Few-Shot Learning With Prompts—A Real-World Perspective”, Timo Schick, Hinrich Schütze

link-bibliography
https://arxiv.org/abs/2111.02570#microsoft: “CLUES: Few-Shot Learning Evaluation in Natural Language Understanding”, Subhabrata Mukherjee, Xiaodong Liu, Guoqing Zheng, Saghar Hosseini, Hao Cheng, Greg Yang, Christopher Meek, Ahmed Hassan Awadallah, Jianfeng Gao

link-bibliography
https://arxiv.org/abs/2110.11309: “Fast Model Editing at Scale”, Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D. Manning

link-bibliography
https://arxiv.org/abs/2109.02593#allen: “General-Purpose Question-Answering With Macaw”, Oyvind Tafjord, Peter Clark

link-bibliography
https://arxiv.org/abs/2106.06981: “RASP: Thinking Like Transformers”, Gail Weiss, Yoav Goldberg, Eran Yahav

link-bibliography
https://arxiv.org/abs/2105.13626#google: “ByT5: Towards a Token-Free Future With Pre-Trained Byte-To-Byte Models”, Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel

link-bibliography
https://m.koreaherald.com/view.php?ud=20210525000824#naver: “Naver Unveils First ‘Hyperscale’ AI Platform”, Kang Jae-eun

link-bibliography
https://arxiv.org/abs/2009.03393#openai: “Generative Language Modeling for Automated Theorem Proving”, Stanislas Polu, Ilya Sutskever

link-bibliography
https://aclanthology.org/2020.acl-main.463.pdf: “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”, Emily M. Bender, Alexander Koller

link-bibliography
https://arxiv.org/abs/2004.10802: “Scaling Laws from the Data Manifold Dimension”, Utkarsh Sharma, Jared Kaplan

link-bibliography
https://arxiv.org/abs/2001.08361#openai: “Scaling Laws for Neural Language Models”, Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei

link-bibliography
https://arxiv.org/abs/2001.04451#google: “Reformer: The Efficient Transformer”, Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya

link-bibliography
https://arxiv.org/abs/1909.05858#salesforce: “CTRL: A Conditional Transformer Language Model For Controllable Generation”, Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher

link-bibliography
https://paperswithcode.com/task/language-modelling: “Language Modeling State-Of-The-Art Leaderboards”, paperswithcode.com

link-bibliography
https://arxiv.org/abs/1901.02860: “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov

link-bibliography
https://magenta.tensorflow.org/music-transformer: “Music Transformer: Generating Music With Long-Term Structure”, Cheng-Zhi Anna Huang, Ian Simon, Monica Dinculescu

link-bibliography
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf#page=5: “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever

link-bibliography
https://paulfchristiano.com/: “Homepage of Paul F. Christiano”, Paul F. Christiano

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]