- See Also
-
Links
- “Can LLMs Be Scammed? A Baseline Measurement Study”, Sehwag et al 2024
- “The Rise of AI-Generated Content in Wikipedia”, Brooks et al 2024
- “On Scalable Oversight With Weak LLMs Judging Strong LLMs”, Kenton et al 2024
- “APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets”, Liu et al 2024
- “Connecting the Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data”, Treutlein et al 2024
- “Designing a Dashboard for Transparency and Control of Conversational AI”, Chen et al 2024
- “Delving into ChatGPT Usage in Academic Writing through Excess Vocabulary”, Kobak et al 2024
- “Do Teachers Spot AI? Evaluating the Detectability of AI-Generated Texts among Student Essays”, Fleckenstein et al 2024
- “LLMs Achieve Adult Human Performance on Higher-Order Theory of Mind Tasks”, Street et al 2024
- “Can Language Models Explain Their Own Classification Behavior?”, Sherburn et al 2024
- “The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions”, Wallace et al 2024
- “FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization”, Kim et al 2024
- “Vulnerability Detection With Code Language Models: How Far Are We?”, Ding et al 2024
- “The NSA Warns That US Adversaries Free to Mine Private Data May Have an AI Edge: Gilbert Herrera, Who Leads Research at the National Security Agency, Says Large Language Models Are Incredibly Useful—And a Bit of a Headache—For America’s Intelligence Machine”, Knight 2024
- “Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews”, Liang et al 2024
- “Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap”, Srivastava et al 2024
- “Tokenization Counts: the Impact of Tokenization on Arithmetic in Frontier LLMs”, Singh & Strouse 2024
-
“
ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024 - “Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models”, Lewis & Mitchell 2024
- “The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4”, Renze & Guven 2024
- “I Think, Therefore I Am: Benchmarking Awareness of Large Language Models Using AwareBench”, Li et al 2024
- “Does Using ChatGPT Result in Human Cognitive Augmentation?”, Fulbright & Morrison 2024
- “A Vision Check-Up for Language Models”, Sharma et al 2024
- “Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach”, Ma et al 2023
- “TinyGSM: Achieving >80% on GSM8k With Small Language Models”, Liu et al 2023
- “Universal Self-Consistency for Large Language Model Generation”, Chen et al 2023
- “PEARL: Personalizing Large Language Model Writing Assistants With Generation-Calibrated Retrievers”, Mysore et al 2023
- “Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations”, Hong et al 2023
- “InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews”, Wang et al 2023
- “Data Contamination Through the Lens of Time”, Roberts et al 2023
- “Can GPT Models Be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on Mock CFA Exams”, Callanan et al 2023
- “Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
- “GeoLLM: Extracting Geospatial Knowledge from Large Language Models”, Manvi et al 2023
- “Can a Computer Outfake a Human [Personality]?”, Phillips & Robie 2023
- “Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”, Zhou et al 2023
- “Using Large Language Models for Qualitative Analysis Can Introduce Serious Bias”, Ashwin et al 2023
- “MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book”, Tanzer et al 2023
- “Embers of Autoregression: Understanding Large Language Models Through the Problem They Are Trained to Solve”, McCoy et al 2023
- “The Cambridge Law Corpus: A Corpus for Legal AI Research”, Östling et al 2023
- “Assessing the Nature of Large Language Models: A Caution against Anthropocentrism”, Speed 2023
- “A Boy Saw 17 Doctors over 3 Years for Chronic Pain. ChatGPT Found the Diagnosis”, Holohan 2023
- “Taken out of Context: On Measuring Situational Awareness in LLMs”, Berglund et al 2023
- “Investigating the Existence of ‘Secret Language’ in Language Models”, Wang et al 2023
- “Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow”, Rio-Chanona et al 2023
- “Machine-Assisted Social Psychology Hypothesis Generation”, Banker et al 2023
- “Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events”, Gu et al 2023
- “Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration”, Wang et al 2023
- “Explaining Competitive-Level Programming Solutions Using LLMs”, Li et al 2023
- “Lost in the Middle: How Language Models Use Long Contexts”, Liu et al 2023
- “Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models”, O’Gara 2023
- “Language Models Are Weak Learners”, Manikandan et al 2023
- “Understanding Social Reasoning in Language Models With Language Models”, Gandhi et al 2023
- “Evaluating Superhuman Models With Consistency Checks”, Fluri et al 2023
- “Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks”, Veselovsky et al 2023
- “Can Large Language Models Democratize Access to Dual-Use Biotechnology?”, Soice et al 2023
- “Iterative Translation Refinement With Large Language Models”, Chen et al 2023
- “Don’t Want Students to Rely on ChatGPT? Have Them Use It: It’s Easy to Forget How Little Students and Educators Understand Generative AI’s Flaws. Once They Actually Try It Out, They’ll See That It Can’t Replace Them”, Howell 2023
- “The Exciting Potential for ChatGPT in Obstetrics and Gynecology”, Grünebaum et al 2023
- “Do GPTs Produce Less Literal Translations?”, Raunak et al 2023
- “The False Promise of Imitating Proprietary LLMs”, Gudibande et al 2023
- “Learning to Generate Novel Scientific Directions With Contextualized Literature-Based Discovery”, Wang et al 2023
- “How Language Model Hallucinations Can Snowball”, Zhang et al 2023
- “LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”, Wu et al 2023
- “Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition”, Muffo et al 2023
- “Generative AI at Work”, Brynjolfsson et al 2023
- “Humans in Humans Out: On GPT Converging Toward Common Sense in Both Success and Failure”, Koralus & Wang-Maścianica 2023
- “Language Models Can Solve Computer Tasks”, Kim et al 2023
- “Performance of ChatGPT on Free-Response, Clinical Reasoning Exams”, Strong et al 2023
- “How Well Do Large Language Models Perform in Arithmetic Tasks?”, Yuan et al 2023
- “Larger Language Models Do In-Context Learning Differently”, Wei et al 2023
- “Is ChatGPT a General-Purpose Natural Language Processing Task Solver?”, Qin et al 2023
- “Predicting Consumer Contracts [With GPT-3]”, Kolt 2023
- “Use GPT-3 Incorrectly: Reduce Costs 40× and Increase Speed by 5×”, Pullen 2023
- “A Judge Just Used ChatGPT to Make a Court Decision: The Case Is the First Time a Court Has Admitted to Using the AI Text Generator’s Answers in a Legal Ruling”, Rose 2023
- “Co-Writing With Opinionated Language Models Affects Users’ Views”, Jakesch et al 2023
- “The inside Story of ChatGPT: How OpenAI Founder Sam Altman Built the World’s Hottest Technology With Billions from Microsoft”, Kahn 2023
- “How Close Is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection”, Guo et al 2023
- “Can GPT-3 Produce New Ideas? Partially Automating Robin Hanson and Others § If You Never Miss a Plane…”, Sempere 2023
- “How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment”, Gilson et al 2023
- “GPT-3 Takes the Bar Exam”, II & Katz 2022
- “Precise Zero-Shot Dense Retrieval without Relevance Labels”, Gao et al 2022
- “Self-Instruct: Aligning Language Models With Self-Generated Instructions”, Wang et al 2022
- “Emergent Analogical Reasoning in Large Language Models”, Webb et al 2022
- “Harvey, Which Uses AI to Answer Legal Questions, Lands Cash from OpenAI”, Wiggers 2022
- “LMentry: A Language Model Benchmark of Elementary Language Tasks”, Efrat et al 2022
- “Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, Press et al 2022
- “How Persuasive Is AI-Generated Argumentation? An Analysis of the Quality of an Argumentative Text Produced by the GPT-3 AI Text Generator”, Hinton & Wagemans 2022
- “Out of One, Many: Using Language Models to Simulate Human Samples”, Argyle et al 2022
- “What Does a Platypus Look Like? Generating Customized Prompts for Zero-Shot Image Classification (CuPL)”, Pratt et al 2022
- “Using Large Language Models to Simulate Multiple Humans”, Aher et al 2022
- “Limitations of Language Models in Arithmetic and Symbolic Induction”, Qian et al 2022
- “RealTime QA: What’s the Answer Right Now?”, Kasai et al 2022
- “GODEL: Large-Scale Pre-Training for Goal-Directed Dialog”, Peng et al 2022
- “Can GPT-3 Write an Academic Paper on Itself, With Minimal Human Input?”, GPT-3 et al 2022 (page 2)
- “NaturalProver: Grounded Mathematical Proof Generation With Language Models”, Welleck et al 2022
- “OPT: Open Pre-Trained Transformer Language Models”, Zhang et al 2022
- “InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, Ouyang et al 2022
- “Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?”, Min et al 2022
- “Impact of Pretraining Term Frequencies on Few-Shot Reasoning”, Razeghi et al 2022
- “Contracts in the Age of Smart Readers”, Arbel & Becher 2022
- “Memory-Assisted Prompt Editing to Improve GPT-3 After Deployment”, Madaan et al 2022
- “CommonsenseQA 2.0: Exposing the Limits of AI through Gamification”, Talmor et al 2022
- “Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution”, Tu et al 2022
- “What Can a Generative Language Model Answer About a Passage?”, Summers-Stay et al 2021
- “Process for Adapting Language Models to Society (PALMS) With Values-Targeted Datasets”, Solaiman & Dennison 2021
- “Scaling Laws for Autoregressive Generative Modeling”, Henighan et al 2020
- “GPT-3: Its Nature, Scope, Limits, and Consequences”, Floridi & Chiriatti 2020
- “MMLU: Measuring Massive Multitask Language Understanding”, Hendrycks et al 2020
- “GPT-3: Language Models Are Few-Shot Learners”, Brown et al 2020
- “Extrapolating to Unnatural Language Processing With GPT-3’s In-Context Learning: The Good, the Bad, and the Mysterious”
- “Fine-Tuning Is Not Sufficient for Capability Elicitation”
- “Connecting the Dots: LLMs Can Infer & Verbalize Latent Structure from Training Data”
- “Reward Hacking Behavior Can Generalize across Tasks”
- “Who Models the Models That Model Models? An Exploration of GPT-3’s In-Context Model Fitting Ability”
- “GPT-3 Catching Fish in Morse Code”
- “A Robot Wrote This Entire Article. Are You Scared Yet, Human? We Asked GPT-3, OpenAI’s Powerful New Language Generator, to Write an Essay for Us from Scratch. The Assignment? To Convince Us Robots Come in Peace | For More about GPT-3 and How This Essay Was Written and Edited, Please Read Our Editor’s Note Below”
- MelMitchell1
- SRajdev
- bucketofkets
- hamandcheese
- sakun135
- spolu
- Sort By Magic
- Miscellaneous
- Bibliography
See Also
Links
“Can LLMs Be Scammed? A Baseline Measurement Study”, Sehwag et al 2024
“The Rise of AI-Generated Content in Wikipedia”, Brooks et al 2024
“On Scalable Oversight With Weak LLMs Judging Strong LLMs”, Kenton et al 2024
“APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets”, Liu et al 2024
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
“Connecting the Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data”, Treutlein et al 2024
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
“Designing a Dashboard for Transparency and Control of Conversational AI”, Chen et al 2024
Designing a Dashboard for Transparency and Control of Conversational AI
“Delving into ChatGPT Usage in Academic Writing through Excess Vocabulary”, Kobak et al 2024
Delving into ChatGPT usage in academic writing through excess vocabulary
“Do Teachers Spot AI? Evaluating the Detectability of AI-Generated Texts among Student Essays”, Fleckenstein et al 2024
Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays
“LLMs Achieve Adult Human Performance on Higher-Order Theory of Mind Tasks”, Street et al 2024
LLMs achieve adult human performance on higher-order theory of mind tasks
“Can Language Models Explain Their Own Classification Behavior?”, Sherburn et al 2024
Can Language Models Explain Their Own Classification Behavior?
“The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions”, Wallace et al 2024
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
“FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization”, Kim et al 2024
FABLES: Evaluating faithfulness and content selection in book-length summarization
“Vulnerability Detection With Code Language Models: How Far Are We?”, Ding et al 2024
Vulnerability Detection with Code Language Models: How Far Are We?
“The NSA Warns That US Adversaries Free to Mine Private Data May Have an AI Edge: Gilbert Herrera, Who Leads Research at the National Security Agency, Says Large Language Models Are Incredibly Useful—And a Bit of a Headache—For America’s Intelligence Machine”, Knight 2024
“Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews”, Liang et al 2024
“Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap”, Srivastava et al 2024
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap
“Tokenization Counts: the Impact of Tokenization on Arithmetic in Frontier LLMs”, Singh & Strouse 2024
Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs
“ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024
ArtPrompt
: ASCII Art-based Jailbreak Attacks against Aligned LLMs
“Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models”, Lewis & Mitchell 2024
“The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4”, Renze & Guven 2024
The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4
“I Think, Therefore I Am: Benchmarking Awareness of Large Language Models Using AwareBench”, Li et al 2024
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench
“Does Using ChatGPT Result in Human Cognitive Augmentation?”, Fulbright & Morrison 2024
“Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach”, Ma et al 2023
Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach
“TinyGSM: Achieving >80% on GSM8k With Small Language Models”, Liu et al 2023
“Universal Self-Consistency for Large Language Model Generation”, Chen et al 2023
Universal Self-Consistency for Large Language Model Generation
“PEARL: Personalizing Large Language Model Writing Assistants With Generation-Calibrated Retrievers”, Mysore et al 2023
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers
“Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations”, Hong et al 2023
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
“InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews”, Wang et al 2023
InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews
“Data Contamination Through the Lens of Time”, Roberts et al 2023
“Can GPT Models Be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on Mock CFA Exams”, Callanan et al 2023
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams
“Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
“GeoLLM: Extracting Geospatial Knowledge from Large Language Models”, Manvi et al 2023
GeoLLM: Extracting Geospatial Knowledge from Large Language Models
“Can a Computer Outfake a Human [Personality]?”, Phillips & Robie 2023
“Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”, Zhou et al 2023
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
“Using Large Language Models for Qualitative Analysis Can Introduce Serious Bias”, Ashwin et al 2023
Using Large Language Models for Qualitative Analysis Can Introduce Serious Bias
“MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book”, Tanzer et al 2023
MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book
“Embers of Autoregression: Understanding Large Language Models Through the Problem They Are Trained to Solve”, McCoy et al 2023
“The Cambridge Law Corpus: A Corpus for Legal AI Research”, Östling et al 2023
“Assessing the Nature of Large Language Models: A Caution against Anthropocentrism”, Speed 2023
Assessing the nature of large language models: A caution against anthropocentrism
“A Boy Saw 17 Doctors over 3 Years for Chronic Pain. ChatGPT Found the Diagnosis”, Holohan 2023
A boy saw 17 doctors over 3 years for chronic pain. ChatGPT found the diagnosis
“Taken out of Context: On Measuring Situational Awareness in LLMs”, Berglund et al 2023
Taken out of context: On measuring situational awareness in LLMs
“Investigating the Existence of ‘Secret Language’ in Language Models”, Wang et al 2023
Investigating the Existence of ‘Secret Language’ in Language Models
“Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow”, Rio-Chanona et al 2023
Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow
“Machine-Assisted Social Psychology Hypothesis Generation”, Banker et al 2023
“Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events”, Gu et al 2023
“Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration”, Wang et al 2023
“Explaining Competitive-Level Programming Solutions Using LLMs”, Li et al 2023
Explaining Competitive-Level Programming Solutions using LLMs
“Lost in the Middle: How Language Models Use Long Contexts”, Liu et al 2023
“Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models”, O’Gara 2023
Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models
“Language Models Are Weak Learners”, Manikandan et al 2023
“Understanding Social Reasoning in Language Models With Language Models”, Gandhi et al 2023
Understanding Social Reasoning in Language Models with Language Models
“Evaluating Superhuman Models With Consistency Checks”, Fluri et al 2023
“Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks”, Veselovsky et al 2023
“Can Large Language Models Democratize Access to Dual-Use Biotechnology?”, Soice et al 2023
Can large language models democratize access to dual-use biotechnology?
“Iterative Translation Refinement With Large Language Models”, Chen et al 2023
“Don’t Want Students to Rely on ChatGPT? Have Them Use It: It’s Easy to Forget How Little Students and Educators Understand Generative AI’s Flaws. Once They Actually Try It Out, They’ll See That It Can’t Replace Them”, Howell 2023
“The Exciting Potential for ChatGPT in Obstetrics and Gynecology”, Grünebaum et al 2023
The exciting potential for ChatGPT in obstetrics and gynecology:
“Do GPTs Produce Less Literal Translations?”, Raunak et al 2023
“The False Promise of Imitating Proprietary LLMs”, Gudibande et al 2023
“Learning to Generate Novel Scientific Directions With Contextualized Literature-Based Discovery”, Wang et al 2023
Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery
“How Language Model Hallucinations Can Snowball”, Zhang et al 2023
“LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”, Wu et al 2023
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
“Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition”, Muffo et al 2023
Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition
“Generative AI at Work”, Brynjolfsson et al 2023
“Humans in Humans Out: On GPT Converging Toward Common Sense in Both Success and Failure”, Koralus & Wang-Maścianica 2023
Humans in Humans Out: On GPT Converging Toward Common Sense in both Success and Failure
“Language Models Can Solve Computer Tasks”, Kim et al 2023
“Performance of ChatGPT on Free-Response, Clinical Reasoning Exams”, Strong et al 2023
Performance of ChatGPT on free-response, clinical reasoning exams
“How Well Do Large Language Models Perform in Arithmetic Tasks?”, Yuan et al 2023
How well do Large Language Models perform in Arithmetic tasks?
“Larger Language Models Do In-Context Learning Differently”, Wei et al 2023
“Is ChatGPT a General-Purpose Natural Language Processing Task Solver?”, Qin et al 2023
Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
“Predicting Consumer Contracts [With GPT-3]”, Kolt 2023
“Use GPT-3 Incorrectly: Reduce Costs 40× and Increase Speed by 5×”, Pullen 2023
Use GPT-3 incorrectly: reduce costs 40× and increase speed by 5×
“A Judge Just Used ChatGPT to Make a Court Decision: The Case Is the First Time a Court Has Admitted to Using the AI Text Generator’s Answers in a Legal Ruling”, Rose 2023
“Co-Writing With Opinionated Language Models Affects Users’ Views”, Jakesch et al 2023
Co-Writing with Opinionated Language Models Affects Users’ Views
“The inside Story of ChatGPT: How OpenAI Founder Sam Altman Built the World’s Hottest Technology With Billions from Microsoft”, Kahn 2023
“How Close Is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection”, Guo et al 2023
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
“Can GPT-3 Produce New Ideas? Partially Automating Robin Hanson and Others § If You Never Miss a Plane…”, Sempere 2023
“How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment”, Gilson et al 2023
“GPT-3 Takes the Bar Exam”, II & Katz 2022
“Precise Zero-Shot Dense Retrieval without Relevance Labels”, Gao et al 2022
“Self-Instruct: Aligning Language Models With Self-Generated Instructions”, Wang et al 2022
Self-Instruct: Aligning Language Models with Self-Generated Instructions
“Emergent Analogical Reasoning in Large Language Models”, Webb et al 2022
“Harvey, Which Uses AI to Answer Legal Questions, Lands Cash from OpenAI”, Wiggers 2022
Harvey, which uses AI to answer legal questions, lands cash from OpenAI
“LMentry: A Language Model Benchmark of Elementary Language Tasks”, Efrat et al 2022
LMentry: A Language Model Benchmark of Elementary Language Tasks
“Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, Press et al 2022
Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)
“How Persuasive Is AI-Generated Argumentation? An Analysis of the Quality of an Argumentative Text Produced by the GPT-3 AI Text Generator”, Hinton & Wagemans 2022
“Out of One, Many: Using Language Models to Simulate Human Samples”, Argyle et al 2022
Out of One, Many: Using Language Models to Simulate Human Samples
“What Does a Platypus Look Like? Generating Customized Prompts for Zero-Shot Image Classification (CuPL)”, Pratt et al 2022
“Using Large Language Models to Simulate Multiple Humans”, Aher et al 2022
“Limitations of Language Models in Arithmetic and Symbolic Induction”, Qian et al 2022
Limitations of Language Models in Arithmetic and Symbolic Induction
“RealTime QA: What’s the Answer Right Now?”, Kasai et al 2022
“GODEL: Large-Scale Pre-Training for Goal-Directed Dialog”, Peng et al 2022
“Can GPT-3 Write an Academic Paper on Itself, With Minimal Human Input?”, GPT-3 et al 2022 (page 2)
Can GPT-3 write an academic paper on itself, with minimal human input?
“NaturalProver: Grounded Mathematical Proof Generation With Language Models”, Welleck et al 2022
NaturalProver: Grounded Mathematical Proof Generation with Language Models
“OPT: Open Pre-Trained Transformer Language Models”, Zhang et al 2022
“InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, Ouyang et al 2022
InstructGPT: Training language models to follow instructions with human feedback
“Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?”, Min et al 2022
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
“Impact of Pretraining Term Frequencies on Few-Shot Reasoning”, Razeghi et al 2022
Impact of Pretraining Term Frequencies on Few-Shot Reasoning
“Contracts in the Age of Smart Readers”, Arbel & Becher 2022
“Memory-Assisted Prompt Editing to Improve GPT-3 After Deployment”, Madaan et al 2022
Memory-assisted prompt editing to improve GPT-3 after deployment
“CommonsenseQA 2.0: Exposing the Limits of AI through Gamification”, Talmor et al 2022
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
“Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution”, Tu et al 2022
Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution
“What Can a Generative Language Model Answer About a Passage?”, Summers-Stay et al 2021
What Can a Generative Language Model Answer About a Passage?
“Process for Adapting Language Models to Society (PALMS) With Values-Targeted Datasets”, Solaiman & Dennison 2021
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
“Scaling Laws for Autoregressive Generative Modeling”, Henighan et al 2020
“GPT-3: Its Nature, Scope, Limits, and Consequences”, Floridi & Chiriatti 2020
“MMLU: Measuring Massive Multitask Language Understanding”, Hendrycks et al 2020
“GPT-3: Language Models Are Few-Shot Learners”, Brown et al 2020
“Extrapolating to Unnatural Language Processing With GPT-3’s In-Context Learning: The Good, the Bad, and the Mysterious”
“Fine-Tuning Is Not Sufficient for Capability Elicitation”
“Connecting the Dots: LLMs Can Infer & Verbalize Latent Structure from Training Data”
Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
“Reward Hacking Behavior Can Generalize across Tasks”
“Who Models the Models That Model Models? An Exploration of GPT-3’s In-Context Model Fitting Ability”
Who models the models that model models? An exploration of GPT-3’s in-context model fitting ability
“GPT-3 Catching Fish in Morse Code”
“A Robot Wrote This Entire Article. Are You Scared Yet, Human? We Asked GPT-3, OpenAI’s Powerful New Language Generator, to Write an Essay for Us from Scratch. The Assignment? To Convince Us Robots Come in Peace | For More about GPT-3 and How This Essay Was Written and Edited, Please Read Our Editor’s Note Below”
MelMitchell1
SRajdev
bucketofkets
hamandcheese
sakun135
spolu
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
evaluation
literature-discovery
ai-legal-assessment ai-education-evaluation gpt-contract-analysis medical-ai-application legal-ai-usage
argumentation-evaluation
Miscellaneous
-
https://automated.beehiiv.com/p/aiimmunity-challenge-lessons-clinical-research-exam
: -
https://chat.openai.com/share/25124525-0bad-4c13-ae5a-ae4beac60360
: -
https://davidabell.substack.com/p/playing-around-with-machine-translation
-
https://dropbox.tech/machine-learning/prompt-injection-with-control-characters-openai-chatgpt-llm
-
https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2812620
-
https://jxnl.github.io/instructor/blog/2023/11/05/chain-of-density/
: -
https://openai.com/blog/function-calling-and-other-api-updates#function-calling
-
https://restofworld.org/2023/ai-revolution-outsourced-workers/
-
https://www.ft.com/content/9aeb482d-f781-45c0-896f-38fdcc912139
:View External Link:
https://www.ft.com/content/9aeb482d-f781-45c0-896f-38fdcc912139
-
https://www.getlibretto.com/blog/does-it-matter-which-examples-you-choose-for-few-shot-prompting
: -
https://www.integrity-research.com/ai-fails-insider-trading-test/
: -
https://www.lesswrong.com/posts/qbbaF79uJqvmWZELv/real-life-sort-by-controversial
: -
https://www.nytimes.com/2023/06/08/business/khan-ai-gpt-tutoring-bot.html
-
https://www.nytimes.com/2023/12/13/technology/chatbot-cheating-schools-students.html
-
https://www.reddit.com/r/ChatGPT/comments/15et6f2/well_i_got_what_i_asked_for/
-
https://www.reddit.com/r/OpenAI/comments/xlvygv/artifical_intelligence_allows_me_to_get_straight/
: -
https://www.supersimple.io/blog/gpt-4-fine-tuning-early-access
-
https://www.vice.com/en/article/5d93p3/what-happens-when-you-ask-ai-to-control-your-life
-
https://www.wired.com/story/china-chatgpt-opportunists-grifters-hard-at-work/
:View External Link:
https://www.wired.com/story/china-chatgpt-opportunists-grifters-hard-at-work/
Bibliography
-
https://arxiv.org/abs/2410.13893
: “Can LLMs Be Scammed? A Baseline Measurement Study”, -
https://arxiv.org/abs/2406.18518#salesforce
: “APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets”, -
https://arxiv.org/abs/2406.07882
: “Designing a Dashboard for Transparency and Control of Conversational AI”, -
https://www.sciencedirect.com/science/article/pii/S2666920X24000109
: “Do Teachers Spot AI? Evaluating the Detectability of AI-Generated Texts among Student Essays”, -
https://arxiv.org/abs/2405.18870#google
: “LLMs Achieve Adult Human Performance on Higher-Order Theory of Mind Tasks”, -
https://arxiv.org/abs/2403.18624
: “Vulnerability Detection With Code Language Models: How Far Are We?”, -
https://www.wired.com/story/fast-forward-nsa-warns-us-adversaries-private-data-ai-edge/
: “The NSA Warns That US Adversaries Free to Mine Private Data May Have an AI Edge: Gilbert Herrera, Who Leads Research at the National Security Agency, Says Large Language Models Are Incredibly Useful—And a Bit of a Headache—For America’s Intelligence Machine”, -
https://arxiv.org/abs/2402.19450
: “Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap”, -
https://arxiv.org/abs/2402.14903
: “Tokenization Counts: the Impact of Tokenization on Arithmetic in Frontier LLMs”, -
https://arxiv.org/abs/2402.11753
: “ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, -
https://arxiv.org/abs/2310.08678
: “Can GPT Models Be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on Mock CFA Exams”, -
https://arxiv.org/abs/2310.06213
: “GeoLLM: Extracting Geospatial Knowledge from Large Language Models”, -
2023-phillips.pdf
: “Can a Computer Outfake a Human [Personality]?”, -
https://arxiv.org/abs/2310.04406
: “Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”, -
https://arxiv.org/abs/2309.12269
: “The Cambridge Law Corpus: A Corpus for Legal AI Research”, -
https://arxiv.org/abs/2309.00667
: “Taken out of Context: On Measuring Situational Awareness in LLMs”, -
2024-banker.pdf
: “Machine-Assisted Social Psychology Hypothesis Generation”, -
https://arxiv.org/abs/2307.06439#microsoft
: “Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events”, -
https://arxiv.org/abs/2307.05300#microsoft
: “Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration”, -
https://arxiv.org/abs/2308.01404
: “Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models”, -
https://arxiv.org/abs/2306.15448
: “Understanding Social Reasoning in Language Models With Language Models”, -
https://arxiv.org/abs/2305.15717
: “The False Promise of Imitating Proprietary LLMs”, -
https://arxiv.org/abs/2305.13534
: “How Language Model Hallucinations Can Snowball”, -
https://www.medrxiv.org/content/10.1101/2023.03.24.23287731.full
: “Performance of ChatGPT on Free-Response, Clinical Reasoning Exams”, -
https://arxiv.org/abs/2304.02015#alibaba
: “How Well Do Large Language Models Perform in Arithmetic Tasks?”, -
https://arxiv.org/abs/2303.03846#google
: “Larger Language Models Do In-Context Learning Differently”, -
https://arxiv.org/abs/2302.06476
: “Is ChatGPT a General-Purpose Natural Language Processing Task Solver?”, -
2022-kolt.pdf
: “Predicting Consumer Contracts [With GPT-3]”, -
https://www.vice.com/en/article/k7bdmv/judge-used-chatgpt-to-make-court-decision
: “A Judge Just Used ChatGPT to Make a Court Decision: The Case Is the First Time a Court Has Admitted to Using the AI Text Generator’s Answers in a Legal Ruling”, -
https://arxiv.org/abs/2302.00560
: “Co-Writing With Opinionated Language Models Affects Users’ Views”, -
https://nunosempere.com/blog/2023/01/11/can-gpt-produce-ideas/#if-you-never-miss-a-plane
: “Can GPT-3 Produce New Ideas? Partially Automating Robin Hanson and Others § If You Never Miss a Plane…”, -
https://mededu.jmir.org/2023/1/e45312/
: “How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment”, -
https://arxiv.org/abs/2212.14402
: “GPT-3 Takes the Bar Exam”, -
https://arxiv.org/abs/2212.10496
: “Precise Zero-Shot Dense Retrieval without Relevance Labels”, -
https://arxiv.org/abs/2212.10560
: “Self-Instruct: Aligning Language Models With Self-Generated Instructions”, -
https://techcrunch.com/2022/11/23/harvey-which-uses-ai-to-answer-legal-questions-lands-cash-from-openai/
: “Harvey, Which Uses AI to Answer Legal Questions, Lands Cash from OpenAI”, -
https://arxiv.org/abs/2210.03350#allen
: “Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, -
https://content.iospress.com/articles/argument-and-computation/aac210026
: “How Persuasive Is AI-Generated Argumentation? An Analysis of the Quality of an Argumentative Text Produced by the GPT-3 AI Text Generator”, -
https://arxiv.org/abs/2209.03320
: “What Does a Platypus Look Like? Generating Customized Prompts for Zero-Shot Image Classification (CuPL)”, -
2022-gpt3.pdf#page=2
: “Can GPT-3 Write an Academic Paper on Itself, With Minimal Human Input?”, -
https://arxiv.org/abs/2205.12910#allen
: “NaturalProver: Grounded Mathematical Proof Generation With Language Models”, -
https://arxiv.org/abs/2202.12837#facebook
: “Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?”, -
https://arxiv.org/abs/2201.05320#allen
: “CommonsenseQA 2.0: Exposing the Limits of AI through Gamification”, -
2022-tu.pdf
: “Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution”, -
https://aclanthology.org/2021.mrqa-1.7.pdf
: “What Can a Generative Language Model Answer About a Passage?”, -
https://arxiv.org/abs/2010.14701#openai
: “Scaling Laws for Autoregressive Generative Modeling”, -
https://arxiv.org/abs/2009.03300
: “MMLU: Measuring Massive Multitask Language Understanding”,