- See Also
-
Links
- “OmegaPRM: Improve Mathematical Reasoning in Language Models by Automated Process Supervision”, Luo et al 2024
- “To Believe or Not to Believe Your LLM”, Yadkori et al 2024
- “Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-Modal LLMs in Video Analysis”, Fu et al 2024
- “LLMs Achieve Adult Human Performance on Higher-Order Theory of Mind Tasks”, Street et al 2024
- “VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, Liu et al 2024
-
“
ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024 - “Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
- “HyperAttention: Long-Context Attention in Near-Linear Time”, Han et al 2023
- “FreshLLMs: Refreshing Large Language Models With Search Engine Augmentation”, Vu et al 2023
- “How Robust Is Google’s Bard to Adversarial Image Attacks?”, Dong et al 2023
- “Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models”, Heiding et al 2023
- “CausalLM Is Not Optimal for In-Context Learning”, Ding et al 2023
- “Simple Synthetic Data Reduces Sycophancy in Large Language Models”, Wei et al 2023
- “Large Language Models Are Few-Shot Health Learners”, Liu et al 2023
- “SeeGULL: A Stereotype Benchmark With Broad Geo-Cultural Coverage Leveraging Generative Models”, Jha et al 2023
- “Q2d: Turning Questions into Dialogs to Teach Models How to Search”, Bitton et al 2023
- “Larger Language Models Do In-Context Learning Differently”, Wei et al 2023
- “Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models”, Aksitov et al 2023
- “Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation With Interaction”, Pilault et al 2023
- “Memory Augmented Large Language Models Are Computationally Universal”, Schuurmans 2023
- “Med-PaLM: Large Language Models Encode Clinical Knowledge”, Singhal et al 2022
- “Character-Aware Models Improve Visual Text Rendering”, Liu et al 2022
- “Efficiently Scaling Transformer Inference”, Pope et al 2022
- “U-PaLM: Transcending Scaling Laws With 0.1% Extra Compute”, Tay et al 2022
- “FLAN: Scaling Instruction-Finetuned Language Models”, Chung et al 2022
- “Large Language Models Can Self-Improve”, Huang et al 2022
- “RARR: Attributed Text Generation via Post-Hoc Research and Revision”, Gao et al 2022
- “Challenging BIG-Bench Tasks (BBH) and Whether Chain-Of-Thought Can Solve Them”, Suzgun et al 2022
- “Language Models Are Multilingual Chain-Of-Thought Reasoners”, Shi et al 2022
- “ReAct: Synergizing Reasoning and Acting in Language Models”, Yao et al 2022
- “AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model”, Soltan et al 2022
- “Inner Monologue: Embodied Reasoning through Planning With Language Models”, Huang et al 2022
- “Solving Quantitative Reasoning Problems With Language Models”, Lewkowycz et al 2022
- “Least-To-Most Prompting Enables Complex Reasoning in Large Language Models”, Zhou et al 2022
- “Unifying Language Learning Paradigms”, Tay et al 2022
- “PaLM: Scaling Language Modeling With Pathways”, Chowdhery et al 2022
- “Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, Ahn et al 2022
- “Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance”, Chowdhery & Narang 2022
- “PaLM § Figure 19: [Explaining a Joke / Inference Chaining] Each ‘Input” Was Independently Prepended With the Same 2-Shot Exemplar Shown at the Top, and “Model Output’ Shows the Greedy Decoding Output of PaLM 540B. The Two Exemplar Jokes Are Known Jokes (explanations Written by Authors), While All Evaluated Jokes Were Written by the Authors. Of Course, These Jokes Do Share Abstract Premises With Existing Jokes (wordplay, Reliability, Humorous Analogies, Reversal-Of-Expectations). The Inference Chaining Examples Were Also Written by the Authors.”
- “AI Will Increase the Quantity—And Quality—Of Phishing Scams”
- Sort By Magic
- Miscellaneous
- Bibliography
See Also
Links
“OmegaPRM: Improve Mathematical Reasoning in Language Models by Automated Process Supervision”, Luo et al 2024
OmegaPRM: Improve Mathematical Reasoning in Language Models by Automated Process Supervision
“To Believe or Not to Believe Your LLM”, Yadkori et al 2024
“Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-Modal LLMs in Video Analysis”, Fu et al 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
“LLMs Achieve Adult Human Performance on Higher-Order Theory of Mind Tasks”, Street et al 2024
LLMs achieve adult human performance on higher-order theory of mind tasks
“VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, Liu et al 2024
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
“ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024
ArtPrompt
: ASCII Art-based Jailbreak Attacks against Aligned LLMs
“Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
“HyperAttention: Long-Context Attention in Near-Linear Time”, Han et al 2023
“FreshLLMs: Refreshing Large Language Models With Search Engine Augmentation”, Vu et al 2023
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
“How Robust Is Google’s Bard to Adversarial Image Attacks?”, Dong et al 2023
“Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models”, Heiding et al 2023
Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models
“CausalLM Is Not Optimal for In-Context Learning”, Ding et al 2023
“Simple Synthetic Data Reduces Sycophancy in Large Language Models”, Wei et al 2023
Simple synthetic data reduces sycophancy in large language models
“Large Language Models Are Few-Shot Health Learners”, Liu et al 2023
“SeeGULL: A Stereotype Benchmark With Broad Geo-Cultural Coverage Leveraging Generative Models”, Jha et al 2023
SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models
“Q2d: Turning Questions into Dialogs to Teach Models How to Search”, Bitton et al 2023
q2d: Turning Questions into Dialogs to Teach Models How to Search
“Larger Language Models Do In-Context Learning Differently”, Wei et al 2023
“Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models”, Aksitov et al 2023
Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models
“Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation With Interaction”, Pilault et al 2023
“Memory Augmented Large Language Models Are Computationally Universal”, Schuurmans 2023
Memory Augmented Large Language Models are Computationally Universal
“Med-PaLM: Large Language Models Encode Clinical Knowledge”, Singhal et al 2022
“Character-Aware Models Improve Visual Text Rendering”, Liu et al 2022
“Efficiently Scaling Transformer Inference”, Pope et al 2022
“U-PaLM: Transcending Scaling Laws With 0.1% Extra Compute”, Tay et al 2022
“FLAN: Scaling Instruction-Finetuned Language Models”, Chung et al 2022
“Large Language Models Can Self-Improve”, Huang et al 2022
“RARR: Attributed Text Generation via Post-Hoc Research and Revision”, Gao et al 2022
RARR: Attributed Text Generation via Post-hoc Research and Revision
“Challenging BIG-Bench Tasks (BBH) and Whether Chain-Of-Thought Can Solve Them”, Suzgun et al 2022
Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them
“Language Models Are Multilingual Chain-Of-Thought Reasoners”, Shi et al 2022
“ReAct: Synergizing Reasoning and Acting in Language Models”, Yao et al 2022
“AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model”, Soltan et al 2022
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
“Inner Monologue: Embodied Reasoning through Planning With Language Models”, Huang et al 2022
Inner Monologue: Embodied Reasoning through Planning with Language Models
“Solving Quantitative Reasoning Problems With Language Models”, Lewkowycz et al 2022
Solving Quantitative Reasoning Problems with Language Models
“Least-To-Most Prompting Enables Complex Reasoning in Large Language Models”, Zhou et al 2022
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
“Unifying Language Learning Paradigms”, Tay et al 2022
“PaLM: Scaling Language Modeling With Pathways”, Chowdhery et al 2022
“Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, Ahn et al 2022
Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances
“Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance”, Chowdhery & Narang 2022
Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance
“PaLM § Figure 19: [Explaining a Joke / Inference Chaining] Each ‘Input” Was Independently Prepended With the Same 2-Shot Exemplar Shown at the Top, and “Model Output’ Shows the Greedy Decoding Output of PaLM 540B. The Two Exemplar Jokes Are Known Jokes (explanations Written by Authors), While All Evaluated Jokes Were Written by the Authors. Of Course, These Jokes Do Share Abstract Premises With Existing Jokes (wordplay, Reliability, Humorous Analogies, Reversal-Of-Expectations). The Inference Chaining Examples Were Also Written by the Authors.”
“AI Will Increase the Quantity—And Quality—Of Phishing Scams”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
generative-models
reasoning-action
in-context-learning
prompt-engineering
robotics
Miscellaneous
-
/doc/ai/nn/transformer/gpt/palm/2022-ahn-figure2-saycanqueryinglanguagemodelforoptions.png
: -
https://every.to/chain-of-thought/i-spent-a-week-with-gemini-pro-1-5-it-s-fantastic
: -
https://minerva-demo.github.io/#category=Algebra&index=1
:View External Link:
-
https://old.reddit.com/r/singularity/comments/1atjz9v/ive_put_a_complex_codebase_into_a_single/
: -
https://research.google/blog/google-research-2022-beyond-language-vision-and-generative-models/
-
https://research.google/blog/minerva-solving-quantitative-reasoning-problems-with-language-models/
-
https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/
-
https://thezvi.wordpress.com/2023/08/31/ai-27-portents-of-gemini/
: -
https://thezvi.wordpress.com/2024/02/27/the-gemini-incident-continues/
-
https://thezvi.wordpress.com/2024/05/31/the-gemini-1-5-report/
: -
https://www.dwarkeshpatel.com/p/demis-hassabis#%C2%A7timestamps
-
https://www.freepatentsonline.com/y2024/0104353.html#deepmind
: -
https://www.lesswrong.com/posts/EHbJ69JDs4suovpLw/testing-palm-prompts-on-gpt3
:View External Link:
https://www.lesswrong.com/posts/EHbJ69JDs4suovpLw/testing-palm-prompts-on-gpt3
-
https://www.lesswrong.com/posts/YzbQeCiwoLBHrvAh4
:View External Link:
-
https://www.lesswrong.com/posts/mLuQfS7gmfr4nwTdv/google-s-new-540-billion-parameter-language-model
: -
https://www.reddit.com/r/GPT3/comments/twxtwg/how_gpt3_answers_the_google_pathway_sample/
: -
https://www.reddit.com/r/singularity/comments/1atjz9v/ive_put_a_complex_codebase_into_a_single/
-
https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini
-
https://www.theverge.com/2023/3/29/23662621/google-bard-chatgpt-sharegpt-training-denies
Bibliography
-
https://arxiv.org/abs/2405.18870#google
: “LLMs Achieve Adult Human Performance on Higher-Order Theory of Mind Tasks”, -
https://arxiv.org/abs/2404.05955
: “VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, -
https://arxiv.org/abs/2402.11753
: “ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, -
https://arxiv.org/abs/2310.03214#google
: “FreshLLMs: Refreshing Large Language Models With Search Engine Augmentation”, -
https://arxiv.org/abs/2309.11751
: “How Robust Is Google’s Bard to Adversarial Image Attacks?”, -
https://arxiv.org/abs/2308.12287
: “Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models”, -
https://arxiv.org/abs/2308.03958#deepmind
: “Simple Synthetic Data Reduces Sycophancy in Large Language Models”, -
https://arxiv.org/abs/2305.11840#google
: “SeeGULL: A Stereotype Benchmark With Broad Geo-Cultural Coverage Leveraging Generative Models”, -
https://arxiv.org/abs/2304.14318#google
: “Q2d: Turning Questions into Dialogs to Teach Models How to Search”, -
https://arxiv.org/abs/2303.03846#google
: “Larger Language Models Do In-Context Learning Differently”, -
https://arxiv.org/abs/2212.13138#google
: “Med-PaLM: Large Language Models Encode Clinical Knowledge”, -
https://arxiv.org/abs/2212.10562#google
: “Character-Aware Models Improve Visual Text Rendering”, -
https://arxiv.org/abs/2211.05102#google
: “Efficiently Scaling Transformer Inference”, -
https://arxiv.org/abs/2210.11399#google
: “U-PaLM: Transcending Scaling Laws With 0.1% Extra Compute”, -
https://arxiv.org/abs/2210.11416#google
: “FLAN: Scaling Instruction-Finetuned Language Models”, -
https://arxiv.org/abs/2210.11610#google
: “Large Language Models Can Self-Improve”, -
https://arxiv.org/abs/2210.08726#google
: “RARR: Attributed Text Generation via Post-Hoc Research and Revision”, -
https://arxiv.org/abs/2210.09261#google
: “Challenging BIG-Bench Tasks (BBH) and Whether Chain-Of-Thought Can Solve Them”, -
https://arxiv.org/abs/2210.03057#google
: “Language Models Are Multilingual Chain-Of-Thought Reasoners”, -
https://arxiv.org/abs/2208.01448#amazon
: “AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model”, -
https://arxiv.org/abs/2207.05608#google
: “Inner Monologue: Embodied Reasoning through Planning With Language Models”, -
https://arxiv.org/abs/2205.10625#google
: “Least-To-Most Prompting Enables Complex Reasoning in Large Language Models”, -
https://arxiv.org/abs/2205.05131#google
: “Unifying Language Learning Paradigms”, -
https://arxiv.org/abs/2204.02311#google
: “PaLM: Scaling Language Modeling With Pathways”, -
https://arxiv.org/abs/2204.01691#google
: “Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”,