‘AI scaling’ tag

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

Wikipedia

Miscellaneous

Bibliography

https://arxiv.org/abs/2410.18514: “Scaling up Masked Diffusion Models on Text”, Shen Nie, Fengqi Zhu, Chao Du, Tianyu Pang, Qian Liu, Guangtao Zeng, Min Lin, Chongxuan Li

link-bibliography
https://research.google/blog/taking-medical-imaging-embeddings-3d/: “CT Foundation: Taking Medical Imaging Embeddings 3D”, Atilla Kiraly, Madeleine Traverse

link-bibliography
https://arxiv.org/abs/2407.04108: “Future Events As Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs”, Sara Price, Arjun Panickssery, Samuel R. Bowman, Asa Cooper Stickland

link-bibliography
https://arxiv.org/abs/2406.19146: “Resolving Discrepancies in Compute-Optimal Scaling of Language Models”, Tomer Porian, Mitchell Wortsman, Jenia Jitsev, Ludwig Schmidt, Yair Carmon

link-bibliography
https://arxiv.org/abs/2406.13121#google: “Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?”, Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

link-bibliography
https://arxiv.org/abs/2406.11233: “Probing the Decision Boundaries of In-Context Learning in Large Language Models”, Siyan Zhao, Tung Nguyen, Aditya Grover

link-bibliography
https://www.biorxiv.org/content/10.1101/2024.06.06.597716.full: “Training Compute-Optimal Protein Language Models”, Xingyi Cheng, Bo Chen, Pan Li, Jing Gong, Jie Tang, Le Song

link-bibliography
https://arxiv.org/abs/2405.14930: “AstroPT: Scaling Large Observation Models for Astronomy”, Michael J. Smith, Ryan J. Roberts, Eirini Angeloudi, Marc Huertas-Company

link-bibliography
https://arxiv.org/abs/2405.00332#scale: “GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, Hugh Zhang, Jeff Da, Dean Lee, Vaughn Robinson, Catherine Wu, Will Song, Tiffany Zhao, Pranav Raja, Dylan Slack, Qin Lyu, Sean Hendryx, Russell Kaplan, Michele Lunati, Summer Yue

link-bibliography
https://lab42.global/community-interview-jack-cole/: “Test-Time Augmentation to Solve ARC”, Jack Cole

link-bibliography
https://arxiv.org/abs/2404.10102: “Chinchilla Scaling: A Replication Attempt”, Tamay Besiroglu, Ege Erdil, Matthew Barnett, Josh You

link-bibliography
https://arxiv.org/abs/2404.06664: “CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack Of) Multicultural Knowledge”, Yu Ying Chiu, Liwei Jiang, Maria Antoniak, Chan Young Park, Shuyue Stella Li, Mehar Bhatia, Sahithya Ravi, Yulia Tsvetkov, Vered Shwartz, Yejin Choi

link-bibliography
https://arxiv.org/abs/2404.02905#bytedance: “Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction”, Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, Liwei Wang

link-bibliography
https://arxiv.org/abs/2403.18802#deepmind: “Long-Form Factuality in Large Language Models”, Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

link-bibliography
https://arxiv.org/abs/2403.17844: “Mechanistic Design and Scaling of Hybrid Architectures”, Michael Poli, Armin W. Thomas, Eric Nguyen, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting, Taiji Suzuki, Brian Hie, Stefano Ermon, Christopher Ré, Ce Zhang, Stefano Massaroli

link-bibliography
https://www.wired.com/story/eight-google-employees-invented-modern-ai-transformers-paper/: “8 Google Employees Invented Modern AI. Here’s the Inside Story: They Met by Chance, Got Hooked on an Idea, and Wrote the Transformers Paper—The Most Consequential Tech Breakthrough in Recent History”, Steven Levy

link-bibliography
https://inflection.ai/inflection-2-5: “Inflection-2.5: Meet the World’s Best Personal AI”, Inflection

link-bibliography
https://arxiv.org/abs/2402.17152#facebook: “Actions Speak Louder Than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU)”, Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Michael He, Yinghai Lu, Yu Shi

link-bibliography
https://arxiv.org/abs/2402.17764: “The Era of 1-Bit LLMs: All Large Language Models Are in 1.58 Bits”, Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei

link-bibliography
https://arxiv.org/abs/2402.16671: “StructLM: Towards Building Generalist Models for Structured Knowledge Grounding”, Alex Zhuang, Ge Zhang, Tianyu Zheng, Xinrun Du, Junjie Wang, Weiming Ren, Stephen W. Huang, Jie Fu, Xiang Yue, Wenhu Chen

link-bibliography
https://arxiv.org/abs/2312.15770#alibaba: “TF-T2V: A Recipe for Scaling up Text-To-Video Generation With Text-Free Videos”, Xiang Wang, Shiwei Zhang, Hangjie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang

link-bibliography
https://arxiv.org/abs/2312.04927: “Zoology: Measuring and Improving Recall in Efficient Language Models”, Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Zou, Atri Rudra, Christopher Ré

link-bibliography
https://arxiv.org/abs/2312.03876: “Scaling Transformer Neural Networks for Skillful and Reliable Medium-Range Weather Forecasting”, Tung Nguyen, Rohan Shah, Hritik Bansal, Troy Arcomano, Sandeep Madireddy, Romit Maulik, Veerabhadra Kotamarthi, Ian Foster, Aditya Grover

link-bibliography
https://arxiv.org/abs/2312.00752: “Mamba: Linear-Time Sequence Modeling With Selective State Spaces”, Albert Gu, Tri Dao

link-bibliography
https://arxiv.org/abs/2311.15599#tencent: “UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition”, Xiaohan Ding, Yiyuan Zhang, Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, Ying Shan

link-bibliography
https://arxiv.org/abs/2311.04145#alibaba: “I2VGen-XL: High-Quality Image-To-Video Synthesis via Cascaded Diffusion Models”, Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, Jingren Zhou

link-bibliography
https://arxiv.org/abs/2310.16764#deepmind: “ConvNets Match Vision Transformers at Scale”, Samuel L. Smith, Andrew Brock, Leonard Berrada, Soham De

link-bibliography
https://arxiv.org/abs/2310.09199#google: “PaLI-3 Vision Language Models: Smaller, Faster, Stronger”, Xi Chen, Xiao Wang, Lucas Beyer, Alexander Kolesnikov, Jialin Wu, Paul Voigtlaender, Basil Mustafa, Sebastian Goodman, Ibrahim Alabdulmohsin, Piotr Padlewski, Daniel Salz, Xi Xiong, Daniel Vlasic, Filip Pavetic, Keran Rong, Tianli Yu, Daniel Keysers, Xiaohua Zhai, Radu Soricut

link-bibliography
https://arxiv.org/abs/2310.06213: “GeoLLM: Extracting Geospatial Knowledge from Large Language Models”, Rohin Manvi, Samar Khanna, Gengchen Mai, Marshall Burke, David Lobell, Stefano Ermon

link-bibliography
https://arxiv.org/abs/2310.06694: “Sheared LLaMA: Accelerating Language Model Pre-Training via Structured Pruning”, Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, Danqi Chen

link-bibliography
https://arxiv.org/abs/2310.03214#google: “FreshLLMs: Refreshing Large Language Models With Search Engine Augmentation”, Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, Thang Luong

link-bibliography
https://arxiv.org/abs/2310.02980: “Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors”, Ido Amos, Jonathan Berant, Ankit Gupta

link-bibliography
https://arxiv.org/abs/2309.00667: “Taken out of Context: On Measuring Situational Awareness in LLMs”, Lukas Berglund, Asa Cooper Stickland, Mikita Balesni, Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo, Owain Evans

link-bibliography
https://arxiv.org/abs/2308.11596#facebook: “SeamlessM4T: Massively Multilingual & Multimodal Machine Translation”, Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ Howes, Bernie Huang, Min-Jae Hwang, Hirofumi Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, Abinesh Ramakrishnan, Anna Sun, Kevin Tran, Tuan Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood, Yilin Yang, Bokai Yu, Pierre Andrews, Can Balioglu, Marta R. Costa-jussà, Onur Celebi, Maha Elbayad, Cynthia Gao, Francisco Guzmán, Justine Kao, Ann Lee, Alexandre Mourachko, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang

link-bibliography
https://arxiv.org/abs/2308.03958#deepmind: “Simple Synthetic Data Reduces Sycophancy in Large Language Models”, Jerry Wei, Da Huang, Yifeng Lu, Denny Zhou, Quoc V. Le

link-bibliography
https://arxiv.org/abs/2307.05300#microsoft: “Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration”, Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, Heng Ji

link-bibliography
https://openai.com/index/introducing-superalignment/: “Introducing Superalignment”, Jan Leike, Ilya Sutskever

link-bibliography
https://www.youtube.com/watch?v=lfXxzAVtdpU&t=1763s: “Gödel, Escher, Bach Author Douglas Hofstadter on the State of AI Today § What about AI Terrifies You?”, Douglas Hofstadter, Amy Jo Kim

link-bibliography
https://arxiv.org/abs/2306.13575: “Scaling MLPs: A Tale of Inductive Bias”, Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

link-bibliography
https://arxiv.org/abs/2306.15448: “Understanding Social Reasoning in Language Models With Language Models”, Kanishk Gandhi, Jan-Philipp Fränken, Tobias Gerstenberg, Noah D. Goodman

link-bibliography
https://arxiv.org/abs/2305.15717: “The False Promise of Imitating Proprietary LLMs”, Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song

link-bibliography
https://arxiv.org/abs/2305.11863: “Scaling Laws for Language Encoding Models in FMRI”, Richard Antonello, Aditya Vaidya, Alexander G. Huth

link-bibliography
https://www.cnbc.com/2023/05/16/googles-palm-2-uses-nearly-five-times-more-text-data-than-predecessor.html: “Google’s Newest AI Model Uses Nearly 5× More Text Data for Training Than Its Predecessor”, Jennifer Elias

link-bibliography
https://arxiv.org/abs/2305.07759#microsoft: “TinyStories: How Small Can Language Models Be and Still Speak Coherent English?”, Ronen Eldan, Yuanzhi Li

link-bibliography
https://arxiv.org/abs/2305.05665#facebook: “ImageBind: One Embedding Space To Bind Them All”, Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Arm Holdings, Joulin, Ishan Misra

link-bibliography
https://www.ft.com/content/f4f73815-6fc2-4016-bd97-4bace459e95e: “Google’s DeepMind-Brain Merger: Tech Giant Regroups for AI Battle”, Madhumita Murgia

link-bibliography
https://arxiv.org/abs/2304.07193#facebook: “DINOv2: Learning Robust Visual Features without Supervision”, Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Arm Holdings, Joulin, Piotr Bojanowski

link-bibliography
https://arxiv.org/abs/2303.15343#google: “Sigmoid Loss for Language Image Pre-Training”, Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer

link-bibliography
https://arxiv.org/abs/2304.02015#alibaba: “How Well Do Large Language Models Perform in Arithmetic Tasks?”, Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang

link-bibliography
https://jameswphillips.substack.com/p/securing-liberal-democratic-control: “Securing Liberal Democratic Control of AGI through UK Leadership”, James W. Phillips

link-bibliography
https://arxiv.org/abs/2303.05511#adobe: “GigaGAN: Scaling up GANs for Text-To-Image Synthesis”, Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, Taesung Park

link-bibliography
https://arxiv.org/abs/2302.05442#google: “Scaling Vision Transformers to 22 Billion Parameters”, Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby

link-bibliography
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4335945: “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, John Nay

link-bibliography
https://arxiv.org/abs/2301.09515#nvidia: “StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-To-Image Synthesis”, Axel Sauer, Tero Karras, Samuli Laine, Andreas Geiger, Timo Aila

link-bibliography
https://arxiv.org/abs/2301.07088#bytedance: “MUG: Vision Learners Meet Web Image-Text Pairs”, Bingchen Zhao, Quan Cui, Hao Wu, Osamu Yoshie, Cheng Yang

link-bibliography
https://arxiv.org/abs/2301.04408: “GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of AI CPA Capabilities”, Jillian Bommarito, Michael Bommarito, Daniel Martin Katz, Jessica Katz

link-bibliography
https://arxiv.org/abs/2301.03728#facebook: “Scaling Laws for Generative Mixed-Modal Language Models”, Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer

link-bibliography
https://arxiv.org/abs/2301.02111#microsoft: “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei

link-bibliography
https://arxiv.org/abs/2212.14402: “GPT-3 Takes the Bar Exam”, Michael Bommarito II, Daniel Martin Katz

link-bibliography
https://arxiv.org/abs/2212.14034: “Cramming: Training a Language Model on a Single GPU in One Day”, Jonas Geiping, Tom Goldstein

link-bibliography
https://arxiv.org/abs/2212.09741: “One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)”, Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah Smith, Luke Zettlemoyer, Tao Yu

link-bibliography
https://arxiv.org/abs/2212.07143: “Reproducible Scaling Laws for Contrastive Language-Image Learning”, Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, Jenia Jitsev

link-bibliography
https://arxiv.org/abs/2212.04979#google: “VideoCoCa: Video-Text Modeling With Zero-Shot Transfer from Contrastive Captioners”, Shen Yan, Tao Zhu, Zirui Wang, Yuan Cao, Mi Zhang, Soham Ghosh, Yonghui Wu, Jiahui Yu

link-bibliography
https://arxiv.org/abs/2212.05051: “VindLU: A Recipe for Effective Video-And-Language Pretraining”, Feng Cheng, Xizi Wang, Jie Lei, David Crandall, Mohit Bansal, Gedas Bertasius

link-bibliography
https://arxiv.org/abs/2212.04356#openai: “Whisper: Robust Speech Recognition via Large-Scale Weak Supervision”, Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever

link-bibliography
https://ai.facebook.com/blog/multiray-large-scale-AI-models/: “MultiRay: Optimizing Efficiency for Large-Scale AI Models”, Nikhil Gupta, Michael Gschwind, Don Husa, Christopher Dewan, Madian Khabsa

link-bibliography
https://arxiv.org/abs/2211.09085#facebook: “Galactica: A Large Language Model for Science”, Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, Robert Stojnic

link-bibliography
https://arxiv.org/abs/2211.08411: “Large Language Models Struggle to Learn Long-Tail Knowledge”, Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, Colin Raffel

link-bibliography
https://arxiv.org/abs/2211.07636#baai: “EVA: Exploring the Limits of Masked Visual Representation Learning at Scale”, Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao

link-bibliography
https://arxiv.org/abs/2211.00241: “Adversarial Policies Beat Superhuman Go AIs”, Tony T. Wang, Adam Gleave, Tom Tseng, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D. Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell

link-bibliography
https://www.youtube.com/watch?v=Q-TJFyUoenc&t=2444s: “Increments Podcast: #45—4 Central Fallacies of AI Research (with Melanie Mitchell)”, Melanie Mitchell, Benny Chugg

link-bibliography
https://arxiv.org/abs/2210.16859: “A Solvable Model of Neural Scaling Laws”, Alexander Maloney, Daniel A. Roberts, James Sully

link-bibliography
https://arxiv.org/abs/2210.13673#nvidia: “Evaluating Parameter Efficient Learning for Generation”, Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J. Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro

link-bibliography
https://arxiv.org/abs/2210.11416#google: “FLAN: Scaling Instruction-Finetuned Language Models”, Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei

link-bibliography
https://arxiv.org/abs/2210.10341#microsoft: “BioGPT: Generative Pre-Trained Transformer for Biomedical Text Generation and Mining”, Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, Tie-Yan Liu

link-bibliography
https://arxiv.org/abs/2210.06423#microsoft: “Foundation Transformers”, Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei

link-bibliography
https://arxiv.org/abs/2210.03350#allen: “Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis

link-bibliography
https://arxiv.org/abs/2210.02414#baai: “GLM-130B: An Open Bilingual Pre-Trained Model”, Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei Xue, Jidong Zhai, Wenguang Chen, Peng Zhang, Yuxiao Dong, Jie Tang

link-bibliography
https://arxiv.org/abs/2210.02441: “Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré

link-bibliography
https://arxiv.org/abs/2208.05516: “Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP”, Thao Nguyen, Gabriel Ilharco, Mitchell Wortsman, Sewoong Oh, Ludwig Schmidt

link-bibliography
https://arxiv.org/abs/2207.06991: “PIXEL: Language Modeling With Pixels”, Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott

link-bibliography
https://arxiv.org/abs/2207.05221#anthropic: “Language Models (Mostly) Know What They Know”, Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy L. Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Samuel R. Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec, Liane Lovitt, Kamal Ndousse, Catherine Olsson, Sam Ringer, Dario Amodei, Tom Brown, Jack Clark, Nicholas Joseph, Ben Mann, Sam McCandlish, Chris Olah, Jared Kaplan

link-bibliography
https://arxiv.org/abs/2206.15472: “On-Device Training Under 256KB Memory”, Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han

link-bibliography
https://arxiv.org/abs/2206.14486: “Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning”, Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S. Morcos

link-bibliography
https://arxiv.org/abs/2206.04658#nvidia: “BigVGAN: A Universal Neural Vocoder With Large-Scale Training”, Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon

link-bibliography
https://arxiv.org/abs/2206.01685: “Toward a Realistic Model of Speech Processing in the Brain With Self-Supervised Learning”, Juliette Millet, Charlotte Caucheteux, Pierre Orhan, Yves Boubenec, Alexandre Gramfort, Ewan Dunbar, Christophe Pallier, Jean-Remi King

link-bibliography
https://arxiv.org/abs/2205.14204#google: “M3AE: Multimodal Masked Autoencoders Learn Transferable Representations”, Xinyang Geng, Hao Liu, Lisa Lee, Dale Schuurams, Sergey Levine, Pieter Abbeel

link-bibliography
https://arxiv.org/abs/2205.10625#google: “Least-To-Most Prompting Enables Complex Reasoning in Large Language Models”, Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Olivier Bousquet, Quoc Le, Ed Chi

link-bibliography
https://arxiv.org/abs/2205.09073#google: “Dialog Inpainting: Turning Documents into Dialogues”, Zhuyun Dai, Arun Tejasvi Chaganty, Vincent Zhao, Aida Amini, Qazi Mamunur Rashid, Mike Green, Kelvin Guu

link-bibliography
https://arxiv.org/abs/2205.05131#google: “Unifying Language Learning Paradigms”, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler

link-bibliography
https://arxiv.org/abs/2205.03983#google: “Building Machine Translation Systems for the Next Thousand Languages”, Ankur Bapna, Isaac Caswell, Julia Kreutzer, Orhan Firat, Daan van Esch, Aditya Siddhant, Mengmeng Niu, Pallavi Baljekar, Xavier Garcia, Wolfgang Macherey, Theresa Breiner, Vera Axelrod, Jason Riesa, Yuan Cao, Mia Xu Chen, Klaus Macherey, Maxim Krikun, Pidong Wang, Alexander Gutkin, Apurva Shah, Yanping Huang, Zhifeng Chen, Yonghui Wu, Macduff Hughes

link-bibliography
https://arxiv.org/abs/2205.04596#google: “When Does Dough Become a Bagel? Analyzing the Remaining Mistakes on ImageNet”, Vijay Vasudevan, Benjamin Caine, Raphael Gontijo-Lopes, Sara Fridovich-Keil, Rebecca Roelofs

link-bibliography
https://arxiv.org/abs/2205.01917#google: “CoCa: Contrastive Captioners Are Image-Text Foundation Models”, Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, Yonghui Wu

link-bibliography
https://arxiv.org/abs/2205.01397: “Data Determines Distributional Robustness in Contrastive Language Image Pre-Training (CLIP)”, Alex Fang, Gabriel Ilharco, Mitchell Wortsman, Yuhao Wan, Vaishaal Shankar, Achal Dave, Ludwig Schmidt

link-bibliography
https://arxiv.org/abs/2204.14198#deepmind: “Flamingo: a Visual Language Model for Few-Shot Learning”, Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sah, Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, Karen Simonyan

link-bibliography
https://arxiv.org/abs/2204.10149: “WebFace260M: A Benchmark for Million-Scale Deep Face Recognition”, Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, Junjie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Dalong Du, Jiwen Lu, Jie Zhou

link-bibliography
https://www.lesswrong.com/posts/SbAgRYo8tkHwhd9Qx/deepmind-the-podcast-excerpts-on-agi: “DeepMind: The Podcast—Excerpts on AGI”, William Kiely

link-bibliography
https://arxiv.org/abs/2203.15556#deepmind: “Chinchilla: Training Compute-Optimal Large Language Models”, Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre

link-bibliography
https://arxiv.org/abs/2203.11171#google: “Self-Consistency Improves Chain-Of-Thought Reasoning in Language Models”, Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Denny Zhou

link-bibliography
https://arxiv.org/abs/2203.03466#microsoft: “Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer”, Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

link-bibliography
https://arxiv.org/abs/2203.00854: “FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours”, Shenggan Cheng, Ruidong Wu, Zhongming Yu, Binrui Li, Xiwen Zhang, Jian Peng, Yang You

link-bibliography
https://arxiv.org/abs/2202.12211#google: “Self-Distilled StyleGAN: Towards Generation from Internet Photos”, Ron Mokady, Michal Yarom, Omer Tov, Oran Lang, Daniel Cohen-Or, Tali Dekel, Michal Irani, Inbar Mosseri

link-bibliography
https://www.nature.com/articles/s42003-022-03036-1: “Brains and Algorithms Partially Converge in Natural Language Processing”, Charlotte Caucheteux, Jean-Rémi King

link-bibliography
https://arxiv.org/abs/2202.06767#huawei: “Wukong: 100 Million Large-Scale Chinese Cross-Modal Pre-Training Dataset and A Foundation Framework”, Jiaxi Gu, Xiaojun Meng, Guansong Lu, Lu Hou, Minzhe Niu, Hang Xu, Xiaodan Liang, Wei Zhang, Xin Jiang, Chunjing Xu

link-bibliography
https://arxiv.org/abs/2202.03052#alibaba: “OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-To-Sequence Learning Framework”, Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, Hongxia Yang

link-bibliography
https://arxiv.org/abs/2202.02317#allen: “Webly Supervised Concept Expansion for General Purpose Vision Models”, Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi

link-bibliography
https://arxiv.org/abs/2202.00273: “StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets”, Axel Sauer, Katja Schwarz, Andreas Geiger

link-bibliography
https://arxiv.org/abs/2201.11990#microsoftnvidia: “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro

link-bibliography
https://arxiv.org/abs/2201.11473#microsoft: “Reasoning Like Program Executors”, Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Yan Gao, Qiang Fu, Jian-Guang Lou, Weizhu Chen

link-bibliography
https://arxiv.org/abs/2201.10005#openai: “Text and Code Embeddings by Contrastive Pre-Training”, Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, Peter Welinder, Lilian Weng

link-bibliography
https://arxiv.org/abs/2201.08371#facebook: “SWAG: Revisiting Weakly Supervised Pre-Training of Visual Perception Models”, Mannat Singh, Laura Gustafson, Aaron Adcock, Vinicius de Freitas Reis, Bugra Gedik, Raj Prateek Kosaraju, Dhruv Mahajan, Ross Girshick, Piotr Dollár, Laurens van der Maaten

link-bibliography
https://arxiv.org/abs/2201.07520#facebook: “CM3: A Causal Masked Multimodal Model of the Internet”, Armen Aghajanyan, Bernie Huang, Candace Ross, Vladimir Karpukhin, Hu Xu, Naman Goyal, Dmytro Okhonko, Mandar Joshi, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer

link-bibliography
https://arxiv.org/abs/2201.06910: “ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization”, Hanwei Xu, Yujun Chen, Yulun Du, Nan Shao, Yanggang Wang, Haiyu Li, Zhilin Yang

link-bibliography
https://arxiv.org/abs/2201.03545#facebook: “ConvNeXt: A ConvNet for the 2020s”, Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie

link-bibliography
https://royalsocietypublishing.org/doi/10.1098/rstb.2020.0529: “The Evolution of Quantitative Sensitivity”, Margaret A. H. Bryer, Sarah E. Koopman, Jessica F. Cantlon, Steven T. Piantadosi, Evan L. MacLean, Joseph M. Baker, Michael J. Beran, Sarah M. Jones, Kerry E. Jordan, Salif Mahamane, Andreas Nieder, Bonnie M. Perdue, Friederike Range, Jeffrey R. Stevens, Masaki Tomonaga, Dorottya J. Ujfalussy, Jennifer Vonk

link-bibliography
https://arxiv.org/abs/2112.05253: “MAGMA—Multimodal Augmentation of Generative Models through Adapter-Based Finetuning”, Constantin Eichenberg, Sidney Black, Samuel Weinbach, Letitia Parcalabescu, Anette Frank

link-bibliography
https://arxiv.org/abs/2112.04426#deepmind: “Improving Language Models by Retrieving from Trillions of Tokens”, Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack W. Rae, Erich Elsen, Laurent Sifre

link-bibliography
https://arxiv.org/abs/2111.12233#microsoft: “LEMON: Scaling Up Vision-Language Pre-Training for Image Captioning”, Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang

link-bibliography
https://arxiv.org/abs/2111.12763#google: “Sparse Is Enough in Scaling Transformers”, Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin, Łukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, Jonni Kanerva

link-bibliography
https://arxiv.org/abs/2111.11904#microsoft: “Can Pre-Trained Language Models Be Used to Resolve Textual and Semantic Merge Conflicts?”, Jialu Zhang, Todd Mytkowicz, Mike Kaufman, Ruzica Piskac, Shuvendu K. Lahiri

link-bibliography
https://arxiv.org/abs/2111.11133: “L-Verse: Bidirectional Generation Between Image and Text”, Taehoon Kim, Gwangmo Song, Sihaeng Lee, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae

link-bibliography
https://arxiv.org/abs/2111.11432#microsoft: “Florence: A New Foundation Model for Computer Vision”, Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, Jianfeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

link-bibliography
https://arxiv.org/abs/2111.10050#google: “BASIC: Combined Scaling for Open-Vocabulary Image Classification”, Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, Quoc V. Le

link-bibliography
https://arxiv.org/abs/2111.08267: “Solving Probability and Statistics Problems by Program Synthesis”, Leonard Tang, Elizabeth Ke, Nikhil Singh, Nakul Verma, Iddo Drori

link-bibliography
https://arxiv.org/abs/2111.11294: “Scaling Law for Recommendation Models: Towards General-Purpose User Representations”, Kyuyong Shin, Hanock Kwak, Su Young Kim, Max Nihlen Ramstrom, Jisu Jeong, Jung-Woo Ha, Kyung-Min Kim

link-bibliography
https://arxiv.org/abs/2111.06377#facebook: “MAE: Masked Autoencoders Are Scalable Vision Learners”, Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick

link-bibliography
https://arxiv.org/abs/2111.05321: “Turing-Universal Learners With Optimal Scaling Laws”, Preetum Nakkiran

link-bibliography
https://arxiv.org/abs/2111.02114#laion: “LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs”, Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, Aran Komatsuzaki

link-bibliography
https://arxiv.org/abs/2110.14168#openai: “Training Verifiers to Solve Math Word Problems”, Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman

link-bibliography
https://arxiv.org/abs/2110.11526#deepmind: “Wide Neural Networks Forget Less Catastrophically”, Seyed Iman Mirzadeh, Arslan Chaudhry, Dong Yin, Huiyi Hu, Razvan Pascanu, Dilan Gorur, Mehrdad Farajtabar

link-bibliography
https://arxiv.org/abs/2110.06990: “Scaling Laws for the Few-Shot Adaptation of Pre-Trained Image Classifiers”, Gabriele Prato, Simon Guiroy, Ethan Caballero, Irina Rish, Sarath Chandar

link-bibliography
https://arxiv.org/abs/2110.02095#google: “Exploring the Limits of Large Scale Pre-Training”, Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie Sedghi

link-bibliography
https://arxiv.org/abs/2109.10686#google: “Scale Efficiently: Insights from Pre-Training and Fine-Tuning Transformers”, Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

link-bibliography
https://arxiv.org/abs/2109.07958: “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Stephanie Lin, Jacob Hilton, Owain Evans

link-bibliography
https://arxiv.org/abs/2109.02593#allen: “General-Purpose Question-Answering With Macaw”, Oyvind Tafjord, Peter Clark

link-bibliography
https://arxiv.org/abs/2108.13002#microsoft: “A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP”, Yucheng Zhao, Guangting Wang, Chuanxin Tang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha

link-bibliography
https://arxiv.org/abs/2108.08810#google: “Do Vision Transformers See Like Convolutional Neural Networks?”, Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, Alexey Dosovitskiy

link-bibliography
https://arxiv.org/abs/2108.07686: “Scaling Laws for Deep Learning”, Jonathan S. Rosenfeld

link-bibliography
https://arxiv.org/abs/2107.02137#baidu: “ERNIE 3.0: Large-Scale Knowledge Enhanced Pre-Training for Language Understanding and Generation”, Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, Chao Pang, Junyuan Shang, Jiaxiang Liu, Xuyi Chen, Yanbin Zhao, Yuxiang Lu, Weixin Liu, Zhihua Wu, Weibao Gong, Jianzhong Liang, Zhizhou Shang, Peng Sun, Wei Liu, Xuan Ouyang, Dianhai Yu, Hao Tian, Hua Wu, Haifeng Wang

link-bibliography
https://arxiv.org/abs/2107.01294#allen: “Scarecrow: A Framework for Scrutinizing Machine Text”, Yao Dou, Maxwell Forbes, Rik Koncel-Kedziorski, Noah A. Smith, Yejin Choi

link-bibliography
https://arxiv.org/abs/2106.07411: “Partial Success in Closing the Gap between Human and Machine Vision”, Robert Geirhos, Kantharaju Narayanappa, Benjamin Mitzkus, Tizian Thieringer, Matthias Bethge, Felix A. Wichmann, Wiel, Brendel

link-bibliography
https://arxiv.org/abs/2106.09488#amazon: “Scaling Laws for Acoustic Models”, Jasha Droppo, Oguz Elibol

link-bibliography
https://arxiv.org/abs/2106.04803#google: “CoAtNet: Marrying Convolution and Attention for All Data Sizes”, Zihang Dai, Hanxiao Liu, Quoc V. Le, Mingxing Tan

link-bibliography
https://arxiv.org/abs/2106.04560#google: “Scaling Vision Transformers”, Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, Lucas Beyer

link-bibliography
https://arxiv.org/abs/2106.03004#google: “Exploring the Limits of Out-Of-Distribution Detection”, Stanislav Fort, Jie Ren, Balaji Lakshminarayanan

link-bibliography
https://arxiv.org/abs/2106.00116: “Effect of Pre-Training Scale on Intra/Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images”, Mehdi Cherti, Jenia Jitsev

link-bibliography
https://arxiv.org/abs/2105.12806: “A Universal Law of Robustness via Isoperimetry”, Sébastien Bubeck, Mark Sellke

link-bibliography
https://m.koreaherald.com/view.php?ud=20210525000824#naver: “Naver Unveils First ‘Hyperscale’ AI Platform”, Kang Jae-eun

link-bibliography
https://arxiv.org/abs/2105.11084#facebook: “Unsupervised Speech Recognition”, Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli

link-bibliography
https://venturebeat.com/ai/google-details-new-ai-accelerator-chips/: “Google Details New AI Accelerator Chips”, Kyle Wiggers

link-bibliography
https://arxiv.org/abs/2105.01601#google: “MLP-Mixer: An All-MLP Architecture for Vision”, Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

link-bibliography
https://arxiv.org/abs/2105.00572#facebook: “XLM-R XL: Larger-Scale Transformers for Multilingual Masked Language Modeling”, Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau

link-bibliography
https://arxiv.org/abs/2104.14294#facebook: “DINO: Emerging Properties in Self-Supervised Vision Transformers”, Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Arm Holdings, Joulin

link-bibliography
https://arxiv.org/abs/2103.14586#google: “Understanding Robustness of Transformers for Image Classification”, Srinadh Bhojanapalli, Ayan Chakrabarti, Daniel Glasner, Daliang Li, Thomas Unterthiner, Andreas Veit

link-bibliography
https://arxiv.org/abs/2103.13009#allen: “UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark”, Nicholas Lourie, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi

link-bibliography
https://arxiv.org/abs/2103.10957#deepmind: “Efficient Visual Pretraining With Contrastive Detection”, Olivier J. Hénaff, Skanda Koppula, Jean-Baptiste Alayrac, Aaron van den Oord, Oriol Vinyals, João Carreira

link-bibliography
https://arxiv.org/abs/2103.07579#google: “Revisiting ResNets: Improved Training and Scaling Strategies”, Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

link-bibliography
https://ai.facebook.com/blog/learning-from-videos-to-understand-the-world/: “Learning from Videos to Understand the World”, Geoffrey Zweig, Polina Kuznetsova, Michael Auli, Francois Fagan

link-bibliography
https://arxiv.org/abs/2103.01988#facebook: “SEER: Self-Supervised Pretraining of Visual Features in the Wild”, Priya Goyal, Mathilde Caron, Benjamin Lefaudeux, Min Xu, Pengchao Wang, Vivek Pai, Mannat Singh, Vitaliy Liptchinsky, Ishan Misra, Armand Joulin, Piotr Bojanowski

link-bibliography
https://arxiv.org/abs/2102.09672#openai: “Improved Denoising Diffusion Probabilistic Models”, Alex Nichol, Prafulla Dhariwal

link-bibliography
https://arxiv.org/abs/2102.05918#google: “ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision”, Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig

link-bibliography
https://arxiv.org/abs/2102.06171#deepmind: “NFNet: High-Performance Large-Scale Image Recognition Without Normalization”, Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan

link-bibliography
https://arxiv.org/abs/2102.02888#microsoft: “1-Bit Adam: Communication Efficient Large-Scale Training With Adam’s Convergence Speed”, Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He

link-bibliography
https://arxiv.org/abs/2102.01951#scaling&org=deepmind: “Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling”, Angeliki Lazaridou, Adhiguna Kuncoro, Elena Gribovskaya, Devang Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d’Autume, Tomas Kocisky, Sebastian Ruder, Dani Yogatama, Kris Cao, Susannah Young, Phil Blunsom

link-bibliography
https://arxiv.org/abs/2003.10580#google: “Meta Pseudo Labels”, Hieu Pham, Zihang Dai, Qizhe Xie, Minh-Thang Luong, Quoc V. Le

link-bibliography
https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language_Supervision.pdf: “CLIP: Learning Transferable Visual Models From Natural Language Supervision”, Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya A. Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever

link-bibliography
https://www.alignmentforum.org/posts/k2SNji3jXaLGhBeYP/extrapolating-gpt-n-performance: “Extrapolating GPT-N Performance”, Lukas Finnveden

link-bibliography
https://arxiv.org/abs/2011.10650#openai: “Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images”, Rewon Child

link-bibliography
https://arxiv.org/abs/2010.14701#openai: “Scaling Laws for Autoregressive Generative Modeling”, Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya A. Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish

link-bibliography
https://arxiv.org/abs/2010.14571#google: “Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus”, Isaac Caswell, Theresa Breiner, Daan van Esch, Ankur Bapna

link-bibliography
https://arxiv.org/abs/2010.10504#google: “Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition”, Yu Zhang, James Qin, Daniel S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le, Yonghui Wu

link-bibliography
https://ai.meta.com/blog/introducing-many-to-many-multilingual-machine-translation/: “The First AI Model That Translates 100 Languages without Relying on English Data”, Angela Fan

link-bibliography
https://arxiv.org/abs/2010.11929#google: “Vision Transformer: An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale”, Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

link-bibliography
https://www.openphilanthropy.org/research/new-report-on-how-much-computational-power-it-takes-to-match-the-human-brain/: “New Report on How Much Computational Power It Takes to Match the Human Brain”, Joseph Carlsmith

link-bibliography
https://arxiv.org/abs/2009.03393#openai: “Generative Language Modeling for Automated Theorem Proving”, Stanislas Polu, Ilya Sutskever

link-bibliography
https://arxiv.org/abs/2008.09037: “Accuracy and Performance Comparison of Video Action Recognition Approaches”, Matthew Hutchinson, Siddharth Samsi, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Michael Houle, Matthew Hubbell, Michael Jones, Jeremy Kepner, Andrew Kirby, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Antonio Rosa, Albert Reuther, Charles Yee, Vijay Gadepally

link-bibliography
https://www.lesswrong.com/posts/Wnqua6eQkewL3bqsF/matt-botvinick-on-the-spontaneous-emergence-of-learning: “Matt Botvinick on the Spontaneous Emergence of Learning Algorithms”, Adam Scholl

link-bibliography
https://arxiv.org/abs/2008.02217: “Hopfield Networks Is All You Need”, Hubert Ramsauer, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, Milena Pavlović, Geir Kjetil Sandve, Victor Greiff, David Kreil, Michael Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter

link-bibliography
https://arxiv.org/abs/2007.06225: “ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing”, Ahmed Elnaggar, Michael Heinzinger, Christian Dallago, Ghalia Rihawi, Yu Wang, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Martin Steinegger, Debsindhu Bhowmik, Burkhard Rost

link-bibliography
https://arxiv.org/abs/2007.03898#nvidia: “NVAE: A Deep Hierarchical Variational Autoencoder”, Arash Vahdat, Jan Kautz

link-bibliography
https://arxiv.org/abs/2006.10621: “On the Predictability of Pruning Across Scales”, Jonathan S. Rosenfeld, Jonathan Frankle, Michael Carbin, Nir Shavit

link-bibliography
2020-chen-2.pdf#openai: “IGPT: Generative Pretraining from Pixels”, Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever

link-bibliography
https://arxiv.org/abs/2006.09882#facebook: “SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments”, Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin

link-bibliography
https://openai.com/index/image-gpt/: “Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Mark Chen, Alec Radford, Ilya Sutskever

link-bibliography
https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/: “ZeRO-2 & DeepSpeed: Shattering Barriers of Deep Learning Speed & Scale”, DeepSpeed Team

link-bibliography
https://openai.com/research/jukebox: “Jukebox: We’re Introducing Jukebox, a Neural Net That Generates Music, including Rudimentary Singing, As Raw Audio in a Variety of Genres and Artist Styles. We’re Releasing the Model Weights and Code, along With a Tool to Explore the Generated Samples.”, Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever

link-bibliography
https://ai.meta.com/blog/state-of-the-art-open-source-chatbot/: “Blender: A State-Of-The-Art Open Source Chatbot”, Stephen Roller, Jason Weston, Emily Dinan

link-bibliography
https://arxiv.org/abs/2004.10802: “Scaling Laws from the Data Manifold Dimension”, Utkarsh Sharma, Jared Kaplan

link-bibliography
https://arxiv.org/abs/2004.08366#google: “DynamicEmbedding: Extending TensorFlow for Colossal-Scale Applications”, Yun Zeng, Siqi Zuo, Dongcai Shen

link-bibliography
https://arxiv.org/abs/2004.07159#alibaba: “PALM: Pre-Training an Autoencoding & Autoregressive Language Model for Context-Conditioned Generation”, Bin Bi, Chenliang Li, Chen Wu, Ming Yan, Wei Wang, Songfang Huang, Fei Huang, Luo Si

link-bibliography
https://www.technologyreview.com/2020/02/17/844721/ai-openai-moonshot-elon-musk-sam-altman-greg-brockman-messy-secretive-reality/: “The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, Karen Hao

link-bibliography
https://arxiv.org/abs/2002.05709#google: “A Simple Framework for Contrastive Learning of Visual Representations”, Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton

link-bibliography
https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/: “Turing-NLG: A 17-Billion-Parameter Language Model by Microsoft”, Corby Rosset

link-bibliography
https://research.google/blog/towards-a-conversational-agent-that-can-chat-aboutanything/: “Towards a Conversational Agent That Can Chat About…Anything”, Daniel Adiwardana, Thang Luong

link-bibliography
https://arxiv.org/abs/2001.08361#openai: “Scaling Laws for Neural Language Models”, Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei

link-bibliography
https://www.youtube.com/watch?v=kY2NHSKBi10: “The Importance of Deconstruction”, Kilian Q. Weinberger

link-bibliography
https://openai.com/research/deep-double-descent: “Deep Double Descent: We Show That the Double Descent Phenomenon Occurs in CNNs, ResNets, and Transformers: Performance First Improves, Then Gets Worse, and Then Improves Again With Increasing Model Size, Data Size, or Training Time”, Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever

link-bibliography
https://arxiv.org/abs/1911.13299: “What’s Hidden in a Randomly Weighted Neural Network?”, Vivek Ramanujan, Mitchell Wortsman, Aniruddha Kembhavi, Ali Farhadi, Mohammad Rastegari

link-bibliography
https://arxiv.org/abs/1911.05722#facebook: “Momentum Contrast for Unsupervised Visual Representation Learning”, Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick

link-bibliography
https://arxiv.org/abs/1911.04252#google: “Self-Training With Noisy Student Improves ImageNet Classification”, Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le

link-bibliography
https://arxiv.org/abs/1911.02116#facebook: “Unsupervised Cross-Lingual Representation Learning at Scale”, Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

link-bibliography
https://arxiv.org/abs/1910.02054#microsoft: “ZeRO: Memory Optimizations Toward Training Trillion Parameter Models”, Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He

link-bibliography
https://arxiv.org/abs/1909.11740: “UNITER: UNiversal Image-TExt Representation Learning”, Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

link-bibliography
https://arxiv.org/abs/1909.05858#salesforce: “CTRL: A Conditional Transformer Language Model For Controllable Generation”, Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher

link-bibliography
https://nv-adlr.github.io/MegatronLM: “MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, NVID I. A. ADLR

link-bibliography
https://arxiv.org/abs/1907.11692#facebook: “RoBERTa: A Robustly Optimized BERT Pretraining Approach”, Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov

link-bibliography
https://arxiv.org/abs/1907.02544: “Large Scale Adversarial Representation Learning”, Jeff Donahue, Karen Simonyan

link-bibliography
https://arxiv.org/abs/1906.06669: “One Epoch Is All You Need”, Aran Komatsuzaki

link-bibliography
https://david-abel.github.io/notes/icml_2019.pdf: “ICML 2019 Notes”, David Abel

link-bibliography
https://arxiv.org/abs/1905.11946#google: “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, Mingxing Tan, Quoc V. Le

link-bibliography
https://arxiv.org/abs/1905.10843: “Asymptotic Learning Curves of Kernel Methods: Empirical Data versus Teacher-Student Paradigm”, Stefano Spigler, Mario Geiger, Matthieu Wyart

link-bibliography
https://arxiv.org/abs/1905.03197: “UniLM: Unified Language Model Pre-Training for Natural Language Understanding and Generation”, Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

link-bibliography
https://arxiv.org/abs/1905.00546#facebook: “Billion-Scale Semi-Supervised Learning for Image Classification”, I. Zeki Yalniz, Hervé Jégou, Kan Chen, Manohar Paluri, Dhruv Mahajan

link-bibliography
https://openai.com/index/better-language-models/: “Better Language Models and Their Implications”, Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever

link-bibliography
https://melaniemitchell.me/aibook/: “Artificial Intelligence: A Guide for Thinking Humans § Prologue: Terrified”, Melanie Mitchell

link-bibliography
https://openai.com/research/how-ai-training-scales: “How AI Training Scales”, Sam McCandlish, Jared Kaplan, Dario Amodei

link-bibliography
https://slatestarcodex.com/2018/11/26/is-science-slowing-down-2/: “Is Science Slowing Down?”, Scott Alexander

link-bibliography
https://arxiv.org/abs/1808.01097: “CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images”, Sheng Guo, Weilin Huang, Haozhi Zhang, Chenfan Zhuang, Dengke Dong, Matthew R. Scott, Dinglong Huang

link-bibliography
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf#page=5: “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever

link-bibliography
https://arxiv.org/abs/1805.00932#facebook: “Exploring the Limits of Weakly Supervised Pretraining”, Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, Laurens van der Maaten

link-bibliography
https://arxiv.org/abs/1801.06146: “ULMFiT: Universal Language Model Fine-Tuning for Text Classification”, Jeremy Howard, Sebastian Ruder

link-bibliography
https://arxiv.org/abs/1706.06083: “Towards Deep Learning Models Resistant to Adversarial Attacks”, Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu

link-bibliography
https://arxiv.org/abs/1706.01427#deepmind: “A Simple Neural Network Module for Relational Reasoning”, Adam Santoro, David Raposo, David G. T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap

link-bibliography
https://arxiv.org/abs/1705.07750#deepmind: “Quo Vadis, Action Recognition? A New Model I3D and the Kinetics Dataset”, Joao Carreira, Andrew Zisserman

link-bibliography
https://arxiv.org/abs/1705.05640: “WebVision Challenge: Visual Learning and Understanding With Web Data”, Wen Li, Limin Wang, Wei Li, Eirikur Agustsson, Jesse Berent, Abhinav Gupta, Rahul Sukthankar, Luc Van Gool

link-bibliography
https://blogs.microsoft.com/ai/microsoft-researchers-win-imagenet-computer-vision-challenge/: “Microsoft Researchers Win ImageNet Computer Vision Challenge”, Allison Linn

link-bibliography
https://arxiv.org/abs/1511.06789#google: “The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition”, Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei

link-bibliography
https://arxiv.org/abs/1511.02251#facebook: “Learning Visual Features from Large Weakly Supervised Data”, Armand Joulin, Laurens van der Maaten, Allan Jabri, Nicolas Vasilache

link-bibliography
https://openaccess.thecvf.com/content_cvpr_2015/papers/Xiao_Learning_From_Massive_2015_CVPR_paper.pdf#baidu: “Clothing-1M: Learning from Massive Noisy Labeled Data for Image Classification”, Tong Xiao, Tian Xia, Yi Yang, Chang Huang, Xiaogang Wang

link-bibliography
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1097_Paper.pdf: “N-Gram Counts and Language Models from the Common Crawl”, Christian Buck, Kenneth Heafield, Bas van Ooyen

link-bibliography
https://aclanthology.org/P13-2121.pdf: “Scalable Modified Kneser-Ney Language Model Estimation”, Kenneth Heafield, Ivan Pouzyrevsky, Jonathan H. Clark, Philipp Koehn

link-bibliography
2010-mikolov.pdf: “Recurrent Neural Network Based Language Model”, Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur

link-bibliography
2010-hameed.pdf: “Understanding Sources of Inefficiency in General-Purpose Chips”, Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, Mark Horowitz

link-bibliography
https://dw2blog.com/2009/11/02/halloween-nightmare-scenario-early-2020s/: “Halloween Nightmare Scenario, Early 2020’s”, David Wood

link-bibliography
https://web.archive.org/web/20230718144747/https://frc.ri.cmu.edu/~hpm/project.archive/robot.papers/2004/Predictions.html: “Robot Predictions Evolution”, Hans Moravec

link-bibliography
2003-perlich.pdf: “Tree Induction vs. Logistic Regression: A Learning-Curve Analysis”, Claudia Perlich, Foster Provost, Jeffrey S. Simonoff

link-bibliography
http://infolab.stanford.edu/~backrub/google.html: “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, Sergey Brin, Lawrence Page

link-bibliography
https://paulfchristiano.com/: “Homepage of Paul F. Christiano”, Paul F. Christiano

link-bibliography