‘RNN’ tag

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

Wikipedia

Miscellaneous

Bibliography

https://arxiv.org/abs/2412.07752: “FlashRNN: Optimizing Traditional RNNs on Modern Hardware”, Korbinian Pöppel, Maximilian Beck, Sepp Hochreiter

link-bibliography
https://arxiv.org/abs/2410.01201: “Were RNNs All We Needed?”, Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio, Hossein Hajimirsadegh

link-bibliography
https://arxiv.org/abs/2408.15237: “The Mamba in the Llama: Distilling and Accelerating Hybrid Models”, Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao

link-bibliography
https://arxiv.org/abs/2406.07887: “An Empirical Study of Mamba-Based Language Models”, Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan, Garvit Kulshreshtha, Vartika Singh, Jared Casper, Jan Kautz, Mohammad Shoeybi, Bryan Catanzaro

link-bibliography
https://arxiv.org/abs/2405.20233: “Grokfast: Accelerated Grokking by Amplifying Slow Gradients”, Jaerin Lee, Bong Gyun Kang, Kihoon Kim, Kyoung Mu Lee

link-bibliography
https://arxiv.org/abs/2404.08801#facebook: “Megalodon: Efficient LLM Pretraining and Inference With Unlimited Context Length”, Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

link-bibliography
https://arxiv.org/abs/2404.05971#eleutherai: “Does Transformer Interpretability Transfer to RNNs?”, Gonçalo Paulo, Thomas Marshall, Nora Belrose

link-bibliography
https://arxiv.org/abs/2403.17844: “Mechanistic Design and Scaling of Hybrid Architectures”, Michael Poli, Armin W. Thomas, Eric Nguyen, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting, Taiji Suzuki, Brian Hie, Stefano Ermon, Christopher Ré, Ce Zhang, Stefano Massaroli

link-bibliography
https://arxiv.org/abs/2403.13802: “ZigMa: Zigzag Mamba Diffusion Model”, Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Bjorn Ommer

link-bibliography
https://arxiv.org/abs/2312.04927: “Zoology: Measuring and Improving Recall in Efficient Language Models”, Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Zou, Atri Rudra, Christopher Ré

link-bibliography
https://arxiv.org/abs/2312.00752: “Mamba: Linear-Time Sequence Modeling With Selective State Spaces”, Albert Gu, Tri Dao

link-bibliography
https://www.nature.com/articles/s41467-023-42875-2#deepmind: “Learning Few-Shot Imitation As Cultural Transmission”, Avishkar Bhoopchand, Bethanie Brownfield, Adrian Collister, Agustin Dal Lago, Ashley Edwards, Richard Everett, Alexandre Fréchette, Yanko Gitahy Oliveira, Edward Hughes, Kory W. Mathewson, Piermaria Mendolicchio, Julia Pawar, Miruna Pȋslar, Alex Platonov, Evan Senter, Sukhdeep Singh, Alexander Zacherl, Lei M. Zhang

link-bibliography
https://arxiv.org/abs/2310.02980: “Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors”, Ido Amos, Jonathan Berant, Ankit Gupta

link-bibliography
https://arxiv.org/abs/2306.00323: “Thought Cloning: Learning to Think While Acting by Imitating Human Thinking”, Shengran Hu, Jeff Clune

link-bibliography
https://arxiv.org/abs/2305.13048: “RWKV: Reinventing RNNs for the Transformer Era”, Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdin, Mom, Atsushi Saito, Xiangru Tang, Bolun Wang, Johan S. Wind, Stansilaw Wozniak, Ruichong Zhang, Zhenyuan Zhang, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu

link-bibliography
https://arxiv.org/abs/2303.06349#deepmind: “Resurrecting Recurrent Neural Networks for Long Sequences”, Antonio Orvieto, Samuel L. Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De

link-bibliography
https://arxiv.org/abs/2302.13939: “SpikeGPT: Generative Pre-Trained Language Model With Spiking Neural Networks”, Rui-Jie Zhu, Qihang Zhao, Jason K. Eshraghian

link-bibliography
2023-bures.pdf: “Organic Reaction Mechanism Classification Using Machine Learning”, Jordi Burés, Igor Larrosa

link-bibliography
https://arxiv.org/abs/2212.14052: “Hungry Hungry Hippos: Towards Language Modeling With State Space Models”, Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré

link-bibliography
https://arxiv.org/abs/2212.10544: “Pretraining Without Attention”, Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M. Rush

link-bibliography
https://arxiv.org/abs/2211.07638: “Legged Locomotion in Challenging Terrains Using Egocentric Vision”, Ananye Agarwal, Ashish Kumar, Jitendra Malik, Deepak Pathak

link-bibliography
https://arxiv.org/abs/2210.01117: “Omnigrok: Grokking Beyond Algorithmic Data”, Ziming Liu, Eric J. Michaud, Max Tegmark

link-bibliography
https://arxiv.org/abs/2209.11737: “Semantic Scene Descriptions As an Objective of Human Vision”, Adrien Doerig, Tim C. Kietzmann, Emily Allen, Yihan Wu, Thomas Naselaris, Kendrick Kay, Ian Charest

link-bibliography
https://arxiv.org/abs/2205.01972: “Sequencer: Deep LSTM for Image Classification”, Yuki Tatsunami, Masato Taki

link-bibliography
2022-grand.pdf: “Semantic Projection Recovers Rich Human Knowledge of Multiple Object Features from Word Embeddings”, Gabriel Grand, Idan Asher Blank, Francisco Pereira, Evelina Fedorenko

link-bibliography
https://arxiv.org/abs/2203.07852: “Block-Recurrent Transformers”, DeLesley Hutchins, Imanol Schlag, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur

link-bibliography
https://arxiv.org/abs/2202.07765#deepmind: “General-Purpose, Long-Context Autoregressive Modeling With Perceiver AR”, Curtis Hawthorne, Andrew Jaegle, Cătălina Cangea, Sebastian Borgeaud, Charlie Nash, Mateusz Malinowski, Sander Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Simon, Hannah Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, João Carreira, Jesse Engel

link-bibliography
2022-miki.pdf: “Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild”, Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, Marco Hutter

link-bibliography
https://arxiv.org/abs/2111.00396: “S4: Efficiently Modeling Long Sequences With Structured State Spaces”, Albert Gu, Karan Goel, Christopher Ré

link-bibliography
https://elifesciences.org/articles/66039: “A Connectome of the Drosophila Central Complex Reveals Network Motifs Suitable for Flexible Navigation and Context-Dependent Action Selection”, Brad K. Hulse, Hannah Haberkern, Romain Franconville, Daniel B. Turner-Evans, Shin-ya Takemura, Tanya Wolff, Marcella Noorman, Marisa Dreher, Chuntao Dan, Ruchi Parekh, Ann M. Hermundstad, Gerald M. Rubin, Vivek Jayaraman

link-bibliography
https://arxiv.org/abs/2107.14795#deepmind: “Perceiver IO: A General Architecture for Structured Inputs & Outputs”, Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira

link-bibliography
https://proceedings.mlr.press/v139/vicol21a.html: “PES: Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Paul Vicol, Luke Metz, Jascha Sohl-Dickstein

link-bibliography
2021-delul.pdf: “Shelley: A Crowd-Sourced Collaborative Horror Writer”, Pinar Yanardag Delul, Manuel Cebrian, Iyad Rahwan

link-bibliography
2021-jouppi.pdf: “Ten Lessons From Three Generations Shaped Google’s TPUv4i”, Norman P. Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B. Jablin, George Kurian, James Laudon, Sheng Li, Peter Ma, Xiaoyu Ma, Thomas Norrie, Nishant Patil, Sushma Prasad, Cliff Young, Zongwei Zhou, David Patterson

link-bibliography
https://arxiv.org/abs/2106.06981: “RASP: Thinking Like Transformers”, Gail Weiss, Yoav Goldberg, Eran Yahav

link-bibliography
https://arxiv.org/abs/2106.09488#amazon: “Scaling Laws for Acoustic Models”, Jasha Droppo, Oguz Elibol

link-bibliography
https://arxiv.org/abs/2103.03206#deepmind: “Perceiver: General Perception With Iterative Attention”, Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, Joao Carreira

link-bibliography
https://arxiv.org/abs/2102.04159: “Deep Residual Learning in Spiking Neural Networks”, Wei Fang, Zhaofei Yu, Yanqi Chen, Tiejun Huang, Timothée Masquelier, Yonghong Tian

link-bibliography
https://arxiv.org/abs/2011.12692#tencent: “Towards Playing Full MOBA Games With Deep Reinforcement Learning”, Deheng Ye, Guibin Chen, Wen Zhang, Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu, Yinyuting Yin, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang, Wei Liu

link-bibliography
https://arxiv.org/abs/2008.07669: “HiPPO: Recurrent Memory With Optimal Polynomial Projections”, Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher Re

link-bibliography
https://www.lesswrong.com/posts/Wnqua6eQkewL3bqsF/matt-botvinick-on-the-spontaneous-emergence-of-learning: “Matt Botvinick on the Spontaneous Emergence of Learning Algorithms”, Adam Scholl

link-bibliography
https://deepmind.google/discover/blog/agent57-outperforming-the-human-atari-benchmark/: “Agent57: Outperforming the Human Atari Benchmark”, Adrià Puigdomènech, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell

link-bibliography
https://arxiv.org/abs/2002.03629: “Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving”, Yang Song, Chenlin Meng, Renjie Liao, Stefano Ermon

link-bibliography
https://arxiv.org/abs/2001.08361#openai: “Scaling Laws for Neural Language Models”, Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei

link-bibliography
https://arxiv.org/abs/1911.11423: “Single Headed Attention RNN: Stop Thinking With Your Head”, Stephen Merity

link-bibliography
https://openreview.net/forum?id=HyxlRHBlUB: “Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks”, Aaron R. Voelker, Ivana Kajić, Chris Eliasmith

link-bibliography
https://arxiv.org/abs/1910.06591#deepmind: “SEED RL: Scalable and Efficient Deep-RL With Accelerated Central Inference”, Lasse Espeholt, Raphaël Marinier, Piotr Stanczyk, Ke Wang, Marcin Michalski

link-bibliography
https://arxiv.org/abs/1909.01792#deepmind: “Mogrifier LSTM”, Gábor Melis, Tomáš Kočiský, Phil Blunsom

link-bibliography
https://paperswithcode.com/task/language-modelling: “Language Modeling State-Of-The-Art Leaderboards”, paperswithcode.com

link-bibliography
https://arxiv.org/abs/1905.01320#deepmind: “Meta-Learners’ Learning Dynamics Are unlike Learners’”, Neil C. Rabinowitz

link-bibliography
https://arxiv.org/abs/1901.02860: “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov

link-bibliography
https://openreview.net/forum?id=r1lyTjAqYX#deepmind: “R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning”, Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, Will Dabney

link-bibliography
https://arxiv.org/abs/1801.06146: “ULMFiT: Universal Language Model Fine-Tuning for Text Classification”, Jeremy Howard, Sebastian Ruder

link-bibliography
https://arxiv.org/abs/1709.07432: “Dynamic Evaluation of Neural Sequence Models”, Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals

link-bibliography
https://arxiv.org/abs/1709.02755: “SRU: Simple Recurrent Units for Highly Parallelizable Recurrence”, Tao Lei, Yu Zhang, Sida I. Wang, Hui Dai, Yoav Artzi

link-bibliography
https://arxiv.org/abs/1704.05179: “SearchQA: A New Q&A Dataset Augmented With Context from a Search Engine”, Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Guney, Volkan Cirik, Kyunghyun Cho

link-bibliography
https://arxiv.org/abs/1704.05526: “Learning to Reason: End-To-End Module Networks for Visual Question Answering”, Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko

link-bibliography
https://arxiv.org/abs/1608.03609: “Clockwork Convnets for Video Semantic Segmentation”, Evan Shelhamer, Kate Rakelly, Judy Hoffman, Trevor Darrell

link-bibliography
https://arxiv.org/abs/1503.08895: “End-To-End Memory Networks”, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus

link-bibliography
2010-mikolov.pdf: “Recurrent Neural Network Based Language Model”, Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur

link-bibliography
1991-hochreiter.pdf: “Untersuchungen Zu Dynamischen Neuronalen Netzen [Studies of Dynamic Neural Networks]”, Sepp Hochreiter

link-bibliography
1989-williams.pdf: “Experimental Analysis of the Real-Time Recurrent Learning Algorithm”, Ronald J. Williams, David Zipser

link-bibliography
1988-werbos.pdf: “Generalization of Backpropagation With Application to a Recurrent Gas Market Model”, Paul Joseph Werbos

link-bibliography
1987-pineda.pdf: “Generalization of Back-Propagation to Recurrent Neural Networks”, Fernando J. Pineda

link-bibliography