‘NN sparsity’ tag

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`neural-computation`

[see previous entry]

`model-compression neural-architecture language-models learning-optimization depth-exploration performance-tuning`

[see previous entry]

`sparse-convolution`

[see previous entry]

`efficient-inference`

[see previous entry]

Wikipedia

Autoencoder § Sparse autoencoder (SAE)⁠:

https://en.wikipedia.org/wiki/Autoencoder#Sparse_autoencoder_(SAE)

Miscellaneous

Bibliography

https://arxiv.org/abs/2403.17844: “Mechanistic Design and Scaling of Hybrid Architectures”, Michael Poli, Armin W. Thomas, Eric Nguyen, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting, Taiji Suzuki, Brian Hie, Stefano Ermon, Christopher Ré, Ce Zhang, Stefano Massaroli

link-bibliography
https://arxiv.org/abs/2311.10770: “Exponentially Faster Language Modeling”, Peter Belcak, Roger Wattenhofer

link-bibliography
https://www.sciencedirect.com/science/article/pii/S0893608023005051: “An Exact Mapping from ReLU Networks to Spiking Neural Networks”, Ana Stanojevic, Stanisław Woźniak, Guillaume Bellec, Giovanni Cherubini, Angeliki Pantazi, Wulfram Gerstner

link-bibliography
https://arxiv.org/abs/2310.17157: “Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time”, Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen

link-bibliography
https://arxiv.org/abs/2308.14711: “Fast Feedforward Networks”, Peter Belcak, Roger Wattenhofer

link-bibliography
https://arxiv.org/abs/2302.12441: “MUX-PLMs: Pre-Training Language Models With Data Multiplexing”, Vishvak Murahari, Ameet Deshpande, Carlos E. Jimenez, Izhak Shafran, Mingqiu Wang, Yuan Cao, Karthik Narasimhan

link-bibliography
https://arxiv.org/abs/2210.06313#google: “The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers”, Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar

link-bibliography
https://arxiv.org/abs/2207.03620: “More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 Using Sparsity (SLaK)”, Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian Wu, Mykola Pechenizkiy, Decebal Mocanu, Zhangyang Wang

link-bibliography
https://arxiv.org/abs/2205.03983#google: “Building Machine Translation Systems for the Next Thousand Languages”, Ankur Bapna, Isaac Caswell, Julia Kreutzer, Orhan Firat, Daan van Esch, Aditya Siddhant, Mengmeng Niu, Pallavi Baljekar, Xavier Garcia, Wolfgang Macherey, Theresa Breiner, Vera Axelrod, Jason Riesa, Yuan Cao, Mia Xu Chen, Klaus Macherey, Maxim Krikun, Pidong Wang, Alexander Gutkin, Apurva Shah, Yanping Huang, Zhifeng Chen, Yonghui Wu, Macduff Hughes

link-bibliography
https://arxiv.org/abs/2204.00595: “Monarch: Expressive Structured Matrices for Efficient and Accurate Training”, Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, Christopher Ré

link-bibliography
https://arxiv.org/abs/2203.06850: “Efficient Language Modeling With Sparse All-MLP”, Ping Yu, Mikel Artetxe, Myle Ott, Sam Shleifer, Hongyu Gong, Ves Stoyanov, Xian Li

link-bibliography
https://arxiv.org/abs/2202.07415#deepmind: “NeuPL: Neural Population Learning”, Siqi Liu, Luke Marris, Daniel Hennes, Josh Merel, Nicolas Heess, Thore Graepel

link-bibliography
https://arxiv.org/abs/2106.09685#microsoft: “LoRA: Low-Rank Adaptation of Large Language Models”, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen

link-bibliography
https://greydanus.github.io/2020/12/01/scaling-down/: “Scaling down Deep Learning”, Sam Greydanus

link-bibliography
https://arxiv.org/abs/1905.11946#google: “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, Mingxing Tan, Quoc V. Le

link-bibliography
https://arxiv.org/abs/1803.10615: “SqueezeNext: Hardware-Aware Neural Network Design”, Amir Gholami, Kiseok Kwon, Bichen Wu, Zizheng Tai, Xiangyu Yue, Peter Jin, Sicheng Zhao, Kurt Keutzer

link-bibliography