‘autoencoder NN’ tag

See Also
Gwern
- “Anime Neural Net Graveyard”, Gwern 2019
Links
Miscellaneous
Bibliography

See Also

Gwern

“Anime Neural Net Graveyard”, Gwern 2019

Anime Neural Net Graveyard

Links

“Scaling the Codebook Size of VQGAN to 100,000 With a Utilization Rate of 99%”, Zhu et al 2024

Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%

“Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction”, Tian et al 2024

Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction

“Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data”, Gerstgrasser et al 2024

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

“Neural Network Parameter Diffusion”, Wang et al 2024

Neural Network Parameter Diffusion

“Attention versus Contrastive Learning of Tabular Data—A Data-Centric Benchmarking”, Rabbani et al 2024

Attention versus Contrastive Learning of Tabular Data—A Data-centric Benchmarking

“Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet”

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

“GIVT: Generative Infinite-Vocabulary Transformers”, Tschannen et al 2023

GIVT: Generative Infinite-Vocabulary Transformers

“Sequential Modeling Enables Scalable Learning for Large Vision Models”, Bai et al 2023

Sequential Modeling Enables Scalable Learning for Large Vision Models

“Finite Scalar Quantization (FSQ): VQ-VAE Made Simple”, Mentzer et al 2023

Finite Scalar Quantization (FSQ): VQ-VAE Made Simple

“DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation”, Duan et al 2023

DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation

“Finding Neurons in a Haystack: Case Studies With Sparse Probing”, Gurnee et al 2023

Finding Neurons in a Haystack: Case Studies with Sparse Probing

“TANGO: Text-To-Audio Generation Using Instruction-Tuned LLM and Latent Diffusion Model”, Ghosal et al 2023

TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

“ACT: Learning Fine-Grained Bimanual Manipulation With Low-Cost Hardware”, Zhao et al 2023

ACT: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

“Bridging Discrete and Backpropagation: Straight-Through and Beyond”, Liu et al 2023

Bridging Discrete and Backpropagation: Straight-Through and Beyond

“Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder”, Valin et al 2022

Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder

“IRIS: Transformers Are Sample-Efficient World Models”, Micheli et al 2022

IRIS: Transformers are Sample-Efficient World Models

“Understanding Diffusion Models: A Unified Perspective”, Luo 2022

Understanding Diffusion Models: A Unified Perspective

“Vector Quantized Image-To-Image Translation”, Chen et al 2022

Vector Quantized Image-to-Image Translation

“Draft-And-Revise: Effective Image Generation With Contextual RQ-Transformer”, Lee et al 2022

Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer

“UViM: A Unified Modeling Approach for Vision With Learned Guiding Codes”, Kolesnikov et al 2022

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

“Closing the Gap: Exact Maximum Likelihood Training of Generative Autoencoders Using Invertible Layers (AEF)”, Silvestri et al 2022

Closing the gap: Exact maximum likelihood training of generative autoencoders using invertible layers (AEF)

“AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars”, Hong et al 2022

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

“AdaVAE: Exploring Adaptive GPT-2s in Variational Autoencoders for Language Modeling”, Tu et al 2022

AdaVAE: Exploring Adaptive GPT-2s in Variational Autoencoders for Language Modeling

“NaturalSpeech: End-To-End Text to Speech Synthesis With Human-Level Quality”, Tan et al 2022

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

“VQGAN-CLIP: Open Domain Image Generation and Editing With Natural Language Guidance”, Crowson et al 2022

VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance

“TATS: Long Video Generation With Time-Agnostic VQGAN and Time-Sensitive Transformer”, Ge et al 2022

TATS: Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer

“Diffusion Probabilistic Modeling for Video Generation”, Yang et al 2022

Diffusion Probabilistic Modeling for Video Generation

“Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values”, Humayun et al 2022

Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values

“Vector-Quantized Image Modeling With Improved VQGAN”, Yu et al 2022

Vector-quantized Image Modeling with Improved VQGAN

“Variational Autoencoders Without the Variation”, Daly et al 2022

Variational Autoencoders Without the Variation

“Truncated Diffusion Probabilistic Models and Diffusion-Based Adversarial Autoencoders”, Zheng et al 2022

Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Autoencoders

“MLR: A Model of Working Memory for Latent Representations”, Hedayati et al 2022

MLR: A model of working memory for latent representations

“CM3: A Causal Masked Multimodal Model of the Internet”, Aghajanyan et al 2022

CM3: A Causal Masked Multimodal Model of the Internet

“Design Guidelines for Prompt Engineering Text-To-Image Generative Models”, Liu & Chilton 2022b

Design Guidelines for Prompt Engineering Text-to-Image Generative Models

“DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents”, Pandey et al 2022

DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents

“ERNIE-ViLG: Unified Generative Pre-Training for Bidirectional Vision-Language Generation”, Zhang et al 2021

ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

“High-Resolution Image Synthesis With Latent Diffusion Models”, Rombach et al 2021

High-Resolution Image Synthesis with Latent Diffusion Models

“Discovering State Variables Hidden in Experimental Data”, Chen et al 2021

Discovering State Variables Hidden in Experimental Data

“VQ-DDM: Global Context With Discrete Diffusion in Vector Quantized Modeling for Image Generation”, Hu et al 2021

VQ-DDM: Global Context with Discrete Diffusion in Vector Quantized Modeling for Image Generation

“Vector Quantized Diffusion Model for Text-To-Image Synthesis”, Gu et al 2021

Vector Quantized Diffusion Model for Text-to-Image Synthesis

“Passive Non-Line-Of-Sight Imaging Using Optimal Transport”, Geng et al 2021

Passive Non-Line-of-Sight Imaging Using Optimal Transport

“L-Verse: Bidirectional Generation Between Image and Text”, Kim et al 2021

L-Verse: Bidirectional Generation Between Image and Text

“Unsupervised Deep Learning Identifies Semantic Disentanglement in Single Inferotemporal Face Patch Neurons”, Higgins et al 2021

Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons

“Telling Creative Stories Using Generative Visual Aids”, Ali & Parikh 2021

Telling Creative Stories Using Generative Visual Aids

“Illiterate DALL·E Learns to Compose”, Singh et al 2021

Illiterate DALL·E Learns to Compose

“MeLT: Message-Level Transformer With Masked Document Representations As Pre-Training for Stance Detection”, Matero et al 2021

MeLT: Message-Level Transformer with Masked Document Representations as Pre-Training for Stance Detection

“Score-Based Generative Modeling in Latent Space”, Vahdat et al 2021

Score-based Generative Modeling in Latent Space

“NWT: Towards Natural Audio-To-Video Generation With Representation Learning”, Mama et al 2021

NWT: Towards natural audio-to-video generation with representation learning

“Vector Quantized Models for Planning”, Ozair et al 2021

Vector Quantized Models for Planning

“VideoGPT: Video Generation Using VQ-VAE and Transformers”, Yan et al 2021

VideoGPT: Video Generation using VQ-VAE and Transformers

“TSDAE: Using Transformer-Based Sequential Denoising Autoencoder for Unsupervised Sentence Embedding Learning”, Wang et al 2021

TSDAE: Using Transformer-based Sequential Denoising Autoencoder for Unsupervised Sentence Embedding Learning

“Symbolic Music Generation With Diffusion Models”, Mittal et al 2021

Symbolic Music Generation with Diffusion Models

“Deep Generative Modeling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models”, Bond-Taylor et al 2021

Deep Generative Modeling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models

“Greedy Hierarchical Variational Autoencoders (GHVAEs) for Large-Scale Video Prediction”, Wu et al 2021

Greedy Hierarchical Variational Autoencoders (GHVAEs) for Large-Scale Video Prediction

“CW-VAE: Clockwork Variational Autoencoders”, Saxena et al 2021

CW-VAE: Clockwork Variational Autoencoders

“Denoising Diffusion Implicit Models”, Song et al 2021

Denoising Diffusion Implicit Models

“DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, Ramesh et al 2021

DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language

“VQ-GAN: Taming Transformers for High-Resolution Image Synthesis”, Esser et al 2020

VQ-GAN: Taming Transformers for High-Resolution Image Synthesis

“Multimodal Dynamics Modeling for Off-Road Autonomous Vehicles”, Tremblay et al 2020

Multimodal dynamics modeling for off-road autonomous vehicles

“Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images”, Child 2020

Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images

“NVAE: A Deep Hierarchical Variational Autoencoder”, Vahdat & Kautz 2020

NVAE: A Deep Hierarchical Variational Autoencoder

“Jukebox: A Generative Model for Music”, Dhariwal et al 2020

Jukebox: A Generative Model for Music

“Jukebox: We’re Introducing Jukebox, a Neural Net That Generates Music, including Rudimentary Singing, As Raw Audio in a Variety of Genres and Artist Styles. We’re Releasing the Model Weights and Code, along With a Tool to Explore the Generated Samples.”, Dhariwal et al 2020

Jukebox: We’re introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles. We’re releasing the model weights and code, along with a tool to explore the generated samples.

“RL Agents Implicitly Learning Human Preferences”, Wichers 2020

RL agents Implicitly Learning Human Preferences

“Encoding Musical Style With Transformer Autoencoders”, Choi et al 2019

Encoding Musical Style with Transformer Autoencoders

“Generating Furry Face Art from Sketches Using a GAN”, Yu 2019

Generating Furry Face Art from Sketches using a GAN

“BART: Denoising Sequence-To-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension”, Lewis et al 2019

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

“Bayesian Parameter Estimation Using Conditional Variational Autoencoders for Gravitational-Wave Astronomy”, Gabbard et al 2019

Bayesian parameter estimation using conditional variational autoencoders for gravitational-wave astronomy

“In-Field Whole Plant Maize Architecture Characterized by Latent Space Phenotyping”, Gage et al 2019

In-field whole plant maize architecture characterized by Latent Space Phenotyping

“Generating Diverse High-Fidelity Images With VQ-VAE-2”, Razavi et al 2019

Generating Diverse High-Fidelity Images with VQ-VAE-2

“Bit-Swap: Recursive Bits-Back Coding for Lossless Compression With Hierarchical Latent Variables”, Kingma et al 2019

Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables

“Hierarchical Autoregressive Image Models With Auxiliary Decoders”, Fauw et al 2019

Hierarchical Autoregressive Image Models with Auxiliary Decoders

“Practical Lossless Compression With Latent Variables Using Bits Back Coding”, Townsend et al 2019

Practical Lossless Compression with Latent Variables using Bits Back Coding

“An Empirical Model of Large-Batch Training”, McCandlish et al 2018

An Empirical Model of Large-Batch Training

“How AI Training Scales”, McCandlish et al 2018

How AI Training Scales

“Neural Probabilistic Motor Primitives for Humanoid Control”, Merel et al 2018

Neural probabilistic motor primitives for humanoid control

“Piano Genie”, Donahue et al 2018

Piano Genie

“IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis”, Huang et al 2018

IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis

“InfoNCE: Representation Learning With Contrastive Predictive Coding (CPC)”, Oord et al 2018

InfoNCE: Representation Learning with Contrastive Predictive Coding (CPC)

“The Challenge of Realistic Music Generation: Modeling Raw Audio at Scale”, Dieleman et al 2018

The challenge of realistic music generation: modeling raw audio at scale

“Self-Net: Lifelong Learning via Continual Self-Modeling”, Camp et al 2018

Self-Net: Lifelong Learning via Continual Self-Modeling

“GANomaly: Semi-Supervised Anomaly Detection via Adversarial Training”, Akcay et al 2018

GANomaly: Semi-Supervised Anomaly Detection via Adversarial Training

“XGAN: Unsupervised Image-To-Image Translation for Many-To-Many Mappings”, Royer et al 2017

XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Mappings

“VQ-VAE: Neural Discrete Representation Learning”, Oord et al 2017

VQ-VAE: Neural Discrete Representation Learning

“Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration”, Rahmatizadeh et al 2017

Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration

“Β-VAE: Learning Basic Visual Concepts With a Constrained Variational Framework”, Higgins et al 2017

β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework

“Neural Audio Synthesis of Musical Notes With WaveNet Autoencoders”, Engel et al 2017

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

“Prediction and Control With Temporal Segment Models”, Mishra et al 2017

Prediction and Control with Temporal Segment Models

“Discovering Objects and Their Relations from Entangled Scene Representations”, Raposo et al 2017

Discovering objects and their relations from entangled scene representations

“Categorical Reparameterization With Gumbel-Softmax”, Jang et al 2016

Categorical Reparameterization with Gumbel-Softmax

“The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables”, Maddison et al 2016

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

“Improving Sampling from Generative Autoencoders With Markov Chains”, Creswell et al 2016

Improving Sampling from Generative Autoencoders with Markov Chains

“Language As a Latent Variable: Discrete Generative Models for Sentence Compression”, Miao & Blunsom 2016

Language as a Latent Variable: Discrete Generative Models for Sentence Compression

“Neural Photo Editing With Introspective Adversarial Networks”, Brock et al 2016

Neural Photo Editing with Introspective Adversarial Networks

“Early Visual Concept Learning With Unsupervised Deep Learning”, Higgins et al 2016

Early Visual Concept Learning with Unsupervised Deep Learning

“Improving Variational Inference With Inverse Autoregressive Flow”, Kingma et al 2016

Improving Variational Inference with Inverse Autoregressive Flow

“How Far Can We Go without Convolution: Improving Fully-Connected Networks”, Lin et al 2015

How far can we go without convolution: Improving fully-connected networks

“Semi-Supervised Sequence Learning”, Dai & Le 2015

Semi-supervised Sequence Learning

“MADE: Masked Autoencoder for Distribution Estimation”, Germain et al 2015

MADE: Masked Autoencoder for Distribution Estimation

“Analyzing Noise in Autoencoders and Deep Networks”, Poole et al 2014

Analyzing noise in autoencoders and deep networks

“Stochastic Backpropagation and Approximate Inference in Deep Generative Models”, Rezende et al 2014

Stochastic Backpropagation and Approximate Inference in Deep Generative Models

“Auto-Encoding Variational Bayes”, Kingma & Welling 2013

Auto-Encoding Variational Bayes

“Building High-Level Features Using Large Scale Unsupervised Learning”, Le et al 2011

Building high-level features using large scale unsupervised learning

“A Connection Between Score Matching and Denoising Autoencoders”, Vincent 2011

A Connection Between Score Matching and Denoising Autoencoders

“Reducing the Dimensionality of Data With Neural Networks”, Hinton & Salakhutdinov 2006

Reducing the Dimensionality of Data with Neural Networks

“Generating Large Images from Latent Vectors”, Ha 2025

Generating Large Images from Latent Vectors⁠:

View External Link:

https://blog.otoro.net/2016/04/01/generating-large-images-from-latent-vectors/

“Transformers As Variational Autoencoders”

Transformers as Variational Autoencoders

“Randomly Traversing the Manifold of Faces (2): Dataset: Labeled Faces in the Wild (LFW); Model: Variational Autoencoder (VAE) / Deep Latent Gaussian Model (DLGM).”

Randomly traversing the manifold of faces (2): Dataset: Labeled Faces in the Wild (LFW); Model: Variational Autoencoder (VAE) / Deep Latent Gaussian Model (DLGM).⁠:

https://www.youtube.com/watch?v=nHX7hCeOtFc

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`anomaly-detection`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`contrastive-learning`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`compression`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`generative-models`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

Wikipedia

Miscellaneous

Bibliography

https://arxiv.org/abs/2406.11837: “Scaling the Codebook Size of VQGAN to 100,000 With a Utilization Rate of 99%”, Lei Zhu, Fangyun Wei, Yanye Lu, Dong Chen

link-bibliography
https://arxiv.org/abs/2404.02905#bytedance: “Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction”, Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, Liwei Wang

link-bibliography
https://arxiv.org/abs/2312.02116: “GIVT: Generative Infinite-Vocabulary Transformers”, Michael Tschannen, Cian Eastwood, Fabian Mentzer

link-bibliography
https://arxiv.org/abs/2309.15505: “Finite Scalar Quantization (FSQ): VQ-VAE Made Simple”, Fabian Mentzer, David Minnen, Eirikur Agustsson, Michael Tschannen

link-bibliography
https://arxiv.org/abs/2304.13731: “TANGO: Text-To-Audio Generation Using Instruction-Tuned LLM and Latent Diffusion Model”, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Soujanya Poria

link-bibliography
https://arxiv.org/abs/2304.13705: “ACT: Learning Fine-Grained Bimanual Manipulation With Low-Cost Hardware”, Tony Z. Zhao, Vikash Kumar, Sergey Levine, Chelsea Finn

link-bibliography
https://arxiv.org/abs/2209.00588: “IRIS: Transformers Are Sample-Efficient World Models”, Vincent Micheli, Eloi Alonso, François Fleuret

link-bibliography
https://arxiv.org/abs/2205.08535: “AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars”, Fangzhou Hong, Mingyuan Zhang, Liang Pan, Zhongang Cai, Lei Yang, Ziwei Liu

link-bibliography
https://arxiv.org/abs/2205.04421#microsoft: “NaturalSpeech: End-To-End Text to Speech Synthesis With Human-Level Quality”, Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, Yuanhao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu

link-bibliography
https://arxiv.org/abs/2204.03638#facebook: “TATS: Long Video Generation With Time-Agnostic VQGAN and Time-Sensitive Transformer”, Songwei Ge, Thomas Hayes, Harry Yang, Xi Yin, Guan Pang, David Jacobs, Jia-Bin Huang, Devi Parikh

link-bibliography
https://arxiv.org/abs/2203.01993: “Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values”, Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

link-bibliography
https://arxiv.org/abs/2110.04627#google: “Vector-Quantized Image Modeling With Improved VQGAN”, Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu

link-bibliography
https://arxiv.org/abs/2201.07520#facebook: “CM3: A Causal Masked Multimodal Model of the Internet”, Armen Aghajanyan, Bernie Huang, Candace Ross, Vladimir Karpukhin, Hu Xu, Naman Goyal, Dmytro Okhonko, Mandar Joshi, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer

link-bibliography
2022-liu-2.pdf: “Design Guidelines for Prompt Engineering Text-To-Image Generative Models”, Vivian Liu, Lydia B. Chilton

link-bibliography
https://arxiv.org/abs/2112.15283#baidu: “ERNIE-ViLG: Unified Generative Pre-Training for Bidirectional Vision-Language Generation”, Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

link-bibliography
https://arxiv.org/abs/2112.10752: “High-Resolution Image Synthesis With Latent Diffusion Models”, Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer

link-bibliography
https://arxiv.org/abs/2111.11133: “L-Verse: Bidirectional Generation Between Image and Text”, Taehoon Kim, Gwangmo Song, Sihaeng Lee, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae

link-bibliography
https://arxiv.org/abs/2106.04615#deepmind: “Vector Quantized Models for Planning”, Sherjil Ozair, Yazhe Li, Ali Razavi, Ioannis Antonoglou, Aäron van den Oord, Oriol Vinyals

link-bibliography
https://arxiv.org/abs/2104.10157: “VideoGPT: Video Generation Using VQ-VAE and Transformers”, Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas

link-bibliography
https://openai.com/index/dall-e/: “DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, Aditya A. Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Mark Chen, Rewon Child, Vedant Misra, Pamela Mishkin, Gretchen Krueger, Sandhini Agarwal, Ilya Sutskever

link-bibliography
https://arxiv.org/abs/2011.10650#openai: “Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images”, Rewon Child

link-bibliography
https://arxiv.org/abs/2007.03898#nvidia: “NVAE: A Deep Hierarchical Variational Autoencoder”, Arash Vahdat, Jan Kautz

link-bibliography
https://cdn.openai.com/papers/jukebox.pdf: “Jukebox: A Generative Model for Music”, Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever

link-bibliography
https://openai.com/research/jukebox: “Jukebox: We’re Introducing Jukebox, a Neural Net That Generates Music, including Rudimentary Singing, As Raw Audio in a Variety of Genres and Artist Styles. We’re Releasing the Model Weights and Code, along With a Tool to Explore the Generated Samples.”, Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever

link-bibliography
https://arxiv.org/abs/1910.13461#facebook: “BART: Denoising Sequence-To-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension”, Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

link-bibliography
https://openai.com/research/how-ai-training-scales: “How AI Training Scales”, Sam McCandlish, Jared Kaplan, Dario Amodei

link-bibliography
2011-vincent.pdf: “A Connection Between Score Matching and Denoising Autoencoders”, Pascal Vincent

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]