- See Also
-
Links
- “Transfusion: Predict the Next Token and Diffuse Images With One Multi-Modal Model”, Zhou et al 2024
- “JPEG-LM: LLMs As Image Generators With Canonical Codec Representations”, Han et al 2024
- “MAR: Autoregressive Image Generation without Vector Quantization”, Li et al 2024
- “STAR: Scale-Wise Text-To-Image Generation via Auto-Regressive Representations”, Ma et al 2024
- “Chameleon: Mixed-Modal Early-Fusion Foundation Models”, Team 2024
- “Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction”, Tian et al 2024
- “IconShop: Text-Guided Vector Icon Synthesis With Autoregressive Transformers”, Wu et al 2023b
- “Rejuvenating Image-GPT As Strong Visual Representation Learners”, Ren et al 2023
- “Image Captioners Are Scalable Vision Learners Too”, Tschannen et al 2023
- “Artificial Intelligence and Art: Identifying the Esthetic Judgment Factors That Distinguish Human & Machine-Generated Artwork”, Samo & Highhouse 2023
- “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, Wang et al 2023
- “Retrieval-Augmented Multimodal Language Modeling”, Yasunaga et al 2022
- “Draft-And-Revise: Effective Image Generation With Contextual RQ-Transformer”, Lee et al 2022
- “CogVideo: Large-Scale Pretraining for Text-To-Video Generation via Transformers”, Hong et al 2022
- “CogView2: Faster and Better Text-To-Image Generation via Hierarchical Transformers”, Ding et al 2022
- “MaskGIT: Masked Generative Image Transformer”, Chang et al 2022
- “CM3: A Causal Masked Multimodal Model of the Internet”, Aghajanyan et al 2022
- “ERNIE-ViLG: Unified Generative Pre-Training for Bidirectional Vision-Language Generation”, Zhang et al 2021
- “Emojich—Zero-Shot Emoji Generation Using Russian Language: a Technical Report”, Shonenkov et al 2021
- “LAFITE: Towards Language-Free Training for Text-To-Image Generation”, Zhou et al 2021
- “NÜWA: Visual Synthesis Pre-Training for Neural VisUal World CreAtion”, Wu et al 2021
- “L-Verse: Bidirectional Generation Between Image and Text”, Kim et al 2021
- “Telling Creative Stories Using Generative Visual Aids”, Ali & Parikh 2021
- “Unifying Multimodal Transformer for Bi-Directional Image and Text Generation”, Huang et al 2021
- “Illiterate DALL·E Learns to Compose”, Singh et al 2021
- “What Users Want? WARHOL: A Generative Model for Recommendation”, Samaran et al 2021
- “ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation”, Zhu et al 2021
- “Chinese AI Lab Challenges Google, OpenAI With a Model of 1.75 Trillion Parameters”, Du 2021
- “M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”, Zhang et al 2021
- “CogView: Mastering Text-To-Image Generation via Transformers”, Ding et al 2021
- “GODIVA: Generating Open-DomaIn Videos from NAtural Descriptions”, Wu et al 2021
- “VideoGPT: Video Generation Using VQ-VAE and Transformers”, Yan et al 2021
- “China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-Scale Pretraining Model.”, Synced 2021
- “Paint by Word”, Bau et al 2021
- “Generating Images With Sparse Representations”, Nash et al 2021
- “M6: A Chinese Multimodal Pretrainer”, Lin et al 2021
- “DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, Ramesh et al 2021
- “Taming Transformers for High-Resolution Image Synthesis”, Esser et al 2020
- “Text-To-Image Generation Grounded by Fine-Grained User Attention”, Koh et al 2020
- “X-LXMERT: Paint, Caption and Answer Questions With Multi-Modal Transformers”, Cho et al 2020
- “IGPT: Generative Pretraining from Pixels”, Chen et al 2020
- “Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Chen et al 2020
- “The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, Hao 2020
- “Conceptual Captions: A Cleaned, Hypernymed, Image Alt-Text Dataset For Automatic Image Captioning”, Sharma et al 2018
- “Image Transformer”, Parmar et al 2018
- “VQ-VAE: Neural Discrete Representation Learning”, Oord et al 2017
- “Categorical Reparameterization With Gumbel-Softmax”, Jang et al 2016
- “The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables”, Maddison et al 2016
- “Borisdayma/dalle-Mini: DALL·E-Mini”
- “Kingnobro/IconShop: (SIGGRAPH Asia 2023) Code of "IconShop: Text-Guided Vector Icon Synthesis With Autoregressive Transformers"”
- “IconShop”
- “The Little Red Boat Story (Make-A-Scene): Our Own Model Was Used to Generate All the Images in the Story, by Providing a Text and Simple Sketch Input”
- Sort By Magic
- Miscellaneous
- Bibliography
See Also
Links
“Transfusion: Predict the Next Token and Diffuse Images With One Multi-Modal Model”, Zhou et al 2024
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
“JPEG-LM: LLMs As Image Generators With Canonical Codec Representations”, Han et al 2024
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
“MAR: Autoregressive Image Generation without Vector Quantization”, Li et al 2024
MAR: Autoregressive Image Generation without Vector Quantization
“STAR: Scale-Wise Text-To-Image Generation via Auto-Regressive Representations”, Ma et al 2024
STAR: Scale-wise Text-to-image generation via Auto-Regressive representations
“Chameleon: Mixed-Modal Early-Fusion Foundation Models”, Team 2024
“Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction”, Tian et al 2024
Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction
“IconShop: Text-Guided Vector Icon Synthesis With Autoregressive Transformers”, Wu et al 2023b
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers
“Rejuvenating Image-GPT As Strong Visual Representation Learners”, Ren et al 2023
Rejuvenating image-GPT as Strong Visual Representation Learners
“Image Captioners Are Scalable Vision Learners Too”, Tschannen et al 2023
“Artificial Intelligence and Art: Identifying the Esthetic Judgment Factors That Distinguish Human & Machine-Generated Artwork”, Samo & Highhouse 2023
“VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, Wang et al 2023
VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
“Retrieval-Augmented Multimodal Language Modeling”, Yasunaga et al 2022
“Draft-And-Revise: Effective Image Generation With Contextual RQ-Transformer”, Lee et al 2022
Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer
“CogVideo: Large-Scale Pretraining for Text-To-Video Generation via Transformers”, Hong et al 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
“CogView2: Faster and Better Text-To-Image Generation via Hierarchical Transformers”, Ding et al 2022
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
“MaskGIT: Masked Generative Image Transformer”, Chang et al 2022
“CM3: A Causal Masked Multimodal Model of the Internet”, Aghajanyan et al 2022
“ERNIE-ViLG: Unified Generative Pre-Training for Bidirectional Vision-Language Generation”, Zhang et al 2021
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
“Emojich—Zero-Shot Emoji Generation Using Russian Language: a Technical Report”, Shonenkov et al 2021
Emojich—zero-shot emoji generation using Russian language: a technical report
“LAFITE: Towards Language-Free Training for Text-To-Image Generation”, Zhou et al 2021
LAFITE: Towards Language-Free Training for Text-to-Image Generation
“NÜWA: Visual Synthesis Pre-Training for Neural VisUal World CreAtion”, Wu et al 2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
“L-Verse: Bidirectional Generation Between Image and Text”, Kim et al 2021
“Telling Creative Stories Using Generative Visual Aids”, Ali & Parikh 2021
“Unifying Multimodal Transformer for Bi-Directional Image and Text Generation”, Huang et al 2021
Unifying Multimodal Transformer for Bi-directional Image and Text Generation
“Illiterate DALL·E Learns to Compose”, Singh et al 2021
“What Users Want? WARHOL: A Generative Model for Recommendation”, Samaran et al 2021
What Users Want? WARHOL: A Generative Model for Recommendation
“ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation”, Zhu et al 2021
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation
“Chinese AI Lab Challenges Google, OpenAI With a Model of 1.75 Trillion Parameters”, Du 2021
Chinese AI lab challenges Google, OpenAI with a model of 1.75 trillion parameters
“M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”, Zhang et al 2021
M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis
“CogView: Mastering Text-To-Image Generation via Transformers”, Ding et al 2021
CogView: Mastering Text-to-Image Generation via Transformers
“GODIVA: Generating Open-DomaIn Videos from NAtural Descriptions”, Wu et al 2021
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
“VideoGPT: Video Generation Using VQ-VAE and Transformers”, Yan et al 2021
“China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-Scale Pretraining Model.”, Synced 2021
“Paint by Word”, Bau et al 2021
“Generating Images With Sparse Representations”, Nash et al 2021
“M6: A Chinese Multimodal Pretrainer”, Lin et al 2021
“DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, Ramesh et al 2021
“Taming Transformers for High-Resolution Image Synthesis”, Esser et al 2020
“Text-To-Image Generation Grounded by Fine-Grained User Attention”, Koh et al 2020
Text-to-Image Generation Grounded by Fine-Grained User Attention
“X-LXMERT: Paint, Caption and Answer Questions With Multi-Modal Transformers”, Cho et al 2020
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
“IGPT: Generative Pretraining from Pixels”, Chen et al 2020
“Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Chen et al 2020
“The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, Hao 2020
“Image Transformer”, Parmar et al 2018
“VQ-VAE: Neural Discrete Representation Learning”, Oord et al 2017
“Categorical Reparameterization With Gumbel-Softmax”, Jang et al 2016
“The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables”, Maddison et al 2016
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
“Borisdayma/dalle-Mini: DALL·E-Mini”
“Kingnobro/IconShop: (SIGGRAPH Asia 2023) Code of "IconShop: Text-Guided Vector Icon Synthesis With Autoregressive Transformers"”
“IconShop”
IconShop:
“The Little Red Boat Story (Make-A-Scene): Our Own Model Was Used to Generate All the Images in the Story, by Providing a Text and Simple Sketch Input”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
multimodal-synthesis
bidirectional-generation
generative-models
image-synthesis
Miscellaneous
Bibliography
-
https://arxiv.org/abs/2405.09818#facebook
: “Chameleon: Mixed-Modal Early-Fusion Foundation Models”, -
https://arxiv.org/abs/2404.02905#bytedance
: “Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction”, -
2023-wu-2.pdf
: “IconShop: Text-Guided Vector Icon Synthesis With Autoregressive Transformers”, -
https://arxiv.org/abs/2312.02147
: “Rejuvenating Image-GPT As Strong Visual Representation Learners”, -
2023-samo.pdf
: “Artificial Intelligence and Art: Identifying the Esthetic Judgment Factors That Distinguish Human & Machine-Generated Artwork”, -
https://arxiv.org/abs/2301.02111#microsoft
: “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, -
https://arxiv.org/abs/2211.12561#facebook
: “Retrieval-Augmented Multimodal Language Modeling”, -
https://arxiv.org/abs/2205.15868
: “CogVideo: Large-Scale Pretraining for Text-To-Video Generation via Transformers”, -
https://arxiv.org/abs/2204.14217#baai
: “CogView2: Faster and Better Text-To-Image Generation via Hierarchical Transformers”, -
https://arxiv.org/abs/2201.07520#facebook
: “CM3: A Causal Masked Multimodal Model of the Internet”, -
https://arxiv.org/abs/2112.15283#baidu
: “ERNIE-ViLG: Unified Generative Pre-Training for Bidirectional Vision-Language Generation”, -
https://arxiv.org/abs/2111.11133
: “L-Verse: Bidirectional Generation Between Image and Text”, -
https://en.pingwest.com/a/8693#baai
: “Chinese AI Lab Challenges Google, OpenAI With a Model of 1.75 Trillion Parameters”, -
https://arxiv.org/abs/2105.14211#alibaba
: “M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”, -
https://arxiv.org/abs/2105.13290#baai
: “CogView: Mastering Text-To-Image Generation via Transformers”, -
https://arxiv.org/abs/2104.10157
: “VideoGPT: Video Generation Using VQ-VAE and Transformers”, -
https://syncedreview.com/2021/03/23/chinas-gpt-3-baai-introduces-superscale-intelligence-model-wu-dao-1-0/#baai
: “China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-Scale Pretraining Model.”, -
https://openai.com/research/dall-e
: “DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, -
https://arxiv.org/abs/2009.11278#allen
: “X-LXMERT: Paint, Caption and Answer Questions With Multi-Modal Transformers”, -
2020-chen-2.pdf#openai
: “IGPT: Generative Pretraining from Pixels”, -
https://openai.com/index/image-gpt/
: “Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, -
https://www.technologyreview.com/2020/02/17/844721/ai-openai-moonshot-elon-musk-sam-altman-greg-brockman-messy-secretive-reality/
: “The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, -
2018-sharma.pdf#google
: “Conceptual Captions: A Cleaned, Hypernymed, Image Alt-Text Dataset For Automatic Image Captioning”,