‘diffusion model’ tag

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

Wikipedia

Miscellaneous

Bibliography

https://arxiv.org/abs/2412.20292: “An Analytic Theory of Creativity in Convolutional Diffusion Models”, Mason Kamb, Surya Ganguli

link-bibliography
https://arxiv.org/abs/2412.06771#deepmind: “Proactive Agents for Multi-Turn Text-To-Image Generation Under Uncertainty”, Meera Hahn, Wenjun Zeng, Nithish Kannen, Rich Galt, Kartikeya Badola, Been Kim, Zi Wang

link-bibliography
https://arxiv.org/abs/2410.10629#nvidia: “SANA: Efficient High-Resolution Image Synthesis With Linear Diffusion Transformers”, Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, Song Han

link-bibliography
https://arxiv.org/abs/2409.17410: “Copying Style, Extracting Value: Illustrators’ Perception of AI Style Transfer and Its Impact on Creative Labor”, Julien Porquet, Sitong Wang, Lydia B. Chilton

link-bibliography
https://arxiv.org/abs/2409.15997#novelai: “Improvements to SDXL in NovelAI Diffusion V3”, Juan Ossa, Eren Doğan, Alex Birch, F. Johnson

link-bibliography
https://arxiv.org/abs/2404.01291: “Evaluating Text-To-Visual Generation With Image-To-Text Generation”, Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan

link-bibliography
https://arxiv.org/abs/2403.13802: “ZigMa: Zigzag Mamba Diffusion Model”, Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Bjorn Ommer

link-bibliography
https://arxiv.org/abs/2401.08741: “Fixed Point Diffusion Models”, Xingjian Bai, Luke Melas-Kyriazi

link-bibliography
https://arxiv.org/abs/2312.02139: “DiffiT: Diffusion Vision Transformers for Image Generation”, Ali Hatamizadeh, Jiaming Song, Guilin Liu, Jan Kautz, Arash Vahdat

link-bibliography
https://arxiv.org/abs/2311.18829#microsoft: “MicroCinema: A Divide-And-Conquer Approach for Text-To-Video Generation”, Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Xiaoyan Sun, Chong Luo, Baining Guo

link-bibliography
https://arxiv.org/abs/2311.16465: “TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering”, Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

link-bibliography
https://arxiv.org/abs/2311.17042#stability: “Adversarial Diffusion Distillation”, Axel Sauer, Dominik Lorenz, Andreas Blattmann, Robin Rombach

link-bibliography
https://arxiv.org/abs/2311.12092: “Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models”, Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, David Bau

link-bibliography
https://arxiv.org/abs/2311.09257#google: “UFOGen: You Forward Once Large Scale Text-To-Image Generation via Diffusion GANs”, Yanwu Xu, Yang Zhao, Zhisheng Xiao, Tingbo Hou

link-bibliography
https://arxiv.org/abs/2311.04145#alibaba: “I2VGen-XL: High-Quality Image-To-Video Synthesis via Cascaded Diffusion Models”, Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, Jingren Zhou

link-bibliography
https://arxiv.org/abs/2310.16825: “CommonCanvas: An Open Diffusion Model Trained With Creative-Commons Images”, Aaron Gokaslan, A. Feder Cooper, Jasmine Collins, Landan Seguin, Austin Jacobson, Mihir Patel, Jonathan Frankle, Cory Stephenson, Volodymyr Kuleshov

link-bibliography
https://arxiv.org/abs/2309.15807#facebook: “Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack”, Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda, Devi Parikh

link-bibliography
https://arxiv.org/abs/2309.06380: “InstaFlow: One Step Is Enough for High-Quality Diffusion-Based Text-To-Image Generation”, Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, Qiang Liu

link-bibliography
https://arxiv.org/abs/2307.01952#stability: “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis”, Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach

link-bibliography
https://arxiv.org/pdf/2307.01952#page=3&org=stability: “SDXL § Micro-Conditioning: Conditioning the Model on Image Size”, Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach

link-bibliography
https://arxiv.org/abs/2306.07691: “StyleTTS 2: Towards Human-Level Text-To-Speech through Style Diffusion and Adversarial Training With Large Speech Language Models”, Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani

link-bibliography
2023-samo.pdf: “Artificial Intelligence and Art: Identifying the Esthetic Judgment Factors That Distinguish Human & Machine-Generated Artwork”, Andrew Samo, Scott Highhouse

link-bibliography
https://arxiv.org/abs/2305.16269: “UDPM: Upsampling Diffusion Probabilistic Models”, Shady Abu-Hussein, Raja Giryes

link-bibliography
https://arxiv.org/abs/2303.14389: “Masked Diffusion Transformer Is a Strong Image Synthesizer”, Shanghua Gao, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan

link-bibliography
https://arxiv.org/abs/2303.01469#openai: “Consistency Models”, Yang Song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever

link-bibliography
https://arxiv.org/abs/2302.04222: “Glaze: Protecting Artists from Style Mimicry by Text-To-Image Models”, Shawn Shan, Jenna Cryan, Emily Wenger, Haitao Zheng, Rana Hanocka, Ben Y. Zhao

link-bibliography
https://arxiv.org/abs/2302.01329#google: “Dreamix: Video Diffusion Models Are General Video Editors”, Eyal Molad, Eliahu Horwitz, Dani Valevski, Alex Rav Acha, Yossi Matias, Yael Pritch, Yaniv Leviathan, Yedid Hoshen

link-bibliography
https://raw.githubusercontent.com/flavioschneider/master-thesis/main/audio_diffusion_thesis.pdf: “Archisound: Audio Generation With Diffusion”, Flavio Schneider

link-bibliography
https://arxiv.org/abs/2212.10562#google: “Character-Aware Models Improve Visual Text Rendering”, Rosanne Liu, Dan Garrette, Chitwan Saharia, William Chan, Adam Roberts, Sharan Narang, Irina Blok, R. J. Mical, Mohammad Norouzi, Noah Constant

link-bibliography
https://arxiv.org/abs/2212.08751#openai: “Point·E: A System for Generating 3D Point Clouds from Complex Prompts”, Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, Mark Chen

link-bibliography
https://arxiv.org/abs/2211.09788: “DiffusionDet: Diffusion Model for Object Detection”, Shoufa Chen, Peize Sun, Yibing Song, Ping Luo

link-bibliography
https://arxiv.org/abs/2211.09800: “InstructPix2Pix: Learning to Follow Image Editing Instructions”, Tim Brooks, Aleksander Holynski, Alexei A. Efros

link-bibliography
https://arxiv.org/abs/2211.01324#nvidia: “EDiff-I: Text-To-Image Diffusion Models With an Ensemble of Expert Denoisers”, Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu

link-bibliography
https://arxiv.org/abs/2210.07508#sony: “Hierarchical Diffusion Models for Singing Voice Neural Vocoder”, Naoya Takahashi, Mayank Kumar, Singh, Yuki Mitsufuji

link-bibliography
https://arxiv.org/abs/2210.03142#google: “On Distillation of Guided Diffusion Models”, Chenlin Meng, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, Tim Salimans

link-bibliography
https://arxiv.org/abs/2209.12892: “g.pt: Learning to Learn With Generative Models of Neural Network Checkpoints”, William Peebles, Ilija Radosavovic, Tim Brooks, Alexei A. Efros, Jitendra Malik

link-bibliography
https://arxiv.org/abs/2208.12242#google: “DreamBooth: Fine Tuning Text-To-Image Diffusion Models for Subject-Driven Generation”, Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman

link-bibliography
https://arxiv.org/abs/2208.09392: “Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise”, Arpit Bansal, Eitan Borgnia, Hong-Min Chu, Jie S. Li, Hamid Kazemi, Furong Huang, Micah Goldblum, Jonas Geiping, Tom Goldstein

link-bibliography
https://arxiv.org/abs/2208.01618: “An Image Is Worth One Word: Personalizing Text-To-Image Generation Using Textual Inversion”, Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or

link-bibliography
https://arxiv.org/abs/2207.09814#microsoft: “NUWA-∞: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis”, Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

link-bibliography
https://arxiv.org/abs/2205.16007#microsoft: “Improved Vector Quantized Diffusion Models”, Zhicong Tang, Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen

link-bibliography
https://arxiv.org/abs/2205.11487#google: “Imagen: Photorealistic Text-To-Image Diffusion Models With Deep Language Understanding”, Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J. Fleet, Mohammad Norouzi

link-bibliography
https://arxiv.org/abs/2205.07460: “Diffusion Models for Adversarial Purification”, Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, Anima Anandkumar

link-bibliography
https://arxiv.org/abs/2112.10752: “High-Resolution Image Synthesis With Latent Diffusion Models”, Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer

link-bibliography
https://arxiv.org/abs/2112.05744: “More Control for Free! Image Synthesis With Semantic Diffusion Guidance”, Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, Trevor Darrell

link-bibliography
https://arxiv.org/abs/2106.09685#microsoft: “LoRA: Low-Rank Adaptation of Large Language Models”, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen

link-bibliography
https://cascaded-diffusion.github.io/: “CDM: Cascaded Diffusion Models for High Fidelity Image Generation”, Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet, Mohammad Norouzi, Tim Salimans

link-bibliography
https://arxiv.org/abs/2105.05233#openai: “Diffusion Models Beat GANs on Image Synthesis”, Prafulla Dhariwal, Alex Nichol

link-bibliography
https://arxiv.org/abs/2104.07636#google: “Image Super-Resolution via Iterative Refinement”, Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, Mohammad Norouzi

link-bibliography
https://arxiv.org/abs/2102.09672#openai: “Improved Denoising Diffusion Probabilistic Models”, Alex Nichol, Prafulla Dhariwal

link-bibliography
2018-sharma.pdf#google: “Conceptual Captions: A Cleaned, Hypernymed, Image Alt-Text Dataset For Automatic Image Captioning”, Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut

link-bibliography
2011-vincent.pdf: “A Connection Between Score Matching and Denoising Autoencoders”, Pascal Vincent

link-bibliography
https://www.jmlr.org/papers/volume6/hyvarinen05a/hyvarinen05a.pdf: “Estimation of Non-Normalized Statistical Models by Score Matching”, Aapo Hyvärinen

link-bibliography