January 2021 Gwern.net newsletter with links on AI scaling up and down.
January 2021’s Gwern.net newsletter is now out; previous, December 2020 (archives). This is a collation of links and summary of major changes, overlapping with my Changelog & /r/gwern; brought to you by my donors on Patreon.
Writings
-
“Danbooru2020: A Large-Scale Crowdsourced & Tagged Anime Illustration Dataset”
-
Gwern.net: +return-to-top floating button; popups: can now be disabled (use the ‘gear’ icon); final reimplementation (dynamic JS now; memoizing the recursive inlining, however clever & elegant, turns out to have painful edge-cases & still not be efficient enough—web browsers really don’t like loading hundreds of kilobytes of extra HTML)
Links
AI
-
Scaling up:
-
“DALL·E 1: Creating Images from Text”, OpenAI (GPT-3-12.5b generating 1280744ya tokens → VQ-VAE pixels; generates illustration & photos); “CLIP (Contrastive Language-Image Pre-training): Connecting Text and Images”, OpenAI (et al 2021 : zero-shot image understanding via text description—useful for much more than just ranking DALL·E 1 samples by quality)
Further blessings of scale: simple contrastive training on n = 400m leads to remarkable generalization & combinatorial flexibility of image generation by DALL·E 1, and CLIP learns to reach image classification SOTA by zero-shot on many datasets, with more human-like errors & less degradation out of samples than rivals, while costing the same to train. OpenAI released their smallest CLIP model (the “ViT-B/32”-equivalent) and people are discovering it seems able to do just about anything without any further training—the paper notes that it does everything from “fine-grained object classification, geo-localization, action recognition in videos, and OCR”, but there’s so much more, and you can use it to generate image captions/descriptions, classify your anime images, pull a specific target image description by gradient ascent or out of another neural network such as an ImageNet BigGAN or TADNE StyleGAN2-ext (or, why not, synthesize images images embodying abstract concepts like emoji or words like “nightmare fuel” or “confusion”!), search your image datasets by embedding, find mislabeled images (eg. by using “upside down” as the prompt)… One wonders, like GPT-3, how much better the largest CLIP (“L/14-336px”) is and how many ways of using it (or DALL·E 1) remain to be found? And why prediction losses work so well in one place, but then contrastive elsewhere?
For perspective: there are newly-minted PhDs going on the job market who got excited about deep learning because of these new “resnet” things; undergrads who applied to grad school because BERT et al were blowing open NLP & extending neural supremacy to natural language would not yet have passed quals; and it has been only 1 academic semester since GPT-3 was announced. Or to put it quantitatively, for just sequence modeling: it has been 8,478 days since LSTM RNNs were published; 3,045 days since AlexNet’s ImageNet scores were released; 1,880 days since residual networks were published in a paper; 1,330 days since “Attention Is All You Need” hit Arxiv; 844 days since BERT’s paper was published; 718 days since GPT-2 was announced; 353 days since SimCLR, and 249 days since GPT-3 was; and 27 days since CLIP/DALL·E 1.1 Spring is coming. (Some still insist we need not worry about “overpopulation on Mars” for >18,264 more days…)
-
“Meta Pseudo Labels”, et al 2020 (90% on ImageNet by pretraining a meta-learning teacher using JFT-300M on a TPUv3-2048)
-
“Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity”, et al 2021 (1.57t-parameter GShard followup; the mixture-of-experts approach, while scaling stably, starts showing its limits)
-
-
Scaling down:
-
“DeiT: Training data-efficient image transformers & distillation through attention”, et al 2020 (scaling Transformer classifiers down to ImageNet+1-GPU); “BoTNet: Bottleneck Transformers for Visual Recognition”, et al 2021/ “Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet”, et al 2021 (hybrids); “not-so-BigGAN: Generating High-Fidelity Images on Small Compute with Wavelet-based Super-Resolution”, et al 2020/ “VQGAN: Taming Transformers for High-Resolution Image Synthesis”, et al 2020 (training >1024px Transformer GANs on just 2 GPUs)
Transformer supremacy in image-related tasks continues, and GANs are becoming increasingly hybridized. Do pure-GANs have a future, now that VAEs and autoregressive models are making such inroads into both the highest-quality & lowest-compute sample generation? To take the GAN/DRL analogy seriously, perhaps they were they ultimately a dead end, akin to trying to learn everything from rewards, and an adversarial GAN loss ought to be only the cherry on the cake of a large unsupervised/semi-supervised generative model.
-
“ZeRO-Offload: Democratizing Billion-Scale Model Training”, et al 2021 (partial CPU training for 13b-parameter models on 1 V100 GPU, scaling to 128 GPUs)
-
“Prefix-Tuning: Optimizing Continuous Prompts for Generation”, 2021 (could the PET & CLIP trick of averaging multiple embeddings to yield much better performance be reused for GPT-3 prompts to greatly improve prompting? The fact that the prefix-tuning, by directly optimizing the prompt embeddings, yields better performance than even single optimized text prompts, suggests so. The user could provide 3 or 4 similar prompts, and synthesize them into a single super-prompt to better program GPT-3…)
-
“Scaling down Deep Learning”, 2020 (cute: parametric simplified-MNIST for rapid iteration on highly-optimized tiny NNs: experiments in lottery-ticket & meta-learning of LRs/activations)
-
“The neural network of the Stockfish chess engine” (very lightweight NN designed for incremental recomputation over changing board states)
-
-
“Transformers in Vision: A Survey”, et al 2021
-
OpenAI departures: Dario Amodei, Sam McCandlish, Tom Brown, Tom Henighan, Chris Olah, Jack Clark, Ben Mann, Paul Christiano et al leave—most for an unspecified new entity (“the elves leave Middle Earth”?)
And the rest:
-
“2020 AI Alignment Literature Review and Charity Comparison”, Larks
-
“Grounded Language Learning Fast and Slow”, et al 2020
-
“DeBERTa: Decoding-enhanced BERT with Disentangled Attention”, et al 2020 ( SuperGLUE falls)
-
“Solving Mixed Integer Programs Using Neural Networks”, et al 2020/ et al 2021
-
“Towards Fully Automated Manga Translation”, et al 2020
-
“UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers”, et al 2021
-
“FERM: A Framework for Efficient Robotic Manipulation”, et al 2021 (contrastive semi-supervised learning + data augmentation for sample-efficiency)
-
“XMC-GAN: Cross-Modal Contrastive Learning for Text-to-Image Generation”, et al 2021
Genetics
Everything Is Heritable:
-
“Nurture might be nature: cautionary tales and proposed solutions”, et al 2021
-
“A genetic perspective on the association between exercise and mental health in the era of genome-wide association studies”, de 2020; “Evidence for shared genetics between physical activity, sedentary behavior and adiposity-related traits”, et al 2020
-
“Antidepressant Response in Major Depressive Disorder: A Genome-wide Association Study”, et al 2020
-
“Genome wide analysis of gene dosage in 24,092 individuals shows that 10,000 genes modulate cognitive ability”, et al 2020 (yep, still polygenic)
-
“GWAS of three molecular traits highlights core genes and pathways alongside a highly polygenic background”, Sinnott-et al 2021
-
“Genome-scale sequencing and analysis of human, wolf and bison DNA from 25,000 year-old sediment”, et al 2021 (incredible this is possible)
-
“Disentangling sex differences in the shared genetic architecture of PTSD, traumatic experiences, and social support with body size and composition”, et al 2021 ( LCV)
Recent Evolution:
-
“African genetic diversity and adaptation inform a precision medicine agenda”, et al 2021; “The influence of evolutionary history on human health and disease”, et al 2021; “Local adaptation and archaic introgression shape global diversity at human structural variant loci”, et al 2021
-
“Natural Selection in Contemporary Humans is Linked to Income and Substitution Effects”, Hugh-2021
-
“The diversity and function of sourdough starter microbiomes”, et al 2021 (crowdsourced sourdough show little trace of geographic origins?)
Engineering:
Statistics/Meta-Science
-
“The Quantum Field Theory on Which the Everyday World Supervenes”, 2021 (“…we have reason to be confident that the laws of physics underlying the phenomena of everyday life are completely known” because all unknown particles/fields are constrained to being extremely rare/weak, eg. by et al 2009 )
-
“How accurate are citations of frequently cited papers in biomedical literature?”, et al 2020 (includes original author’s evaluation of whether a citation of their work is correct)
-
“Energy-Efficient Algorithms”, et al 2016 ( reversible computing asymptotics: constant-factor stacks/arrays, 𝒪(log n) time/energy AVL trees, 𝒪(n) space sorts, & various 𝒪(Vertex+Edge) time/space/energy graph searches)
-
“The Optimizer’s Curse: Skepticism and Postdecision Surprise in Decision Analysis”, Smith & Winkler 200618ya (regression to the mean is everywhere; another example of why Bayes & decision theory are two great flavors that go great together)
Politics/Religion
-
“The Mechanisms of Cult Production: An Overview”, Xavier 2020 (see previously his blog roundup)
-
“When Prophecy Fails and Faith Persists: A Theoretical Overview”, 1999
-
“Why We Fight Over Fiction”, Robin Hanson
Psychology/Biology
-
“Still Alive”, Scott Alexander (announcement of SSC return as Substack newsletter ‘Astral Codex Ten’ & launching a low-cost psychiatry clinic ‘Lorien Psychiatry’)
-
“The Temporal Dynamics of Opportunity Costs: A Normative Account of Cognitive Fatigue and Boredom”, et al 2020
-
“A unified framework for association and prediction from vertex-wise grey-matter structure”, Couvy-et al 2020 (more morphometricity)
-
Common phenomena: “Sounds from seeing silent motion: Who hears them, and what looks loudest?”, 2018 (on ‘visual ear’; previously: 2008, et al 2017 )
-
“Predicting Mental Health From Followed Accounts on Twitter”, et al 2021 ( Registered Report: who you choose to follow says a lot about you—everything is correlated)
-
“No evidence for general intelligence in a fish”, et al 2021
-
Delirium tremens (why the pink Elephants in Dumbo? The inmates temporarily ran the asylum.)
-
“Microbiome connections with host metabolism and habitual diet from 1,098 deeply phenotyped individuals”, et al 2021
-
“Universal DNA methylation age across mammalian tissues”, et al 2021; “Whole-body senescent cell clearance alleviates age-related brain inflammation and cognitive impairment in mice”, et al 2021
-
“BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data”, et al 2021 (towards brain imitation learning)
-
Parker-Hulme murder case; The Slender Man stabbing (paracosms?)
-
Correction: Programming competition skills do not inversely correlate with job performance after all
Technology
-
“Baffles and Bastions: The Universal Features of Fortifications”, et al 2007
-
Footnote 36: “Redisturbed”: a unicase font experiment
Economics
-
“Businesses Aim to Pull Greenhouse Gases From the Air. It’s a Gamble”
-
“Does Advertising Actually Work?” (what could be more obvious than “advertising works”, and trivial to confirm with correlational data? Yet, the tedious saying “correlation ≠ causation” stubbornly insists on being true); “Digital Paywall Design: Implications for Content Demand and Subscriptions”, 2020 (NYT nag-paywall caused −9.9% reading; in line with all the other results)
-
“Who Gains and Who Loses from Credit Card Payments? Theory and Calibrations”, Schuh et al 201014ya (a compelling case for getting a rewards credit card if you’re a debit card user—why subsidize them so much?)
Fiction
-
“St Martin’s Four Wishes”, Anonymous medieval poet (trans. Dubin 201311ya)
Miscellaneous
-
But it’ll still be too many days ’till we say we’re sorry.↩︎