A side effect of preference-learning approaches like RLHF is a severe loss of ‘diversity’ or ‘creativity’ in outputs, analogous to broader mode collapse in generative models.
Mode collapse is the observation that many generative deep learning models lose their ‘creativity’ or ‘diversity’ of responses after post-training improvements like RLHF or instruction-tuning. This goes beyond the intended ‘chatbot’ or ‘assistant’, and has side-effects.
Mode collapse harms esthetics: outputs start to sound the same, like “AI slop” or “ChatGPTese”, or look somehow similar, like “the Midjourney look.” This cripples creative uses like creative writing. And this damage can manifest in strange ways, like models refusing to write non-rhyming poetry & subtly steering non-rhyming inputs towards rhyming, or being unable to generate random numbers.
It also makes the temperature sampling hyperparameter useless, due to flattened logits, which makes it harder to search for correct answers using a model (because you cannot raise the temperature to sample many different completions—the distribution is the same at all temperatures, WYSIWYG—rendering best-of or inner-monologue sampling less effective). Further, the models lose their ability to simulate diverse populations or content, undermining attempts to do in silico psychology/economics/sociology/etc with LLMs.
Why does mode collapse persist? Because it often greatly increases the quality of the ‘default’ user-friendly outputs and looks shiny; while the tradeoff of loss of diversity can only be seen corpus-wide and not on initial casual use (eg. image models like DALL·E 2 vs 3), and the other side-effects are subtle and hard to trace to their origin in the post-training tuning. Further, because of the extensive use of ChatGPT Internet-wide, particularly among researchers & corporations using the OA API to generate, rate, rewrite, and process samples, often in order to create competing models, even models which are not RLHF-tuned wind up imitating ChatGPT’s mode collapse! (Curiously, the Claude-2/3 family of models, using RLAIF/Constitutional AI, has a ChatGPT-esque assistant personality, but the esthetics of Claude outputs do not seem as revolting.) So while mode collapse has been widely noticed in many guises by users, it has been largely ignored by AI researchers & corporations, as apparently they consider it desirable or a minor nuisance.