MAMBA: i love when a new base model has novel behavior! I was running my usual Jabberwocky test (see if it can recite the whole thing from "'Twas brillig" alone) and not only did it succeed but, it's also happy to make up new stanzas (pic2), unlike transformer models! temp=0.0

Dec 6, 2023 · 5:24 AM UTC

i've never seen this sort of thing with zero temperature on GPT or llama (at least without explicit instruction on instruct/chat/RLHF models), and i've been running jabberwocky tests since gpt3 davinci
normally transformers at temp=0.0 can NOT get creative with this sort of stuff. they are stuck in memorization loops, and at the very best repeat lines that came before Mamba also eventually gets stuck repeating, but prior to that there's a spark of something new :)
to be fully fair, the two screenshots are not from the same run, pic2's prompt had the full first stanza of jabberwocky. so i wanted to make sure to post pic1 where only first line is in the prompt. doesn't effect the conclusion though
i feel like i should clarify, transformers have been able to memorize Jabberwocky since 2020, and now pretty much any off-the-shelf llama will have it fully memorized the novel behavior is in how it got creative with it after, without explicit instruction at zero temperature
gwern.net/gpt-3#fn12 here's the context for when we were trying really hard to get gpt-3 davinci to generate creative variations of jabberwocky, and how hard it was to find a prompt that would snap it out of reciting memorized lines
Replying to @mayfer
can it do novel iambic pentameter ? aka ask it to write a sonnet...
this is base model so i can't "ask it", needs some prompt engineering send me a prompt if you have ideas. would that be something gpt can't do? i don't think this model will be any different due to auditory awareness requirements?
Replying to @mayfer
Did you train it from scratch?
Replying to @mayfer
What is a “solid bloc of solid stuff”?