MAMBA:
i love when a new base model has novel behavior!
I was running my usual Jabberwocky test (see if it can recite the whole thing from "'Twas brillig" alone)
and not only did it succeed but, it's also happy to make up new stanzas (pic2), unlike transformer models!
temp=0.0
Dec 6, 2023 · 5:24 AM UTC
i've never seen this sort of thing with zero temperature on GPT or llama (at least without explicit instruction on instruct/chat/RLHF models), and i've been running jabberwocky tests since gpt3 davinci
normally transformers at temp=0.0 can NOT get creative with this sort of stuff. they are stuck in memorization loops, and at the very best repeat lines that came before
Mamba also eventually gets stuck repeating, but prior to that there's a spark of something new :)
to be fully fair, the two screenshots are not from the same run, pic2's prompt had the full first stanza of jabberwocky. so i wanted to make sure to post pic1 where only first line is in the prompt.
doesn't effect the conclusion though
i feel like i should clarify,
transformers have been able to memorize Jabberwocky since 2020, and now pretty much any off-the-shelf llama will have it fully memorized
the novel behavior is in how it got creative with it after, without explicit instruction at zero temperature
gwern.net/gpt-3#fn12
here's the context for when we were trying really hard to get gpt-3 davinci to generate creative variations of jabberwocky, and how hard it was to find a prompt that would snap it out of reciting memorized lines