DALLE-2 has a secret language!? | Theories and explanations

Ғылым және технология

DALLE-2 has a secret language? No, it’s rather a secret vocabulary. Let’s see what happens and why the model behaves like this.
SPONSOR: Weights & Biases 👉 wandb.me/ai-co...
📺 Diffusion models and GLIDE explained: • Diffusion models expla...
📺 Imagen diffusion model: • Imagen, the DALL-E 2 c...
Check out our daily #MachineLearning Quiz Questions: / aicoffeebreak
➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak....
Paper 📜: Daras, Giannis and Alexandros G. Dimakis. “Discovering the Hidden Vocabulary of DALLE-2.” (2022). arxiv.org/abs/... or giannisdaras.g...
🔗 Original post by the authors: / 1531693093040230402
Author’s thread addressing critique: / 1532605363232444416
🔗 Josha Bach’s take: / 1531711345585860609
🔗 Benjamin Hilton’s take: / 1531780892972175361
🔗 rapha gontijo lopes’ take: / 1532579141542850560
🔗 k1uge’s take: / 1531736708903051265
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Don Rosenthal, Dres. Trost GbR, banana.dev -- Kyle Morris, Julián Salazar, Edvard Grødem, Vignesh Valliappan, Kevin Tsai, Mutual Information, Mike Ton
Outline:
00:00 DALL-E 2 has a secret vocabulary
01:05 Weights&Biases (Sponsor)
02:34 How DALL-E 2 responds to gibberish
05:00 Why does this happen?
07:59 Security implications (adversarial attacks)
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: / aicoffeebreak
Ko-fi: ko-fi.com/aico...
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
AICoffeeBreakQuiz: / aicoffeebreak
Twitter: / aicoffeebreak
Reddit: / aicoffeebreak
KZread: / aicoffeebreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Music 🎵 : Built to Last (Instrumental) - NEFFEX
Video editing: Nils Trost

Пікірлер: 21

@zahar18752 жыл бұрын
Well, the neural network HAS TO map any input to an image, so random strings would have to mean something
@stevenmitchell76972 жыл бұрын
Great analysis and also impressed with your nonchalant pronunciation of the gibberish.
@AICoffeeBreak
2 жыл бұрын
🤣🤣 thanks
@salomeshunamon Жыл бұрын
Some experiments I read about today suggest a theory that the VAE or autoencoder is lossy and can make smaller/shrunk/compressed/downsampled text illegible that removes lots of details even when just upscaling the same image that is input. The result of using ONLY a vae looked eerily like what happens to text jumble on these models of all sizes. So I wonder also if it could be that the text encoder, if big enough, just figures out the text or, if where there is no text, the text encoder can impute text better. But where guidance is needed or with smaller text-encoders, it cannot make sense of the scrambled eggs of lossy autoencoders. Maybe, just a thought to think about???
@DerPylz2 жыл бұрын
I like the silly words that Dall-E2 comes up with 🤣
@victorplacidorangel97072 жыл бұрын
I would LOVE Dalle-2 language classes lol I have noticed talking with a friend that all the gibberish sounded somewhat like... latin-esque. Makes pretty much sense the AI is relying on taxonomics, which is always grammatically treated as Latin (using the cases structure etc), but can be derived from pretty much any language (which actually is possibly one of our own practices in recording endagered languages). I've played a lot with this very friend trying to understand the ways the AIs make to aproximate meaning for word from context and frequency analysis. Surfing in the Wordle wave we've been playing on Pimantle, which traces the 2d map for google's word2vec algorithm, and it's frequently frustrating how winning the game is not always about thinking in meaning but in morphology of the words, or how sometimes very generic or ultra-specific far-fetched words score high in proximity of the secret word.
@victorplacidorangel9707
2 жыл бұрын
Funny to remember all the commotion back in 2017 when facebook's FAIR bot end-to-end-negotiator started speaking through its all own shortcut simplification of English, which people couldn't follow at first.
@theosalmon2 жыл бұрын
I think Ms. Coffee Bean is correct.
@jonatan01i
2 жыл бұрын
I'm high.
@linkanjarad28412 жыл бұрын
That's quite interesting! I tried to use the same method to get the GLIDE's vocabulary, but the text that it outputs is unreadable. Instead, I've just tried putting in random gibberish into the GLIDE model. It seems that "lkjnf" means airplanes (GLIDE almost always outputs images of planes) and "dnfnfjwnv" means some kind of ambulance (GLIDE almost always outputs vehicles that have sirens and checkered patterns that resemble ambulances). However, when I try puting in "a red dnfnfjwnv", it outputs images of red trucks. I'm guessing the vector representation of the word "dnfnfjwnv" is just close to that of "vehicle".
@AICoffeeBreak
2 жыл бұрын
It's very interesting what you say, but wait: You have acces to GLIDE? 🤯
@Micetticat2 жыл бұрын
I'm wondering if this type of behavior could happen also inside biological networks at deep levels, and there are "protection" layers that prevent that external stimuli would trigger strange memory recollections or strange language associations.
@giantbee97632 жыл бұрын
I think what Joscha Bach is trying to say is that Dalle-2 might be good at associating one concept to a word but not multiple concepts at once, (ie. not a language just vocabs). :D Though the language isn't fully gibberish after all, the effect is still very interesting, like accidentally stepping inside another alien dimension.
@AICoffeeBreak
2 жыл бұрын
Thanks, you are the first one here to deliver an explanation on Joscha Bach's take. 👏
@AICoffeeBreak
2 жыл бұрын
But I still not understand what he means. DALL-E 2 should capture two concepts at once, otherwise it could not do astronauts riding horses. Do I miss something?
@giantbee9763
2 жыл бұрын
@@AICoffeeBreak I might be wrong but I think he might be thinking that the gibberish text is only encoding one semantic concept at a time, though the gibberish text lives in the latent space of Dalle-2 which means if Dalle-2 can do more than one concept it can as well. Probably fooled by the fact that everyone when testing the gibberish treats them as single words, or rather single word single concept gibberish is occuring more frequently. I don't have access to Dalle-2(I wish :P) but If I do I would test if there's some sort of grammar in the gibberish and if it can be broken down to more than one semantic concepts.
@justinwhite27252 жыл бұрын
What's interesting to me is that when you pronounce those gibberish words, my brain just assumes you are talking in your native language, wheras if someone who lives locally were to pronounce them in an accent I'm more familiar with I'd recognize them as gibberish. I don't know if that's because my brain doesn't have enough training data for different accents or if yours as more (as people in other continents tend to have a broader experience with other languages than most people on North America - I'm in Canada).
@AICoffeeBreak
2 жыл бұрын
I totally understand what you mean. 😅 Yes, I do fall down to my native language pronunciation of the gibberish (otherwise I'd fail at the confident pronunciation aspect), so it does not sound very English.
@justinwhite2725
2 жыл бұрын
@@AICoffeeBreak it's kind of like adversarial neural networks isn't it? I didn't recognize the 'fake result' because it was presented with a modifier I wasn't familiar with 😂
@AICoffeeBreak
2 жыл бұрын
@@justinwhite2725 Haha, I like the analogy.
@manamsetty26642 жыл бұрын
🤯