Ai that makes thumbnails (or any image)

Ғылым және технология

A video about using AI to generate youtube thumbnails. I explore the classic GAN method and compare it with a newer method called diffusion. One turns out to be better than the other!
Reviewed by Andrew Carr: / andrew_n_carr
Disclaimer: All thumbnails were deleted after use. I do not aggregate youtube data.
The losses are not exactly inverse for the generator and discriminator because the two are not trained on the same data.
LINKS
Twitter: / max_romana
Discord: / discord
Patreon: / emergentgarden
The life engine: thelifeengine.net
SOURCES
Original GAN Paper: proceedings.neurips.cc/paper/...
Face interpolation: • StyleGAN2 Interpolatio...
BigGAN Paper: arxiv.org/abs/1809.11096
thispersondoesnotexist.com
Flower Gan: / 1527890938386857984
Katydid: • The Katydid (Leaf Bug)
Mantis: • Praying Mantis Hunts a...
Diffusion Paper: arxiv.org/abs/2006.11239
Diffusion beats GANs: arxiv.org/abs/2105.05233?curi...
Blog Post: gretel.ai/blog/diffusion-mode...
Diffusion explanation: • Diffusion Models | Pap...
Diffusion Visualization: / 1537042940475883520
Water Diffusion: • demo - (hot and cold w...
Dall-E 2: openai.com/dall-e-2/
Imagen: imagen.research.google/
CogView: arxiv.org/abs/2105.13290
Parti: parti.research.google/
TIMESTAMPS
(0:00) Intro
(0:32) The Goal
(1:11) The Data
(2:15) Latent Image Generators
(3:03) GANs
(4:35) GAN training
(7:45) Diffusion
(8:55) Diffusion training
(10:43) ☆Generated thumbnails☆
(13:58) Diffusion beats GANs
(15:27) Conclusion
(16:28) Outro
MUSIC
• Closed Circuits

Пікірлер: 61

@markmarketing7365 Жыл бұрын
It would be super awesome to have a GAN trained to do camouflage. In fact, there are papers that describe this already. They train a GAN with one NN to colour a triangle on a random position on a random background, and a second NN to try to detect this one. As a result the triangles take on patterns that are harder to make out. There's cool websites where you can try to spot these triangles yourself. I've always wanted to do a few variants of this. Firstly I'd love to see this done, but with a "poisonous" triangle added to the image along with the camouflaged triangle, one with some very distinct pattern. Then the spotting NN is penalised for detecting the position of that triangle. It would be awesome to see if aside from camouflage, mimicry would evolve - and which one would be more likely. Secondly a variant where some parameter influences the contour of the triangle as well, like a frayed edge, would be cool. I'm sure you'd get some crazy good results.
@darkwise8628
Жыл бұрын
hey, can you please post a link to such an website? thanks in advance!
@0xdecaf Жыл бұрын
Fantastic video, this channel deserves much more attention. You have a real talent for breaking down complex ideas and making them easy to understand. Thanks!
@Crasterius Жыл бұрын
This is an awesome project. I hope you can take this to the next level.
@HotDiceMiniatures Жыл бұрын
What an amazing video. Thank you Emergent Garden. What a great channel name btw. You're awesome.
@MakerBen Жыл бұрын
This is super neat! Amazing explanation of diffusion!
@sohamjobanputra2914 Жыл бұрын
Idea#1: AI that can generate great comments. Idea#2: AI that can generate a script for a movie/Stories. Idea#3: AI that can generate tips for demotivated people. Idea#4: AI that can tell, that when approximately human civilization will end. Idea#5: AI that can generate Idea like this.
@Jennn Жыл бұрын
Omg. This is the best sick day ever. Thank you sir for taking the time to teach us all that you have! I just cannot stop watching your content!
@SantosEnoque Жыл бұрын
🎉 I am glad KZread recommended your channel 🔥🔥🔥🔥 this is something else 🎉
@literailly Жыл бұрын
Awesome video and explanation, thank you!
@ninjalacoon Жыл бұрын
You know what. I am realizing that the process of diffusion is a lot like reddit's "the place" event that they have done in the past. People would contribute pixel colors individually to anywhere on the page canvas and it was always amazing to me to see how the image would "evolve" over time. People would organically arrive at recognizable images by seeing the patterns that would emerge from others that had laid down pixel colors before them. As the images begin to take shape, in an iterative way more pixels would come to fill in the gaps and hone it into a final form that would resemble a flag, a person's face, a logo, etc. In this case however, it was an image that was collectively well known to the people that participated. Obviously a lot of the images were coordinated and didn't undergo this process but there were a lot of areas where this seemed to be the case.
@HM-rf2ov Жыл бұрын
My experience with GANs is exactly the same. After solving many bugs and issues, I end up with "mode collapse".
@franzfungis4264 Жыл бұрын
Love the presentation!
@FrostCraftedMC Жыл бұрын
okay seeing the thumbnail change to what it is now made me watch this. im making this comment before i watch cause i wanna say that it made me wonder if you gave the the ai access to the youtube statistics to try and learn to make better thumbnails for this video, and if you did then thats cool thats why i wanted to watch this. but if you didnt, id love your input on if thats a good idea or even a possible idea.
@andreivlasenko527
Жыл бұрын
Yes, sounds like you can extend diffusion model with something like this(as diffusion model has some internal measure of how good image is(it's trained there for measuring noise, unrealisticness) you can try to put such statistics in that space for worse thumbails being worse in the same sense as noisy images worse - but it can be tricky that way, also something like using conditioning, like with text or class conditioned diffusion, to get the "slider" for generating better or worse thimbnails) - but one of the real problems is how to get such statistics, like you can not just use amount of views or something like that cause it's depends on so many things, like channel popularity, trend, youtube recomendations system and so on. Something like rate of click per view of thumbail would be good probably, but it's some internal youtube info, we're not gonna get
@Bjarkediedrage Жыл бұрын
Great video!! I sort of agree with the other comment. I'd say this video is deserving of a little better title that reflects its educational value and content. I only clicked because I was subscribed and I wanted to see if I had to unsubscribe. I'm picky!^^ and I think I've come to associate some clickbate with poor quality video content, and this is definitely not that!
@literailly Жыл бұрын
Can you similarly walk the latent space with a diffusion model by modifying the input noise?
@JadeFoxy Жыл бұрын
As long as diffusion models cannot generate samples in one forward pass i think GANs have a reason to exist in use cases where synthesis speed is an important factor.
@Kram1032 Жыл бұрын
larger datasets may not even be necessary. You can accomplish an increase in diversity by deduplicating the data you already got, potentially actually increasing performance with a smaller dataset! Deduplication may be tricky. But one method might be to train up a purposefully relatively small network to simply distinguish images. If it thinks two images are the same, chances are, the images are really similar. And to further improve this, you can train up *multiple* such networks and go with if like more than half of them think they are the same image, they are too similar and should be picked at random as a group - i.e. you group up "similar" images, then randomly select groups, and finally randomly select an image from each group. Alternatively, more easily, you can just discard all but one of the images of each group to shrink your dataset down to only sufficiently unique thumbnails.
@pooriaarab Жыл бұрын
Can you share a link to the dataset generated or a tutorial on how to do it? Also, is it feasible to create KZread Thumbnails with the current state of the art of AI?
@OMGitshimitis Жыл бұрын
Is there a hybrid approach? Like using diffusion to generate images and then a gan to tune that model? I'm not a computer scientist and I may be either saying something super stupid or super obvious but I'm genuinely curious.
@jnotjequel Жыл бұрын
can't wait to play the new MIAECROOFT: MRGROOTBU update 13:48
@realastropulse Жыл бұрын
Just a little after this was released, the most impressive diffusion based image generator yet was open-sourced. Stable Diffusion is the most promising AI image creator yet, at least until Parti's techniques are perfected and data researchers go back to the drawing board.
@gw6667
6 ай бұрын
Who's Parti?
@kalilinuxhikida9216 Жыл бұрын
So with thumb nails you could include the text of the video, so It would have to make a good thumb nail and text prompt
@CathrinMachinArt Жыл бұрын
amazing video
@RafaelSCalsaverini Жыл бұрын
What happens if you use a diffusion network as generator for a GAN?
@frost7423 Жыл бұрын
this is the first time i click on a click baity thumbnail and get good content
@andreivlasenko527 Жыл бұрын
Seems like my comment got deleted cause of arxiv link, I was saying you could try StyleGAN XL as it showed quite good performance with diverse datasets like imagenet, and trains relatively fast(despite big size) and second advise is using finetuning instead of training from scratch, it's much faster and more stable for gans
@EmergentGarden
Жыл бұрын
Oh yes, the best way to do it would be to fine-tune a big pretrained model like stylegan. But I'd rather do that with a diffusion model first, and maybe stylegan for comparison.
@fungi42021 Жыл бұрын
very cool
@CmdrTigerKing Жыл бұрын
so generating the perfect image is like a slot machine
@Graverman Жыл бұрын
great video
@DerfaelB Жыл бұрын
600/10
@pelodofonseca6106 Жыл бұрын
If it takes 24 hours to train a batch of images how do wombo and dall e generate images in less than a minute?
@enesmahmutkulak Жыл бұрын
Hi, can you share your datasets? Or is it from kaggle?
@CharlesVanNoland Жыл бұрын
The more I've learned about neural networks, particularly while watching Machine Learning Street Talk, the more I doubt when people say "these people do not exist" about the deepfaked face images. I believed it blindly before, but now I am concerned that it's basically just interpolating between faces, which is pretty great regardless, but I think someone needs to take the input images the network was trained on and compare them to the best most concise outputs it generates, and see which faces it most resembles - if not matches almost perfectly. Sure, with a GAN it's encoding down to a much lower dimensional latent variable, and then decoding back up to image resolution, but that still could just mean that it's just showing us faces it's learned, and interpolating between them within the latent variable space. At any rate, I'd just ;ike to see comparisons between the "random" outputs and the actual images that the network is trained on.
@willguggn2
Жыл бұрын
That's the point of latent spaces of human faces … ? Given the right parameters it should be able to generate every possible picture of a human face within and outside the training data.
@dunar10059 ай бұрын
5:02 is that by brute force? So the discriminator will see 1,000,000 pure noise pictures, and when, by chance the generator generates two black pixels besides each other, it will decide that it prefers that picture.?
@programorprogrammed Жыл бұрын
Looking at the patreon, we must all be broke
@yorkwestenhaver8680 Жыл бұрын
Fucking Great video man!!
@iDrewa Жыл бұрын
Imagen is pronounced I-mi-gen. Great vid!
@Landee Жыл бұрын
gg !
@CmdrTigerKing Жыл бұрын
This video's script was enirely AI generated. then read by an AI, and pictures created by an AI
@michaelmam1490 Жыл бұрын
12:23 Does anyone know if the Chinese text is intelligible?
@errantwashere Жыл бұрын
Errant was here
@JamesEdwardTracy
Жыл бұрын
Well we DID "invite" you. Now guess how.
@goranjosic Жыл бұрын
I recently played with Stable Diffusion beta 1.5, they have a trial and some points for everyone to try their model and my impression is that their diffusion model is really great only for generating artistic images, paintings, in many other situations it looks either too artificial or overfited - it copies too much, (that is my impression) and all faces looks awful - especially when compared to "Mid Journey" and their model and images. I guess, more work is needed on this type of neural network...?! _I'm not expert, just a hobby programmer_
@raptordarwish887 Жыл бұрын
Make a thumbnail of a bot sitting infront of the computer making/editing thumbnail
@petergibbons607 Жыл бұрын
i have a friend that is an "artist" (not really that good but thinks they are), and she is pissed about this AI thing, it's pretty funny, sucks to be an artist now :D (or soon when this thing gets really REALLY good)
@Agony. Жыл бұрын
As if youtuber's jobs weren't easy and lazy enough.
@RafaelSCalsaverini Жыл бұрын
Have you generated Adam Neely?
@alan2here Жыл бұрын
annealing defusion waveform collapse …
@artem945 Жыл бұрын
So the discriminator is basically solving the Turing's test...
@HighlyRegardted Жыл бұрын
Unfortunately…Oil is still the oil of the 21st century …
@EmergentGarden
Жыл бұрын
true lol
@ellinikoptero7355 Жыл бұрын
Man, love the vid, but for real change the thumbnail. Your current just falls too much into the uncanny valley, as you too know and I didn’t click on the vid for many hours, even considering unsubbing because I thought it was junk cluttering up my “subscribed” feed.
@caesaroftampa1266
Жыл бұрын
Worst possible reaction ever. I did not have the same reaction, I saw it and was intrigued. Reminded me of some of VSauce's thumbnails!
@EmergentGarden
Жыл бұрын
Not the reaction I was going for lol! I'll be messing around with the thumbnail/title, I figured uncanny ones would catch the eye but they can also freak people out.