Ai that makes thumbnails (or any image)

Ғылым және технология

A video about using AI to generate youtube thumbnails. I explore the classic GAN method and compare it with a newer method called diffusion. One turns out to be better than the other!
Reviewed by Andrew Carr: / andrew_n_carr
Disclaimer: All thumbnails were deleted after use. I do not aggregate youtube data.
The losses are not exactly inverse for the generator and discriminator because the two are not trained on the same data.
LINKS
Twitter: / max_romana
Discord: / discord
Patreon: / emergentgarden
The life engine: thelifeengine.net
SOURCES
Original GAN Paper: proceedings.neurips.cc/paper/...
Face interpolation: • StyleGAN2 Interpolatio...
BigGAN Paper: arxiv.org/abs/1809.11096
thispersondoesnotexist.com
Flower Gan: / 1527890938386857984
Katydid: • The Katydid (Leaf Bug)
Mantis: • Praying Mantis Hunts a...
Diffusion Paper: arxiv.org/abs/2006.11239
Diffusion beats GANs: arxiv.org/abs/2105.05233?curi...
Blog Post: gretel.ai/blog/diffusion-mode...
Diffusion explanation: • Diffusion Models | Pap...
Diffusion Visualization: / 1537042940475883520
Water Diffusion: • demo - (hot and cold w...
Dall-E 2: openai.com/dall-e-2/
Imagen: imagen.research.google/
CogView: arxiv.org/abs/2105.13290
Parti: parti.research.google/
TIMESTAMPS
(0:00) Intro
(0:32) The Goal
(1:11) The Data
(2:15) Latent Image Generators
(3:03) GANs
(4:35) GAN training
(7:45) Diffusion
(8:55) Diffusion training
(10:43) ☆Generated thumbnails☆
(13:58) Diffusion beats GANs
(15:27) Conclusion
(16:28) Outro
MUSIC
• Closed Circuits

Пікірлер: 61

  • @markmarketing7365
    @markmarketing7365 Жыл бұрын

    It would be super awesome to have a GAN trained to do camouflage. In fact, there are papers that describe this already. They train a GAN with one NN to colour a triangle on a random position on a random background, and a second NN to try to detect this one. As a result the triangles take on patterns that are harder to make out. There's cool websites where you can try to spot these triangles yourself. I've always wanted to do a few variants of this. Firstly I'd love to see this done, but with a "poisonous" triangle added to the image along with the camouflaged triangle, one with some very distinct pattern. Then the spotting NN is penalised for detecting the position of that triangle. It would be awesome to see if aside from camouflage, mimicry would evolve - and which one would be more likely. Secondly a variant where some parameter influences the contour of the triangle as well, like a frayed edge, would be cool. I'm sure you'd get some crazy good results.

  • @darkwise8628

    @darkwise8628

    Жыл бұрын

    hey, can you please post a link to such an website? thanks in advance!

  • @0xdecaf
    @0xdecaf Жыл бұрын

    Fantastic video, this channel deserves much more attention. You have a real talent for breaking down complex ideas and making them easy to understand. Thanks!

  • @Crasterius
    @Crasterius Жыл бұрын

    This is an awesome project. I hope you can take this to the next level.

  • @HotDiceMiniatures
    @HotDiceMiniatures Жыл бұрын

    What an amazing video. Thank you Emergent Garden. What a great channel name btw. You're awesome.

  • @MakerBen
    @MakerBen Жыл бұрын

    This is super neat! Amazing explanation of diffusion!

  • @sohamjobanputra2914
    @sohamjobanputra2914 Жыл бұрын

    Idea#1: AI that can generate great comments. Idea#2: AI that can generate a script for a movie/Stories. Idea#3: AI that can generate tips for demotivated people. Idea#4: AI that can tell, that when approximately human civilization will end. Idea#5: AI that can generate Idea like this.

  • @Jennn
    @Jennn Жыл бұрын

    Omg. This is the best sick day ever. Thank you sir for taking the time to teach us all that you have! I just cannot stop watching your content!

  • @SantosEnoque
    @SantosEnoque Жыл бұрын

    🎉 I am glad KZread recommended your channel 🔥🔥🔥🔥 this is something else 🎉

  • @literailly
    @literailly Жыл бұрын

    Awesome video and explanation, thank you!

  • @ninjalacoon
    @ninjalacoon Жыл бұрын

    You know what. I am realizing that the process of diffusion is a lot like reddit's "the place" event that they have done in the past. People would contribute pixel colors individually to anywhere on the page canvas and it was always amazing to me to see how the image would "evolve" over time. People would organically arrive at recognizable images by seeing the patterns that would emerge from others that had laid down pixel colors before them. As the images begin to take shape, in an iterative way more pixels would come to fill in the gaps and hone it into a final form that would resemble a flag, a person's face, a logo, etc. In this case however, it was an image that was collectively well known to the people that participated. Obviously a lot of the images were coordinated and didn't undergo this process but there were a lot of areas where this seemed to be the case.

  • @HM-rf2ov
    @HM-rf2ov Жыл бұрын

    My experience with GANs is exactly the same. After solving many bugs and issues, I end up with "mode collapse".

  • @franzfungis4264
    @franzfungis4264 Жыл бұрын

    Love the presentation!

  • @FrostCraftedMC
    @FrostCraftedMC Жыл бұрын

    okay seeing the thumbnail change to what it is now made me watch this. im making this comment before i watch cause i wanna say that it made me wonder if you gave the the ai access to the youtube statistics to try and learn to make better thumbnails for this video, and if you did then thats cool thats why i wanted to watch this. but if you didnt, id love your input on if thats a good idea or even a possible idea.

  • @andreivlasenko527

    @andreivlasenko527

    Жыл бұрын

    Yes, sounds like you can extend diffusion model with something like this(as diffusion model has some internal measure of how good image is(it's trained there for measuring noise, unrealisticness) you can try to put such statistics in that space for worse thumbails being worse in the same sense as noisy images worse - but it can be tricky that way, also something like using conditioning, like with text or class conditioned diffusion, to get the "slider" for generating better or worse thimbnails) - but one of the real problems is how to get such statistics, like you can not just use amount of views or something like that cause it's depends on so many things, like channel popularity, trend, youtube recomendations system and so on. Something like rate of click per view of thumbail would be good probably, but it's some internal youtube info, we're not gonna get

  • @Bjarkediedrage
    @Bjarkediedrage Жыл бұрын

    Great video!! I sort of agree with the other comment. I'd say this video is deserving of a little better title that reflects its educational value and content. I only clicked because I was subscribed and I wanted to see if I had to unsubscribe. I'm picky!^^ and I think I've come to associate some clickbate with poor quality video content, and this is definitely not that!

  • @literailly
    @literailly Жыл бұрын

    Can you similarly walk the latent space with a diffusion model by modifying the input noise?

  • @JadeFoxy
    @JadeFoxy Жыл бұрын

    As long as diffusion models cannot generate samples in one forward pass i think GANs have a reason to exist in use cases where synthesis speed is an important factor.

  • @Kram1032
    @Kram1032 Жыл бұрын

    larger datasets may not even be necessary. You can accomplish an increase in diversity by deduplicating the data you already got, potentially actually increasing performance with a smaller dataset! Deduplication may be tricky. But one method might be to train up a purposefully relatively small network to simply distinguish images. If it thinks two images are the same, chances are, the images are really similar. And to further improve this, you can train up *multiple* such networks and go with if like more than half of them think they are the same image, they are too similar and should be picked at random as a group - i.e. you group up "similar" images, then randomly select groups, and finally randomly select an image from each group. Alternatively, more easily, you can just discard all but one of the images of each group to shrink your dataset down to only sufficiently unique thumbnails.

  • @pooriaarab
    @pooriaarab Жыл бұрын

    Can you share a link to the dataset generated or a tutorial on how to do it? Also, is it feasible to create KZread Thumbnails with the current state of the art of AI?

  • @OMGitshimitis
    @OMGitshimitis Жыл бұрын

    Is there a hybrid approach? Like using diffusion to generate images and then a gan to tune that model? I'm not a computer scientist and I may be either saying something super stupid or super obvious but I'm genuinely curious.

  • @jnotjequel
    @jnotjequel Жыл бұрын

    can't wait to play the new MIAECROOFT: MRGROOTBU update 13:48

  • @realastropulse
    @realastropulse Жыл бұрын

    Just a little after this was released, the most impressive diffusion based image generator yet was open-sourced. Stable Diffusion is the most promising AI image creator yet, at least until Parti's techniques are perfected and data researchers go back to the drawing board.

  • @gw6667

    @gw6667

    6 ай бұрын

    Who's Parti?

  • @kalilinuxhikida9216
    @kalilinuxhikida9216 Жыл бұрын

    So with thumb nails you could include the text of the video, so It would have to make a good thumb nail and text prompt

  • @CathrinMachinArt
    @CathrinMachinArt Жыл бұрын

    amazing video

  • @RafaelSCalsaverini
    @RafaelSCalsaverini Жыл бұрын

    What happens if you use a diffusion network as generator for a GAN?

  • @frost7423
    @frost7423 Жыл бұрын

    this is the first time i click on a click baity thumbnail and get good content

  • @andreivlasenko527
    @andreivlasenko527 Жыл бұрын

    Seems like my comment got deleted cause of arxiv link, I was saying you could try StyleGAN XL as it showed quite good performance with diverse datasets like imagenet, and trains relatively fast(despite big size) and second advise is using finetuning instead of training from scratch, it's much faster and more stable for gans

  • @EmergentGarden

    @EmergentGarden

    Жыл бұрын

    Oh yes, the best way to do it would be to fine-tune a big pretrained model like stylegan. But I'd rather do that with a diffusion model first, and maybe stylegan for comparison.

  • @fungi42021
    @fungi42021 Жыл бұрын

    very cool

  • @CmdrTigerKing
    @CmdrTigerKing Жыл бұрын

    so generating the perfect image is like a slot machine

  • @Graverman
    @Graverman Жыл бұрын

    great video

  • @DerfaelB
    @DerfaelB Жыл бұрын

    600/10

  • @pelodofonseca6106
    @pelodofonseca6106 Жыл бұрын

    If it takes 24 hours to train a batch of images how do wombo and dall e generate images in less than a minute?

  • @enesmahmutkulak
    @enesmahmutkulak Жыл бұрын

    Hi, can you share your datasets? Or is it from kaggle?

  • @CharlesVanNoland
    @CharlesVanNoland Жыл бұрын

    The more I've learned about neural networks, particularly while watching Machine Learning Street Talk, the more I doubt when people say "these people do not exist" about the deepfaked face images. I believed it blindly before, but now I am concerned that it's basically just interpolating between faces, which is pretty great regardless, but I think someone needs to take the input images the network was trained on and compare them to the best most concise outputs it generates, and see which faces it most resembles - if not matches almost perfectly. Sure, with a GAN it's encoding down to a much lower dimensional latent variable, and then decoding back up to image resolution, but that still could just mean that it's just showing us faces it's learned, and interpolating between them within the latent variable space. At any rate, I'd just ;ike to see comparisons between the "random" outputs and the actual images that the network is trained on.

  • @willguggn2

    @willguggn2

    Жыл бұрын

    That's the point of latent spaces of human faces … ? Given the right parameters it should be able to generate every possible picture of a human face within and outside the training data.

  • @dunar1005
    @dunar10059 ай бұрын

    5:02 is that by brute force? So the discriminator will see 1,000,000 pure noise pictures, and when, by chance the generator generates two black pixels besides each other, it will decide that it prefers that picture.?

  • @programorprogrammed
    @programorprogrammed Жыл бұрын

    Looking at the patreon, we must all be broke

  • @yorkwestenhaver8680
    @yorkwestenhaver8680 Жыл бұрын

    Fucking Great video man!!

  • @iDrewa
    @iDrewa Жыл бұрын

    Imagen is pronounced I-mi-gen. Great vid!

  • @Landee
    @Landee Жыл бұрын

    gg !

  • @CmdrTigerKing
    @CmdrTigerKing Жыл бұрын

    This video's script was enirely AI generated. then read by an AI, and pictures created by an AI

  • @michaelmam1490
    @michaelmam1490 Жыл бұрын

    12:23 Does anyone know if the Chinese text is intelligible?

  • @errantwashere
    @errantwashere Жыл бұрын

    Errant was here

  • @JamesEdwardTracy

    @JamesEdwardTracy

    Жыл бұрын

    Well we DID "invite" you. Now guess how.

  • @goranjosic
    @goranjosic Жыл бұрын

    I recently played with Stable Diffusion beta 1.5, they have a trial and some points for everyone to try their model and my impression is that their diffusion model is really great only for generating artistic images, paintings, in many other situations it looks either too artificial or overfited - it copies too much, (that is my impression) and all faces looks awful - especially when compared to "Mid Journey" and their model and images. I guess, more work is needed on this type of neural network...?! _I'm not expert, just a hobby programmer_

  • @raptordarwish887
    @raptordarwish887 Жыл бұрын

    Make a thumbnail of a bot sitting infront of the computer making/editing thumbnail

  • @petergibbons607
    @petergibbons607 Жыл бұрын

    i have a friend that is an "artist" (not really that good but thinks they are), and she is pissed about this AI thing, it's pretty funny, sucks to be an artist now :D (or soon when this thing gets really REALLY good)

  • @Agony.
    @Agony. Жыл бұрын

    As if youtuber's jobs weren't easy and lazy enough.

  • @RafaelSCalsaverini
    @RafaelSCalsaverini Жыл бұрын

    Have you generated Adam Neely?

  • @alan2here
    @alan2here Жыл бұрын

    annealing defusion waveform collapse …

  • @artem945
    @artem945 Жыл бұрын

    So the discriminator is basically solving the Turing's test...

  • @HighlyRegardted
    @HighlyRegardted Жыл бұрын

    Unfortunately…Oil is still the oil of the 21st century …

  • @EmergentGarden

    @EmergentGarden

    Жыл бұрын

    true lol

  • @ellinikoptero7355
    @ellinikoptero7355 Жыл бұрын

    Man, love the vid, but for real change the thumbnail. Your current just falls too much into the uncanny valley, as you too know and I didn’t click on the vid for many hours, even considering unsubbing because I thought it was junk cluttering up my “subscribed” feed.

  • @caesaroftampa1266

    @caesaroftampa1266

    Жыл бұрын

    Worst possible reaction ever. I did not have the same reaction, I saw it and was intrigued. Reminded me of some of VSauce's thumbnails!

  • @EmergentGarden

    @EmergentGarden

    Жыл бұрын

    Not the reaction I was going for lol! I'll be messing around with the thumbnail/title, I figured uncanny ones would catch the eye but they can also freak people out.

Келесі