AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)

Learn how AI image generation works. This video goes over the AI components of AI image generation models like Stable Diffusion and explains how they work and how they're trained.
Blog post: jalammar.github.io/illustrate...
---
Twitter: / jayalammar
Blog: jalammar.github.io/
Mailing List: jayalammar.substack.com/
--
Introduction (0:00)
Text-to-image and image-to-image (1:32)
The components of Stable Diffusion - high-level overview (3:06)
The three models inside the AI Image Generator (5:48)
Generating images with reverse diffusion (8:36)
Images emerging from noise (11:09)
How the model is trained. 1 - Diffusion (12:46)
How the model is trained. 2 - Compression (17:44)
The importance of language models for image generation (20:43)
How CLIP is trained (training on both text and images) (22:55)
Guiding image generation with text prompts (25:57)
Conclusion (28:07)

Пікірлер: 44

  • @omidsajedi5
    @omidsajedi5 Жыл бұрын

    You have a very unique way of explaining deep learning concepts. The illustrations are very concise and to the point which really helps focus on the core concepts and not get distracted by technical details. Thanks for making this great video!

  • @herrbonk3635

    @herrbonk3635

    5 ай бұрын

    Good for you, I understood nothing. Some concrete technical detail would have helped me.

  • @Tjeminee
    @Tjeminee5 ай бұрын

    As a visual thinker, SD can be quite overwhelming under the hood. I have been using the graphical interface "Comfyui" and it has taken me quite a distance in understanding the dynamics of SD. Your video and page helped me a lot in taking the next step to the more advanced features and expanding my options. Thanks Jay!

  • @sabooshubham
    @sabooshubham Жыл бұрын

    Great explaination, loved it!

  • @trajesh81
    @trajesh81 Жыл бұрын

    Thanks Jay! just like your NLP Transformer series which still stands tall with the test of time.., one more added to the my list of go--to reference.! you are indeed a master in the art of teaching!!

  • @paresh1930
    @paresh1930 Жыл бұрын

    Thank you for this great explanation!

  • @pierrelebreton7634
    @pierrelebreton7634 Жыл бұрын

    Thank you, really nicely explained!

  • @maxkhan4485
    @maxkhan4485 Жыл бұрын

    Thank you! I finally understand Stable Diffusion!

  • @sanyahyde3959
    @sanyahyde39598 ай бұрын

    Excellent video, thank you!

  • @DrNoureddinSadawi
    @DrNoureddinSadawi Жыл бұрын

    Nice explanation, thanks!

  • @nqnam12345
    @nqnam12345 Жыл бұрын

    great Jay!

  • @andrechoi2553
    @andrechoi2553 Жыл бұрын

    Good video, very inspiring😁

  • @rachidbensaid6629
    @rachidbensaid6629 Жыл бұрын

    Great Work, Good luck

  • @XishanAfzal
    @XishanAfzal6 ай бұрын

    More than useful. Thanks

  • @justaguy2365
    @justaguy23652 ай бұрын

    Oppose to the end!

  • @RodrigoRibeiroGomes
    @RodrigoRibeiroGomesАй бұрын

    Excelente!!!

  • @adeelgilll
    @adeelgilll Жыл бұрын

    excellent

  • @user-cm7wo3bi1p
    @user-cm7wo3bi1p3 ай бұрын

    thank you

  • @itsnotthattough7588
    @itsnotthattough7588 Жыл бұрын

    Thanks, sir!

  • @abhishek-tandon
    @abhishek-tandon Жыл бұрын

    Brilliant

  • @daveonvr2192
    @daveonvr219211 ай бұрын

    Thanks Jay - I had been looking for something that does more than describe the denoising process and the attention bit related to prompts is what I was missing. That said, I still can't quite understand how you get a completely new image. I can understand that you should be able to get back to an original image (say a dog, or a flower) via the noisification and reverse process, but how can it, say, create an image with a flower and the dog such they are integrated in some way? Where does that data that come from? A visual example of the earlier stages which show this would be helpful. The examples you had jumped from basically to an image (albeit unrefined) in 3 steps - I'd like to see this broken down so I can "see" what is happening. Still requires a level of acceptance without evidence that I am not happy with....

  • @muhammedaneesk.a4848
    @muhammedaneesk.a4848 Жыл бұрын

    Thanks for the explanation. Can you please make a 1 hr or 2hr video with more deep dive into the internal? Maybe you already have it recorded I guess. Thanks.

  • @mostlynotworking4112
    @mostlynotworking4112 Жыл бұрын

    Simple question: does that mean it can't create a prompt (or specific word) that it hasn't been trained on? Thank you for your video!

  • @UnderstandingCode
    @UnderstandingCode Жыл бұрын

    love from Saudi arabia!

  • @d.p.5874
    @d.p.5874 Жыл бұрын

    Thanks Jay for all your efforts to share a bit of your knowledge in AI. I am not an expert, by far, but I came to the conclusion that AI is mainly a construction of hundreds of lego bricks, assembled together into specific architectures and trained with the same gradient back propagation algorithm. Some of them perform well some other don't. Therefore, the only genuine piece of AI theory is the mathematical background of the training algorithm. The rest is pure heuristics more or less well explained, a kind of AI cook books with ad hoc recipees. The training algorithm itself seems very limited (even if highly powerful), since it is applied in a centralized way onto a predefined architecture and does not participate to the architecture topology definition. In other words, the topology is defined before the training while, intituively, the training should probably define the topology. Therefore incremental learning remains a big issue in most of the AI architectures if not all. This lack of a consistent and unified AI theory (there is no, to my limited knowledge, any AI theorems nor demonstrations that some sort of optimum is reached using a given architecture) makes me believe that we are at the very beginning of a new science still to come. Could you react to the above humble considerations and share your thoughts ? Kind regards,

  • @karthik8972
    @karthik89729 ай бұрын

    Thanks Jay for the video, the concept of converting noised image to a clear image is understood. How does it creates a image which doesn't exist in its training ? It is understood that the model doesn't understand the concepts of the image and only focuses on the patterns. But how is the below operations performed, 1. Creating a cartoon image of cat based on caption ex: Place a hat on top of cat How does it creates a cartoon image of cat ? How does it know the exact location of cat's head ? How does it know to place the hat exactly at the head ? 2. A closeup shot of a dog facing the sun How does it knows to create a close shot of a dog ? How does it know to place the sun in the background ? How it makes the the object to turn towards the sun ? No videos exist to explain this concept. It would be of great help if you could make a video on this.

  • @10FACTSABOUTGAMES
    @10FACTSABOUTGAMES Жыл бұрын

    Would you kindly tell me if it is possible to sell the artwork that I made with stable diffusion , and does the administration allow this, and how can I communicate with them i mban the mangemment or soppert for this program-, and where can the pictures be sold as pieces of art? I do not speak English, help me

  • @JohnGilbertmoore
    @JohnGilbertmoore7 ай бұрын

    It renders the image from text instead of a 3D model. Its like Maya-but with words, and using 1B+’pre-trained models (images with their text descriptions) from the Internet wired up with plain English, so you don’t have to build the models in 3D, you can just type what you want to create using plain English, and the AI renders out the image.

  • @jamiewatts333
    @jamiewatts33311 ай бұрын

    Is this simplified explanation of the process of noise in Stable Diffusion true? It's like teaching an artist about our visual world -- object definitions, shapes, dimensions, etc., and how they correspond to the person who commissioned the art (text prompts). The artist then watches a mosaic - say of an ice cream - being inserted by hundreds of tesserae (rectangular slabs used to create a mosaic) and then removed to restore the original mosaic. During this, the artist learns how to understand, recreate, and reinterpret the ‘ice cream’ image in other mosaics. The artist goes through this with millions of other depictions in mosaics (objects, locations, etc.) so they can create entirely new mosaics based on the requests (or text prompts) of the person commissioning them. Sampling steps are like commissioning an artist to interpret and construct a mosaic quickly or carefully. The more detail or accuracy you want, the more work and time have to go into it.

  • @treksis
    @treksis Жыл бұрын

    👆just like the transformers series, excellent

  • @anilsharma32g
    @anilsharma32g6 ай бұрын

    Dear Sir, I am your Subscriber I want to create a tool that finds text errors in the image. For Example: if I forgot to write CONTACT US, BUY NOW, CONTACT NUMBER, SPELLING MISTAKE, etc... in my social media post. that the tool finds error and suggests what are missing or what is incorrect in social media post. 🙏 Please guide me and suggest what course I need to buy or what I need to learn to create this tool Thank you!

  • @CptBlaueWolke
    @CptBlaueWolke Жыл бұрын

    *AI Pictures. Art means craftsmanship and personal expression

  • @nerdfinite

    @nerdfinite

    Жыл бұрын

    Not all photographs are art, but photography can be an art. The nonsense I draw in a game of Pictionary has no craftsmanship or personal expression. However, illustration is an important form of art. Not only would the AI never produce art on its own, it would never produce anything. The amount of craftsmanship and personal expression being put into the image is dependent on the person using it. A low effort random prompt to the AI is arguably not art, but that's not really the point.

  • @51_percent_

    @51_percent_

    Жыл бұрын

    Writing the prompts is personal expression

  • @avistryfe4534

    @avistryfe4534

    Жыл бұрын

    @@51_percent_ nope. It aint shit. Even with a shortcut. You will still have zero talent or expression. Anyone can say those words. So you have the same skill and expressive power as a toddler. Enjoy. Pretend with your orgy of robots all you like. But you are not special.

  • @mingkko1

    @mingkko1

    Жыл бұрын

    @@51_percent_ so is ordering food at a restaurant but that does not make you a chef.😉

  • @CptBlaueWolke

    @CptBlaueWolke

    Жыл бұрын

    @@51_percent_ no it isn't Writing a full text by yourself is.

  • @nerdfinite
    @nerdfinite Жыл бұрын

    Not all photographs are art, but photography can be an art. The nonsense I draw in a game of Pictionary has no craftsmanship or personal expression. However, illustration is an important form of art. Not only would the AI never produce art on its own, it would never produce anything. The amount of craftsmanship and personal expression being put into the image is dependent on the person using it. A low effort random prompt to the AI is arguably not art, but that's not really the point.

  • @simawpalmer7721
    @simawpalmer7721 Жыл бұрын

    Thanks, great video again, but Your voice has a lot of sibilants, making the listening experience is atrocious. If you make enough money making these videos, I suggest hiring a professional audio producer/mixing guy to clean up the audio. Email me, I'll suggest someone.

  • @anneallison6402
    @anneallison6402 Жыл бұрын

    This is not art don't be silly

  • @mpavankumar6695

    @mpavankumar6695

    Жыл бұрын

    No, this is revolution