Stable Diffusion - How to build amazing images with AI

Ғылым және технология

This video is about Stable Diffusion, the AI method to build amazing images from a prompt.
If you like this material, check out LLM University from Cohere!
llm.university
Get the Grokking Machine Learning book!
manning.com/books/grokking-ma...
Discount code (40%): serranoyt
(Use the discount code on checkout)
0:00 Introduction
1:27 How does Stable Diffusion work?
2:55 Embeddings
12:55 Diffusion Model
15:00 Numerical Example
17:39 Embedding Example
19:37 Image Generator Example
28:37 The Sigmoid Function
34:39 Diffusion Model Example
41:03 Summary

Пікірлер: 46

  • @shafiqahmed3246
    @shafiqahmed324614 күн бұрын

    Serrano you are a genius bro your channel is so underrated

  • @amirkidwai6451
    @amirkidwai64515 ай бұрын

    Arguably the greatest teacher alive

  • @SerranoAcademy

    @SerranoAcademy

    5 ай бұрын

    Thank you :)

  • @thebigFIDDLES
    @thebigFIDDLES6 ай бұрын

    These videos are always incredibly helpful, informative, and understandable. Very grateful

  • @krajanna
    @krajanna4 ай бұрын

    I am a fan of your work. I read your "Grokking Machine Learning". It's awesome. I am totally impressed. I stopped watching other AI videos and following you for most of the stuff. Simple and practical explanation. Thanks a lot and grateful for spreading the knowledge.

  • @jasekraft430
    @jasekraft4304 ай бұрын

    Always impressed with how understandable, but detailed your videos are. Thank you!

  • @NigusBasicEnglish
    @NigusBasicEnglishАй бұрын

    You are the best expainer ever. You are amazing.

  • @enginnakus9550
    @enginnakus95506 ай бұрын

    I respect your concise explaination

  • @wanggogo1979
    @wanggogo19796 ай бұрын

    Amazing, I hope to truly understand the mechanism of stable diffusion through this video!

  • @avijitsen8096
    @avijitsen80965 ай бұрын

    Superb, so elegant explanation. Big thanks Sir!

  • @kyn-ss4kc
    @kyn-ss4kc6 ай бұрын

    Amazing!! Thanks for this high level overview. It was really helpful and fun 👍

  • @anthonymalagutti3517
    @anthonymalagutti35176 ай бұрын

    excellent explanation - thank you so much

  • @MikeTon
    @MikeTon3 ай бұрын

    Really incredible job of stepping through the HELLO WORLD of image generation, especially how the video compresses the key output a 4x4 pixel grid and clearly hand computes each step of the way!

  • @abhaymishra-uj6jp
    @abhaymishra-uj6jp3 ай бұрын

    Really amazing work easy to understand and grasp doing a great deal for the community thanks alot..

  • @skytoin
    @skytoin5 ай бұрын

    Great video, it gives good intuition to deep network architecture. Thanks

  • @reyhanehhashempour7157
    @reyhanehhashempour71576 ай бұрын

    Amazing as always!

  • @priyankavarma1054
    @priyankavarma10545 ай бұрын

    Thank you so much!!!

  • @AravindUkrd
    @AravindUkrd5 ай бұрын

    Thank you for such wonderful visualization that conveys an overview of complex mathematical concepts. Can you please do a video detailing the underlying architecture of the neural network that forms the diffusion model? Also, are Generative Adversarial Networks (GANs) not used anymore for image generation?

  • @samirelzein1095
    @samirelzein10956 ай бұрын

    Amazing deep dismantling job of complex structures. that s real ML/AI democratization.

  • @BigAsciiHappyStar
    @BigAsciiHappyStarАй бұрын

    Muy BALL-issimo 😄 Loved the puns!!!!!😋😋😋

  • @qwertyntarantino1937
    @qwertyntarantino19374 ай бұрын

    thank you

  • @olesik
    @olesik6 ай бұрын

    Thanks for teaching Mr Luis! I still remember fondly you teaching me machine learning basics over drinks in SF

  • @SerranoAcademy

    @SerranoAcademy

    6 ай бұрын

    Thanks Jon!!! Great to hear from you! How’s it going?

  • @hamidalavi5595
    @hamidalavi559514 күн бұрын

    thank you for your amazing educational videos! I have a questions though, is there any transformers (+ attention mechanism) involved in the text2image generator (the diffusion model)? If no, then how the semantic in the text is captured??

  • @melihozcan8676
    @melihozcan86763 ай бұрын

    Serrano Academy: The art of Understanding Luis Serrano: The GOD of Understanding

  • @SerranoAcademy

    @SerranoAcademy

    3 ай бұрын

    Thank you so much, what an honour! :)

  • @melihozcan8676

    @melihozcan8676

    3 ай бұрын

    @@SerranoAcademy Thank you, the honour is ours! :)

  • @abhishek-zm7tx
    @abhishek-zm7tx3 ай бұрын

    Hi @Louis. Your videos are very informative and I love them. Thank you so much for sharing your knowledge with us. I wanted to know if "Fourier Transforms in AI" is in your pipeline. I request you to please give some intuitions around that in a video. Thanks in advance.

  • @SerranoAcademy

    @SerranoAcademy

    3 ай бұрын

    Thanks for the suggestion! It's definitely a great idea. In the meantime, 3blue1brown has great videos on Fourier transformations, take a look!

  • @olesik
    @olesik6 ай бұрын

    So can we just use the diffusion model to denoise low quality or night time shots?

  • @SerranoAcademy

    @SerranoAcademy

    6 ай бұрын

    Yes absolutely, they can be used to denoise already existing images.

  • @aswinosbalaji4224
    @aswinosbalaji42244 күн бұрын

    In intermediate result it is said that after sigmoid, we will not get sharp image of ball and bat. How can there be fractional pixel values. Since it is monochromatic, it should be either in 0 or 1 right. Rounding off to nearest integer will give same result as before sigmoid. Even if it's not monochrome, pixels can't be in fractions right?

  • @NVHdoc
    @NVHdoc6 ай бұрын

    (at 17:25), the image on the right, baseball and bat should have 3 gray squares right? Very nice channel, I just subscribed.

  • @SerranoAcademy

    @SerranoAcademy

    6 ай бұрын

    Thank you! Yes, ball and bat should be three gray or black squares. Since these images are not so exact, there could also be dark gray, or some variations.

  • @ASdASd-kr1ft
    @ASdASd-kr1ft6 ай бұрын

    Could be that the diffusion model is trained to learn what amount of noise have to be removed from the input image instead the image with less noise? That is what i understended from others sources, cause they say that that is more easy for the model. Thank you, and good video, very enlightening

  • @AI_Financier
    @AI_Financier6 ай бұрын

    Finally the diffusion penny dropped for me, many thanks

  • @maxxu8818
    @maxxu88183 ай бұрын

    Hello Serrano, is there paper like attention is all you need for Stable diffusion?

  • @SerranoAcademy

    @SerranoAcademy

    3 ай бұрын

    Good question, I'm not fully aware. There's this but I'm not 100% sure if it's the original: stability.ai/news/stable-diffusion-public-release I always use this explanation as reference, there may be some good leads there jalammar.github.io/illustrated-stable-diffusion/

  • @maxxu8818

    @maxxu8818

    3 ай бұрын

    thanks @@SerranoAcademy 🙂

  • @parmarsuraj99
    @parmarsuraj996 ай бұрын

    🙏

  • @850mph
    @850mphАй бұрын

    This is wonderful… Perhaps the best low-level description of the diffusion process I’ve seen…. But discrete images of bats and balls represented as single pixels- are a long way away from a PHOTO REALISTIC pirate standing on a ship at sunrise. What I can’t get my head around is how these discrete images (which actually exist in the multi-dimensional data set space) are combined, really, grafted together (parts pulled from each existing image) into a single image with correct composition, scaling, coloring, shadows, etc. If I lay even a specifically chosen (by the NN) bat and ball pictures over each other to produce a “fuzzy” combined image (composition) and then use another NN to sharpen the fuzzy image into a crisp composition with all the attributes defined in the prompt and pointed to by the embeddings…. There’s still too much magic inside the DIFFUSION black box which I just don’t understand…. Even understanding the denoising and self-attention processes.

  • @850mph

    @850mph

    Ай бұрын

    I guess what I have not been able to determine after watching maybe 30-35 hours of Diffusion videos.. is specifically how the black box COMPOSES a complicated scene BEFORE the process begins which “tightens” the image up by removing noise between the given and target in successive passes of the decoder. I get the fact (one) that the prompts correspond to embeddings, and the embeddings point to some point in multi-dimensional space which contains all sorts of related info and perhaps a close image representation of the prompted request….. or perhaps not. I get the fact (two) that the diffusion process is able to generate virtually any complicated scene starting from random noise when gently persuaded to a target by the prompt…. What I don’t understand is how the black box builds a complicated FUZZY image once the various “parts” of the composition are identified. Does the composing process start with a single image if available in the dataset and scale individual attributes to correspond with the prompt…? -or- Does the composing process start with segmented attributes, scale all appropriately, and combine into a single image…? A closer look at how the scene COMPOSITION works would be a great addition to your very helpful library of vids, thnx.

  • @850mph

    @850mph

    Ай бұрын

    Ok… for those with the same “problem…” The missing part, at least for me, is the “classifier” portion of the model which I have NOT seen explained in the high-level Diffusion explanation vids. This tripped me up… Here is good vid and corresponding paper which helps understand the “feature” set extraction within the image convolution process which penultimately creates an “area/segment aware” data-set (image) which can be directed to include the visual requirements described in a text prompt. kzread.info/dash/bejne/gGVpz8yfcai2odo.htmlsi=6sZxibtFvjrVNHeE In a nutshell… the features extracted from each image are MUCH more descriptive than I had pictured allowing for much better interpolation, composition and reconstruction of multiple complex forms in each image. Of course the queues to build these complex images all happen as the model interpolates its learned data, converging on the visual representation of the text prompt, somewhere in the multi-dimensional space which we can not comprehend… so in a sense it’s still all a black box. I don’t pretend to understand it all… but it does give the gist of how certain abstract features within the models convolutional layers blow themselves up into full blown objects.

  • @850mph

    @850mph

    Ай бұрын

    Another good short vid which shows how diffusion accomplishes image COMPOSITION: kzread.info/dash/bejne/qqig2qWzY5efh7g.htmlsi=PJl_vWueiQdZxLn1

  • @850mph

    @850mph

    Ай бұрын

    Another good vid which gets into composition: kzread.info/dash/bejne/ZZZrza-vorPAiJs.htmlsi=AwNQJAjABKn-iV4F

  • @850mph

    @850mph

    12 күн бұрын

    Another good set of vids which get into IMAGE COMPOSITION: kzread.info/dash/bejne/qK2a05WMl7u3qbg.htmlsi=ShiOXaQH_0baU8Z- Especially helpful is the last vid.. url posted above.

Келесі