controlnet paper explained - Adding Conditional Control to Text-to-Image Diffusion Models

ControlNets is the first paper to enable precise spatial control of the generated outputs of image generation models. It won the best prize in the prestigious ICCV 2023 conference.
This video covers the architecture of ControlNets, the idea of classifier-free guidance, and how it has been modified for resolution reweighting. It also covers the qualitative results and ablation studies.
⌚️ ⌚️ ⌚️ TIMESTAMPS ⌚️ ⌚️ ⌚️
0:00 Introduction to ControlNet
1:45 Neural Network Blocks
2:04 ControlNet Architecture
3:02 ControlNet with Stable Diffusion
5:05 ControlNet Training
6:39 Classifier-free Guidance Resolution Weighting
6:56 Classifier Guidance
8:58 Classifier-free Guidance
9:46 Classifier-free Guidance Resolution Weighting
11:08 Ablation Studies
🛠 🛠 🛠 MY SOFTWARE TOOLS 🛠 🛠 🛠
✍️ Notion - affiliate.notion.so/aibites-yt
✍️ Notion AI - affiliate.notion.so/ys9rqzv2vdd8
📹 OBS Studio for video editing - obsproject.com
📼 Manim for some animations - www.manim.community
🎵 My music - www.bensound.com and
📚 📚 📚 BOOKS I HAVE READ, REFER AND RECOMMEND 📚 📚 📚
📖 Deep Learning by Ian Goodfellow - amzn.to/3Wnyixv
📙 Pattern Recognition and Machine Learning by Christopher M. Bishop - amzn.to/3ZVnQQA
📗 Machine Learning: A Probabilistic Perspective by Kevin Murphy - amzn.to/3kAqThb
📘 Multiple View Geometry in Computer Vision by R Hartley and A Zisserman - amzn.to/3XKVOWi
MY KEY LINKS
KZread: / @aibites
Twitter: / ai_bites
Patreon: / ai_bites
Github: github.com/ai-bites
WHO AM I?
I am a Machine Learning researcher/practitioner who has seen the grind of academia and start-ups equally. I started my career as a software engineer 15 years ago. Because of my love for Mathematics (coupled with a glimmer of luck), I graduated with a Master's in Computer Vision and Robotics in 2016 when the now happening AI revolution just started. Life has changed for the better ever since.
#machinelearning #deeplearning #aibites

Пікірлер: 6

@abcd450584 ай бұрын
Great work. Interesting paper read indeed. At 7:27 ; Bayes theorem is incorrect. P(X/Y) = P(Y/X).P(X) / P(Y) ; The rest of the math that follows is fine.
@AIBites
4 ай бұрын
well spotted. thank you. I think I saw it after the video pub. Left it as YT doesn't allow newer versions of videos. I think I should start writing errata in the comments :)
@frazuppi48977 ай бұрын
great video but is not clear how one train it, one needs to have pairs of controlnet input - image output right?
@AIBites
5 ай бұрын
yes, we need depth or pose datasets. We already have several datasets in computer vision for depth or pose. The problem is these datasets are tiny compared to the scale at which LLMs or LVMs are trained. So the solution is ControlNet. By ControlNet approach, we simply add a few trainable layers and we are good to go and train with these "small" datasets. As a result, we will be able to control the spatial layout of the generated image during inference. Hope that clarifies :)
@frazuppi4897
5 ай бұрын
@@AIBitesyeah but I guess controlenet is around 50M
@AIBites
4 ай бұрын
thats the upper bound I guess. Not sure whats the lower bound to train.

controlnet paper explained - Adding Conditional Control to Text-to-Image Diffusion Models

Пікірлер: 6

@AIBites

4 ай бұрын

@AIBites

5 ай бұрын

@frazuppi4897

5 ай бұрын

@AIBites

4 ай бұрын

Келесі