Stanford CS236: Deep Generative Models I 2023 I Lecture 16 - Score Based Diffusion Models

For more information about Stanford's Artificial Intelligence programs visit: stanford.io/ai
To follow along with the course, visit the course website:
deepgenerativemodels.github.io/
Stefano Ermon
Associate Professor of Computer Science, Stanford University
cs.stanford.edu/~ermon/
Learn more about the online course and how to enroll: online.stanford.edu/courses/c...
To view all online courses and programs offered by Stanford, visit: online.stanford.edu/

Пікірлер: 3

  • @CPTSMONSTER
    @CPTSMONSTER7 күн бұрын

    4:40 Estimating the noise (denoising) is equivalent to estimating the score of the noise perturbed data distribution (sigma model). Knowing how to denoise is knowing which direction to perturb the image to increase the likelihood most rapidly. 5:50? Taylor approximation of likelihood around each data point 12:45 Set of Gaussian conditional densities, encoder interpretation of Markovian joint distribution 15:20 Typical VAE maps data point through some neural network that gives mean and sd parameters for the distribution over the latence. In diffusion, just add noise to the data, nothing learned. 16:15 T times dimension of original data point, mapping is not invertible 17:00 Closed form of joint distribution (Gaussian) 18:20 Same way as generating training data for denoising score matching procedure (as in perturbing the data samples for each sigma_i) 22:20? Process is invertible if score is known, variationally learn an operator (ELBO) as a decoder instead of invertible mapping (which doesn't need the score, but if Gaussian then score is needed anyway) 23:05 Initial data not required to be mixture of Gaussians but the model has to be continuous and the transition kernel has to be Gaussian, in latent diffusion models discrete data can be embedded in continuous space 26:00 Exact denoising distribution is unknown (reverse kernel), variational approximation 28:15 Similar to VAE decoder, reverse process is defined variationally through conditionals parameterized by neural networks 30:05 Alpha parameter, define a diffusion process such that when it runs for a sufficiently long amount of time it reaches a steady state of pure noise 31:00 Langevin equivalence, variational training gives mu which is the score 32:15? Langevin corrections upon vanilla reverse Gaussian kernel 32:30 Transition is Gaussian therefore stochastic (same as VAE decoder), neural network parameters are deterministic 33:50 Flavor of VAE, sequence of latent variables indexed by time, encoder does not learn features, interpreted as VAE with fixed encoder that adds noise 35:00? VAE ELBO (reference formula in previous lecture), second term encourages high entropy 37:35? Hierarchical VAE evidence lower bound formula 38:40 In the usual VAE q is learnable, in diffusion q is fixed 40:00? ELBO loss is equivalent to the denoising score matching loss, minimizing the negative ELBO or maximizing the lower bound on the average log-likelihood is exactly the same as estimating the scores of the noise perturbed data distributions 41:40 In score based model sample from Langevin, in diffusion model sample from decoder 41:50? Denoising diffusion probabilistic model training procedure, equivalence of denoising score matching loss 43:20? Encoder is fixed, decoder minimizes KL divergence or maximizing ELBO which is the same as inverting the generative process (turns out this needs estimation of scores) 44:10 Training both encoder and decoder results in better ELBOs but worse sample quality, equivalence to one step Langevin, score based model perspective as limit of infinite number of noise levels (tricky to get with VAE perspective) 45:35 Optimizing likelihood does not correlate with sample quality 45:45 Even if encoder is fixed and something simple like adding Gaussian noise, the inverse is nontrivial 47:15 Expensive computation graph but trained incrementally layer by layer (locally) without having to look at the whole process 48:00 Efficient training process due to structure of q, forward jumps 48:45 Solving the loss of the vanilla VAE yields the same loss function as in the diffusion model, fixed encoder loss function is the same as denoising score matching loss 49:15 Equivalence to VAE to sample without Langevin 50:45 A fixed encoder for VAE (one step) would be very complicated to invert, diffusion model forms 1000 subproblems 51:30 Argument in epsilon_theta is the perturbed data point (sample from q x_t given x_0), architecture is same as noise conditional score model 51:50 Not learning decoders for every t, amortize epsilon_theta network 52:40 U Net for learning denoising, transformers also used 1:03:15? Training objective 1:05:00? A score based model fixes errors of basic numerical SDE solver by running Langevins for that time step 1:05:40 Transitions are Gaussian, marginals are not 1:05:50 DDPM is a particular type of discretization of the SDE 1:07:35 Converting VAE (SDE) to flow (ODE), equivalent marginals, ODE is a deterministic invertible mapping so likelihoods can be computed exactly (change of variable formula)

  • @harshitmeena1625
    @harshitmeena162525 күн бұрын

    please bring atleast 1 new course of cs / ai from stanford evry 1 ,2 weeks / months

  • @harshitmeena1625

    @harshitmeena1625

    25 күн бұрын

    @stanfordonline