Stanford CS236: Deep Generative Models I 2023 I Lecture 14 - Energy Based Models

For more information about Stanford's Artificial Intelligence programs visit: stanford.io/ai
To follow along with the course, visit the course website:
deepgenerativemodels.github.io/
Stefano Ermon
Associate Professor of Computer Science, Stanford University
cs.stanford.edu/~ermon/
Learn more about the online course and how to enroll: online.stanford.edu/courses/c...
To view all online courses and programs offered by Stanford, visit: online.stanford.edu/

Пікірлер: 1

  • @CPTSMONSTER
    @CPTSMONSTER14 күн бұрын

    7:00 Sliced score matching slower than denoising score matching, taking derivatives 13:45 Denoising data minimizes sigma, but minimum sigma is not optimal for perturbing data when sampling 27:15 Annealed Langevin, 1000 sigmas 38:50 Fokker Planck PDE, interdependence of scores, intractable so treat loss functions (scores) as independent 45:00? Weighted combination of denoising score matching losses, estimation of score for each perturbed data by sigma_i, weighted combination of the estimated scores 48:15 As efficient as estimating a single non-conditional score network, joint estimation of scores is amortized by a single score network 49:50? Smallest to largest noise during training, largest to smallest noise during inference (Langevin) 52:10? Notation, p sigma_i is equivalent to previous q (estimation of perturbed data) 57:20 Mixture denoising score matching is expensive at inference time (Langevin steps), deep computation graph which doesn't have to be unrolled at training time (not generating samples during training) 1:07:00 SDE describes perturbation iterations over time 1:08:50 Inference time (largest to smallest noise) described by reverse SDE which only depends on the score functions of the noise perturbed data densities 1:12:00 Euler-Maruyama discretizes time to solve numerically solve SDE 1:13:25 Numerically integrating SDE that goes from noise to data 1:15:00? SDE and Langevin corrector 1:20:25 Infinitely deep computation graph (refer to 57:20) 1:21:45 Possible to convert SDE model to normalizing flow and get latent variables 1:22:00 SDE can be described as ODE with same marginals 1:23:15 Machinery defines a continuous time normalizing flow where the invertible mapping is given by solving an ODE, paths of solved ODE with different initial conditions can never cross (invertible, normalizing flow), normalizing flow model trained not by maximum likelihood but by score matching, flow with infinite depth (likelihoods can be obtained)

Келесі