Deep Learning with Differential Privacy (DP-SGD explained)

We're performing a technical deep dive into differential privacy: preventing models from memorising private data. Theory + Colab notebook using Tensorflow Privacy!
Social:
Twitter: / mukulrathi_
My website (+ blog): mukulrathi.com/
My email newsletter: newsletter.mukulrathi.com
-----------------
Links:
DP-SGD paper: arxiv.org/pdf/1607.00133.pdf
Tensorflow Privacy Tutorials: github.com/tensorflow/privacy...
Tensorflow Privacy: github.com/tensorflow/privacy
Pytorch Opacus: github.com/pytorch/opacus
Moments accountant implementation: github.com/marcotcr/tf-models...
GPT-2 memorises private data: ai.googleblog.com/2020/12/pri...
Netflix dataset deanonymised: www.wired.com/2007/12/why-ano...
Netflix deanonymisation paper: www.cs.utexas.edu/~shmat/shma...
Strava heatmap leaks: www.zdnet.com/article/strava-...
------------------------------------------------------------------------------------
Timestamps:
00:00 Introduction
00:32 Overview
01:26 Why Anonymisation Isn't Enough
02:38 Intuition for Differential Privacy
03:12 Example: Predict whether Bob has Cancer
04:11 Privacy Intuition
04:51 Privacy Loss Definition
05:26 Definition of Differential Privacy
06:40 Role of Noise in DP
07:08 Privacy Amplification Theorem
07:26 Fundamental Law of Information Recovery
07:51 Composition in DP
08:19 DP-SGD
09:20 Moments Accountant
12:55 Google Colab Notebook
14:39 Limitations of DP-SGD
---------------------------------------------------------------------------
Music: Coffee Break by Pyrosion is licensed under a Creative Commons License.
creativecommons.org/licenses/...
/ pyrosion
Support by RFM - NCM: bit.ly/2xGHypM

Пікірлер: 50

  • @mingxiao9651
    @mingxiao96513 жыл бұрын

    Very clear. Thanks! A question: why high order moments can achieve a tighter bound?

  • @MukulRathi

    @MukulRathi

    3 жыл бұрын

    Great question! I'll give a short answer and then give a longer answer (using the notation in the paper) that'll help if you want to read the paper in more depth . The short answer: we're computing bounds for each of the moments, and each moment gives you a different bound. We pick the tightest bound, and that might not necessarily be the highest moment, but we try out bounds from up to the 32nd moment to give us a range of bounds. ---------------------------------------------------------------------------------------- Okay, a deeper dive. First the wider context in case you're unfamiliar with moments: there is the whole field of concentration inequalities that I'd encourage you to check out (start with Markov's inequality and Chebyshev's inequality) en.wikipedia.org/wiki/Concentration_inequality From this field, the maths in the DP-SGD paper refers to a specific technique called moment-generating functions to get our bounds. In the paper, the moment generating function is written as α_M(λ). It takes in a number λ and returns the λth moment of the mechanism M's privacy loss distribution. E.g. if I pass in the number 1 to the function it would return the 1st moment. Essentially the "body" of this function α_M(λ) is an expression for the λth moment of our privacy loss distribution. In the paper the authors say the value of α_M(λ) is computed through numerical integration, they also discuss an upper bound under some mixture of two Gaussians. α_M(λ) ≤ q^2 λ(λ + 1)/(1 − q)σ^2 +O(q^3/σ^3) . The details don't matter (they're messy and relegated to the appendix), the point is that it's a general expression involving λ. We take this expression for the λth moment of the privacy loss distribution, from which we can derive a bound on the tail of the distribution (the part where the privacy loss is > ε). In the paper we get this: δ

  • @mingxiao9651

    @mingxiao9651

    3 жыл бұрын

    @@MukulRathi I appreciate your detailed explanation. It's very clear. Thanks a lot! Looking forward to your video about Renyi DP, CDP, zCDP and their relations!

  • @zark4346

    @zark4346

    2 жыл бұрын

    ​@@MukulRathi Very nice explanation. Just wondering if you missed λ in the third equation, which I thinks should be ε

  • @sarahsimionescu7926
    @sarahsimionescu79262 жыл бұрын

    This video is exactly what I was looking for and I feel like you went into just the right amount of detail to provide a little bit of a deeper understanding. Thank you so much!

  • @yunathanielkang
    @yunathanielkang11 ай бұрын

    Very clear explanation; thank you for saving my life!

  • @MukulRathi
    @MukulRathi3 жыл бұрын

    TIMESTAMPS: 0:00 Introduction 0:32 Overview 1:26 Why Anonymisation Isn't Enough 2:38 Intuition for Differential Privacy 3:12 Example: Predict whether Bob has Cancer 4:11 Intuition behind Privacy 4:51 Privacy Loss Definition 5:26 Definition of Differential Privacy 6:40 Role of Noise in DP 7:08 Privacy Amplification Theorem 7:26 Fundamental Law of Information Recovery 7:51 Composition in DP 8:19 DP-SGD 9:20 Moments Accountant 12:55 Google Colab Notebook 14:39 Limitations of DP-SGD

  • @musicbuff81
    @musicbuff812 жыл бұрын

    Thank you for a really clear explanation.

  • @xenowits
    @xenowits5 ай бұрын

    It was a great informative video, thanks man!

  • @hengxuantang2806
    @hengxuantang28062 жыл бұрын

    excellent, you solved most of my doubts. Looking for your further videos!

  • @MukulRathi

    @MukulRathi

    2 жыл бұрын

    Awesome, thank you!

  • @shreejaltrivedi9731
    @shreejaltrivedi97312 жыл бұрын

    Hey Mukul, Great Video. There are very few resources on which federated learning is explained in laymen' language. Actually, I am starting researching a bit about on this topic lately. Maybe you can make a series on Federated Learning wherein you can explain the papers starting from overview and taking it to different papers such as FedAvg, qFedA, etc. Looking forward on that initiative. Again, Great explanation. Keep it up!

  • @ucsdvc
    @ucsdvc3 жыл бұрын

    Very useful intro for someone new to this topic, thanks!

  • @MukulRathi

    @MukulRathi

    3 жыл бұрын

    Glad you found it useful!

  • @-long-
    @-long-3 жыл бұрын

    Mukul Rathi you have a great ability to explain clearly the problem, keep it up! Thank you.

  • @MukulRathi

    @MukulRathi

    3 жыл бұрын

    Thank you very much!

  • @user-fw9lc4qj9m
    @user-fw9lc4qj9m2 ай бұрын

    absolutely clear :D thanks a lot!!!

  • @loukas371
    @loukas3712 жыл бұрын

    Awesome video, very helpful to clear some things up, thank you :)

  • @MukulRathi

    @MukulRathi

    2 жыл бұрын

    Glad it was helpful!

  • @user-bt7si4ov7s
    @user-bt7si4ov7s2 жыл бұрын

    So informative video!!And I really appreciate your video clap

  • @MukulRathi

    @MukulRathi

    2 жыл бұрын

    My pleasure!

  • @user-br9gc7sj8e
    @user-br9gc7sj8e3 ай бұрын

    Thank you so much.

  • @egesarca9550
    @egesarca95506 ай бұрын

    Hey Mukul, great video and blog thanks a lot! My senior design project is Differential Privacy in Federated Learning and i have found your content extremely useful. In your blog u mention that there will be Part 2 about the moments accoundant yet i see that you have stopped creating content. Can you give me any advise, useful links about this topic? I study Electronics engineering so i'm a newbie in field of AI, I'd appreciate any help!

  • @VatsalVenkatkrishna
    @VatsalVenkatkrishna9 ай бұрын

    Thanks for the video! Very well explained. I had a question though: doesn't one lose out on potentially critical information if the confidence of a given model is clipped? For example, if Bob would achieve a probability of 0.8 when his data is added to the training set, isn't it an unfair estimate to report a probability of 0.57? The seriousness with which one takes this prediction would be impacted.

  • @macknightxu2199
    @macknightxu21992 жыл бұрын

    at 10:28, why the privacy loss of moments accountant that we're calculating is actually a random variable of its own?

  • @sonjaadomeit9600
    @sonjaadomeit9600 Жыл бұрын

    Hi Mukul, I'm a bit late to the game. Thanks for your video! In your example of Bob having cancer, I think it's confusing how you used log(0.8/0.5) in the formular of privacy loss, because in Dwork and Roth, the formular looks at distributions for a given outcome (e.g., 0.55) and does not compare two different outcomes. Maybe you could comment on this?

  • @michaeljang299
    @michaeljang2992 жыл бұрын

    Very informative video. I study ML fairness for my Ph.D and new to privacy literature. It helped me alot. Could you make a video about facebook team that you are involved in?

  • @MukulRathi

    @MukulRathi

    2 жыл бұрын

    Thanks! Yes I will do, I’m currently editing a video on what life as been like as a new software engineer at Facebook :)

  • @eduardojreis
    @eduardojreis Жыл бұрын

    3:16 - If I have the ability to provide bob's data to train the model, wouldn't that mean that I already need the information I want to leek prior to leak that information?

  • @user-tg6ot1mb2i
    @user-tg6ot1mb2i3 жыл бұрын

    Good ~

  • @MukulRathi

    @MukulRathi

    3 жыл бұрын

    Thanks!

  • @aninditasaha0307
    @aninditasaha0307 Жыл бұрын

    Can someone pls explain me the delta added to the equation?

  • @1UniverseGames
    @1UniverseGames2 жыл бұрын

    Nice video. Is it possible can you share coding implementation of these privacy for federated learning for beginners. Thank you

  • @MukulRathi

    @MukulRathi

    2 жыл бұрын

    linked in the description!

  • @810602jay
    @810602jay2 жыл бұрын

    What is the big O( ) means at 10:14 ? In the epsilon term of (\epsilon, \delta) - DP formula

  • @MukulRathi

    @MukulRathi

    2 жыл бұрын

    big O() is asymptotic complexity: we're measuring how much the overall loss changes as a function of q, epsilon, T, and big O notation lets us disregard constants. en.wikipedia.org/wiki/Asymptotic_computational_complexity

  • @810602jay

    @810602jay

    2 жыл бұрын

    @Mukul Rathi Very Thanks Mukul ! 🥰 But I still confusing about the detail of compute the each epsilon_i in each step of gradient update. How to use the "Tail Bound" formula of original paper (Theorem 2.) to compute the each epsilon_i in training process?

  • @MukulRathi

    @MukulRathi

    2 жыл бұрын

    ​@@810602jay the authors use numerical integration to compute α(λ). Then since δ is fixed, we can rearrange the Tail Bound formula to get the value of ε in terms of α(λ) and δ.

  • @810602jay

    @810602jay

    2 жыл бұрын

    @@MukulRathi Very Thanks Again ! 🥰 I try to rearrange the Tail Bound formula. i.imgur.com/Xnfb67h.png But I find there are not involved the training step T in the Tail Bound & the asymptotic bound α(λ) formula. Does that mean I can obtain the total Privacy Loss ε only use q & λ & σ & δ parameters, and without T? I think I was missing some concept because the experiment result shows the privacy loss ε increasing with the epochs (related to training step T) Can you give me a brief example of how to use these two formulas (Tail Bound & the asymptotic bound α(λ)) to obtain the privacy loss ε by when training step=T? Thank you in advance! 🥰 🥰 🥰 You really help me to learn a lot of concepts.

  • @MukulRathi

    @MukulRathi

    2 жыл бұрын

    @@810602jay Regarding rearranging the tail bound formula, when we minimise the tail bound with respect to lambda, we try values up to 32 and pick the lambda that gives the smallest value of the expression. This is not necessarily when lambda=32 so you can't substitute 32 into the expression. I re-read the paper and the proofs contained in the appendix of the paper are pretty thorough. I'll try to fill in the gaps as to where the bounds in the proof presented in the appendix come from. The proof for Theorem 1 should fall out by convincing yourself first that α(λ)

  • @detpekandefingret
    @detpekandefingret3 ай бұрын

    06:13 "adding noise to the data", while noisy smoke detector simultaneously chirping in the background. Seriously, don't die in a fire, please replace those batteries, for your and everyone else's safety.

  • @viralgupta5630
    @viralgupta56302 жыл бұрын

    Can you clearly define privacy loss.

  • @MukulRathi

    @MukulRathi

    2 жыл бұрын

    It's defined at 4:26, no?

  • @viralgupta5630

    @viralgupta5630

    2 жыл бұрын

    @@MukulRathi Thanks for pointing out. I was confused about privacy loss when looking at 8:45. How is it computed isn't obvious. In 4:26 it talks about probability with D and D'. But in 8:45 we are learning just single theta.

  • @MukulRathi

    @MukulRathi

    2 жыл бұрын

    @@viralgupta5630 Hey, so the process of learning theta leaks information about the data we are training on. However, if we add Gaussian noise and clip the gradients (controlled by C, sigma) then we can bound the privacy loss for each gradient descent step. We compute this using the moments accountant. As discussed in 10:24 onwards, the intuition behind the moments accountant technique is that we treat the privacy loss (defined as 4:26) as a random variable and then compute bounds on its distribution - the plot of the distribution at 10:50 gives you an intuition for what epsilon and delta refer to. I'd suggest looking at the paper (+ its appendix) to see the specific bounds and proofs. I deliberately gave a high-level overview in the video as this is an introductory video, however if you look at the replies to the other comments I've gone into the details in response to specific questions about the proofs. I think HengJay's question was similar to yours, so perhaps that response also answers your question.

  • @macknightxu2199
    @macknightxu21992 жыл бұрын

    Hi, at 5:05, what's the meaning of Higer std.dev?