An intuitive introduction to Propensity Score Matching

Propensity score matching is a common technique used to estimate the effects of a treatment or program when you don't have a randomized controlled experiment. In particular, it's used when you have observational data that includes pre-program characteristics that determine whether or not each individual received the treatment.
In this video, I work through a simple example of how it works and give you the basic intuition for the method. I also talk briefly about how to assess how well the method works, and discuss the method's advantages and disadvantages relative to multiple regression.
Intended audience: Folks who have had some exposure to linear regression models, but want to learn more statistical methods.

Пікірлер: 119

@christopherzimmer2 жыл бұрын
Among the dozens of PSM videos, this stands out as simply the best. The central example, shown clearly with the intuitive elements highlighted, and the discussion at the end regarding what PSM does *not* do- are crucial and critical! One suggestion: insert a slide showing the logit regression model to really highlight where the probabilities are coming from.
@namgaydorji33444 жыл бұрын
Extremely helpful to someone who is just beginning to learn the PSM approach. Thank you very much.
@DermDrNik4 жыл бұрын
This is excellent, refreshing to see a tutorial where you can tell someone knows what they're doing
@moqaraza9 жыл бұрын
Extremely helpful, especially with the simple, minimalistic data example. Thank you.
@gelodude074 жыл бұрын
This is so much better than most books!
@katyasotiris96672 жыл бұрын
Intuitive indeed! Love the simplicity and clarity in your explanation, thank you!
@myyoutubechannel28583 жыл бұрын
Thank you --- wonderful video. When I read "intuitive", I was skeptical. But you truly made it intuitive.
@svalbard019 жыл бұрын
This was really helpful and intuitive. Thank you!
@Dave487973 жыл бұрын
Loved the Video. Best explanation of Propensity Score Matching I ve come across this far.
@andeslam73702 жыл бұрын
i don't know what to say but you teaching is way better than my professor's teaching.
@zhulin25315 жыл бұрын
Well done. It's very clear and I like it when you explained the advantages and disadvantages of propensity score matching. Very useful for interviews
@dougmckee6739 жыл бұрын
Thanks so much for the positive feedback!
@anupamghosh65785 жыл бұрын
Excellently presented intuitive explanation of p-score matching! Thank you
@daniloamfreire9 жыл бұрын
Very easy to understand. Thanks a lot!
@szai6068 Жыл бұрын
Okay the second time watching this I finally understood. Thank you!
@montanabuntragulpoontawee40654 жыл бұрын
So easy to understand.As a clinician, I have a hard time studying statisitcs. Really appreciate your work. Thank you so much. Please do more VDOs like this! P.S. I still have a hard time figuring out inverse probability weighting following propensity score use.
@32deepan5 жыл бұрын
Thanks for excellent video Doug. Very informative and intuitive
@chaiwuty9 жыл бұрын
Thank you very much. Make me understand a lot more and more and looking forward to your video on propensity score.I use it in medical research.
@olajumokeolateju11042 жыл бұрын
Your example made it easy to understnad. Thanks so much
@popo-je8ze2 жыл бұрын
great explanation
@triong5 жыл бұрын
Just beautiful! Thanks a lot, Doug.
@siyuhou19574 жыл бұрын
I don't quite understand the reasoning behind why we can use people's characteristics to predict whether a person is assigned to the treatment group or not. Why are we assuming that the assignment is based on the characteristics, and hence build a logistic regression to predict the assignment using these characteristics, then use the probability as a measure of 'similarity'? I am sure it's right, just don't understand why...
@fksons41612 жыл бұрын
Thank you for this explanation
@Haz22887 жыл бұрын
Huge thanks for this, Doug!
@richardmuhindo14396 жыл бұрын
indeed i needed this at this time in my phd studies
@sangheepark077 жыл бұрын
This is an amazing explanation! Thank you!
@projectkfw82014 жыл бұрын
Thank you very much sir after watching many finally I understood from you
@yumik49904 жыл бұрын
I love how your examples are small. There are pro and con in propensity score matching vs multivariate regression. But if one believes that the propensity score can be used to explain casual effects, the multivariate regression model is just as much be able to explain casual effects as both eliminates cofounding factors.
@Potencyfunction4 ай бұрын
😃 What an interesting score.
@FlywithZahanat2 жыл бұрын
very clear Dear
@roraaa11 Жыл бұрын
Great explanation!
@paolo440111 ай бұрын
mi problem is: how I do interpretate the new dataset generated after PSM? how do I create a table showing percentages of each categorical covariate I've chosen for matching?
@RightAIopenАй бұрын
Really good
@ceciliapisoni6 жыл бұрын
The video is excellent. Thank you very clear and helpful.
@Jhonnydonny6 жыл бұрын
This is an amazing explanation. Thanks.
@ripples19842 жыл бұрын
quite intuitive and helpful, thanks!
@Has_19905 жыл бұрын
Thank you Doug! This was very helpful
@indikamallawaarachchi71886 жыл бұрын
Very good explanation. Thank you!!!
@jacksheng7650 Жыл бұрын
God, this is so good!
@melodydaccache41892 жыл бұрын
This is excellent
@masudparvez91332 жыл бұрын
Its really helpful, but can you please tell how you calculated ps1? How can I do it in Stata?
@sumitmandal39012 жыл бұрын
amazingly explained! Thanks
@Lake_mondota4 жыл бұрын
very clear! great example!thanks
@toobaahmedalvi7008 Жыл бұрын
How did you summarize the infant mortality rate lowering 7 deaths per 1000?was 1000 your sample population among treated and non treated infants??
@maxi01v3 жыл бұрын
better than my textbook!
@eviirawan487 жыл бұрын
Very clear explanation
@anmolpardeshi3138 Жыл бұрын
why are you considering weights when calculating effect size. eg 0.25*() - 0.25*() - where did this 0.25 came from and why?
@ericlau64355 жыл бұрын
Great work
@douglasespindola51855 жыл бұрын
Man, I LOVE YOU! Hahaha! Greetings from Brazil! Nice job!
@linpershey2 жыл бұрын
Brilliant! Learned a lot from it!
@timte59242 жыл бұрын
Excellent video, thank you very much! Can you maybe quickly explain how you calculated and displayed PS1 in Stata? I understand how to run the regression but I struggle to find the PS1 outputs per line, so I can actually match one line to another
@hangsu52945 жыл бұрын
Really really helpful, you saved my ass! THANKS!!!!! You earned yourself a subscriber!
@pricillajeyapaul3 ай бұрын
Thanks a lot bro 🎉
@f2harrell3 жыл бұрын
It doesn't follow that a large number of control observations are irrelevant if the treatment is very imbalanced. Matching methods tend to discard very applicable controls just because they came later in the dataset. The resulting loss of sample size makes matching inefficient.
@TheAkshaykher4 жыл бұрын
Awesome Video!
@kellermartinezsolis59262 жыл бұрын
Thanks for the video! It is very clear, just a quick question: how did you compute in Stata the column "ps1"?
@250IZ3 жыл бұрын
This was well simplified
@paigetao67583 жыл бұрын
Very helpful. Thank you so much
@chris6925 Жыл бұрын
Awesome!
@johnnychiu97155 жыл бұрын
This is great! Thank you!
@paulinavazquezquintana5662 Жыл бұрын
Which program do you use to calculate this analysis? Are there some code packages, which can be used and upload data? Thanks!
@SNSDjennifer7 жыл бұрын
Dear Doug, thank you for making this great and easily understanding video. However, a small question regarding the computation of predict probability of treatment, could you show me the calculation of one psl in the example? Thank you :)
@powermod6772
2 жыл бұрын
Logistic regression models the Posterior P(T|X) as a Bernoulli. So for some x value, the logistic regression model returns a probability p for T=1, i.e. p = P(T=1|X=x). This is the propensity score. Note that in classification p is the predicted probability for T being 1. To make a class label (for which purpose logistic regression is most often used) you simply predict class 1 if p > 0.5. But this class label prediction step is omitted here.
@mayastoyanovawarner79974 жыл бұрын
Yes! Thank you! I had so many aha moments watching this!
@bijaya77646 жыл бұрын
Thanks for the teaching... Do you also have video that shows how you calculated the individual ps1 values? thanks
@garbour4566 жыл бұрын
Great video, thanks for doing this
@hm.913 жыл бұрын
Great video! Thanks a lot!
@user-xh4lp5ts8g9 жыл бұрын
This is great:D Thanks!
@yulinliu8502 жыл бұрын
Thanks!
@valeriablanco033 жыл бұрын
Hi! Here you calculate ATT = -7, how do you obtain ATE in this simple example?
@AbhishekSharma-mt8yz5 жыл бұрын
This is very helpful. What happens if the balancing property is not satisfied?
@bharathkumar324 жыл бұрын
Hello Doug, I had extremely good learning from your video. I have one challenge in application. My treatment observations are more than control observations. In this case, how does the matching works? What are the challenges generally this data set would have?
@ziceru83814 жыл бұрын
Could you tell me how do you preprocess your data? My result of Logit regression is different from yours.
@spencerfrank88374 жыл бұрын
Really helpful. Thanks!
@gelodude074 жыл бұрын
Good however, in the logistic regression why wasnt the predictive accuracy of the model not factored in. One can use the confusion matrix and sensitivity.
@user-rv3ic2dz9x3 жыл бұрын
informative
@kayjang49015 жыл бұрын
Thank you so much for your great presentation. It is really intuitive. I have seen an article that used a multiple regression with a matched samples instead of using one approach. What do you think of that? Could you advise me?
@NZegg7 жыл бұрын
Dear Doug, Thank you for this very helpful video. I have a question regarding the selection of the covariates when using teffects in stata. The dataset Im using contains 2.8mio observations and I wanna try to estimate the causal effect of brazils Bolsa Família programm (similar to mexicos Oportunidades on which you've also uploaded a video) on educational outcomes. Im not sure on which variables I should match the treatment and control group. Could you please give any suggestions how one should choose the right variables for matching? Thank you in advance =)
@MrCuongnguyendang5 жыл бұрын
Thank you for this video, it is very helpful. I need to use the Propensity Matching Score methodology and my dependent variable is a dummy, could you give me a suggestion to evaluate the difference between control and treatment group, thank you so much
@fernandojackson72077 жыл бұрын
Thanks, nice presentation, Prof. Please check if my understanding is correct. I just saw a claim that school X has a graduation rate higher than all other schools with students in similar socioeconomic background. Would PSM work as to make sure that the student groups being compared to each other re graduation, have similar social background?
@3foss1917 жыл бұрын
is lowering the infant mortality by 7...? sorry im not getting well the pronunciation. thks
@dharman.bhatta70428 жыл бұрын
Dear Doug, your videos are very informative and easy to follow, could you please provide the PSM Stata commands for RCT study designs. Your first video related to DiD is very easy to follow with stata commands. Thank you
@dougmckee673
8 жыл бұрын
+Dharma N. Bhatia Glad you like the video! If your RCT is truly randomized, you shouldn't need to do any adjustment using matching--Just use a simple t-test to compare means of continuous variables in your treatment group to your control group.
@dharman.bhatta7042
8 жыл бұрын
+Doug McKee , Thank you for your response, yes true, just I wanted to cross check the DiD (impact) with matching or without matching. Thank you.
@danielmillian20243 жыл бұрын
from were did you get the 0.25 ?
@wgeorge16025 жыл бұрын
really good
@nikolov9018 жыл бұрын
I'm trying to learn more about matching and stumbled upon your video. It seems that you frame the question Regression vs. Matching, while other articles I read (including wikipedia) seem to use matching as a preprocessing step in a regression. What's up with this discrepancy?
@dougmckee673
8 жыл бұрын
Both are correct. Classic propensity score matching (what I describe here) is an alternative to regression--You use the covariates to identify close matches between observations of treatment and control. More recently it's become popular to combine regression and propensity scores. That is, you can use the inverse of the propensity score for each observation as a weight in a regression analysis.
@cocoagardenia6 жыл бұрын
So helpful!
@artwork21798 жыл бұрын
What is 0.25 and -0.25 written in the blue equation on slide 13? Thanks for the video. Its insightful.
@dougmckee673
8 жыл бұрын
+Soumya Upadhyay I'm computing the average in the treatment group by just adding the four outcomes together and dividing by 4 (aka multiplying by 0.25) and then doing the same thing for the matched control group. Hope this clears things up!
@kareemmohammed78623 жыл бұрын
at10:12, where match were 6 and 5, in formula its -0.25*(19+25+25+25). it should have been -0.25*(25+19+19+19)..
@graysonbuning500
3 жыл бұрын
No, observation 5 was matched three times and thus we use the observation 5 PS of 25 three times.
@douglasmangini87445 жыл бұрын
helped a lot, thank you!
@zeinebouni87648 жыл бұрын
Thank you for this video is very helpfull. I need to use the Propensity Matching Score methodology and my dependent variable is ordinal. I am Using Stata 14. I just want to know if there is a specific specification for ordinal outcomes? In Stata 14 we have the choice between: Continous Outcomes, Binary Outcomes, Count Outcomes, Fractional outcomes, nonegatives outcomes and survival Outcomes. But not Ordinal outcomes. Thank you
@dougmckee673
8 жыл бұрын
+Zeineb Ouni I don't know of anything built in, but I think you could use propensity score matching to create your matched control group, and then use something like a Wilcoxson Rank Sum test to see if the distributions are significantly different in the two groups. You could also run a ologit with a single independent variable (the treatment dummy) with the combined treatment and matched control data set to quantify the differences. Hope this helps!
@zeinebouni8764
8 жыл бұрын
+Doug McKee Thank you Mr Doug for your response. It's very helpful. I have another idea. This is the situation: The dependante Variable is Ratings Firms (1 to 7; 1 is low Rating and 7 is high). Independantes Variables: D1 (Treatment); D2 (Time). I thougt transform my dependant Variable and create a binary Variable according to the average of Rating. So Ranting2 = 1 if Rating> Average; 0 if Rating And use Propensity Matching Score for binary Outcomes using Rating2. What do you think? Thank you so much.
@dougmckee673
8 жыл бұрын
+Zeineb Ouni This throws away a little information, but it should work.
@zeinebouni87648 жыл бұрын
Hi Mr Doug, I am very confused between the commands of Endogenous treatment effects (eteffect in stata) and Linear regression with endogenous treatment effects (etregress in Stata). What's the main difference and when i have to use one not the other one. Really confused. Thank you for the help.
@dougmckee673
8 жыл бұрын
+Zeineb Ouni Great question and believe it or not, this is the first I've heard of either of these commands! Sorry I can't be of any help at all! I recommend spending some quality time with the TE (Treatment Effects) Stata manual.
@zeinebouni8764
8 жыл бұрын
Thank you very much for your interest anfd for recommandations.
@michellesaksena12268 жыл бұрын
Doug, I was wondering if PSM can be used when there is no apparent selection bias, but rather to make a comparison between the treated and non-treated groups. For example, if i were to designate birth cohort as my "treatment" where obviously birth year is not an individual decision, the PSM would essentially boil down to pair-wise controlling of treated and non-treated individuals based on whatever J attributes. As in, the distributions of p-scores should be the same for treated and non-treated groups. For an example, i have seen gender used as a "treatment" to compare wage differentials between men and women within subsets of STEM disciplines and gender is for the most part, not an individual decision. However, this was a tautological exercise so i am not sure if this is actually practiced in real life research. Basically, are there other benefits of PSM other than ameliorating selection bias that are used in practice to justify using PSM? Thanks, Michelle
@dougmckee673
8 жыл бұрын
+Michelle Saksena Sometimes people use propensity score matching when they believe the treatment might have very different effects on different groups and they want the control group to look as much as possible like the treatment group. In the situation you describe where you have two groups that are not systematically different, a t-test is the most straight-forward way to compare outcomes. If there is a lot of variation that can be explained by observable characteristics, most people would simply use a regression to increase the precision of the estimate of the difference. Hope this helps!
@michellesaksena1226
8 жыл бұрын
this helps! thank you!!
@niveditasrivastava6542 жыл бұрын
where is the 0.25 in the equation coming from not ?
@masonwang92184 жыл бұрын
nice video
@roypeijen8 жыл бұрын
Dear Doug, thanks for this video since it already helped me a lot. I have a question though I would like to ask. After you computed ps1 by logistic regression (controlled for vector X), you create match1. How did you create this match1 variable? Did you do this just by hand or is there any stata command that looks at the best match given the scores in ps1? In my large dataset I cannot do it by hand, that is why I am asking. Thanks in advance.
@dougmckee673
8 жыл бұрын
+Roy Peijen Great question--I used Stata's "teffects" command. Specifically: . teffects psmatch (imrate) (T povrate pcdocs) ,gen(match) atet
@ghadaabu-sheasha42787 жыл бұрын
Amazing
@mekonnendemlie20283 жыл бұрын
it is good and clear howevere it becomes clear if i is with practical example
@omarfrikhat5191 Жыл бұрын
Interesting (y)
@3foss1917 жыл бұрын
thks for the video
@artwork21798 жыл бұрын
Mr. Mckee, You said that the command is logistics. Isn't it psmatch2 in stata
@dougmckee673
8 жыл бұрын
+Soumya Upadhyay People used to use 3rd party plugins to do propensity score matching in Stata, but in version 13, Stata added the teffects command which is quite powerful and does ps matching along with several other things.
@felipeestradadeaguirre56928 жыл бұрын
how can you do the matching of PS using stata?
@user-tw8qp4wr7p
7 жыл бұрын
Felipe Estrada de Aguirre Try the psmatch2 cmd, hope it helps