ROC and AUC in R

This tutorial walks you through, step-by-step, how to draw ROC curves and calculate AUC in R. We start with basic ROC graph, learn how to extract thresholds for decision making, calculate AUC and partial AUC and how to layer multiple ROC curves on the same graph.
You can get a copy of the code from the StatQuest GitHub, here:
github.com/StatQuest/roc_and_...
NOTE: This StatQuest builds on the example in the original ROC and AUC StatQuest:
• THIS VIDEO HAS BEEN UP...
Also, if you're curious, here are some links to StatQuests about...
...Logistic Regression
• StatQuest: Logistic Re...
...and Random Forests...
• StatQuest: Random Fore...
For a complete index of all the StatQuest videos, check out:
statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Patreon: / statquest
...or...
KZread Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshirt.com/statques...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
#statquest #ROC #AUC

Пікірлер: 367

  • @statquest
    @statquest3 жыл бұрын

    You can get a copy of the code from the StatQuest GitHub, here: github.com/StatQuest/roc_and_auc_demo/blob/master/roc_and_auc_demo.R Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @falaksingla6242

    @falaksingla6242

    2 жыл бұрын

    Hi Josh, Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so. Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.

  • @ryanmckenna2047

    @ryanmckenna2047

    Жыл бұрын

    The code would not run when I downloaded it from github

  • @statquest

    @statquest

    Жыл бұрын

    @@ryanmckenna2047 What part didn't run? I just re-ran it and worked fine.

  • @ashishdayal172

    @ashishdayal172

    Жыл бұрын

    did u make this in python too??

  • @statquest

    @statquest

    Жыл бұрын

    @@ashishdayal172 not yet

  • @ripsu100
    @ripsu1005 жыл бұрын

    "The only man who never makes mistakes is the man who never does anything." Thank you ;)

  • @statquest

    @statquest

    5 жыл бұрын

    No, thank you! You're comment was very helpful and spared me a lot of future embarrassment. The video was only seen by 100 or so people (not 1,000s) before you pointed out the error.

  • @marcianocaliman8601
    @marcianocaliman86015 жыл бұрын

    Dude, your videos are great. I never found something so clearly on the internet. Congratulations!!!

  • @marcoventura9451
    @marcoventura94512 жыл бұрын

    Impressive video. Theory and examples with software are the best way to learn. There is much going on this video, one of the best of ever. Thank You, Josh, greetings form Italy for a happy new year for you, your beloved ones and for all the people which follow your amazing lessons.

  • @statquest

    @statquest

    2 жыл бұрын

    Wow, thanks!

  • @kumarrishabh8904
    @kumarrishabh89044 жыл бұрын

    Such an awesome channel I came across! ....gonna share it with everyone under my umbrella !!! You are doing really great bro!

  • @statquest

    @statquest

    4 жыл бұрын

    Thank you! :)

  • @EdySold
    @EdySold11 ай бұрын

    Complex things in simple and understandable language. I have never met a better teacher!

  • @statquest

    @statquest

    11 ай бұрын

    Thank you very much! :)

  • @rigae2
    @rigae29 ай бұрын

    Your explanation of the process and logic behind each function and line are so helpful. I hope you'll make more of these videos. Thank you so much, this content is uniquely valuable.

  • @statquest

    @statquest

    9 ай бұрын

    Thanks!

  • @yvnasu5714
    @yvnasu57142 ай бұрын

    Crazy how good you are at explaining. You explain the little things I always start to struggle with other teachers/tutors! Thank you so much for these Videos

  • @statquest

    @statquest

    2 ай бұрын

    Happy to help!

  • @geocarvalhont
    @geocarvalhont5 жыл бұрын

    Hey Josh Ty again, while my studies I reproduced everything using R Colab (Really recommend for who is studying Josh's codes in R)

  • @SurrenderPink
    @SurrenderPink4 жыл бұрын

    Best song ever, Josh. StatQuest keeps gettin’ better and better! Many thanks.

  • @statquest

    @statquest

    4 жыл бұрын

    Thank you so much! :)

  • @rylieedwards2641
    @rylieedwards2641 Жыл бұрын

    Great explanation of everything including each parameter in the graphs. Loved it!

  • @statquest

    @statquest

    Жыл бұрын

    Thank you!

  • @Zahumny
    @Zahumny5 жыл бұрын

    Thank you for helping me with my credit risk class :)

  • @arike9289
    @arike92893 жыл бұрын

    Good job and well-done. I like your style of teaching, it's great!!!

  • @statquest

    @statquest

    3 жыл бұрын

    Thank you! 😃

  • @horseheadmd6844
    @horseheadmd68443 жыл бұрын

    Thank you for this informative video. It helped me a lot. Great work!

  • @statquest

    @statquest

    3 жыл бұрын

    Glad it helped!

  • @archowdhury007
    @archowdhury0074 жыл бұрын

    Wonderful tutorial!!.....thank you so much Josh :)

  • @statquest

    @statquest

    4 жыл бұрын

    Thanks! :)

  • @meenakshidevi5425
    @meenakshidevi54253 жыл бұрын

    Hey..... Love the way you present ❤️

  • @statquest

    @statquest

    3 жыл бұрын

    Thank you so much 😀

  • @happygolucky4350
    @happygolucky43503 жыл бұрын

    These are the best videos. When I need to relax, I watch your videos

  • @statquest

    @statquest

    3 жыл бұрын

    Glad you like them!

  • @happygolucky4350

    @happygolucky4350

    3 жыл бұрын

    @@statquest If you have two output neurons in a ANN (for a two class classification problem {1,0; 0,1}, it is okay to build the ROC just by comparing output of any one of those neurons with its corresponding target?

  • @happygolucky4350

    @happygolucky4350

    3 жыл бұрын

    Thanks Josh, I changed it to {1,0} as output as the AUC for the two neurons {1or0} in the {1,0;0,1} architecture were not the same.

  • @benguo661
    @benguo6612 жыл бұрын

    Thank you sooooo much Josh! You are a life saver!!😄

  • @statquest

    @statquest

    2 жыл бұрын

    Happy to help!

  • @esan120au
    @esan120au Жыл бұрын

    Thanks for your wonderful and detailed videos!

  • @statquest

    @statquest

    Жыл бұрын

    Thank you so much for supporting StatQuest! BAM! :)

  • @akshay_up
    @akshay_up5 жыл бұрын

    You are amazing man, thanks for the video and keep making more videos like these. BAM!!

  • @statquest

    @statquest

    5 жыл бұрын

    Double BAM!!!! Thanks for the encouragement! :)

  • @bitclear670

    @bitclear670

    5 жыл бұрын

    Double Bam!!

  • @peterh5960
    @peterh59603 жыл бұрын

    Incredibly helpful, thank you!

  • @statquest

    @statquest

    3 жыл бұрын

    Thanks!

  • @justchiful
    @justchiful4 жыл бұрын

    Dear ,i haveenjoyed ur video ,very much clearity of thoughts

  • @statquest

    @statquest

    4 жыл бұрын

    Thank you so much 🙂

  • @famin7794
    @famin779427 күн бұрын

    You solve my headache. Thanks a lot

  • @statquest

    @statquest

    27 күн бұрын

    Happy to help!

  • @dylanz52
    @dylanz525 жыл бұрын

    Great video! One quick question. Do you know how to plot ROC-AUC graph for SVM and adaboost?

  • @ogunsadebenjaminadeiyin2729
    @ogunsadebenjaminadeiyin27294 жыл бұрын

    Thanks man, very clear and helpful

  • @statquest

    @statquest

    4 жыл бұрын

    Thanks! :)

  • @xyliu3758
    @xyliu37586 ай бұрын

    hey bro, i love your videos so much, please hang in and i will continue to support you!

  • @statquest

    @statquest

    6 ай бұрын

    Thank you very much!

  • @thuli5209
    @thuli52093 жыл бұрын

    Thank you sooooo much for your lessons. Super helpful

  • @statquest

    @statquest

    3 жыл бұрын

    Thanks!

  • @nabilmahmoud608
    @nabilmahmoud6085 жыл бұрын

    This video is absolutely amazing! but how can i determine the threshold/cut off weight from threshold probability that decides whether the subject is obese or not using code and not by direct extrapolation from the logit curve?

  • @odearjafter9426
    @odearjafter94262 жыл бұрын

    thank you for such an informative tutorial

  • @statquest

    @statquest

    2 жыл бұрын

    Glad it was helpful!

  • @tojama
    @tojama3 жыл бұрын

    Great again! I would be interested to see how to make combined ROCs for, say 2-4 different biomarker candidates. This would be to see if their combined use would result in higher AUCs than that of individual markers.

  • @statquest

    @statquest

    3 жыл бұрын

    Noted

  • @yulinliu850
    @yulinliu8505 жыл бұрын

    Many Thanks Josh!

  • @statquest

    @statquest

    5 жыл бұрын

    You're welcome! :)

  • @ManyBadVids
    @ManyBadVids Жыл бұрын

    The silly songs, the calm voice and the bams gives this vibes as if the course is narrated by Forrest Gump. Love it.

  • @statquest

    @statquest

    Жыл бұрын

    Thanks! :)

  • @tynna333
    @tynna3335 жыл бұрын

    Is there anyway to suppress plotting the top and right axes? I tried bty='n' and axes=FALSE to add them later using axis(1) and axis(2) but neither of those worked.

  • @b1ndaboymetz
    @b1ndaboymetz3 жыл бұрын

    VERY helpful - thank you!

  • @statquest

    @statquest

    3 жыл бұрын

    Glad it was helpful!

  • @DailyKosia
    @DailyKosia4 жыл бұрын

    Thank you very much!

  • @elmonovagales2929
    @elmonovagales29295 жыл бұрын

    I got an error, Error in roc.data.frame(trainData, fitModelTrai$votes[, 1], plot = TRUE, : 'response' argument should be the name of the column, optionally quoted. the only difference between your code and mine is that I have many parameters/columns/features (approx 35) not only one (weight)

  • @michalispapadopoulos5090
    @michalispapadopoulos50902 жыл бұрын

    Thanks a lot sir! You are very helpful!

  • @statquest

    @statquest

    2 жыл бұрын

    Most welcome!

  • @JulioCCavalcanti
    @JulioCCavalcanti3 жыл бұрын

    You are amazing, man! Thanks!!!

  • @statquest

    @statquest

    3 жыл бұрын

    Thanks

  • @nalliwok
    @nalliwok Жыл бұрын

    Thank you so much for this video!

  • @statquest

    @statquest

    Жыл бұрын

    Glad it was helpful!

  • @AromaVancouver
    @AromaVancouver Жыл бұрын

    Keep up the good work .. Thank u🤩

  • @statquest

    @statquest

    Жыл бұрын

    Thanks!

  • @vivektanwar628
    @vivektanwar6282 ай бұрын

    YOU ARE MARVELOUS,EXTRAORDINARY .I WISH YOU COULD HAVE EXPLAINED IN PYTHON

  • @statquest

    @statquest

    2 ай бұрын

    One day I will.

  • @anikshah8796
    @anikshah87963 жыл бұрын

    THanks for the videos Josh! I have a question about AUC. Even though in this video AUC for random forest is lower than logistic, isn't forest a better alternative here as there exists a threshold that generates higher true positive rate for the same false positive rate compared to logistic. This makes the significance of AUC subjective in comparison

  • @statquest

    @statquest

    3 жыл бұрын

    What you have to do is pick a range of thresholds that are acceptable. Once you do that, you can compare the AUC between those thresholds to determine which method is best.

  • @marco1anziano84
    @marco1anziano84 Жыл бұрын

    I mean, the stats tutorial is indeed very well done, but the intro song was already enough to make me immediatly click on the like button.

  • @statquest

    @statquest

    Жыл бұрын

    bam! :)

  • @qurrataayunkartika1496
    @qurrataayunkartika14962 жыл бұрын

    waaa.. i'm so thankful found this video. Thanks a lot. Stay healthy cool people :)

  • @statquest

    @statquest

    2 жыл бұрын

    Thanks!

  • @christelleleitzingerphd7491
    @christelleleitzingerphd74912 жыл бұрын

    Thanks for the video and explanations! What statistical test would you use to compare 2 ROC curves?

  • @statquest

    @statquest

    2 жыл бұрын

    There are a bunch of options. This tool (in R) implements them: bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-77

  • @zainabkhan2475
    @zainabkhan24752 жыл бұрын

    I thank God I found this channel 2 years ago... 😇

  • @statquest

    @statquest

    2 жыл бұрын

    bam!

  • @zainabkhan2475

    @zainabkhan2475

    2 жыл бұрын

    @@statquest 😄😄😄

  • @songuihamedkone6600
    @songuihamedkone66004 жыл бұрын

    Thanks a lot !

  • @user-io2em1ld4n
    @user-io2em1ld4n Жыл бұрын

    Hey Josh, great videos on ROC curves, your teaching is refreshingly concise and clear. I just have one question that I hope you could expand on. When we first generate 100 samples from a normal distribution, why do we need to sort them from low to high? And what would the dangers be if we didn't do this? Thanks for the great content!

  • @statquest

    @statquest

    Жыл бұрын

    What time point in the video, minutes and seconds, are you asking about?

  • @user-io2em1ld4n

    @user-io2em1ld4n

    Жыл бұрын

    @@statquest roughly around 2:55

  • @statquest

    @statquest

    Жыл бұрын

    @@user-io2em1ld4n Technically, you don't need to sort them, but it makes it easier to look at the data. When we print out the values for the "obese" variable at 4:11, the output is way easier to interpret because the values for weight were sorted.

  • @Davidravaux
    @Davidravaux5 жыл бұрын

    Thank you so much! Do you consider make a video about limited dependant variables models (tobit, heckman...)? It will be very helpful for us! All the best.

  • @statquest

    @statquest

    5 жыл бұрын

    OK. I'll put it on the to-do list, but it will be a while before I get to it.

  • @Davidravaux

    @Davidravaux

    5 жыл бұрын

    Thank you! This is a short bibliography about the topic: J. Scott Long, Regression Models for Categorical and Limited Dependent Variables Alfred DeMaris, Regression With Social Data: Modeling Continuous and Limited Response Variables Wooldrige, Introductory Econometrics I can share you the books if needed.

  • @statquest

    @statquest

    5 жыл бұрын

    @@Davidravaux OK. However, just know that my to-do list is huge (it has about 200 things on it - I get about 3 or 4 requests every day), so it might take me a long time to get to it. However, if a lot of people start asking for a certain topic, that topic gets moved closer to the top of the to-do list. So, if you know of a ton of people interested in this subject, you should have them add to this comment.

  • @Davidravaux

    @Davidravaux

    5 жыл бұрын

    Ok, I totally understand, thank you for clarifying.

  • @pitiwatkittiwimonchai4656
    @pitiwatkittiwimonchai46562 жыл бұрын

    So good Thanks for the video

  • @statquest

    @statquest

    2 жыл бұрын

    Glad you enjoyed it!

  • @marynapolyakova8722
    @marynapolyakova87223 жыл бұрын

    Thank your great lectures! The thresholds that you derive here are between 0 and 1. Can we translate these thresholds to the actual cut-off values?

  • @statquest

    @statquest

    3 жыл бұрын

    In these examples, the thresholds are the actual cut-off values. In other words, if the logistic regression predicts that the probability that a mouse is obese is 0.9, then we would compare that to the threshold that we obtained from the ROC graph to make a final classification.

  • @curvesettermcatprep1400
    @curvesettermcatprep14003 жыл бұрын

    Love your content! Quick q: from a conceptual standpoint, are you just testing the hypothesis that the underlying distribution of the weights (which you defined as a gaussian) is not a uniform distribution

  • @statquest

    @statquest

    3 жыл бұрын

    ROC graphs give us a sense of how accurate or models are given different thresholds for making decisions. For more details, see: kzread.info/dash/bejne/Zp6GpLR9kq3LnbA.html

  • @vishakhakumar3854
    @vishakhakumar38544 жыл бұрын

    great video!!

  • @statquest

    @statquest

    4 жыл бұрын

    Thank you! :)

  • @KarenCruz-tx5nh
    @KarenCruz-tx5nh4 жыл бұрын

    You are the savior of the little humans we are, thanke you god! I have a silly question, sometimes you use

  • @statquest

    @statquest

    4 жыл бұрын

    R is funny about the "

  • @lalita3853
    @lalita38535 жыл бұрын

    Thank you sir

  • @jovanpetrovic168
    @jovanpetrovic1683 жыл бұрын

    Hi Josh, your videos are great! I have one question about choosing best method based on ROC overlapping graph. If we compare Logistic Regression and Random Forest we see that Logistic Regression is better because of bigger AUC. Bur does it make more sense here to choose Random Forest because one specific instance of Random Forest (with one specific threshold) gave us best confusion matrics? I assumed here that accurately classifyng positive and negative class are equally important.

  • @statquest

    @statquest

    3 жыл бұрын

    It really depends on your goals. In general, Logistic Regression performs better. However, depending on what threshold works best for you, you may still choose Random Forests if it performs better at that threshold.

  • @Ryutora8
    @Ryutora84 жыл бұрын

    I Have a problem with this. The ifelse function is giving me a different value each time i run it. ¿Do you have a clue why is this happening?

  • @jeanmarysymon3596
    @jeanmarysymon35965 жыл бұрын

    could you do the same in python too?

  • @nabilmahmoud608
    @nabilmahmoud6083 жыл бұрын

    Hey Josh, is there a way to make inferences on more than two ROC and to perform multiple comparisons? (a generalization of DeLong's test? and maybe a method to adjust alpha for multiple comparisons too?)

  • @statquest

    @statquest

    3 жыл бұрын

    Good question! Off the top of my head I don't know if there is or not.

  • @timstone5168
    @timstone51684 жыл бұрын

    That ROC you had really tied the room together

  • @statquest

    @statquest

    4 жыл бұрын

    :)

  • @gregorsamsa3290
    @gregorsamsa32904 жыл бұрын

    Please make more Videos with R! :)

  • @statquest

    @statquest

    4 жыл бұрын

    :)

  • @dariatriffon6335
    @dariatriffon63353 жыл бұрын

    Hi and thanks for your great videos! Could you please elaborate about the obese variable and specifically about the "test" part in that code line. What if I already know who is obese and who is not (let's say based on some external medical profile, let's say "real") and I want to estimate the prediction of the model which is based on a some score (let's say "score") that each individual has. Would I just do glm(real ~ score).? What if I wanted to find the best score - the score that above it I classify someone as "obese" and below it "not obese". what's between the probability threshold in ROC curve and a thresholding of the score itself. Thanks!

  • @statquest

    @statquest

    3 жыл бұрын

    In order to draw this ROC graph, we have to know who is obese and who is not to begin with. So the situation in this video is no different from yours. If you want to find the "best" score, you have to then decide what percentage of false positives and false negatives you are willing to live with - the ROC graph will help you decide that. You can then find the corresponding value by looking at the thresholds and the probabilities predicted for from your model with different scores.

  • @jethrogauld7437
    @jethrogauld7437 Жыл бұрын

    Great video thanks

  • @statquest

    @statquest

    Жыл бұрын

    Thank you!

  • @harmagician1
    @harmagician13 жыл бұрын

    Bam! Good tutorial.

  • @statquest

    @statquest

    3 жыл бұрын

    Thanks! :)

  • @diegoangulo3724
    @diegoangulo37244 жыл бұрын

    is it possible to print the cutoffs at seq(0.1, by=0.1) in this curve with roc() function? ...Awesome videos btw!!!!!!!!

  • @statquest

    @statquest

    4 жыл бұрын

    I'm not sure this would be easy to do, since the thresholds may not exactly equal 0.1, 0.2, 0.3 etc. For example, in this video, the thresholds start at 0, then the next one is 0.013, then 0.032, ..., 0.088, 0.1004, 0.119, etc. So you see, there is no threshold that is exactly 0.1. So you'd have to calculate the differences from different thresholds and print the one that has the smallest difference.

  • @mzw90
    @mzw904 жыл бұрын

    Thank you for the video. It was very easy to follow. May I know how do i obtain optimal cut off points using the ROC curve?

  • @statquest

    @statquest

    4 жыл бұрын

    I answer that question in my video that explains ROC and AUC: kzread.info/dash/bejne/Zp6GpLR9kq3LnbA.html

  • @mzw90

    @mzw90

    4 жыл бұрын

    @@statquest Thank you for your reply! I was actually wondering how to interpret the threshold numbers seen on 09:51. After head(roc.df), you get a list of TPP, FPP and thresholds. For example in the 2nd row TPP 100 FPP 97.77, what does threshold of 0.01349 mean? I also have a separate question, I am curious if it is always necessary to always create a linear model first for the ROC curve? For example I am comparing the ROC curves of age and co-morbidities against non-cancer mortality, do I have to create a linear regression for age using glm()?

  • @brianhung24241111
    @brianhung242411115 жыл бұрын

    i am a big fan of you! can you make a survival anaylsis video?

  • @statquest

    @statquest

    5 жыл бұрын

    Yes! I will make one this spring. Many people have asked for this topic, so it is at the top of my to-do list.

  • @NitsT01
    @NitsT015 жыл бұрын

    You gotta stop saying BAM!!! it's really funny :D

  • @chelseyzhao2178
    @chelseyzhao21783 жыл бұрын

    Loved the video! How do you relate the threshold back to the data? I.e. make a statement like the threshold between obese and not obese is 140lb

  • @statquest

    @statquest

    3 жыл бұрын

    First, you find the threshold you are interested in (these are in roc.df), then we look at weight associated with the largest glm.fit$fitted.values < the threshold. For example, if the threshold is 0.5, then the weight is: max(weight[glm.fit$fitted.values < 0.5])

  • @vishnudut7079
    @vishnudut70792 жыл бұрын

    Hey josh great video. I'm having a small doubt. Is there any way to plot ROC graph for multiclass ? I ran a multinomial logistic regression model on my dry bean dataset which has 7 classes. Is there a way to plot ROC graph for this ?

  • @statquest

    @statquest

    2 жыл бұрын

    I don't know how to do that.

  • @farawayscity
    @farawayscity5 жыл бұрын

    Great vedio! Very helpful. BTW, there is a discrepancy between this clip and the code shared in your website about the obj roc.df (line 78). Nothing has been assigned to the obj yet so when we run the line 78 gives an error msg. Overall, very clear and handy. Thank you!

  • @statquest

    @statquest

    5 жыл бұрын

    Thanks for catching that! The problem had to do with how wordpress interprets the the ">" and "

  • @farawayscity

    @farawayscity

    5 жыл бұрын

    @@statquest I see. Good to know! Thank you~ :>

  • @1292kira
    @1292kira4 жыл бұрын

    Thank you!

  • @statquest

    @statquest

    4 жыл бұрын

    :)

  • @pavkalinowski5145
    @pavkalinowski51454 жыл бұрын

    Is there a way to increase the font size of the text and numbers? Great job btw

  • @statquest

    @statquest

    4 жыл бұрын

    Yes. See: stackoverflow.com/questions/4241798/how-to-increase-font-size-in-a-plot-in-r

  • @jayjayf9699
    @jayjayf96993 жыл бұрын

    What does par(pty=‘M’) do? You said it’s maximum but does it change the shape of the plot ?

  • @statquest

    @statquest

    3 жыл бұрын

    Yes. It uses up all available space to draw the plot, regardless of the shape of that space (so if that space is rectangular, your plot will be rectangular). In contrast, setting pty='s' forces the plot to be square.

  • @rahulg1504
    @rahulg15043 жыл бұрын

    Many thanks Josh, you are doing a great job. In my study, I would like to calculate and plot pROCs for a couple of maxent scenarios and glm model scenarios using 1000 iterations and a 5% omission error using pROC package in R, would be really grateful if you can guide me a bit. Thanks in advance.

  • @statquest

    @statquest

    3 жыл бұрын

    Let me know how it goes! :)

  • @rahulg1504

    @rahulg1504

    3 жыл бұрын

    @@statquest May I get the R code for the scenario I mentioned? I am still trying to figure out how to prepare data from the maxent output and then use it with pROC package to calculate and plot AUCs. I am relatively a newbie in R. Theory wise I think I am pretty clear, but struggling with codes and commands to get this job done with pROC package.

  • @statquest

    @statquest

    3 жыл бұрын

    @@rahulg1504 The code for this video is here: github.com/StatQuest/roc_and_auc_demo/blob/master/roc_and_auc_demo.R

  • @dorothymartin2477
    @dorothymartin24772 жыл бұрын

    Hi Sir, your videos are very helpful. Hope that you can make a video on mean decrease Gini of Random Forest

  • @statquest

    @statquest

    2 жыл бұрын

    I'll keep that in mind.

  • @dorothymartin2477

    @dorothymartin2477

    2 жыл бұрын

    thank you very much !!!! 😁😁

  • @gustavoenrique2019
    @gustavoenrique20193 жыл бұрын

    Hello! Any ideas on how to plot the Precision-Recall Curve?

  • @statquest

    @statquest

    3 жыл бұрын

    I'll keep that topic in mind.

  • @davidstivenarboledaprado8731
    @davidstivenarboledaprado87312 ай бұрын

    Hello for the video, really useful, in this example you come up with a method to classify obese and not obese , what about when you don't know a threshold for the initial classification of obese or not obese ? Does the pROC function test different thresholds ?

  • @statquest

    @statquest

    2 ай бұрын

    That's the whole idea of an ROC graph to being with - it's used to determine the optimal threshold.

  • @PunmasterSTP
    @PunmasterSTP4 ай бұрын

    Ah, the pirate's favorite programming language!

  • @statquest

    @statquest

    4 ай бұрын

    :)

  • @jakubkahoun8383
    @jakubkahoun83835 жыл бұрын

    Tried to get this runing from predtictions from sparklyr...then after 4 hour of tourment i realize that predicted dataframe in not on my computer....tunel vision can be bitch sometimes.

  • @amulyagupta9161
    @amulyagupta91619 ай бұрын

    Hey! Wonderful video. I had just one doubt- I used a similar code that you used in my Rstudio. And as the runif function is generating random numbers, I could have very well expected that the values in the obese variable is different from the ones generated in your machine. However, eerily enough, it came out to be exactly the same. What sort of sorcery is this? 😮

  • @statquest

    @statquest

    9 ай бұрын

    Did you set the seed of the random number generator? If so, we'll get the same random numbers every time.

  • @SS-cp1cm
    @SS-cp1cm3 жыл бұрын

    thank you soooo much!!!

  • @statquest

    @statquest

    3 жыл бұрын

    :)

  • @kartikrayaprolu9076
    @kartikrayaprolu90764 жыл бұрын

    I have multiple logistic regression models and want to plot the ROCs of all those logistic regression models in one plot. How can I do that?

  • @statquest

    @statquest

    4 жыл бұрын

    I talk about how to add multiple curves to the same graph at 13:19

  • @Sara-su1bi
    @Sara-su1bi2 жыл бұрын

    How d you calculate the p-value of the AUC (obtained from logistic regression model)?

  • @statquest

    @statquest

    2 жыл бұрын

    See: stats.stackexchange.com/questions/386468/does-auc-roc-curve-return-a-p-value

  • @nxtou90
    @nxtou903 жыл бұрын

    Is is possible to compute the significance level of AUC using the pROC package? Sth similar to the SPSS output

  • @statquest

    @statquest

    3 жыл бұрын

    As far as I know, you can do confidence intervals. For more details, see: www.rdocumentation.org/packages/pROC/versions/1.16.2

  • @baiyuncao3353
    @baiyuncao33534 жыл бұрын

    Excuse me, but is this the same thing as "Lift Chart"?

  • @statquest

    @statquest

    4 жыл бұрын

    They are similar, but not the same.

  • @emkahuda776
    @emkahuda7762 жыл бұрын

    Thank you for another great video. I have a question, what if we have multiple problems for classifications? Not only two classifications (obese and not obese). For example, we want to classify 10 cell types (let's say cell type 1, cell type 2, ..., cell type 10) whether these cell types are present or not in the tissue sample? How can we use this roc() function to plot the ROC curve?

  • @statquest

    @statquest

    2 жыл бұрын

    To be honest, I don't know the answer to that off the top of my head.

  • @emkahuda776

    @emkahuda776

    2 жыл бұрын

    @@statquest I have made my own function to plot the ROC curve with similar condition I mentioned. However, I need to make another function to calculate the AUC and was hoping I could use the roc() function which seems providing more information and can include much more information, such as AUC and partial AUC as well. 😰

  • @mcan543
    @mcan5434 жыл бұрын

    So even if it has a lower AUC, it seems that random forest is a better choice. Right?

  • @statquest

    @statquest

    4 жыл бұрын

    It always depends on how important it is to avoid false positives or false negatives. Once you define those, you can figure out which curve makes more sense for your application.

  • @julieyananzhu1134
    @julieyananzhu11343 жыл бұрын

    Hi, Josh! A big fun of yours! Thanks for so many wonderful videos! Wonder if I can ask for help. I am using glmer function in R to fit a mixed effect logistic regression to my longitudinal data. However, I am having trouble extracting fitted value for my model to draw a ROC, like what you did with glm.fit$fitted.value. I have been searching about it but in vain. Appreciate it if you can give me a clue! Thanks very much!

  • @statquest

    @statquest

    3 жыл бұрын

    This might help: stats.idre.ucla.edu/r/dae/mixed-effects-logistic-regression/

  • @julieyananzhu1134

    @julieyananzhu1134

    3 жыл бұрын

    @@statquest Thanks for your kind reply! The web page didn't solve my problem directly, but it's very informative! Thanks!

  • @jiayoongchong2606
    @jiayoongchong26063 жыл бұрын

    I installed and loaded pROC, it says couldn't find function roc... Which editor u used? I used rstudio, help pleaseeee?

  • @statquest

    @statquest

    3 жыл бұрын

    I used RStudio as well. Sorry you'r having trouble.

  • @AravindHan008
    @AravindHan0084 жыл бұрын

    i am following python for data science so far and got stuck after saw this video , best person like you using R language instead of python so what should i do and which one is best for data science and also in future purpose R program or python kindly let me know and enlighten me thanks in advance ..! little BAM

  • @statquest

    @statquest

    4 жыл бұрын

    They are both very useful. Python is a great language used in a lot of different situations and has a lot of good machine learning libraries. In contrast, R is very useful for doing statistics.... So I would recommend learning both if you have time.

  • @pierfrancescovisaggi7984
    @pierfrancescovisaggi79844 жыл бұрын

    Hello, thank you for your videos. If the thresholds refer to the "weight threshold", why are they expressed as 0.nr (zero point number). Can't understand the meaning. Can you hel with this please?

  • @statquest

    @statquest

    4 жыл бұрын

    Each weight corresponds to a probability of classifying someone as "obese", and traditionally the thresholds are in terms of these probabilities. Since probabilities go from 0 to 1, these thresholds go from 0 to 1. However, we could transform the thresholds to be in terms of weight if we wanted to.

  • @pierfrancescovisaggi7984

    @pierfrancescovisaggi7984

    4 жыл бұрын

    @@statquest Thank you very much for the answer, it's very kind of yours. Could I ask how to transform such probabilities in terms of weight? I only managed to do it without a linear model, just applying roc( myresponse, mypredictor). How could I transform all those thresholds into numbers corresponding to my predictor? Thank you

  • @statquest

    @statquest

    4 жыл бұрын

    @@pierfrancescovisaggi7984 Each value for weight has a corresponding probability stored in glm.fit$fitted values. See 4:40 . You can use the index for any weight to access the associated probability.

  • @anoriginalnick
    @anoriginalnick2 ай бұрын

    Excellent videio

  • @statquest

    @statquest

    2 ай бұрын

    Thanks!

  • @mathiasschmidt93
    @mathiasschmidt93 Жыл бұрын

    Great video! I was wondering if it is possible to plot this graph in a Multinomial Logistic Regression?

  • @statquest

    @statquest

    Жыл бұрын

    Hmmm...I'm not sure.

  • @mathiasschmidt93

    @mathiasschmidt93

    Жыл бұрын

    @@statquest Ah okay, what about a multiple logistic regression? Any ideas about that one?

  • @statquest

    @statquest

    Жыл бұрын

    @@mathiasschmidt93 As long as your predicted value is binary, it shouldn't matter how many variables you use to make predictions - the process is the exact same as illustrated in this video. To see how it is done in R, see: kzread.info/dash/bejne/o5eqo9N6eJmWido.html

  • @sanketmaiti5086
    @sanketmaiti50865 жыл бұрын

    what is threshold?

  • @hamzaouazzanitouhami8170
    @hamzaouazzanitouhami81703 жыл бұрын

    hello i want to know how can i plot a roc curve from glm() function with excel data

  • @statquest

    @statquest

    3 жыл бұрын

    Maybe someone else with experience in this can help out.

  • @animeshkansal7746
    @animeshkansal77465 жыл бұрын

    AUC for Logistic regression is more than AUC for RF, but if you consider only corner most points for both, RF does better, so who is the winner in this case ?

  • @statquest

    @statquest

    5 жыл бұрын

    Which corner are you looking at? I don't see RF doing better in either one. Or are you looking at the very edges?

  • @animeshkansal7746

    @animeshkansal7746

    5 жыл бұрын

    StatQuest with Josh Starmer At the north west corner Rf at a point has better tpp and fpp So isn’t rf better than logistic regression?

  • @statquest

    @statquest

    5 жыл бұрын

    @@animeshkansal7746 North east? You are right. RF is a little better up there. This is a good example of when a Partial AUC might be more informative.

  • @animeshkansal7746

    @animeshkansal7746

    5 жыл бұрын

    Thank you so much, your videos are really great

  • @statquest

    @statquest

    5 жыл бұрын

    @@animeshkansal7746 Thanks!

  • @Leo-wd8vq
    @Leo-wd8vq5 жыл бұрын

    thank you for your video. btw, can you make one for python?

  • @statquest

    @statquest

    5 жыл бұрын

    I'll work on it. I'm doing a lot more Python coding these days, so it makes sense.

  • @ccuny1

    @ccuny1

    4 жыл бұрын

    @@statquest A year later, I suddenly wake up to StatQuest. Python implementation please. Perhaps SciKit Learn also has built-in computations for these and other metrics. I'll check...

  • @hannahhillman3593
    @hannahhillman3593 Жыл бұрын

    Is it expected that the number of sensitivity/specificity values determined by the roc function (that we stored in the data frame) may not match the number of predictor/response values that I input? For example, my input predictor/response vectors contained 46 objects, but the roc function returned only 12 sensitivity/specificity values.

  • @statquest

    @statquest

    Жыл бұрын

    I believe this is possible if there are fewer thresholds that make a difference. In other words, some thresholds might result in the same number of false positives, true positives etc. and in that case, those "duplicate" thresholds will be omitted.

  • @hannahhillman3593

    @hannahhillman3593

    Жыл бұрын

    @@statquest Okay great this is exactly what I thought was happening--just wasn't sure if that was a possible outcome. Thanks so much for your reply and for all the great videos!!!

  • @jarednesvet2826
    @jarednesvet28264 жыл бұрын

    So do these thresholds correlate to the probabilities that are used to separate the obese vs. not obese? Is there a way to figure out how to convert the thresholds back to the actual weights themselves that are used as the cutoff?

  • @statquest

    @statquest

    4 жыл бұрын

    The thresholds, with the exception of -infinity and +infinity, are the exact same as the probabilities. -infinity corresponds to a probability of 0 and +infinity corresponds to a probability of 1. Thus, you can compare thresholds to the original glm.fit$fitted.values and match those to the original array of "weight" values.

  • @jarednesvet2826

    @jarednesvet2826

    4 жыл бұрын

    @@statquest Great thanks for the help!

  • @redgreenskittles

    @redgreenskittles

    4 жыл бұрын

    @@statquest Many thanks for a great video. Could you kindly explain how exactly we can do this? I am looking to convert these threshold to actual cut-off values

  • @statquest

    @statquest

    4 жыл бұрын

    @@redgreenskittles First, I would look at the ROC curve to find my threshold. For the example, we might pick a False Positive Percentage of 20 to be the threshold. Then I would look in roc.info to find the threshold associated with that false positive percentage. We can do that by just printing roc.info to the screen and looking at it, or with the command... roc.df[min(which(roc.df$fpp

  • @redgreenskittles

    @redgreenskittles

    4 жыл бұрын

    @@statquest Wow that was a super quick response. Works like a treat! thank you