Is this a controversial opinion about statistics?

Here's some learning objectives for this video:
Hypotheses are tied to specific parameters
Two ways of doing research (hypothesis-driven versus data mining)
Why is it a red flag when someone uses regression with more than 3 predictors
Here's a link to my textbook: quantpsych.net/stats_modeling/

Пікірлер: 46

  • @scortez221
    @scortez2212 жыл бұрын

    Really enjoyed this, Dustin. More vlogs with random thoughts please! This being said, I've also noticed that researchers (espeicially in the epi, med, psych fields in my experience) use multiple regressions to try to find good predictors instead of defining an hypothesis and then using models to verify it. They often come to me with 20 variables+ and cannot clearly state their hypothesis. I call it fishing.

  • @QuantPsych

    @QuantPsych

    2 жыл бұрын

    Exactly!

  • @m.rijalussholihin1102
    @m.rijalussholihin11022 жыл бұрын

    I love this kind of videos. Learning things from an actual problem and questioning our ideas on our approach on solving things. I think interpretation is key when a research aim to explain a phenomena with statistical models. Sometimes it's hard to get a good conclusion from a model with too many predictors. But I also understand sometimes it's hard to make a hypothesis and choosing the predictors in the first place. Maybe it's worth to make a separated video/playlist on your approach to make a good hypothesis and choosing a subset of good predictors when you have a bunch of variables to begin with. That would be great. Thanks for the awesome videos.

  • @glennschexnayder3720

    @glennschexnayder3720

    2 жыл бұрын

    I second this suggestion, because I’m horribly confused.

  • @kimwaters2053
    @kimwaters2053 Жыл бұрын

    love love love...writing phonetic/phonolgical ling dissertation on dialect bias... black vs white, southern vs non....you're saving my bacon. Have watched GLM 1-3...old school girl...19 v full pages of notes & how to use w my data... nothing better than pen (fine nib fountain with Waterman violet ink) on paper to get something in head...will do 4-6 tomorrow....eyes crossing. BTW, agree with SCG et al. > more vlogs good. This was a groovy groove vibe to finish the day...Thank you...peace

  • @planetary-rendez-vous
    @planetary-rendez-vous Жыл бұрын

    This channel is a gold mine. I feel like I'm finally learning some real statistics instead of being a goblin looking at pvalue only.

  • @dr.donnadietz4363
    @dr.donnadietz43632 жыл бұрын

    I'm agreeing with you on this one. Professor Dietz here... From American U. But my degree is in mathematics even though I moonlight in data science and statistics...

  • @luisa1551
    @luisa15512 жыл бұрын

    Thanks for the explanation from the different points of views and their relation with statistics. As a biologist with TERRIBLE stats education these types of information are eye opening.

  • @nabeelsiddiqui3377
    @nabeelsiddiqui33772 жыл бұрын

    Enjoyed this. Please do more. Also maybe you could go through some videos where you look at data from a paper and walk through the papers statistical claims. It’s hard to understand how this all works in practice for those without domain specific knowledge.

  • @QuantPsych

    @QuantPsych

    2 жыл бұрын

    Great suggestion!

  • @dr.donnadietz4363
    @dr.donnadietz43632 жыл бұрын

    I just showed this to Michael Robinson, also at AU doing math and stats and data science.. He also agreed with you.

  • @taotaotan5671
    @taotaotan56712 жыл бұрын

    I feel how many/which variables to include in a multiple regression problem really depends on domain knowledge, aka we should try to draw DAGs to help us visualize and make decisions. On the statistical point of view, adding more variables, in general, will produce less biased estimators, but the downside is that we will increase estimator variance (hard to reach p value cutoff). It’s really hard to say what is the good practice.

  • @jekamito
    @jekamito Жыл бұрын

    Please carry on with this!! It is fantastic to envision how you think. Really. Thank you!

  • @TheBjjninja
    @TheBjjninja2 жыл бұрын

    Hypothesis driven vs. Exploration driven

  • @nachete34
    @nachete342 жыл бұрын

    Always nice to hear these reflections. I share your view btw. I feel the same when I see scientists removing/adding fixed/random effects like crazy just to hit the lowest AIC...If the aim is to predict with the most parsimonious model, then I see the point. But for testing pre-established hypotheses, it does not make any sense. D'you know what I mean?

  • @MaxMagnificence
    @MaxMagnificence2 жыл бұрын

    i absolutely agree, that this is a "data-mining" approach. And it is not well suited for confirmatory testing of certain hypotheses. However, I really enjoy data driven approaches and if something from the data "jumps into your face" I think if you discuss it well as a post-hoc finding (and not an a priori hypothesis) that needs to confirmed and tested in future studies this appears to be a different approach. In convergence to your recommendations I would then recommend some different methods than GLM. EDIT: I know of one case where it might be appropriate: Studying Cortisol for example is quite difficult because it is influenced by a lot of demographic, chrono-biological and "environment"-related (e.g., smoking, taking contraceptives) variables. These models sometimes appear to be rather complex, but again i would not recommend a multiple regression in this regard.

  • @tzvetanzlatanov6349
    @tzvetanzlatanov63492 жыл бұрын

    Very usefu, keep on with the random thoughts :)!

  • @bernadettblummer108
    @bernadettblummer108 Жыл бұрын

    As a 3rd-year student, I think these videos and thoughts sometimes clarify things even more than a lecture would. Because lectures are tied to the curriculum and a narrow topic area. However, when you have the time to ponder and explain in a broader sense, you let student in on the bigger picture - parts of which they likely have not been exposed to yet, so they could not possibly see for themselves.

  • @MKhan-zo8xo
    @MKhan-zo8xo2 жыл бұрын

    This is cool, I enjoy this type of video!

  • @fisnikzogaj1442
    @fisnikzogaj14422 жыл бұрын

    If you really want to check for confounding influences, throwing in some extravariables (more then just the ones for the Hypothesis) might be ok. Of course there shouldnt be to many of them (blowing the R squared etc.) but we have metrics for that (AIC,BIC for example) Especially in the social Sciences this might be a thing but yeah ... i 100% agree with you on the two paradigms, going from a Hypothesis to the Model is preferrable in my eyes, rather then the other way around

  • @QuantPsych

    @QuantPsych

    2 жыл бұрын

    Yep. But also see my reply to @AlpsRootnote.

  • @hypercortical7772
    @hypercortical77722 жыл бұрын

    In research, I see specific hypotheses examined with regressions using several predictors for the sake of attempting to control for those variables.

  • @galenseilis5971
    @galenseilis59712 жыл бұрын

    The video doesn't define what a "hypothesis" is. My impression from the examples given is that a "hypothesis" refers to statements made about which variables affect other variables and might include forms of conditional independence such as interaction terms. These statements about what afffects what can be formalized into structural causal models, but they can also be used as guiding background information in a machine learning context for what variables to include in a model.

  • @Saynotoclipontiescch
    @Saynotoclipontiescch Жыл бұрын

    When anyone asks me for stats help my first question is ALWAYS "So what is your hypothesis?"

  • @danilomoggia
    @danilomoggia2 жыл бұрын

    What's the problem with that? As you said, it's a data-mining perspective or a bottom-up approach, from the data to the theory or model. It's not necessary to have hypotheses if your study is exploratory and you are working with archival datasets, for example. Or if you are only interested in prediction (as they do in finances or economy. E.g., A bank is interested in knowing if a client will be able to pay the loan in the long term. They don't care about the explanation or the hypotheses behind the predictors). Additionally, you can combine methods. For instance, use random forest to select the most important variables and introduce them into a regression analysis. You can configure the regression model's method to retain or discharge variables.

  • @beaumartin8132
    @beaumartin81322 жыл бұрын

    In either case, you should probably be using some variant of a Bayesian Network to ensure that all the relevant interaction effects are detected and conditioned upon. Under specification is a real problem and it's entirely avoidable.

  • @bikinibottom2100
    @bikinibottom2100 Жыл бұрын

    Don't just throw in regressors hoping for the R2 to increase. As someone said, perfection is not met when you can't add anything, perfection is met when you can't remove anything

  • @QuantPsych

    @QuantPsych

    Жыл бұрын

    I like that!

  • @shawnabeese5146
    @shawnabeese51462 жыл бұрын

    Loved off the cuff!

  • @pedropequeno7353
    @pedropequeno735311 ай бұрын

    Amazing

  • @writtenlike
    @writtenlike2 жыл бұрын

    I tend to specify hypotheses (involving 1 or 2 predictors, 1 outcome variable) but I then I ‘need’ to add several control variables (based on previous literature) leaving me with 10-15 predictors. I get peer-reviewed papers published like that so it seems that’s an “OK” practice but I’d love to hear your thoughts on adding lots (10+) control variables.

  • @QuantPsych

    @QuantPsych

    2 жыл бұрын

    I see a few problems with that: 1. You're more likely to violate the assumption of homogeneity of regression. Regression models assume there are no unmodeled interactions. It's very possible that your predictor(s) interact with at least one of those control variables. And, the more you add, the more likely it is that you will have an unmodeled interaction (and thus a bad model). 2. You might accidentally condition on a collider. See this page for more information: www.the100.ci/2017/03/14/that-one-weird-third-variable-problem-nobody-ever-mentions-conditioning-on-a-collider/ 3. You might actually "control" for the very thing you're trying to model. I wish I had time to expand on this (and I wish I could find the article that once taught me about this), but the idea is that sometimes conditioning on things removes the very effect which you're trying to model. Maybe I'll make a video about this in the future.

  • @menacetocommunity

    @menacetocommunity

    2 жыл бұрын

    @@QuantPsych please do a video on point 3. Thanks for a great video!

  • @StatisticsSupreme

    @StatisticsSupreme

    2 жыл бұрын

    A video about why regression is bad for EDA would be cool. You made some points on this in one of your forest vids. But I guess there is more infering from your 3 points above

  • @galenseilis5971

    @galenseilis5971

    2 жыл бұрын

    You should be careful what you control for. See: ftp.cs.ucla.edu/pub/stat_ser/r493.pdf

  • @glennschexnayder3720

    @glennschexnayder3720

    2 жыл бұрын

    @@StatisticsSupreme I didn’t even know he had forest vids. I guess that might help me out.

  • @TheBjjninja
    @TheBjjninja2 жыл бұрын

    Problem with random forest is you can't get your coefficients and standard errors, right? Has someone resolved that with Shap or Lime? With mixed models you can use the coefficients to optimize marketing policy.

  • @vinithams6003
    @vinithams60032 жыл бұрын

    Hello Sir, Your videos are amazing and thank you so much for such videos. I am struggling with issue in glmm. Actually I am from biology field. Can I send a mail about my doubts? Thanks

  • @erickcampos50
    @erickcampos502 жыл бұрын

    Is there a way to implement random forests on Jamovi?

  • @QuantPsych

    @QuantPsych

    2 жыл бұрын

    I'm not sure. But you can in JASP!

  • @anasbitar1270
    @anasbitar12703 ай бұрын

    What if I want to make an exploratory analysis to test the independent effect of each variable while controlling the other variables. I mean in this case I do not care about the whole model ability to predict the outcome, but I want the effect of each independent variable. Wouldn't using multiple regression make sense this way?

  • @QuantPsych

    @QuantPsych

    3 ай бұрын

    You could, but you risk overfitting. Ensemble methods are better for this (e.g., random forest models)

  • @anasbitar1270

    @anasbitar1270

    3 ай бұрын

    @@QuantPsych Thanks my guy.

  • @OskarBienko
    @OskarBienko Жыл бұрын

    I don't fully agree. I've seen a lot of papers (health & econometrics) with a buquet of explanatory variables. And the reason for that is actually researchers wanted to control for a number of confounders in theie regeessions so it's okay I guess.

  • @QuantPsych

    @QuantPsych

    4 ай бұрын

    There may be occasions where it makes sense, it just makes it a lot harder to make sure your model isn't violating the homogeneity of regression assumption.

  • @galenseilis5971
    @galenseilis5971 Жыл бұрын

    I think if a student asked me "here, what d'ya think, how'd I do?" I would be first tempted to ask: What were you trying to do? Often there is an XY-problem lurking nearby. en.wikipedia.org/wiki/XY_problem