Are nonparametric statistics useless?

This is the video that started this incident: • Robustness in Statistics
Here's a link to the cross-validated question: stats.stackexchange.com/quest...
My favorite alternatives to nonparametric statistics are generalized linear models: • Understanding Generali...
I mentioned that I believe nature behaves lawfully. As I said, other statisticians feel otherwise. This book is an excellent introduction to the different philosophies of science: www.amazon.com/What-This-Thin...
It's been a bit since I've read it, but I believe the position I subscribe to is called "realist," but there are "anti-realists," who really are unconcerned with whether a model is "true." Rather they're more concerned with whether it is *useful*. I suppose my opinion wouldn't change if I were an anti-realist: maybe nature doesn't behave lawfully, but it seems that using models has been most useful.
Link about EDA versus CDA: • Ethics in Statistics P...
My Multivariate playlist: • Multivariate Statistics
And here's a paper I wrote about my eight step approach to data analysis: psyarxiv.com/r8g7c/
Undergraduate curriculum playlist (GLM-based approach): kzread.info?list...
Graduate curriculum playlist (also GLM-based approach): kzread.info?list...
Exonerating EDA paper: psyarxiv.com/5vfq6/
Download JASP (and visual modeling module): www.jasp-stat.org

Пікірлер: 37

  • @TheBjjninja
    @TheBjjninja4 ай бұрын

    Hey! I always loved the background music

  • @TheBjjninja
    @TheBjjninja4 ай бұрын

    Dr. Fife 'won't stop, can't stop'

  • @vazquez-borsetti
    @vazquez-borsetti Жыл бұрын

    congrats for your paper!!!

  • @StatisticsSupreme
    @StatisticsSupreme Жыл бұрын

    But what if your variable is ordnial? Are ranks not the best way to model ordinal data?

  • @QuantPsych

    @QuantPsych

    Жыл бұрын

    Good point! I hadn't thought of that.

  • @toad8427
    @toad8427 Жыл бұрын

    The loud music bit, the “oh snap” 😂😂

  • @dimitrioskioroglou4316
    @dimitrioskioroglou4316Ай бұрын

    I totally agree with you... ranks are not that useful. The way I think it is that ranks result from an underlying latent process. We need to understand and properly model the process, not the ranks which represent a snapshot. It is not the easier thing to do. But better trying tricky stuff than chasing ghosts.

  • @galenseilis5971
    @galenseilis5971 Жыл бұрын

    Thanks for elaborating on your perspective, Dustin. I'll be happy to respond in time. Hopefully not an entire year later though! ;-) I'll post something back here to ping you when I have posted a response.

  • @QuantPsych

    @QuantPsych

    Жыл бұрын

    Deal :)

  • @olgierd245

    @olgierd245

    11 ай бұрын

    I wonder what the response would be

  • @galenseilis5971

    @galenseilis5971

    11 ай бұрын

    @@olgierd245 I am working on a response when time allows.

  • @galenseilis5971

    @galenseilis5971

    9 ай бұрын

    @@QuantPsych Hmm, well it took almost a year... Somehow the time slips away. I've put a response on my blog. I won't post it here because I think KZread will automatically delete it anyway, but it should be easy to find. I'll also try to send the link to your Rowan University email.

  • @galenseilis5971

    @galenseilis5971

    9 ай бұрын

    @@olgierd245 The response can be found on my blog.

  • @seejendo3290
    @seejendo3290 Жыл бұрын

    I officially need a video from you on what you’re referring to when you say “convergence issues” - the internet is not explaining this well for non-math folk. Pretty please?

  • @seejendo3290

    @seejendo3290

    Жыл бұрын

    And maybe just some examples of rank based models and other non-parametric models, how they’re supposed to be used, and how they’re used badly in the wild.

  • @QuantPsych

    @QuantPsych

    Жыл бұрын

    It's on my to-do list :)

  • @AbdullahN8
    @AbdullahN8 Жыл бұрын

    Thanks for the insight.. Can you give practical real-life examples in R when nonparametric are routinely used in biostatistics and when to use robust, loess or random forest in those situations?

  • @QuantPsych

    @QuantPsych

    Жыл бұрын

    I'm not sure that loess/robust/RF are the alternative for the situations I'm talking about. I would probably do like a gamma regression model instead of a mann-whitney. But, I'll think about doing another video.

  • @mahmoudhamza6765
    @mahmoudhamza6765 Жыл бұрын

    Thanks a lot for the videos in general. I am starting to become addicted to ur channel Also, thanks a million for the open discussion. I have a few questions but I will briefly describe what I think first. Please correct me if I am wrong. First, using the normal distribution assumption is very tempting since we have an arsenal of classical tests that are based on it. when the assumption is violated, we have multiple options: 1- use the Central limit theorem to assume the normality of the sampling distribution This has limitations: a- limited by certain statistics such as mean and proportions b- large enough sample size. That's vague. Sometimes a sample as large as 500 observations is not enough in highly skewed data or with extreme outliers 2- nonparametric rank tests [problems: not modeling the data anymore] 3- bootstrapping: I mean here for a test of the difference between two groups, we can create a distribution of mean/median/sd/variance based on each sample and then compare these distributions) 4- Other methods - glm, quantile regression, robust stats My questions are: 1- when is bootstrapping not enough for comparing differences between group(s)? 2- for the other methods in item 4, which one do you use/prefer? a video/resource/thoughts would be highly appreciated 3- This package in R www.danieldsjoberg.com/gtsummary/ is saving me a lot of time. It can very easily formulate summary stats and models as elegant tables in R. However, it is using the nonparametric ranked tests as the default for comparing groups in **table1 summary stats**. Is this acceptable for descriptive statistics - table1 patients' characteristics in statistical analysis? Sorry for the long comment

  • @QuantPsych

    @QuantPsych

    Жыл бұрын

    A couple comments: re: central limit theorem. Yes, technically, models are quite robust to normality violations (because of CLT). But, they're not robust to nonlinearity. Unfortunately, nonlinearity and non-normality go hand in hand. re: bootstrapping. Modern robust methods use bootstrapping to estimate probabilities. But, again, that's not a model. I think it's fine for a quick and dirty estimation method, but it's probably better to find the right model. I use generalized linear models. I believe there's a playlist on my channel for those. re: gtsummary. Looks like a cool package for preparing tables. I'll have to check it out. If I understand your question, I don't think it's a problem to do rank tests for a basic demographics table. That's not really your model. I do gripe about doing tests of these, but not because of the type of model chosen. I'm more concerned about people getting distracted from what the actual paper is about.

  • @mahmoudhamza6765

    @mahmoudhamza6765

    Жыл бұрын

    @@QuantPsych That was insightful. Thanks!

  • @MikkoHaavisto1
    @MikkoHaavisto1 Жыл бұрын

    how had you never heard of ordinal variables? that is stuff for the first statistics course...

  • @pedropequeno7353
    @pedropequeno735311 ай бұрын

    Thanls for putting my thoughts into words, maybe I am not going crazy

  • @user-qy9fc9kb5x
    @user-qy9fc9kb5x4 ай бұрын

    Can I get some insight, I am kind of desparate and it seems that Wilcoxon test is my only way out: I obtained a dataset of a pre-post intervention, with n=10 and no control group. The measurement was conducted using a scale ranging from 0 to 12 to assess the outcome of a physical test before and after the intervention. The objective is to determine if there is a difference due to the intervention. I conducted a Wilcoxon test for paired data, which yielded a significant result - a good start. However, no linear model met the assumptions (not surprising). I attempted a Huber regression, but it didn't yield any changes in the outcome. I also tried modeling the difference between post and pre-intervention scores, and then dichotomizing them into 1 (improved) and 0 (not improved). However, it appears that I lost information as the result turned out to be non-significant. Thus, it seems to me that the only analysis I can perform that adequately accounts for paired data, a small sample size, and doesn't rely on assumptions of normality is the Wilcoxon test.

  • @QuantPsych

    @QuantPsych

    4 ай бұрын

    Have you plotted your data? Don't use statistical tests to determine if you've violated normality. Look at the plot and see if the fitted line passes through the data. If it doesn't, then you can use generalized linear models instead of a wilcoxen.

  • @StatisticsSupreme
    @StatisticsSupreme Жыл бұрын

    To me ranks are models - not transformations. With a tranformation you can go back to the original data, even if you lost the original data, because you have a tranformation formula. With ranks, once you lost the original data, you can not go back. Same with other models.

  • @QuantPsych

    @QuantPsych

    Жыл бұрын

    I'm not sure what you mean. Ranks are models even though you can untransform them?

  • @StatisticsSupreme

    @StatisticsSupreme

    Жыл бұрын

    @@QuantPsych Ranks are models. They are not transformations. Because you can_not untransform them.

  • @pipertripp
    @pipertripp Жыл бұрын

    So are we mostly talking about using models for explanation or maybe inference vs prediction here? I'm guessing that you're more interested in trying to explain a phenomenon mathematically and so the nonparameteric models like RF aren't super useful b/c they don't yield something that explains the phenomenon with a closed form expression like a GLM would? Sorry, I'm really new to statistics and this is over my head right now, but definitely interesting.

  • @QuantPsych

    @QuantPsych

    Жыл бұрын

    I suppose that's a fair assessment. Yes, if you're just doing prediction, maybe parametric models don't matter as much.

  • @dryinpan9860
    @dryinpan9860 Жыл бұрын

    You know what would really show Galen? Some forecasting methods in FLEXPLOT... I'm so sorry, I just want to see it.

  • @QuantPsych

    @QuantPsych

    Жыл бұрын

    Persistent one, aren't you :) You can file a feature request on github. It's been over 15 years since I've done any forecasting, but maybe it won't be too hard to modify flexplot to handle that.

  • @dryinpan9860

    @dryinpan9860

    Жыл бұрын

    @@QuantPsych Oow sorry to post this in the wrong area. It doesn't HAVE to be in flexplot. It would be great to see you do a series on forecasting in R and working with time series data! Thank you again for all your teachings.

  • @ikitoki
    @ikitoki4 ай бұрын

    You make me feel guilty for using rank-based, non-parametric tests in the past. But I did not know any better. This is what they taught me to use when the sample groups are so small that I can't test for the normality of the distribution. They also told me that in general, non-parametric tests are less powerful than parametric ones, so I thought it would be better to use a less powerful test and only report the most significant results. I actually thought I was being conservative in using rank-based, non-parametric tests.

  • @QuantPsych

    @QuantPsych

    4 ай бұрын

    I used to use those a lot too. I don't know that I'd go so far as saying they're bad, they're just not modeling the data. I prefer to model the data.

  • @JakeCo-pf6ty
    @JakeCo-pf6ty Жыл бұрын

    I suppose this would be less about the model and more about the inferential procedure, but nonparametric methods like the bootstrap can be quite useful and in some cases, not a pit stop, but the end goal (or best general test for a certain quantity). Think of mediation models and the indirect effect: the product of regression coefficients is typically not normal (except in very large samples), so the bootstrap serves as a good-great alternative that won't break down when methods like the Sobel do. (That isn't to say there isn't a parametric procedure for this, you could look at the regression coefficients jointly or PRODCLIN developed by MacKinnon for the product of the coefficients), but these methods would break down when assumptions are violated all the same. In other words, are theoretical sampling distributions /always (or ideally) better than empirical sampling distributions that don't try to force a form or shape to a particular problem? I would say no, but I can see your perspective.