Validating K-means cluster anslysis in SPSS

In this video I show and explain how to determine the appropriate and valid number of factors to extract in a k-means cluster analysis.

Пікірлер: 30

  • @009kishor
    @009kishor6 жыл бұрын

    Very helpful video 👍🏻

  • @zhalehmohammadalipour3542
    @zhalehmohammadalipour35422 жыл бұрын

    Very great tutorial! it helped a lot. Thanks.

  • @kanika8123
    @kanika81233 жыл бұрын

    Thanks a lot. Very helpful video.

  • @thanghoang1944
    @thanghoang19443 жыл бұрын

    THANK YOU!

  • @nataliegillepiegaskins
    @nataliegillepiegaskins2 жыл бұрын

    Thank you for this! Nice last name!

  • @Ana-zi4mk
    @Ana-zi4mk8 жыл бұрын

    Hi, James. Thank you for this video. I also watched your other video regarding K-means cluster analysis in SPSS where you have mentioned: „If we can't converge in 10 iterations than we probably don’t have good data for clustering”. I am trying to learn how to do the cluster analysis and I am using some of my data. I have followed your suggestions on how to determine the number of clusters and how to validate them. In my case, I did k-means cluster analysis where I have specified 2, 3, 4 and 5 clusters. In the case of 3 cluster solution, post hoc tests were significantly different in the table presenting Multiple comparisons, but a number of iterations where 0.000 was achieved for all three clusters was 14. On the other hand, in the case of 4 cluster solutions, a number of iterations where 0.000 was achieved for all three clusters was 10, but in the table presenting Multiple comparisons two clusters were not significantly different on few variable. What is your opinion, is my data not suitable for cluster analysis?

  • @Gaskination

    @Gaskination

    8 жыл бұрын

    +Ana It might be suitable. The more variables you include, the harder it is to converge. So, if there are lots of variables, then more than 10 iterations is fine. I don't know if there is a published threshold or guideline.

  • @eboamuah6811

    @eboamuah6811

    2 жыл бұрын

    @@Gaskination Hi James. Your work has been very helpful. I have read about silhouette as a method of validation in K mean cluster analysis. However, I don't know how to obtain that in SPSS. Is there any index in SPSS that can be used to validate the number of clusters chosen in K mean cluster analysis? Thank you

  • @Gaskination

    @Gaskination

    2 жыл бұрын

    @@eboamuah6811 silhouette is used in two-step cluster analysis in SPSS, but I don't know of a way to produce it for K-means.

  • @najeebullahahmadzai5160
    @najeebullahahmadzai516022 күн бұрын

    Thank you sir!

  • @jdemontre
    @jdemontre3 жыл бұрын

    Hey James, I enjoy your videos specially about SEM and now cluster analysis. Thank you! I ran my data and everything went well (10 variables and ca.100 observations). The 3-cluster solution was the best in all criteria. But the Bonferroni test resulted not significant in 2 (out of 60) comparisons (p-vaue slightly higher than 0.1), does it mean the solution was not validated?

  • @Gaskination

    @Gaskination

    3 жыл бұрын

    If it is just 2 out of 60 comparisons, then this is strong evidence that it is a good clustering solution. Nice!

  • @mayurgo10
    @mayurgo107 жыл бұрын

    my data contains 900 observations and i tried k means method, the data converges at 15 iterations for 4 cluster solution and 16 iterations for 10 cluster solution. can you suggest some good test to check which cluster solution would be better?

  • @Gaskination

    @Gaskination

    7 жыл бұрын

    Check the AIC or BIC if that is an option. You want to minimize these. Also, check to see which solution is more helpful. Usually 3-5 clusters is most useful and anything more than 5 begins to be difficult to interpret or distinguish.

  • @marcelbeermann1036
    @marcelbeermann10364 жыл бұрын

    Thanks for the video. How can I see if a cluster actually is underrepresented?

  • @Gaskination

    @Gaskination

    4 жыл бұрын

    It's just a subjective judgment. If the sample size of the cluster is small, then perhaps it is under-represented. You can see what the profile of members of that cluster looks like to determine if it is a legitimate cluster, or just an odd outlier.

  • @kieramillar-brandt2854
    @kieramillar-brandt28543 жыл бұрын

    Hi James, thanks for this video. Is there a paper that can be referenced to support that a lower number of iterations is better? Or maybe a paper that indicates best practice in general for reporting the results of k-means clustering? Many thanks. Kiera

  • @Gaskination

    @Gaskination

    3 жыл бұрын

    Chapter nine of Hair et al 2010 ("Multivariate Data Analysis") is all about clustering methods.

  • @kieramillar-brandt2854

    @kieramillar-brandt2854

    3 жыл бұрын

    @@Gaskination thanks very much. That's really appreciated. Your videos are great!

  • @henrypritchard4911
    @henrypritchard49114 жыл бұрын

    Hi James, This has been very helpful, so firstly thank you! I was wondering if there was a way to validate/find a statistical difference between two clusters as a post hoc one way ANOVAs cannot be performed on fewer than 3 groups/clusters of data? Kind Regards, Henry

  • @Gaskination

    @Gaskination

    4 жыл бұрын

    You can just use a t-test instead.

  • @henrypritchard4911

    @henrypritchard4911

    4 жыл бұрын

    @@Gaskination Thank you!

  • @henrypritchard4911

    @henrypritchard4911

    4 жыл бұрын

    @@Gaskination Hi James, I am sorry to be a pain with another question. I was also wondering why in these instances there is no need to test for normality of distribution before performing the ANOVA with post hoc tests? Thank you in advance and Kind regards, Henry

  • @Gaskination

    @Gaskination

    4 жыл бұрын

    @@henrypritchard4911 Normality of distribution is not required for cluster membership. We really just need sufficient sample size in each group.

  • @shantanuchakrabory5527
    @shantanuchakrabory55273 жыл бұрын

    K-mean cluster analysis using spss in really special one

  • @masharifulamin5682
    @masharifulamin56824 жыл бұрын

    Hello James, im new here, is it possible to get the dataset to practice? plz share it with us.

  • @Gaskination

    @Gaskination

    4 жыл бұрын

    The dataset is available on the homepage of statwiki: statwiki.kolobkreations.com/

  • @statsmadeeasy7233
    @statsmadeeasy7233 Жыл бұрын

    Hi James can we get a copy of the file that you used? I wanted to practice it.

  • @Gaskination

    @Gaskination

    Жыл бұрын

    It's the burgers dataset available on the homepage of statwiki.gaskination.com/

  • @karlafuentes2726
    @karlafuentes27263 жыл бұрын

    In spanish plis