Introduction to Cluster Analysis with R - an Example

Provides illustration of doing cluster analysis with R.
R code: github.com/bkrai/Top-10-Machi...
Data file link and more on cluster analysis: • R Programming Live - L...
For citation as reference in a research paper, use:
Meshram, A., and Rai, B. (2019). “User-Independent Detection for Freezing of Gait in Parkinson’s Disease Using Random Forest Classification,” International Journal of Big Data and Analytics in Healthcare, Vol. 4, Issue 1, 57-72.
Rai BK (2017) “Feature Selection and Predictive Modeling of Housing Data Using Random Forest,” International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering, Vol. 11, No. 4, 880-884.
Xiaoling, Lu., Rai, B., Yan, Z., and Li, Y. (2018). “Cluster-based Smartphone Predictive Analytics for Application Usage and Next Location Prediction,” International Journal of Business Intelligence Research, Vol. 9, No. 2, 64-80.
Topics
00:00 Read data file
00:45 Scatter plot
02:30 Data normalization
04:27 Calculate Euclidean distance
05:54 Cluster dendrogram with complete linkage
08:20 Cluster dendrogram with average linkage
08:52 Cluster membership
10:47 Cluster means
12:35 Silhouette plot
13:31 Scree plot
14:47 Non-hierarchical k-means clustering & interpretation
Cluster analysis is an important tool related to analyzing big data or working in data science field.
Machine Learning videos: goo.gl/WHHqWP
Becoming Data Scientist: goo.gl/JWyyQc
Introductory R Videos: goo.gl/NZ55SJ
Deep Learning with TensorFlow: goo.gl/5VtSuC
Image Analysis & Classification: goo.gl/Md3fMi
Text mining: goo.gl/7FJGmd
Data Visualization: goo.gl/Q7Q2A8
Playlist: goo.gl/iwbhnE
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Пікірлер: 1 300

  • @factChecker01
    @factChecker015 жыл бұрын

    This is an excellent tutorial -- well presented and thorough. I followed along with my own application example (country healthcare per capita expenditure versus infant mortality rates of various types) and got very interesting results.

  • @bkrai

    @bkrai

    5 жыл бұрын

    Thanks for comments and feedback!

  • @ArcenisRojas
    @ArcenisRojas8 жыл бұрын

    Great tutorial. I really like how you stuck to explaining the steps through a practical application. Thank you for this.

  • @bkrai

    @bkrai

    3 жыл бұрын

    Thanks for comments!

  • @markshanks9142
    @markshanks91425 жыл бұрын

    This is truly an excellent, clear and concise tutorial. You covered a lot of topics in a short amount of time. I will be watching your other videos. Well done!

  • @bkrai

    @bkrai

    5 жыл бұрын

    Thanks for your comments and feedback!

  • @stephenhobbs948
    @stephenhobbs9487 жыл бұрын

    Excellent explanation and code. I took the Johns Hopkins data science course, and clustering was part of the course. This video really helps explain the concept.

  • @bkrai

    @bkrai

    7 жыл бұрын

    +Stephen Hobbs thanks 👍

  • @DineshKumarT1990
    @DineshKumarT19908 жыл бұрын

    Great tutorial!!...the way you explain is easy to understand...you should do more like this

  • @bkrai

    @bkrai

    8 жыл бұрын

    Thanks for the feedback!

  • @josebueno7602

    @josebueno7602

    5 жыл бұрын

    Please, how can I get the data utilities.csv? Thanks.

  • @karoargote
    @karoargote4 жыл бұрын

    Really thank you so much!!! The best tutorial on this topic!!!

  • @bkrai

    @bkrai

    4 жыл бұрын

    You're very welcome!

  • @rarosification
    @rarosification6 жыл бұрын

    My goodness, this video is so complete, and clearly explained with details of the script... Thank you so very much... 100 points to you...!! You have a new fan...

  • @bkrai

    @bkrai

    6 жыл бұрын

    Thanks :)

  • @sebastiansocianu5441
    @sebastiansocianu54414 жыл бұрын

    5-star explanation. thank you! Very much recommended for beginners and intermediate R users. You got a new follower!

  • @bkrai

    @bkrai

    4 жыл бұрын

    Awesome, thank you!

  • @ramasamythirunavukkarasu6777
    @ramasamythirunavukkarasu67772 жыл бұрын

    Thank you so much Dr.B.Rai, I inspired your way of teaching even you in online, hopefully, every one enjoying your teaching

  • @bkrai

    @bkrai

    2 жыл бұрын

    You are welcome!

  • @janelutken9818
    @janelutken98183 жыл бұрын

    Thank you so much. This was easy to follow and I did my own analysis as we went along with almost no trouble. This was a breakthrough video for me.

  • @bkrai

    @bkrai

    3 жыл бұрын

    You are welcome! For more detailed presentation, you may refer to: kzread.info/dash/bejne/oaieuaWafca8kaQ.html

  • @kanikalungani
    @kanikalungani6 жыл бұрын

    If i had a thousand likes you would have received them all sir. Love the way you have explained and covered the concepts

  • @bkrai

    @bkrai

    6 жыл бұрын

    Thanks, I’ll consider it 1000😊

  • @archeops.
    @archeops.4 жыл бұрын

    Fantastic explanation! I followed along with a different dataset and it worked perfectly! Great work!!

  • @bkrai

    @bkrai

    4 жыл бұрын

    Thanks for comments!

  • @rupeshbharadwaj
    @rupeshbharadwaj5 жыл бұрын

    Great tutorial! You are really helping a lot of people like me, and the best part is- drama, background music etc are completely missing unlike many other tutorials. Also saw some bhojpuri songs :)...thank you sir!

  • @bkrai

    @bkrai

    5 жыл бұрын

    Thanks for comments and feedback!

  • @arnab_jana
    @arnab_jana8 жыл бұрын

    After a long time, I have seen such a good tutorial. Thanks, for your effort

  • @bkrai

    @bkrai

    8 жыл бұрын

    +Arnab Jana Thanks for the feedback!

  • @rosestube1233
    @rosestube12338 жыл бұрын

    Thank you for this tutorial! it's amazingly easy to follow and thanks a lot for the script/file

  • @bkrai

    @bkrai

    8 жыл бұрын

    +Roses Tube 👍

  • @harikamacharla7005
    @harikamacharla70057 жыл бұрын

    Wah!!! how could u explain it so well!! Great job.

  • @bkrai

    @bkrai

    3 жыл бұрын

    Thanks!

  • @tradingtraveller05
    @tradingtraveller057 жыл бұрын

    Thanks for such wonderful explanation. By the way, I was working on a similar dataset, and apply didnt work for me. Although I removed all character vectors, but still the numeric vectors were returning 'NA'. I applied sapply and it solved the purpose. Thanks again!!

  • @bkrai

    @bkrai

    7 жыл бұрын

    Good to hear!

  • @saikrishna2589
    @saikrishna25897 жыл бұрын

    Thank you for wonderful explanation. Appreciate your help with these amazing videos

  • @bkrai

    @bkrai

    7 жыл бұрын

    Thanks for your feedback!

  • @sarahroffe2142
    @sarahroffe21425 жыл бұрын

    This is a brilliant tutorial which is easy to understand and follow.

  • @bkrai

    @bkrai

    5 жыл бұрын

    Thanks for comments!

  • @emiltsenov7853
    @emiltsenov78538 жыл бұрын

    Hi Bharatendra, this is an excellent tutorial - the first one that worked for me. Great effort, keep up the good work!

  • @bkrai

    @bkrai

    8 жыл бұрын

    +Emil Tsenov Good to know, thanks for feedback!

  • @ahmetcandemir7032
    @ahmetcandemir70324 жыл бұрын

    Very good tutorial ! impressively well explained. Thank you

  • @bkrai

    @bkrai

    4 жыл бұрын

    You are welcome!

  • @kapilrana1153
    @kapilrana11533 жыл бұрын

    Great Explanation! Thank you Sir For this Video Lecture I will be watching your other videos.

  • @bkrai

    @bkrai

    3 жыл бұрын

    Thanks and welcome!

  • @metalhealth14
    @metalhealth148 жыл бұрын

    this is a really great detail thank you! I appreciate the detailed guidance into understanding and checking cluster membership

  • @bkrai

    @bkrai

    8 жыл бұрын

    It's good to hear your feedback! Thanks

  • @nafinks6081
    @nafinks60817 жыл бұрын

    Excellent tutorial! very easy to grasp.

  • @bkrai

    @bkrai

    7 жыл бұрын

    +Nafin Ks thanks for the feedback!

  • @mwambakapambwe2382
    @mwambakapambwe23825 жыл бұрын

    Fantastic presentation. Very helpful

  • @bkrai

    @bkrai

    5 жыл бұрын

    Thanks for comments!

  • @Aminah6623
    @Aminah66233 жыл бұрын

    Wow. This was extremely helpful. Thank you.

  • @bkrai

    @bkrai

    3 жыл бұрын

    You're very welcome!

  • @liamhannah6325
    @liamhannah63255 жыл бұрын

    This was really helpful THANK YOU! Make more! I would love it if you showed us how to do Latent Class Analysis in R, its not obvious right now

  • @bkrai

    @bkrai

    5 жыл бұрын

    Thanks for comments and suggestion!

  • @khushboobegwani1612
    @khushboobegwani16125 жыл бұрын

    Thank you so much sir for informative video. You really made it easy.

  • @bkrai

    @bkrai

    5 жыл бұрын

    Thanks for your comments!

  • @kandreitapomen
    @kandreitapomen7 жыл бұрын

    Great tutorial. Thank you very much!

  • @bkrai

    @bkrai

    7 жыл бұрын

    +Kandreitapomen 👍

  • @abdulkhader101
    @abdulkhader1015 жыл бұрын

    You are a great teacher sir, you are really awesome

  • @bkrai

    @bkrai

    5 жыл бұрын

    Thanks for comments!

  • @fredpoole6373
    @fredpoole63735 жыл бұрын

    Great Video! Look forward to more videos!

  • @bkrai

    @bkrai

    5 жыл бұрын

    Thanks for comments! For more machine learning videos you can use this link: goo.gl/WHHqWP

  • @bassamal-kaaki3253
    @bassamal-kaaki32534 жыл бұрын

    Lovely explanation:) easy to absorb.

  • @bkrai

    @bkrai

    4 жыл бұрын

    Thanks for comments!

  • @gulapakarthik3864
    @gulapakarthik38643 жыл бұрын

    This is really Amazing...Thank you so much 😎

  • @bkrai

    @bkrai

    3 жыл бұрын

    You are welcome!

  • @ssundaraju
    @ssundaraju5 жыл бұрын

    Very Informative, great slides and explanations. The delivery and presentation was good. I will be viewing other videos produced by Edureka. Some suggestions, show more examples. Present the limitations and god fit scenarios for K-means clustering.

  • @bkrai

    @bkrai

    5 жыл бұрын

    Thanks for comments and feedback!

  • @txigual
    @txigual5 жыл бұрын

    Thank you so much, very useful video.

  • @bkrai

    @bkrai

    5 жыл бұрын

    Thanks for comments!

  • @saikatkar547
    @saikatkar5473 жыл бұрын

    thats really excellent explanation!

  • @bkrai

    @bkrai

    3 жыл бұрын

    Glad it was helpful!

  • @omkarsingh6060
    @omkarsingh60604 жыл бұрын

    Amazing...Really impressed

  • @bkrai

    @bkrai

    4 жыл бұрын

    Thanks for comments!

  • @EduardoFrancoChalco
    @EduardoFrancoChalco7 жыл бұрын

    Really great tutorial, thank you very much!

  • @bkrai

    @bkrai

    7 жыл бұрын

    +Eduardo Franco Chalco 👍

  • @EduardoFrancoChalco

    @EduardoFrancoChalco

    7 жыл бұрын

    Would you please send me the scrip and data? email: efranco1@uc.cl

  • @zhuziyan9454
    @zhuziyan94546 жыл бұрын

    dear professor, I am so lucky to know you. could you also update full tutorial about using rmd and advanced model like hmm? Thank you and wish you have a great day

  • @bkrai

    @bkrai

    6 жыл бұрын

    Thanks for the suggestion, I've added this to my list.

  • @kishoreyarramshetty2930
    @kishoreyarramshetty29303 жыл бұрын

    Good Job in explaining the content along with code..

  • @kishoreyarramshetty2930

    @kishoreyarramshetty2930

    3 жыл бұрын

    can u provide us the link to download the dataset in this video to run the code.

  • @bkrai

    @bkrai

    3 жыл бұрын

    Thanks for comments!

  • @bkrai

    @bkrai

    3 жыл бұрын

    For data, there should be a link below this: kzread.info/dash/bejne/oaieuaWafca8kaQ.html

  • @prashantmishra2094
    @prashantmishra20944 жыл бұрын

    nice tutorial Sir. Keep making such videos

  • @bkrai

    @bkrai

    4 жыл бұрын

    Thanks for comments!

  • @asifjeelani1215
    @asifjeelani12152 жыл бұрын

    thank you sir, very well explained

  • @bkrai

    @bkrai

    2 жыл бұрын

    Thanks for comments!

  • @desisto007
    @desisto0077 жыл бұрын

    Thank you so much! Very well explained. I would like to ask you if I still can use the Euclidian distance to find the closest elements of a cluster center, even if I use a dimensionality reduction approach (such as PCA, T-sne) that uses probabilities to arrange clusters in 2 dimension before using K-means.

  • @alicelatimier3133
    @alicelatimier31334 жыл бұрын

    Thank you so much for your amazing videos, everything is so clear and practical :) From a french research in cognitive science, I have one tricky question for you : i would like to find the best classifier/cluster analysis for repeated measures dataset (i.e., multiple repeated measures for one subject on the same features, as this is the case in experimental psychology research for example, or in longitudinal studies). Best

  • @bkrai

    @bkrai

    4 жыл бұрын

    You can look into this link: kzread.info/head/PL34t5iLfZddvMPAl1TzHJ_GjQcD3s6w_Z

  • @biswadeepdas5528
    @biswadeepdas55288 жыл бұрын

    sir, it is quite good. I would really appreciate if you upload more videos .

  • @bkrai

    @bkrai

    8 жыл бұрын

    +biswadeep das thanks for your feedback! I'll definitely create more such videos.

  • @javzmaatsend3785
    @javzmaatsend37854 жыл бұрын

    Thank you, Very easy

  • @bkrai

    @bkrai

    4 жыл бұрын

    You are welcome!

  • @DeepeshSinghAndroid
    @DeepeshSinghAndroid7 жыл бұрын

    Hi Mr. Rai, great tutorial. Thanks for your effort. Just wanted to understand more about these 2 methodologies. Why and when we apply different methodologies i.e. K means and Hierarchy. It will be great help if you can make separate videos for the same. Also, as lots of people requested for data set and you have already uploaded to Dropbox, could you please share the link in your description for everyone's benefits. Thanks again :)

  • @bkrai

    @bkrai

    7 жыл бұрын

    Initially we try all methods and finally choose the one that seems more meaningful for the dataset used. It's difficult to say which method will work best beforehand. Also thanks for your feedback and suggestions.

  • @phediasdiamandis2441
    @phediasdiamandis24417 жыл бұрын

    Great Video. Congrats

  • @bkrai

    @bkrai

    7 жыл бұрын

    +Phedias Diamandis thanks for the feedback 👍

  • @rinoypaultharu5071
    @rinoypaultharu50715 жыл бұрын

    Great tutorial, it really help for my analysis. Im having some douts, in that while silhouette calculation, whether we need to check average silhouette value, or which value we have to check to find out the number of clusters. Please help me with that. In your analysis what is the silhoutte value for k=3, where it is showing on that plot? Second while calculating my Euclidean distance, i have 40 observations, so it is not showing complete rows of Euclidean matrix, so is there any other way to obtain the complete matrix

  • @stephravelo
    @stephravelo8 жыл бұрын

    This is a very informative video. I hope you would have a repository github of your data so that we can play around with the script you used.

  • @bkrai

    @bkrai

    4 жыл бұрын

    Here is the link: github.com/bkrai/Top-10-Machine-Learning-Methods-With-R

  • @AnchalSingh06
    @AnchalSingh068 жыл бұрын

    Thank you for posting this video. It's helpful. I have (500,226) data . Please guide me to do Kmeans and Silhouette in R

  • @hridayborah9750
    @hridayborah97504 жыл бұрын

    yes all your videos are helpful. Could you prepare a tutorial on machine learning in the tidy verse.

  • @bkrai

    @bkrai

    4 жыл бұрын

    I've added it to list of future videos. Thanks!

  • @maggief6653
    @maggief66534 жыл бұрын

    I´d like to change the dendogram position (Horizontal Plot), what package and function can I use?

  • @vinzkyvijayaraj4035
    @vinzkyvijayaraj403511 ай бұрын

    Thank you!

  • @bkrai

    @bkrai

    11 ай бұрын

    You're welcome!

  • @santosacosta4645
    @santosacosta46455 жыл бұрын

    Thank you very much sir. Question: using Within group SS plot (min 14:39), isn't the optimal number of clusters 5? the variability from 4 to 5 seems very significant. Please let me know.

  • @bkrai

    @bkrai

    5 жыл бұрын

    This data has only 22 companies. As we increase number of clusters, number of companies in some clusters becomes really small, to the extent that a cluster may contain just one company. So the choice of 'k' should also consider this aspect.

  • @niv2419
    @niv24196 жыл бұрын

    Hi! Thank you so much of making this blog! Can you please make a video on feature engineering in R? Thank you!

  • @bkrai

    @bkrai

    5 жыл бұрын

    Here is the link: kzread.info/dash/bejne/iHl2w9prh7DIdaQ.html

  • @shivakazempour5603
    @shivakazempour56035 жыл бұрын

    you are amazing!

  • @bkrai

    @bkrai

    5 жыл бұрын

    Thx for comments!

  • @betzthomas9693
    @betzthomas96934 жыл бұрын

    Thank you Sir for the tutorial.Please explain if there is any package is R to identify on what basis clusters are grouped from the data we provide.

  • @bkrai

    @bkrai

    4 жыл бұрын

    Refer to the averages for each cluster and all variables.

  • @manigandanv8531
    @manigandanv85317 жыл бұрын

    Thank you so much for explaining this . It would be really grate help if you could upload a video using bray curtis similarity.. b

  • @bkrai

    @bkrai

    7 жыл бұрын

    Thanks for the suggestion, I'll keep it for future.

  • @rezaamirahmadi6013
    @rezaamirahmadi60133 жыл бұрын

    Thanks , How can I use fuzzy k-means (FKM) to impute missing in R ?

  • @jonathanrhein7553
    @jonathanrhein75538 жыл бұрын

    Hi Bharatendra, great video - really helpful! Everything goes well until the point of doing the scree plot, I am getting: > withinGroupSumOfSquares = (nrow(normNum)-1) * sum(apply(normNum, 2, var, na.rm=TRUE)) > for(i in 2:20) withinGroupSumOfSquares[i] = sum(kmeans(normNum, centers=i)$withinss) Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1) > plot(1:20, withinGroupSumOfSquares, type="b", xlab = "Number of Clusters", ylab = "Within group SS") Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ Can you help me? Thank you.

  • @jonathanrhein7553

    @jonathanrhein7553

    8 жыл бұрын

    someone has deleted my comment...

  • @bkrai

    @bkrai

    8 жыл бұрын

    +Jonathan Rhein Not sure what's causing the error you got. May have something to do with data. I ran my data using the code you have, and everything seems fine.

  • @bkrai

    @bkrai

    8 жыл бұрын

    +Jonathan Rhein I still see your previous comment.

  • @azfersaeed1602
    @azfersaeed16028 жыл бұрын

    Great video man! Thank you very much for posting :). Could you show cluster analysis using more than 2 variables?

  • @bkrai

    @bkrai

    8 жыл бұрын

    +Azfer Saeed thanks for feedback! In the example we have cluster analysts with 8 variables. However for scatter plot we use two variables at a time.

  • @azfersaeed1602

    @azfersaeed1602

    8 жыл бұрын

    +Bharatendra Rai You are correct...sorry for the incorrect semantics. At 2:15, you mention that broadly, there are 3 clusters but they are based only on 2 variables. Is there a way to create clusters based on more than 2 variables?

  • @niv2419
    @niv24196 жыл бұрын

    Hello sir, as always your videos have been very helpful and thank you for this video too. Also, I wanted to know if there is a way to improve between cluster distance? If so can you please let us know? Thank You!

  • @bkrai

    @bkrai

    6 жыл бұрын

    You can increase or decrease number of clusters and see which one improves between cluster distance.

  • @Nit1601
    @Nit16012 жыл бұрын

    THE BEST !!! Could you please advise, do we need to do anything else to normalize if we are dealing with Binary columns (0,1). Thanks !

  • @bkrai

    @bkrai

    2 жыл бұрын

    We should exclude such variables.

  • @tahzeebfatima3121
    @tahzeebfatima31215 жыл бұрын

    Thanks for the informative video. May I please know how to deal with dichotomous variables along with continuous variables in the data if we want to include both in one cluster analysis, how do we do it please?

  • @bkrai

    @bkrai

    3 жыл бұрын

    This link has more cluster analysis topics: kzread.info/dash/bejne/oaieuaWafca8kaQ.html

  • @thejuhulikal6290
    @thejuhulikal62903 жыл бұрын

    sir please make the video on this K-mode also, that would be great to understand both topics and comparison

  • @bkrai

    @bkrai

    3 жыл бұрын

    Thanks, I've added it to my list.

  • @gayatritadla3844
    @gayatritadla38447 жыл бұрын

    Thank you.

  • @bkrai

    @bkrai

    7 жыл бұрын

    +Gayatri Tadla 👍

  • @sanjayh3897
    @sanjayh38978 жыл бұрын

    Excellent tutorial Bharatendra ! Do you have any example to share for Overlapping clustering - would appreciate it. Thanks !

  • @bkrai

    @bkrai

    8 жыл бұрын

    There are 52 datasets where clustering can be applied in the link below: archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&numAtt=&numIns=&type=&sort=nameUp&view=table

  • @deepaksingh9318
    @deepaksingh93186 жыл бұрын

    A good tutorial , Could you please also tell us when should we go for Kmeans and When should we go for Hclust(I.E situations to select methods) 2. What do we mean when we say above average and below average (in Hclust) , i mean if the value is 1.05 so are we saying that sales in cluster x is higher 1.05 than average ?? a explanation will be appreacited.. REst everything is explained in a really simple way so Subscribing the channed :) Keep it up..

  • @bkrai

    @bkrai

    3 жыл бұрын

    For more on clustering: kzread.info/dash/bejne/oaieuaWafca8kaQ.html

  • @anigov
    @anigov6 жыл бұрын

    Dear Sir..thank you for the time & effort that you have put in to make this wonderful video tutorial. I have a query. At 12:27 , how are the original average values displayed even though member.c is used which is obtained through a series of calculations using the normalised data? Why did not you use PCA to decide the no. of clusters for kmeans? Regards Aniruddh

  • @bkrai

    @bkrai

    6 жыл бұрын

    In the 2nd aggregation line, note that I've used utilities. That's the reason we can display original values. In the 1st aggregation, z was used. Also, here focus was on clustering, so pca is not used.

  • @anigov

    @anigov

    6 жыл бұрын

    Thank you

  • @mariaamithapennington3737
    @mariaamithapennington37373 жыл бұрын

    Thank you so much for the tutorial. It is extremely helpful. But my question like the other is that it would have been very kind of you if you would have linked your data set too. Thanks!

  • @bkrai

    @bkrai

    3 жыл бұрын

    You can get it from here: kzread.info/dash/bejne/oaieuaWafca8kaQ.html

  • @mariaamithapennington3737

    @mariaamithapennington3737

    3 жыл бұрын

    @@bkrai Thank you very much! Appreciate it! :)

  • @bkrai

    @bkrai

    3 жыл бұрын

    You are welcome!

  • @AdityaLandge1994
    @AdityaLandge19945 жыл бұрын

    nice tutorial

  • @bkrai

    @bkrai

    5 жыл бұрын

    thanks for comments!

  • @sajidurrahmannafis8476
    @sajidurrahmannafis84763 жыл бұрын

    Best tutorial in the internet. I have one question: why are using euclidean distance then again complete linkage? I thought we need one distance measurement technique. I will be really grateful if someone can clarify. My questions answer may help others also. Thank you.

  • @bkrai

    @bkrai

    3 жыл бұрын

    You can refer to this more recent one: kzread.info/dash/bejne/oaieuaWafca8kaQ.html

  • @sajidurrahmannafis8476

    @sajidurrahmannafis8476

    3 жыл бұрын

    @@bkrai Thank you sir. I am a big fan of your teaching. I am also a research assistant in US. Thank you for your amazing lectures!

  • @sajidurrahmannafis8476

    @sajidurrahmannafis8476

    3 жыл бұрын

    @@bkrai Thank you. I got the answer to my question from your new cluster video lecture.

  • @bkrai

    @bkrai

    3 жыл бұрын

    Thanks for the update!

  • @bkrai

    @bkrai

    3 жыл бұрын

    You are welcome!

  • @Masshiara
    @Masshiara6 жыл бұрын

    Thanks! a lot!!

  • @bkrai

    @bkrai

    3 жыл бұрын

    Welcome!

  • @shruthihariharapura
    @shruthihariharapura7 жыл бұрын

    hi, excellent tutorial, it helped me a lot, can you help us in implementing density based clustering in R. Feeling difficult in implimenting

  • @bkrai

    @bkrai

    3 жыл бұрын

    Thanks!

  • @TusharLapani
    @TusharLapani8 жыл бұрын

    Thanks Bharatendra. Can you please upload video of how to performe clustering when the dataset has numbers of numerical attributes and categorical attributes. In this video you are eliminating categorical attribute. What would you have done if your dataset has 10 numeric columns and 8 categorical data. Appreciate your knowledge contribution.

  • @bkrai

    @bkrai

    8 жыл бұрын

    +Tushar Lapani For cluster analysis you must have quantitative variables. You can use categorical variables after cluster analysis to see if they show any pattern with identified clusters and use it for characterizing the clusters.

  • @rithishvikram1759
    @rithishvikram17594 жыл бұрын

    nice explaination sir!!!!! thank you so much ....great respect ....sir if you would pls attach concern datasets with a video ...thank you once again

  • @bkrai

    @bkrai

    4 жыл бұрын

    send your email id

  • @rithishvikram1759

    @rithishvikram1759

    4 жыл бұрын

    rithishvikram4937@gmail.com

  • @bkrai

    @bkrai

    4 жыл бұрын

    all set.

  • @rithishvikram1759

    @rithishvikram1759

    4 жыл бұрын

    thank you so much sir

  • @mfkalabdullah6966
    @mfkalabdullah69667 жыл бұрын

    Sir, Do you have more videos on clustering? Also, can I contact you in the future regarding clustering because I'm doing a research using data mining clustering?

  • @bkrai

    @bkrai

    3 жыл бұрын

    There is a playlist on clustering: kzread.info/dash/bejne/oaieuaWafca8kaQ.html

  • @DrMDarwish
    @DrMDarwish7 жыл бұрын

    Thanks :)

  • @bkrai

    @bkrai

    3 жыл бұрын

    welcome!

  • @tanmaygawade1068
    @tanmaygawade10683 жыл бұрын

    hello sir!! actually wanted to know how to perform clustering on PCA generated scores in r and how to compare the cluster size for both.

  • @VenkateshDataScientist
    @VenkateshDataScientist7 жыл бұрын

    HAPPY NEW YEAR TO YOU AND YOUR FAMILY MEMBERS . Sir ,If you have time please upload support vector machine and Sentimental analysis .

  • @bkrai

    @bkrai

    7 жыл бұрын

    A very happy new year to you and family too! I'll keep your suggestion in mind for next videos.

  • @bkrai

    @bkrai

    7 жыл бұрын

    Here is the link to SVM: kzread.info/dash/bejne/oodpybp-fseZkZc.html&list=PL34t5iLfZddtII4ssT8FSUFP27fPYDEhY&index=25

  • @sapnapatil27
    @sapnapatil277 жыл бұрын

    Thanks.

  • @bkrai

    @bkrai

    3 жыл бұрын

    Welcome!

  • @mattcotoia8749
    @mattcotoia87492 жыл бұрын

    Can you tell us what each variable means in this dataset please?

  • @rohanshetty1016
    @rohanshetty10165 жыл бұрын

    Sir your video lectures are really awesome! Excellent Tutorial! Can you please share the csv file used for cluster analysis?

  • @bkrai

    @bkrai

    5 жыл бұрын

    send me your email id.

  • @jalluravikiran4146
    @jalluravikiran41466 жыл бұрын

    good video.

  • @bkrai

    @bkrai

    3 жыл бұрын

    Thanks!

  • @aks1008
    @aks10085 жыл бұрын

    Sir how to remove multicollinearlity in cluster analysis as it is an unsupervised algorithm..there is no dependent variable..

  • @bkrai

    @bkrai

    5 жыл бұрын

    Multicollinearlity is a problem only for regression models. For cluster analysis it not an issue.

  • @zhuziyan9454
    @zhuziyan94546 жыл бұрын

    could you please explain why subtracting the first variable by [,-c(1,1)] rather than[,-1]? Thank you

  • @bkrai

    @bkrai

    6 жыл бұрын

    Both work fine. You can use it if you need to remove more than one variable.

  • @ramp2011
    @ramp20116 жыл бұрын

    Great tutorial. Thank you... How do you handle categorical variables for clustering? In this example looks like you removed the 1st column that happened to be a factor variable. Can you please post the data file used in the comments as well if possible? Thank you

  • @bkrai

    @bkrai

    6 жыл бұрын

    Cluster analysis only works with quantitative variables. During the analysis you may note that we calculate distances, which we cannot do with categorical variables. But after finalizing number of clusters, you can plot dendrogram with a categorical variable to see if there is any obvious pattern or not. For data, send email id.

  • @Jorge-vp7of

    @Jorge-vp7of

    6 жыл бұрын

    you can use K-modes to do clustering with categorical data

  • @medardkafoutchoni6511

    @medardkafoutchoni6511

    6 жыл бұрын

    Thank you dear Sanchez. What about mixed data (i.e. including both numerical and categorical variables)?

  • @vivekwilliam3370

    @vivekwilliam3370

    6 жыл бұрын

    vivek4u.3048@gmail.com

  • @harishnagpal21
    @harishnagpal215 жыл бұрын

    Nice video as always. I have couple of questions. In K means cluster example, if we want a list as per the three clusters, how do we tag that. 2nd query, I have a data set of 100000 insurance customers having customer ids and their policy Face amount. I want to divide them in cluster ( say 5 cluster) and also want to know which customer comes in which cluster (same query as first) so that I can target them for a campaign. How do we do that and which clustering technique to use? Thanks in advance.

  • @bkrai

    @bkrai

    5 жыл бұрын

    You can use something similar to kc$cluster that I've used at around 16:30 time point in the video.

  • @harishnagpal21

    @harishnagpal21

    5 жыл бұрын

    Thanks

  • @ss11996
    @ss119965 жыл бұрын

    hello sir, while creating eucledian distance i am getting an error that says"Error: cannot allocate vector of size 1094.0 Gb" how do i solve this issue

  • @bkrai

    @bkrai

    5 жыл бұрын

    1094.0 Gb suggests there is memory problem. Probably you can close data sets that you are not using and try again.

  • @Guavarosa
    @Guavarosa4 жыл бұрын

    Please can you give me a hint? I want to give as input the initial centres for kmeans clustering. I just do not manage to select these points out of my dataset. Thank you in advance for your help!

  • @bkrai

    @bkrai

    4 жыл бұрын

    Why do you need that? The algorithm should automatically take care of finding the best clusters.

  • @Guavarosa

    @Guavarosa

    4 жыл бұрын

    @@bkrai Because I try to correlate my clusters to the physical problem. That is why I was wondering if I can give initial centres as in case of software Origin Pro. I appreciate your answer.

  • @jaatni64
    @jaatni645 жыл бұрын

    Sir... Please tell me that why i am getting different result when we run kmeans command twice or more

  • @bkrai

    @bkrai

    5 жыл бұрын

    you can run set.seed(123) each time before running kmeans to get same result.

  • @maryamaziz5064
    @maryamaziz50644 жыл бұрын

    would love to try it on my own

  • @bkrai

    @bkrai

    4 жыл бұрын

    Thanks!

  • @adityasharma2667
    @adityasharma26676 жыл бұрын

    Hello Sir, Just wanted to know, after running K-Means with 3 cluster, we found that there are two cluster which are overlapping, how to remove overlapping of these two clusters otherwise it would not be called as K-means clustering.Please correct me If I am wrong sir. Thanks

  • @bkrai

    @bkrai

    6 жыл бұрын

    Some amount of overlap is common with k-means or any other clustering method.

  • @mallorywright1453
    @mallorywright14534 жыл бұрын

    Do you have any examples of validating a cluster analysis using LPA?

  • @bkrai

    @bkrai

    4 жыл бұрын

    I'm adding to the list of future videos.

  • @tsmg6889
    @tsmg68898 жыл бұрын

    thanks.

  • @bkrai

    @bkrai

    3 жыл бұрын

    Welcome!

  • @heratpatel7174
    @heratpatel71747 жыл бұрын

    what can i do for minimum overlapping ?

  • @abhiagni242
    @abhiagni2426 жыл бұрын

    Thanks for the video sir,,, .... can u Plz share the link to the dataset used

  • @bkrai

    @bkrai

    6 жыл бұрын

    email id?

  • @shubhasmitasahani1738
    @shubhasmitasahani17385 жыл бұрын

    Hello Sir, do you have any video on latent class clustering in R? Please share...Looking forward.

  • @bkrai

    @bkrai

    5 жыл бұрын

    Not yet, but I'm adding this to my list for future. For clustering related videos, you may refer to this link: kzread.info/head/PL34t5iLfZddvMPAl1TzHJ_GjQcD3s6w_Z

  • @sayedyavar3752
    @sayedyavar37526 жыл бұрын

    i want to remove multiple columns from my data set just like you removed the company. what code should I use?

  • @bkrai

    @bkrai

    6 жыл бұрын

    let's say tou want to remove columns 2, and 4 from 'data' that has 5 columns. Then, data1