Median, Mean, Mode, Percentile | Math, Statistics for data science, machine learning

Median, Mean, Mode and Percentile are essential concepts in statistics that you should have good understanding of if you plan to make a career in data science field. In this video, using few very simple real life examples we will understand what each of these concepts mean and how exactly they are used in real life to solve data science problems
Check our Math and Statistics course for Data Science here : codebasics.io/courses/math-an...
Code: github.com/codebasics/math-fo...
Exercise: github.com/codebasics/math-fo...
Math/Stats for Data science, ML playlist: • Mathematics, statistic...
Math/Stats for Data science, ML playlist Hindi: • Mathematics, statistic...
Outlier removal using IQR Video: • Outlier detection and ...
⭐️ Timestamps ⭐️
00:00 Introduction
00:23 What is Median?
04:40 What is percentile?
10:20 What is Mode?
11:05 Python Code
18:00 Exercise
Do you want to learn technology from me? Check codebasics.io/ for my affordable video courses.
🌎 Website: codebasics.io/
🎥 Codebasics Hindi channel: / @codebasicshindi
#️⃣ Social Media #️⃣
🔗 Discord: / discord
📸 Instagram: / codebasicshub
🔊 Facebook: / codebasicshub
📱 Twitter: / codebasicshub
📝 Linkedin (Personal): / dhavalsays
📝 Linkedin (Codebasics): / codebasics
❗❗ DISCLAIMER: All opinions expressed in this video are of my own and not that of my employers'.

Пікірлер: 109

  • @codebasics
    @codebasics2 жыл бұрын

    Check our Math and Statistics course for Data Science here : codebasics.io/courses/math-and-statistics-for-data-science

  • @vikashdas1852
    @vikashdas18523 жыл бұрын

    Subscribing to this channel proved to be lot more helpful than enrolling into college for graduation

  • @belfloretkoriciza5279
    @belfloretkoriciza5279 Жыл бұрын

    Thank you so much Sir you're a good teacher and you're different from others because of the practice you demonstrate

  • @thomass4153
    @thomass41533 жыл бұрын

    Mean, median, mode and percentile are also known as 'Measures of Central Tendency'.

  • @programmingwithraahim

    @programmingwithraahim

    2 жыл бұрын

    Yeah bro I listened them from Khan Academy

  • @Akshay-vq1uv
    @Akshay-vq1uv2 жыл бұрын

    Your content and examples are great😃. Please don't stop making such easily explained content.

  • @chilledvibes5700
    @chilledvibes57002 жыл бұрын

    I have no words to say, really awesome series!

  • @bilalzubair6843
    @bilalzubair68432 жыл бұрын

    The best video to understand the concept of removing outliers

  • @accountingsapayag
    @accountingsapayag2 жыл бұрын

    As a. Beginner, I should say this is the best.

  • @kelvinticllahuanacohuachac9562
    @kelvinticllahuanacohuachac9562 Жыл бұрын

    furthermore to learn, this was even a enjoyable video, thanks a lot sir.

  • @wasimrajamiddya7560
    @wasimrajamiddya7560 Жыл бұрын

    Thank you Sir, for making such kind of beginners friendly videos. I really enjoyed and learned a lot. Please make make more such kind of videos so that we can understand easily. ❤️

  • @instagramstarstories6682

    @instagramstarstories6682

    5 ай бұрын

    Tell me Your insta id bro...plz

  • @kakumanusridhanalakshmi3203
    @kakumanusridhanalakshmi320311 ай бұрын

    Ultimate Explanation🎉 Got a good idea on using mean and medain

  • @locu83
    @locu832 жыл бұрын

    Exactly what I wanted a mentor 👍🏻❤️🙂.

  • @ravidawade5178
    @ravidawade51783 жыл бұрын

    Sr please make one video for freshers on real life data science project, your teaching skills are so simple everyone can understand very easily

  • @georgetzimas6882
    @georgetzimas68822 жыл бұрын

    2:45 When you added an extra value you did not sort them in ascending order (7000,7500,8000) instead of (7000,8000,7500).

  • @zishanafzal6671
    @zishanafzal66712 жыл бұрын

    You are the best teacher and have the best content on data analysis. NO need to go any channel.

  • @codebasics

    @codebasics

    2 жыл бұрын

    I am happy this was helpful to you.

  • @sathesht7532
    @sathesht75323 жыл бұрын

    Hi sir, thanks a lot for your extraordinary teaching, I have learned lot and did my homework by following your machine learning tutorial. Sir, Can you do for a video about Generative Adversarial Network (GAN) for regression prediction?

  • @parishjain159
    @parishjain1592 жыл бұрын

    Sir your way of teaching is very awesome

  • @arunadang7872
    @arunadang78723 жыл бұрын

    This series are masterpiece. Thank you.

  • @ankurhalke139

    @ankurhalke139

    2 жыл бұрын

    Yeah . So true ...*uck education system

  • @mivaangadewadvlogs
    @mivaangadewadvlogs2 жыл бұрын

    Hi Sir,can we use multiple median for multiple NaN data like you did in sofia;s case?

  • @soheilpalermo491
    @soheilpalermo4912 жыл бұрын

    Thank you that was very informative content.

  • @bhuralal5299
    @bhuralal52992 жыл бұрын

    Thanks for making this video its very helpful

  • @d3v487
    @d3v487 Жыл бұрын

    Hi , I have a dataset where 3 columns are independent categorical features and 5 dependent features that are 10th ,25th, 50th ,75th , 90th percentile of annual wage. How can I get values (annual wage ,which is missing) from the 5 percentile columns ?

  • @andresfrr100
    @andresfrr1003 жыл бұрын

    Hi! in time = 2:44 for the median you take Tao and Prem, but they must be first sorted and Prem it is not counted in the median, but Sofia do. So m=(Tao + Sofia)/2?

  • @balajib.9561
    @balajib.95613 жыл бұрын

    Sir upload real life data science project 👍😁

  • @codebasics

    @codebasics

    3 жыл бұрын

    On KZread search for "codebasics data science project", you will find my videos please watch it

  • @pavan2926
    @pavan29263 жыл бұрын

    Only one word loved your explination

  • @abhinavkumbalwar6837
    @abhinavkumbalwar6837 Жыл бұрын

    Very informative video.

  • @shreyas_._
    @shreyas_._3 жыл бұрын

    One of the best tutorial ❤️🔥

  • @codebasics

    @codebasics

    3 жыл бұрын

    Glad it was helpful!

  • @MuhammadUmar-px6ij
    @MuhammadUmar-px6ij3 жыл бұрын

    Before exploring the Codebasics channel. I never had an interest in Math & Stat. Thanks, Bro. Love & Respect from Pakistan

  • @codebasics

    @codebasics

    3 жыл бұрын

    I am happy this was helpful to you.

  • @sudarshanm.s6736
    @sudarshanm.s67368 күн бұрын

    Sir , how is the median of the data points 7500 , since the median has to be the average of Tao's and Sofia's income so it will be (7000+7500)/2 = 7250 right.. So I meant after arranging in ascending order

  • @kirankapruwan8892
    @kirankapruwan88926 ай бұрын

    While calculating the median( when data values are even) we need to sort data values in ascending order.

  • @siddharudtevaramani1055
    @siddharudtevaramani10553 жыл бұрын

    Example of Mode is lit 😀

  • @Baburao_Aapte
    @Baburao_Aapte3 жыл бұрын

    Your way of teaching is incredible, I love your videos. Whenever anyone ask me from where you learn all this then, I share link of ur channel to my juniors.

  • @codebasics

    @codebasics

    3 жыл бұрын

    Thanks for sharing! I am happy this was helpful to you.

  • @mvcutube
    @mvcutube3 жыл бұрын

    Thanks for such a nice tutorial

  • @codebasics

    @codebasics

    3 жыл бұрын

    Glad it was helpful!

  • @cyptowithkelv
    @cyptowithkelv Жыл бұрын

    do you have any full course on data analysis?

  • @vidhikapadia9700
    @vidhikapadia97002 жыл бұрын

    What is the difference between 0.99 and 0.999 quantile range as in exercise 0.999 is used?

  • @sundar6323
    @sundar63233 жыл бұрын

    Is careerera a good institute to join as a beginner. Im final yr ECE student.

  • @kIocuchl2
    @kIocuchl2 Жыл бұрын

    2:43 there should be sorted values and median will be equals to (7000+7500)/2

  • @harishkannan8023
    @harishkannan80233 жыл бұрын

    Beautiful explanation

  • @codebasics

    @codebasics

    3 жыл бұрын

    Glad it was helpful!

  • @Murlik1604
    @Murlik16043 жыл бұрын

    One very basic question - Should the outlier removal be applied on labels (values to be predicted) as well if outliers exist on such data labels as well ?

  • @architchaudhary1791

    @architchaudhary1791

    2 жыл бұрын

    No

  • @momincomputer9967
    @momincomputer9967 Жыл бұрын

    great sir 🥰

  • @ParulBedi
    @ParulBedi3 жыл бұрын

    what is the difference between Linear Quantile and Midpoint quantile ??

  • @VishalSingh-dv2vg
    @VishalSingh-dv2vgАй бұрын

    Sir what if the data is missing from or below 25% ,75% then how to find The Average.please reply

  • @philtoa334
    @philtoa3343 жыл бұрын

    so clear, thx.

  • @codebasics

    @codebasics

    3 жыл бұрын

    Glad it helped!

  • @shantanughode275
    @shantanughode2752 жыл бұрын

    Is the amount of statistics required for data science and data analytics the same?

  • @lathaloganathan4429
    @lathaloganathan442911 ай бұрын

    So, How to identify there is an outlier in the dataset? please calrify

  • @SURAJKUMAR-ug4oi
    @SURAJKUMAR-ug4oi2 жыл бұрын

    Sir there could have been possibility that sofia's income would really high then median will not work well?

  • @arupgorai2320
    @arupgorai23203 жыл бұрын

    Sir I want to know which language is very important? Should we start with Java or python

  • @himanshusemwal1889
    @himanshusemwal18893 жыл бұрын

    Again Great Video Sir. I have a silly doubt. As you said we cant take average to fill null value if outlier have very large value like Elon musk(10 million$) and now we are going to take Median to fill na values.but nan values itself present at the middle of datapoints .So how we gonna calculate median if nan value is present at those points. median=(nan+nan)/2 ?

  • @abhijeetjain2098

    @abhijeetjain2098

    3 жыл бұрын

    maybe you can take the median of non-null values and fill up

  • @shutterup24-7

    @shutterup24-7

    2 жыл бұрын

    I think for taking median of dataset first we have to rearrange data to ascending order that will shift position of Nan value!!

  • @samvhora9076

    @samvhora9076

    2 жыл бұрын

    @@shutterup24-7 yes thats the first step

  • @mimosveta
    @mimosveta3 жыл бұрын

    am I just scatter brain, or did you not include the link to video where you explain how to use iqr to remove outliers? I only see a link to a playlist, but none of them seem to be on that particular topic? EDIT: okay, seems you explained it later in this video, but it really sounded like you had a link for us...

  • @codebasics

    @codebasics

    3 жыл бұрын

    mimosvera, you are right I forgot to include a link but I just added it now. Please check video description

  • @alokbhushan9026
    @alokbhushan9026Ай бұрын

    At 3:02 adding prem to the dataset is disturbing the ascending sorting order. So the median should really be 7000+7500 / 2 = 7250.

  • @shariqueansari9921
    @shariqueansari99213 жыл бұрын

    Sir, I need your suggestion. Can you help me ?

  • @ExcelPro.
    @ExcelPro.2 жыл бұрын

    Awesome learning 🆗😎👍

  • @codebasics

    @codebasics

    2 жыл бұрын

    Glad you enjoyed it

  • @prathampatel582
    @prathampatel582 Жыл бұрын

    why cannt we use trimmed mean?

  • @annonymous.
    @annonymous.2 жыл бұрын

    Why don't we fill missing values with mode? Mode is the one that appears most but why we use mean and median most of the time?

  • @arshad1781
    @arshad17813 жыл бұрын

    nice

  • @friendonymous
    @friendonymous11 ай бұрын

    What is the difference between average and mean?

  • @user-fn2vo9lh5z
    @user-fn2vo9lh5z6 ай бұрын

    In the median example at minute 2:40 , shouldn't we order the values first before guessing about which value is the median? shouldn't the values be like that: 4,000 so, the median would be the average of 7,000 and 7,500 which is 7,250

  • @Life_rollercoaster
    @Life_rollercoasterАй бұрын

    I'm near about 50 . I have completed MCA from IGNOU and Digital marketing from NIIT imperia. I worked as a software developer and now im a digital marketer. If I want to change my career in data science after learning this field, can i get a job in data science field?

  • @shubhampathare4892
    @shubhampathare4892 Жыл бұрын

    in the example at 3:00 u havent sort data in ascending order for median

  • @_craig_
    @_craig_3 жыл бұрын

    Nice video. I would like to suggest a change. 100th percentile doesn't exist, only 99th. In your example, Musk would have to be earning higher than himself to be the 100th percentile.

  • @saikatdutta1991
    @saikatdutta19912 ай бұрын

    Consider my data points: 100 100 100 100 here the 50th percentile which is 100 is kinda misleading right? because 2 more 100 values are present in the right side of median. SO.. 100% of the data values are equals to 50th percentile. Can you please explain where I am confused??

  • @micagar2510
    @micagar25103 жыл бұрын

    Should we first learn pandas then attempt exercises?

  • @pankajjoshi8292
    @pankajjoshi829211 ай бұрын

    Power Bi KO Course Kaha Cha Hola?

  • @dimpisayed9710
    @dimpisayed97102 жыл бұрын

    How can i code in Jupyter, just like you.

  • @HitmanBlitz15
    @HitmanBlitz153 жыл бұрын

    Sir can u explain the steps to become a data analyst and skills required for that

  • @codebasics

    @codebasics

    3 жыл бұрын

    On KZread search for "codebasics learn data analyst skills", you will find my videos please watch it

  • @HitmanBlitz15

    @HitmanBlitz15

    3 жыл бұрын

    @@codebasics tq sir

  • @mayur_variya1219
    @mayur_variya12199 ай бұрын

    in case of even n.of data point you have not sorted them so median is wrong

  • @universal4334
    @universal43343 жыл бұрын

    For suppose the data is like this 4,4,6,7,40,100,110,120,1300...in this case taking median doesn't make sense right ....same for mean outlier 1300 involved...and for mode also 4,4 just repeating 4 for 2 times doesn't make sense right... What to do in this case please any one answer me ...could we find solution from this video..

  • @codebasics

    @codebasics

    3 жыл бұрын

    Taking mode of 4 is perfectly ok because you are looking for a value that is most frequently occurring and 4 is that value. It really depends on what problem you are trying to solve here. Can you suggest what type of dataset this is? You just made up the values and are generally curious about such distribution?

  • @universal4334

    @universal4334

    3 жыл бұрын

    @@codebasics I just take it as an example...but just for repeating 4 for 2 times blindly we can't take 4 for filling the missing value right because it is far less than other higher values

  • @universal4334
    @universal43343 жыл бұрын

    It is good if you should have taught why not median and mode in some cases

  • @brendamg7298
    @brendamg7298 Жыл бұрын

    🙏🏻

  • @sagarhirapara5455
    @sagarhirapara54553 жыл бұрын

    Sir tamari sathe contact kai rite kari saku?

  • @ramananagavelli3055
    @ramananagavelli30555 ай бұрын

    how do you know that your data has oulier

  • @catherinezeng4917
    @catherinezeng49172 жыл бұрын

    Hi, I'm a bit confused with the solution of the exercise. To me, the outlier is not simply removed by percentile, we should exclude the line with 365 availability and 0 reviews + 0 availability and 0 reviews because those lists are just "ghost" lists that no one actually rent them or just the data is not accurate. If we go further down, we should probably clean the data by review date also, I see some of them are with 2011 date, but if we are analyzing the average of this/recent year then there should be a cut off of the latest year we can use. Please let me know your thoughts. Thanks.

  • @codebasics

    @codebasics

    2 жыл бұрын

    Totally agreed with your thoughts here. Percentile is just one of the ways, using common sense simple logic is totally a legit way of treating outliers

  • @catherinezeng4917

    @catherinezeng4917

    2 жыл бұрын

    @@codebasics Thank you for replying to me so quickly, so if I apply what I said in the post first and then apply percentile, is that going to be right, or let's say with better accuracy? Also, how do we measure the accuracy? should the mean be close to the 50% percentile? how do we know our analysis is good or bad? Thank you so much!

  • @saichaitanyakumbhari274
    @saichaitanyakumbhari2743 жыл бұрын

    why 0.999 in the exercise ?

  • @ipubg615
    @ipubg6158 ай бұрын

    2:44 median will be 725

  • @financewithsom485
    @financewithsom4853 жыл бұрын

    removing elon from twitter as an outlier is also great

  • @wallahengineer9989
    @wallahengineer99892 ай бұрын

    Sandeep Jain sir GFG samjhne wale haath uthao😅😅

  • @RH-hv4ir
    @RH-hv4ir8 күн бұрын

    The video is great but i didnt like the exercise because there is more in it than it has been covered in the video

  • @troubution
    @troubution3 жыл бұрын

    The funniest part is if Elon Musk lives in our town😂😂

  • @codebasics

    @codebasics

    3 жыл бұрын

    Ha ha.. yes he is my neighbor ☺️🧐

  • @akshitsinghal8590
    @akshitsinghal85903 жыл бұрын

    Sorry sir , you miss one part in the video first we have to sort the nos. ( When the count of no is even (while finding median )

  • @aravintht3774

    @aravintht3774

    3 жыл бұрын

    1:55

  • @ankurhalke139
    @ankurhalke1392 жыл бұрын

    This is legend . Go to hell teachers and education system...

  • @jayasurya3864
    @jayasurya38642 жыл бұрын

    You really wish musk to be your neighbour it seems

  • @mdfarhanrza4274
    @mdfarhanrza42743 жыл бұрын

    Your medin answer is totally wrong

  • @aquapisces
    @aquapisces Жыл бұрын

    16:04 df.income.iloc[3] =Nan will work too

  • @ParulBedi
    @ParulBedi3 жыл бұрын

    what is the difference between Linear Quantile and Midpoint quantile ??