Live-Feature Engineering-All Techniques To Handle Missing Values- Day 2

Telegram link: t.me/joinchat/N77M7xRvYUd403D...
github link: github.com/krishnaik06/Featur...
Join the Ineuron Affordable course
ineuron1.viewpage.co/Deep-lea...
Please donate if you want to support the channel
Gpay: krishnaik06@okicici
Please join as a member in my channel to get additional benefits like materials in Data Science, live streaming for Members and many more
/ @krishnaik06
Please do subscribe my other channel too
/ @krishnaikhindi
Connect with me here:
Twitter: / krishnaik06
Facebook: / krishnaik06
instagram: / krishnaik06

Пікірлер: 86

  • @harikrishna-harrypth
    @harikrishna-harrypth3 жыл бұрын

    You are a very patient teacher (especially with soo many weird live comments whilst you do your sessions)!! God Bless Man! Thanks for your videos!!! 🙏🏼🙏🏼🙏🏼💖💖💖

  • @chaitanyakumarsomagani592
    @chaitanyakumarsomagani5923 жыл бұрын

    sir, the person who helped to this many people your just like amazing and God.i am great thankful to sir

  • @gaunollakalpana7866
    @gaunollakalpana78663 жыл бұрын

    Amazing playlist and superb explanation Krish. Hats off.

  • @srishtikumari6664
    @srishtikumari66643 жыл бұрын

    Amazing as usual! You are doing a great work.

  • @bhaskarg8438
    @bhaskarg84382 жыл бұрын

    Your are Great !! helping the Data Science community

  • @marijatosic217
    @marijatosic2173 жыл бұрын

    Great job! Keep up with the great work!

  • @jaykhade8940
    @jaykhade89403 жыл бұрын

    Function you wrote for random value imputation is very intuitive. 👌

  • @saidurgakameshkota1246
    @saidurgakameshkota12463 жыл бұрын

    Your teaching is awesome krish

  • @AkashGupta-ov4cy
    @AkashGupta-ov4cy3 жыл бұрын

    Owesome content on feature engineering

  • @eternalsilvers5961
    @eternalsilvers59613 жыл бұрын

    df.sample(df['Age'].isna().sum(),random_state =0 ) : Correction Sample is just taking the 177 as an input for generating 177 random values from the non null values in the feature column. Random state = 0 is just used for reproducibility of the sample. So that we can get the same sample everytime.

  • @md.muntasirulhoque8563
    @md.muntasirulhoque85633 жыл бұрын

    sir seeing your live session and learning from Bangladesh

  • @virubakaran4717
    @virubakaran47173 жыл бұрын

    while creating function in the last line you also changed the nan values of orginal Age column with the random sample values , thats why there is no difference in ditribution plot

  • @shwetaamit6810
    @shwetaamit68103 жыл бұрын

    U r very hard working dude

  • @akashprabhakar6353
    @akashprabhakar63533 жыл бұрын

    Thanks a ton!!

  • @vaibhavshukla9777
    @vaibhavshukla97773 жыл бұрын

    Thank you sir 🌟

  • @swetanishad7290
    @swetanishad72903 жыл бұрын

    Very good session

  • @rambaldotra2221
    @rambaldotra22213 жыл бұрын

    Thanks Sir!!

  • @GauravKumar-mi5wm
    @GauravKumar-mi5wm3 жыл бұрын

    For those who are finding it difficult to write the function understand the logic from the video and use the code import feature_engine.missing_data_imputers as mdi titanic = mdi.RandomSampleImputer(variables=["Age"],random_state=0).fit_transform(titanic)

  • @harshilgandhi4397
    @harshilgandhi43973 жыл бұрын

    Hey Krish! following your videos from past few days and its going good. just one query, In this video at 20-22 minutes, you are replacing Age NAN with Random Variable to df['Age'] and df['Age_Random'] both. So definitely it will show perfect plot. Is this query correct??

  • @akarkabkarim
    @akarkabkarim3 жыл бұрын

    thank you krish

  • @phungtruong6698
    @phungtruong66983 жыл бұрын

    Thank you sir

  • @aravindkumar9457
    @aravindkumar94573 жыл бұрын

    krish i have a doubt....what is use of capturing null values with new feature..because what can we do with creating new feature in this method and our aim is to fill null values with something correct??...but in this case we are doing mean values to fill null values in age column...so it is same as mean median imputation right??..so what is the diffrenece

  • @dipikazope8742
    @dipikazope87423 жыл бұрын

    Sir, I have seen all ur videos related to data science. I m from non CS background and want to learn data science. can u please provide some information or platform about practical projects to enhance our skills. Thankyou

  • @amadoukindybarry949
    @amadoukindybarry949 Жыл бұрын

    Afternoon sir , your session on the feature engineering techniques is ok , now is it possible to use 2 or more different techniques in the same dataset without an impact on the target variable?

  • @naveenrajan3765
    @naveenrajan37652 жыл бұрын

    If we uses end of distribution to fill the NAN, then outlier will be affecting the mean values. Do we need to take care of Outlier before or with outliers it will work

  • @sandipansarkar9211
    @sandipansarkar92112 жыл бұрын

    finished watching

  • @prateekbhardwaj8494
    @prateekbhardwaj84943 жыл бұрын

    Thanks

  • @sahajrajmalla
    @sahajrajmalla3 жыл бұрын

    How does adding a feature from NAN give a meaning to the datasets? Please elaborate more on this !

  • @adeyinkasotunde6870
    @adeyinkasotunde68703 жыл бұрын

    what if our dataset contains large amount of missing values in virtually all the columns in our dataset. What method is the best to fill up the large number of missing values in all (say 30 columns) of my dataset? Again, just like we do for train dataset, can we do for test data missing values as well or we can concatenate both train and test datasets and later implement this imputation?

  • @chiragagrawal7104

    @chiragagrawal7104

    3 жыл бұрын

    Don't concatenate train and test for missing imputation, this will lead to data leakage.

  • @prashanthdhananjayan1745
    @prashanthdhananjayan17452 жыл бұрын

    What a legend

  • @mlvali1350
    @mlvali13503 жыл бұрын

    there are another imputation is also use full for handling missing values. hot and cold deck imputation fancy imputation em algorithm

  • @arjungoud3450
    @arjungoud34503 жыл бұрын

    Bhai, can we check correlation with Y variable after each technique and can go with high correlation score technique

  • @naveenkumarjadi2915
    @naveenkumarjadi29153 жыл бұрын

    as in last part End of Distribution imputation are we deleting the nan values , because we are trying to make them in out from the 3rd standard deviation it means we indirectly skipping the nan values by iqr method is this true ?

  • @ajaykushwaha-je6mw
    @ajaykushwaha-je6mw2 жыл бұрын

    End of distribution will remove outlier but it will change distribution of feature so is it advisable to use ?

  • @are_amay
    @are_amay3 жыл бұрын

    Sir is there any difference in data science and engineering

  • @akashagrawal085
    @akashagrawal0853 жыл бұрын

    Sir you mentioned in last the last video that age is not MCAR in this dataset then why we are using that in methods where condition is of MCAR.are you doing so just for explaining or am I wrong in understanding this thing?

  • @irsadalam1604
    @irsadalam16043 жыл бұрын

    Sir, can you make the video on ctc_loss implementation in keras? Please...

  • @chiragagrawal7104
    @chiragagrawal71043 жыл бұрын

    But sir, how capturing nan values with new feature will get to know which is the new feature to work on?

  • @ritvikpant7107
    @ritvikpant71072 жыл бұрын

    Here as na and null values are same we know that both signify missing values so if we have already dropped na values then how are we substituting randomly collected values from our dataset hear in 21:31 ?

  • @rajatjain328
    @rajatjain3283 жыл бұрын

    sir please make videoseries on flask development

  • @konathammanoharreddy7670
    @konathammanoharreddy76703 жыл бұрын

    sir, idid not get any notification of live .plz solve

  • @ilayaraja1551
    @ilayaraja15512 жыл бұрын

    At 33:22, I have the doubt, Age_median column, the missing value imputed by median, then its value should be the median.. but here it is same as random_sample.. Is that correct..

  • @rakeshdesu2180
    @rakeshdesu21803 жыл бұрын

    Sir, Normally we 'fillna()' to replace NAN, it's not working if used for sample case, like below df['Age'].fillna(df['Age'].dropna().sample()) If i fill with any other value it is working like df['Age'].fillna(0)

  • @asawanted

    @asawanted

    3 жыл бұрын

    Even I had the same query. But I tried to debug and the reason it doesn't work is sample() returns series of sample values and their corresponding indices in the original dataframe. To merge, you have to have same indices. In your case, the fillna(0) is same as mean/median imputation only your are using zero as the value. Check Corey Schafer's cleaning data in pandas video. He has explained well.

  • @ramireddy371
    @ramireddy3713 жыл бұрын

    What happens, if we don't replace NaN values and directly create model with NaN values?

  • @mahindrarao4565
    @mahindrarao45653 жыл бұрын

    The function that you have written for Random Variable Replacement is super amazing..!! 1. Creating and copying the Age to Age_Random. 2. Put all random values to the random_sample 3. To have the same indexes, matching the indexes of Age.isNull() to random_sample 4. With Loc operation, where ever data is null in Age_random, replace with random_sample

  • @kartiksharma-wn3sd
    @kartiksharma-wn3sd2 жыл бұрын

    can any anyone explain that function part that how we are getting it not null values in it because i have understood that dropna will give non null value and sam ple is just picking 177 random samples , now how it is replacing null values. can any one help

  • @GAURAVSINGH-qy4cj
    @GAURAVSINGH-qy4cj2 жыл бұрын

    One more question, Age_end_distribution is not normally distributed and right skewed then how this distribution is good just because it doesn't has outliers.....!!

  • @GAURAVSINGH-qy4cj
    @GAURAVSINGH-qy4cj2 жыл бұрын

    Actually, sample function is taking random values from 0 to 117 (which is sum of NaN Values) for filling up NaN values which I don't think is needed as per the definition. It should select random observation for filling up NaN values. Can anyone clarify it...!!

  • @chandrashekhar-ss9hm
    @chandrashekhar-ss9hm3 жыл бұрын

    Do u also teach IoT classes? Pl share ur telegram channel

  • @tigerbhavesh6905
    @tigerbhavesh69053 жыл бұрын

    Sir i have one doubt in Variable_median and varible_at_random According too any data random_imputation is best as compared to median or not Otherwise depand upon the data

  • @mlvali1350

    @mlvali1350

    3 жыл бұрын

    depand upon the data

  • @sanjaysinghgariya2707
    @sanjaysinghgariya2707 Жыл бұрын

    At 32.00 use df[variable+"_random"] = df[variable].fillna(random_sample) Use this code instead of last line of function

  • @abhinandrasingh
    @abhinandrasingh3 жыл бұрын

    The impute_nan function could also be written in the following manner: def impute_nan(df, variable, median): df[variable+'_median'] = df[variable].fillna(median) df[variable+'_random'] = df[variable] # to fill the NULL values in data['Age'] we are going to fill the values randomly. random_sample = df[variable].dropna().sample(n=df[variable].isnull().sum(), random_state=0) # now these random_sample value needs to be sent to the dataframe to replace the NAN values in the #data frame random_sample.index = df[df[variable].isnull()].index #df.loc[df[variable].isnull(), variable+'_random'] = random_sample df[variable+'_random'] = df[variable].fillna(random_sample)

  • @JohnSmith-uu5gp

    @JohnSmith-uu5gp

    Жыл бұрын

    Also you can use it like - df[variable+'_random'] = df[variable+'_random'].fillna(random_sample)😀😊👍

  • @boogeyman9824
    @boogeyman98243 жыл бұрын

    Sir why have u used the same dataset and features for both missing completely at random and not missing completely at random

  • @ayaansk99
    @ayaansk992 жыл бұрын

    why we are using "variable" in function impute_nan? Can anyone ?

  • @vagheeshmk3156
    @vagheeshmk31566 ай бұрын

    #KingKrish

  • @anikasingh2464
    @anikasingh24643 жыл бұрын

    Why correlation is impacted?

  • @ArunKumar-sg6jf
    @ArunKumar-sg6jf3 жыл бұрын

    sir upload this video in the channel

  • @kunalyadav4776
    @kunalyadav47763 жыл бұрын

    why do the index mismatch in random_sample and df [df["Age"].isnull()].index

  • @gurmanbirsingh1209

    @gurmanbirsingh1209

    3 жыл бұрын

    random_sample consists of the values picked at random from df["Age"].dropna() which doesnt consist of index of Nan records. hence the mismatch.

  • @AKHILESHKUMAR-nk2rk
    @AKHILESHKUMAR-nk2rk3 жыл бұрын

    you are from which state?

  • @Raveendr1191
    @Raveendr11913 жыл бұрын

    Hey Krish, in[22] you try to replace all nan values with random sample video time[21:50] there you are saying 423 replaced with 28.00 but default value of 423 is 28.0 please check in the CSV file. actually it's not replacing the value

  • @srishtikumari6664

    @srishtikumari6664

    3 жыл бұрын

    That code is basically creating 177(df.Age.isnull().sum()) random samples from the not null values of Age column. You are right! it's not replacing null values with random value.

  • @sanjaysinghgariya2707

    @sanjaysinghgariya2707

    Жыл бұрын

    df[variable+"_random"] = df[variable].fillna(random_sample) Use this code instead of last line of function

  • @praloysarker7639
    @praloysarker76393 жыл бұрын

    in the End of Distribution imputation 3rd std was used...what will happen if we use 1st or 2nd std...will you please tell ???

  • @priyam66

    @priyam66

    Жыл бұрын

    you may not be able to get rid of outliers.

  • @purnimamsb352
    @purnimamsb3523 жыл бұрын

    why the extreme and median values are equal

  • @akashubale5199
    @akashubale51993 жыл бұрын

    Hey Krish, Please Rename this video as" Live-Feature Engineering-All Techniques To Handle Missing Values- Day 2 " , it will be convenient to find it.

  • @riteshmukhopadhyay6922
    @riteshmukhopadhyay6922 Жыл бұрын

    "Buddy I am teaching feature engineering" lol XD, Ramayan khatam fir pucha Ram kon hain XXXXD

  • @gurdeepsinghbhatia2875
    @gurdeepsinghbhatia28753 жыл бұрын

    Sir , one doubt , **********SIR DROPNA , will first drop all nan valued rows , then after ,sample will choose 177 values from that columns ,, so how it will fill na values with that , u told that it will fil that nan values with that sample values ,, i am talking this with respect to video at **23:05** , please clear my doubt sir *********** , thanku sir

  • @naveenvashistha9692

    @naveenvashistha9692

    3 жыл бұрын

    Yeah it's a mistake.... He misunderstood the code

  • @gurdeepsinghbhatia2875

    @gurdeepsinghbhatia2875

    3 жыл бұрын

    @@naveenvashistha9692 ya , thanks brother

  • @equbalmustafa
    @equbalmustafa3 жыл бұрын

    Pls provide Telegram link for paid members(799)...

  • @stuttzzzi
    @stuttzzzi2 жыл бұрын

    when i call the function,it says dataframe object has no ATTRIBUTE called AGE

  • @priyam66

    @priyam66

    Жыл бұрын

    it is 'Age' not 'AGE'

  • @kartiksharma-wn3sd
    @kartiksharma-wn3sd2 жыл бұрын

    what is extreme value here

  • @priyam66

    @priyam66

    Жыл бұрын

    Extreme values are regarded as outliers which you can see on a boxplot visually. these extreme values are outside the min and max values in the box plot

  • @dikshantgupta5539
    @dikshantgupta55393 жыл бұрын

    why so much ads in a educative video?

  • @aliensamv3997
    @aliensamv39972 жыл бұрын

    but age can't be 0.95

  • @are_amay
    @are_amay3 жыл бұрын

    Sir what are you doing now??

  • @hazimmir1019
    @hazimmir10193 жыл бұрын

    Teacher is amazing, but students are gadhay!!