Live-Feature Engineering-All Techniques To Handle Missing Values- Day 2
Telegram link: t.me/joinchat/N77M7xRvYUd403D...
github link: github.com/krishnaik06/Featur...
Join the Ineuron Affordable course
ineuron1.viewpage.co/Deep-lea...
Please donate if you want to support the channel
Gpay: krishnaik06@okicici
Please join as a member in my channel to get additional benefits like materials in Data Science, live streaming for Members and many more
/ @krishnaik06
Please do subscribe my other channel too
/ @krishnaikhindi
Connect with me here:
Twitter: / krishnaik06
Facebook: / krishnaik06
instagram: / krishnaik06
Пікірлер: 86
You are a very patient teacher (especially with soo many weird live comments whilst you do your sessions)!! God Bless Man! Thanks for your videos!!! 🙏🏼🙏🏼🙏🏼💖💖💖
sir, the person who helped to this many people your just like amazing and God.i am great thankful to sir
Amazing playlist and superb explanation Krish. Hats off.
Amazing as usual! You are doing a great work.
Your are Great !! helping the Data Science community
Great job! Keep up with the great work!
Function you wrote for random value imputation is very intuitive. 👌
Your teaching is awesome krish
Owesome content on feature engineering
df.sample(df['Age'].isna().sum(),random_state =0 ) : Correction Sample is just taking the 177 as an input for generating 177 random values from the non null values in the feature column. Random state = 0 is just used for reproducibility of the sample. So that we can get the same sample everytime.
sir seeing your live session and learning from Bangladesh
while creating function in the last line you also changed the nan values of orginal Age column with the random sample values , thats why there is no difference in ditribution plot
U r very hard working dude
Thanks a ton!!
Thank you sir 🌟
Very good session
Thanks Sir!!
For those who are finding it difficult to write the function understand the logic from the video and use the code import feature_engine.missing_data_imputers as mdi titanic = mdi.RandomSampleImputer(variables=["Age"],random_state=0).fit_transform(titanic)
Hey Krish! following your videos from past few days and its going good. just one query, In this video at 20-22 minutes, you are replacing Age NAN with Random Variable to df['Age'] and df['Age_Random'] both. So definitely it will show perfect plot. Is this query correct??
thank you krish
Thank you sir
krish i have a doubt....what is use of capturing null values with new feature..because what can we do with creating new feature in this method and our aim is to fill null values with something correct??...but in this case we are doing mean values to fill null values in age column...so it is same as mean median imputation right??..so what is the diffrenece
Sir, I have seen all ur videos related to data science. I m from non CS background and want to learn data science. can u please provide some information or platform about practical projects to enhance our skills. Thankyou
Afternoon sir , your session on the feature engineering techniques is ok , now is it possible to use 2 or more different techniques in the same dataset without an impact on the target variable?
If we uses end of distribution to fill the NAN, then outlier will be affecting the mean values. Do we need to take care of Outlier before or with outliers it will work
finished watching
Thanks
How does adding a feature from NAN give a meaning to the datasets? Please elaborate more on this !
what if our dataset contains large amount of missing values in virtually all the columns in our dataset. What method is the best to fill up the large number of missing values in all (say 30 columns) of my dataset? Again, just like we do for train dataset, can we do for test data missing values as well or we can concatenate both train and test datasets and later implement this imputation?
@chiragagrawal7104
3 жыл бұрын
Don't concatenate train and test for missing imputation, this will lead to data leakage.
What a legend
there are another imputation is also use full for handling missing values. hot and cold deck imputation fancy imputation em algorithm
Bhai, can we check correlation with Y variable after each technique and can go with high correlation score technique
as in last part End of Distribution imputation are we deleting the nan values , because we are trying to make them in out from the 3rd standard deviation it means we indirectly skipping the nan values by iqr method is this true ?
End of distribution will remove outlier but it will change distribution of feature so is it advisable to use ?
Sir is there any difference in data science and engineering
Sir you mentioned in last the last video that age is not MCAR in this dataset then why we are using that in methods where condition is of MCAR.are you doing so just for explaining or am I wrong in understanding this thing?
Sir, can you make the video on ctc_loss implementation in keras? Please...
But sir, how capturing nan values with new feature will get to know which is the new feature to work on?
Here as na and null values are same we know that both signify missing values so if we have already dropped na values then how are we substituting randomly collected values from our dataset hear in 21:31 ?
sir please make videoseries on flask development
sir, idid not get any notification of live .plz solve
At 33:22, I have the doubt, Age_median column, the missing value imputed by median, then its value should be the median.. but here it is same as random_sample.. Is that correct..
Sir, Normally we 'fillna()' to replace NAN, it's not working if used for sample case, like below df['Age'].fillna(df['Age'].dropna().sample()) If i fill with any other value it is working like df['Age'].fillna(0)
@asawanted
3 жыл бұрын
Even I had the same query. But I tried to debug and the reason it doesn't work is sample() returns series of sample values and their corresponding indices in the original dataframe. To merge, you have to have same indices. In your case, the fillna(0) is same as mean/median imputation only your are using zero as the value. Check Corey Schafer's cleaning data in pandas video. He has explained well.
What happens, if we don't replace NaN values and directly create model with NaN values?
The function that you have written for Random Variable Replacement is super amazing..!! 1. Creating and copying the Age to Age_Random. 2. Put all random values to the random_sample 3. To have the same indexes, matching the indexes of Age.isNull() to random_sample 4. With Loc operation, where ever data is null in Age_random, replace with random_sample
can any anyone explain that function part that how we are getting it not null values in it because i have understood that dropna will give non null value and sam ple is just picking 177 random samples , now how it is replacing null values. can any one help
One more question, Age_end_distribution is not normally distributed and right skewed then how this distribution is good just because it doesn't has outliers.....!!
Actually, sample function is taking random values from 0 to 117 (which is sum of NaN Values) for filling up NaN values which I don't think is needed as per the definition. It should select random observation for filling up NaN values. Can anyone clarify it...!!
Do u also teach IoT classes? Pl share ur telegram channel
Sir i have one doubt in Variable_median and varible_at_random According too any data random_imputation is best as compared to median or not Otherwise depand upon the data
@mlvali1350
3 жыл бұрын
depand upon the data
At 32.00 use df[variable+"_random"] = df[variable].fillna(random_sample) Use this code instead of last line of function
The impute_nan function could also be written in the following manner: def impute_nan(df, variable, median): df[variable+'_median'] = df[variable].fillna(median) df[variable+'_random'] = df[variable] # to fill the NULL values in data['Age'] we are going to fill the values randomly. random_sample = df[variable].dropna().sample(n=df[variable].isnull().sum(), random_state=0) # now these random_sample value needs to be sent to the dataframe to replace the NAN values in the #data frame random_sample.index = df[df[variable].isnull()].index #df.loc[df[variable].isnull(), variable+'_random'] = random_sample df[variable+'_random'] = df[variable].fillna(random_sample)
@JohnSmith-uu5gp
Жыл бұрын
Also you can use it like - df[variable+'_random'] = df[variable+'_random'].fillna(random_sample)😀😊👍
Sir why have u used the same dataset and features for both missing completely at random and not missing completely at random
why we are using "variable" in function impute_nan? Can anyone ?
#KingKrish
Why correlation is impacted?
sir upload this video in the channel
why do the index mismatch in random_sample and df [df["Age"].isnull()].index
@gurmanbirsingh1209
3 жыл бұрын
random_sample consists of the values picked at random from df["Age"].dropna() which doesnt consist of index of Nan records. hence the mismatch.
you are from which state?
Hey Krish, in[22] you try to replace all nan values with random sample video time[21:50] there you are saying 423 replaced with 28.00 but default value of 423 is 28.0 please check in the CSV file. actually it's not replacing the value
@srishtikumari6664
3 жыл бұрын
That code is basically creating 177(df.Age.isnull().sum()) random samples from the not null values of Age column. You are right! it's not replacing null values with random value.
@sanjaysinghgariya2707
Жыл бұрын
df[variable+"_random"] = df[variable].fillna(random_sample) Use this code instead of last line of function
in the End of Distribution imputation 3rd std was used...what will happen if we use 1st or 2nd std...will you please tell ???
@priyam66
Жыл бұрын
you may not be able to get rid of outliers.
why the extreme and median values are equal
Hey Krish, Please Rename this video as" Live-Feature Engineering-All Techniques To Handle Missing Values- Day 2 " , it will be convenient to find it.
"Buddy I am teaching feature engineering" lol XD, Ramayan khatam fir pucha Ram kon hain XXXXD
Sir , one doubt , **********SIR DROPNA , will first drop all nan valued rows , then after ,sample will choose 177 values from that columns ,, so how it will fill na values with that , u told that it will fil that nan values with that sample values ,, i am talking this with respect to video at **23:05** , please clear my doubt sir *********** , thanku sir
@naveenvashistha9692
3 жыл бұрын
Yeah it's a mistake.... He misunderstood the code
@gurdeepsinghbhatia2875
3 жыл бұрын
@@naveenvashistha9692 ya , thanks brother
Pls provide Telegram link for paid members(799)...
when i call the function,it says dataframe object has no ATTRIBUTE called AGE
@priyam66
Жыл бұрын
it is 'Age' not 'AGE'
what is extreme value here
@priyam66
Жыл бұрын
Extreme values are regarded as outliers which you can see on a boxplot visually. these extreme values are outside the min and max values in the box plot
why so much ads in a educative video?
but age can't be 0.95
Sir what are you doing now??
Teacher is amazing, but students are gadhay!!