Train Test Split vs K Fold vs Stratified K fold Cross Validation

In this video we will be discussing how to implement
1. K fold Cross Validation
2. Stratified K fold Cross Validation
3. Train Test Split
amazon url: www.amazon.in/Hands-Python-Fi...
Buy the Best book of Machine Learning, Deep Learning with python sklearn and tensorflow from below
amazon url:
www.amazon.in/Hands-Machine-L...
Connect with me here:
Twitter: / krishnaik06
Facebook: / krishnaik06
instagram: / krishnaik06
Subscribe my unboxing Channel
/ @krishnaikhindi
Below are the various playlist created on ML,Data Science and Deep Learning. Please subscribe and support the channel. Happy Learning!
Deep Learning Playlist: • Tutorial 1- Introducti...
Data Science Projects playlist: • Generative Adversarial...
NLP playlist: • Natural Language Proce...
Statistics Playlist: • Population vs Sample i...
Feature Engineering playlist: • Feature Engineering in...
Computer Vision playlist: • OpenCV Installation | ...
Data Science Interview Question playlist: • Complete Life Cycle of...
You can buy my book on Finance with Machine Learning and Deep Learning from the below url
amazon url: www.amazon.in/Hands-Python-Fi...
🙏🙏🙏🙏🙏🙏🙏🙏
YOU JUST NEED TO DO
3 THINGS to support my channel
LIKE
SHARE
&
SUBSCRIBE
TO MY KZread CHANNEL

Пікірлер: 57

@megalaramu3 жыл бұрын
Hi Kris, When we use cross_val_score and give the paramter to cv as int which tells about number of folds and if the model we are using is classification it chooses stratified by default right and not k-Fold type of cross validation. I found this in the sklearn library cv: int, cross-validation generator or an iterable, default=None Determines the cross-validation splitting strategy. Possible inputs for cv are: 1)None, to use the default 5-fold cross validation, 2)int, to specify the number of folds in a (Stratified)KFold, 3)CV splitter, 4)An iterable yielding (train, test) splits as arrays of indices. For int/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used. This is just a query. Please let me know if my understanding is wrong.
@dollylollapaloosa3 жыл бұрын
I've been wondering for this topic for a while, very happy to find your content!!!
@shantanupagare71414 жыл бұрын
Sir your content is great, thanks for uploading such important and informational videos. These videos are very helpful. Keep making these, more power to you.
@supersql84063 жыл бұрын
Thank you! Keep going with other vids tutorial!!
@ayushijmusic2 жыл бұрын
You are a saviour!
@mahnabraja10864 жыл бұрын
excellent work
@mgcgv4 жыл бұрын
just awesome!!!! But can it be possible that you also share GIT repository for its code
@maxwellpatten92273 жыл бұрын
great video, thank you!
@saltanatkhalyk33973 жыл бұрын
so clear explanation thanks
@kamenxxx90373 жыл бұрын
wow this is very enlightening!!!! thank you sir! one question tho. what if we need to have the confusion matrix? I am using Repeated Stratified K-Fold, and im curious on how to obtain a reasonable and easy to execute confusion matrix. Any suggestion on this?
@21Gannu4 жыл бұрын
So my understanding is with cross-validation we can look up what is the achievable score however it lacks interpretability likeability to make confusion matrix out of it.
@mashalnabh27474 жыл бұрын
Namaskar Krish Ji! Great video, well done. The question I have is regarding the imbalanced dataset and StratifiedKFold validation. Taking your example of Churn, lets say your churn rate is 1% which means in total 50k observations your churners are 500. Now, because your data is high imbalanced with very rare events, suppose you want to do some balancing (over , under or both) and then do the stratifiedkfold validation. How would stratifiedkfold validation work in this case? Will stratifiedkfold validation take test data (lets say 10%) without balancing and build model on balanced dataset (90%), hence we would know the validation is done on real data? Or even validation is done on balanced data? If later, we would need a separate test dataset to see how model fits on real unbalanced dataset, isn't it? I hope its clear. thanks Sachin
@pratikramteke32743 жыл бұрын
By selecting n_splits as 4, i got highest accyracy in the 4th ie the last fold.. any idea on how to extract the exact dataset fed to train test so that i can replicate the output of the 4th split???
@panosp57112 жыл бұрын
Hello very nice video, but ONE QUESTION, what is the train/ test ratio in every iteration when you use stratiified k cross validation. I mean somehow combining stratified k fold cross validation with train test split
@louerleseigneur45322 жыл бұрын
We are living in a wonderful universe
@ajayvishwakarma69434 жыл бұрын
thanks man
@philipokiokio30003 жыл бұрын
Thank you for teaching can I get a link to the notebook.
@ravindrachauhan40783 жыл бұрын
How to get confusion matrix and auc roc curve after k fold varification?
@sridhar63583 жыл бұрын
with stratified k-fold the only difference is that the classes of type Yes and No are also considered when choosing the test size and the rest of it is the same as k fold cross validation is that so
@Aaronisification3 жыл бұрын
I love your content, it is very helpful. You are a treasure. But this video would have been loads better if you slowly allowed students to copy over the code.
@abhi9029
2 жыл бұрын
exactly.
@ClickyKitsune3 жыл бұрын
Isn't stratified validation by default included in cross_val_score library..
@pavanim62583 жыл бұрын
Thanks for very clear explanation Krish..can u pls share github link also
@mohe4ever5143 жыл бұрын
Krish, In K fold validation, You fitted the classifier on diff sets of X train and Y train and got the different accuracies. This is fine to evaluate the model but you didn't mention on which data we need to train the model if we want to evaluate the performance using k fold. Are we going to train our classifier on full data i.e. X and Y? Final model which we want to use later on should be trained on full dataset?
@babayaga6264 жыл бұрын
Hello Sir, Thanks for the wonderful explanation. However I have a naive question to ask, How is RandomizedSearchCV and GridSearchCV different from K-Fold, Stratified K-Fold?
@krishnaik06
4 жыл бұрын
Randomizedsearchcv and gridsearch performs k fold cross validation while selecting the right hyperparameters
@fusionarun
4 жыл бұрын
@@krishnaik06 Hi Krish, thanks for the video. It was excellent/crisp as always. To continue on the same point, RandomsearchCV and GridsearchCV will help us to choose the right parameters for a model by various iterations. Only after choosing the right parameters, we step into K-fold or Stratified K=fold to get a sense of how accurate our model can perform when it is into production, right? And also what I understood is, I don't have to perform the actual train_test_split before the model creation? Please correct me if I'm wrong
@gebremedhnmehari84513 жыл бұрын
how about precision, recall and f-measure?
@skasifali22023 жыл бұрын
Hello Krish sir...What if need to perform customize prediction. I do need to perform classifier. predict(test). But in my code it shows me feature name missing. I'm using the Pima Diabetes dataset in kaggle
@baburamchaudhary159 Жыл бұрын
I would be best if you had provided link of the dataset for the practice and confirmation. Thanks.
@siddharthwaghela72344 жыл бұрын
Hello Sir, Stratified K fold works only for categorical and multiclass target variables. What if the target variable is continuous? Binning the target variable is the solution ? Thanks
@gollusingh007
2 жыл бұрын
Then go for kfold cv
@manusharma85274 жыл бұрын
sir in terms of skf you have miss the line nubmber 86. skf = StratifiedKFold()
@prateekkumar.13256 ай бұрын
sir,if i run stratified k fold code for different time, will the result vary? it shouldn't isn't it? But mine does, I don't know why? Also, if I make changes with the number of folds, my accuracy changes.
@saifulislamsanto61473 жыл бұрын
How can I find roc curve and confusion matrix from this project in all Train Test Split vs K Fold vs Stratified K fold Cross Validation please give us a video of this.
@user-ur1oj7vj2z3 жыл бұрын
where can i get the dataset set 🙏
@ravindrachauhan40783 жыл бұрын
How to get confusion matrix after cross validation
@pawankulkarni76343 жыл бұрын
Sir, I have question on this. Since we have imbalanced data set earlier and then we fix it by some feature engineering technique.so after fixing it we can use again k fold CV? or we have to stick with stratified CV only?
@ArjunSingh-gt2fv
3 жыл бұрын
I think we should stick to stratified CV because even after having a balanced dataset the shuffling of the data set for creating fold can create the problem of not having a good proportion of each class.
@tomstomsable3 жыл бұрын
higher bias does not necessarily mean a good accuracy, best one is low variance and low bias
@HarishS121374 жыл бұрын
How different is this from in setting the stratify parameter (stratify=y) while splitting the data using train_test_split?
@panosp5711
2 жыл бұрын
train test_split just gives you an array OF SAME SIZE as your initial dataset (with same class ratio), but the stratified k fold breaks the dataset in k folds (Subset) each containing len(dataset) / k elements and each having the same class ratio
@banjiaderibigbe14153 жыл бұрын
is this video notebook available in gitbut @krish
4 жыл бұрын
First!
@kushagrak49033 жыл бұрын
Hello sir can you share your Jupyter notebook, please.
@amitagarahari85014 жыл бұрын
Gud sir, I like your videos very much, sir I have a question that, in K fold validation, after score value getting how can i make confusion matrix sir....?
@jawaharunited4 жыл бұрын
Can you provide the link of source code...
@ishantguleria8703 жыл бұрын
cross value is coming what does it mean
@saifulislamsanto61473 жыл бұрын
Y.iloc[number] is not working. Error is AttributeError: 'numpy.ndarray' object has no attribute 'iloc'
@hasnain-khan3 жыл бұрын
If i have 1000 rows in dataset. Then how can select first 200 rows for testing and last 800 rows for training instead of select randomly in splitting?
@rajeevu3051
3 жыл бұрын
use Leave one out CV and give value as 5, then take first value from the scores you get.
@muhammadairlanggarahmadi73013 жыл бұрын
Github link for those of you who need github.com/krishnaik06/Hyperparameter-Optimization
@karan98377685553 жыл бұрын
Sir plz upload github link also
@abhi90292 жыл бұрын
This video could have been better.
@PadminiMansingh3 жыл бұрын
May I have ur mail-id sir

Train Test Split vs K Fold vs Stratified K fold Cross Validation

Пікірлер: 57

@abhi9029

2 жыл бұрын

@krishnaik06

4 жыл бұрын

@fusionarun

4 жыл бұрын

@gollusingh007

2 жыл бұрын

@ArjunSingh-gt2fv

3 жыл бұрын

@panosp5711

2 жыл бұрын

@rajeevu3051

3 жыл бұрын

Келесі