Train Test Split vs K Fold vs Stratified K fold Cross Validation
In this video we will be discussing how to implement
1. K fold Cross Validation
2. Stratified K fold Cross Validation
3. Train Test Split
amazon url: www.amazon.in/Hands-Python-Fi...
Buy the Best book of Machine Learning, Deep Learning with python sklearn and tensorflow from below
amazon url:
www.amazon.in/Hands-Machine-L...
Connect with me here:
Twitter: / krishnaik06
Facebook: / krishnaik06
instagram: / krishnaik06
Subscribe my unboxing Channel
/ @krishnaikhindi
Below are the various playlist created on ML,Data Science and Deep Learning. Please subscribe and support the channel. Happy Learning!
Deep Learning Playlist: • Tutorial 1- Introducti...
Data Science Projects playlist: • Generative Adversarial...
NLP playlist: • Natural Language Proce...
Statistics Playlist: • Population vs Sample i...
Feature Engineering playlist: • Feature Engineering in...
Computer Vision playlist: • OpenCV Installation | ...
Data Science Interview Question playlist: • Complete Life Cycle of...
You can buy my book on Finance with Machine Learning and Deep Learning from the below url
amazon url: www.amazon.in/Hands-Python-Fi...
🙏🙏🙏🙏🙏🙏🙏🙏
YOU JUST NEED TO DO
3 THINGS to support my channel
LIKE
SHARE
&
SUBSCRIBE
TO MY KZread CHANNEL
Пікірлер: 57
Hi Kris, When we use cross_val_score and give the paramter to cv as int which tells about number of folds and if the model we are using is classification it chooses stratified by default right and not k-Fold type of cross validation. I found this in the sklearn library cv: int, cross-validation generator or an iterable, default=None Determines the cross-validation splitting strategy. Possible inputs for cv are: 1)None, to use the default 5-fold cross validation, 2)int, to specify the number of folds in a (Stratified)KFold, 3)CV splitter, 4)An iterable yielding (train, test) splits as arrays of indices. For int/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used. This is just a query. Please let me know if my understanding is wrong.
I've been wondering for this topic for a while, very happy to find your content!!!
Sir your content is great, thanks for uploading such important and informational videos. These videos are very helpful. Keep making these, more power to you.
Thank you! Keep going with other vids tutorial!!
You are a saviour!
excellent work
just awesome!!!! But can it be possible that you also share GIT repository for its code
great video, thank you!
so clear explanation thanks
wow this is very enlightening!!!! thank you sir! one question tho. what if we need to have the confusion matrix? I am using Repeated Stratified K-Fold, and im curious on how to obtain a reasonable and easy to execute confusion matrix. Any suggestion on this?
So my understanding is with cross-validation we can look up what is the achievable score however it lacks interpretability likeability to make confusion matrix out of it.
Namaskar Krish Ji! Great video, well done. The question I have is regarding the imbalanced dataset and StratifiedKFold validation. Taking your example of Churn, lets say your churn rate is 1% which means in total 50k observations your churners are 500. Now, because your data is high imbalanced with very rare events, suppose you want to do some balancing (over , under or both) and then do the stratifiedkfold validation. How would stratifiedkfold validation work in this case? Will stratifiedkfold validation take test data (lets say 10%) without balancing and build model on balanced dataset (90%), hence we would know the validation is done on real data? Or even validation is done on balanced data? If later, we would need a separate test dataset to see how model fits on real unbalanced dataset, isn't it? I hope its clear. thanks Sachin
By selecting n_splits as 4, i got highest accyracy in the 4th ie the last fold.. any idea on how to extract the exact dataset fed to train test so that i can replicate the output of the 4th split???
Hello very nice video, but ONE QUESTION, what is the train/ test ratio in every iteration when you use stratiified k cross validation. I mean somehow combining stratified k fold cross validation with train test split
We are living in a wonderful universe
thanks man
Thank you for teaching can I get a link to the notebook.
How to get confusion matrix and auc roc curve after k fold varification?
with stratified k-fold the only difference is that the classes of type Yes and No are also considered when choosing the test size and the rest of it is the same as k fold cross validation is that so
I love your content, it is very helpful. You are a treasure. But this video would have been loads better if you slowly allowed students to copy over the code.
@abhi9029
2 жыл бұрын
exactly.
Isn't stratified validation by default included in cross_val_score library..
Thanks for very clear explanation Krish..can u pls share github link also
Krish, In K fold validation, You fitted the classifier on diff sets of X train and Y train and got the different accuracies. This is fine to evaluate the model but you didn't mention on which data we need to train the model if we want to evaluate the performance using k fold. Are we going to train our classifier on full data i.e. X and Y? Final model which we want to use later on should be trained on full dataset?
Hello Sir, Thanks for the wonderful explanation. However I have a naive question to ask, How is RandomizedSearchCV and GridSearchCV different from K-Fold, Stratified K-Fold?
@krishnaik06
4 жыл бұрын
Randomizedsearchcv and gridsearch performs k fold cross validation while selecting the right hyperparameters
@fusionarun
4 жыл бұрын
@@krishnaik06 Hi Krish, thanks for the video. It was excellent/crisp as always. To continue on the same point, RandomsearchCV and GridsearchCV will help us to choose the right parameters for a model by various iterations. Only after choosing the right parameters, we step into K-fold or Stratified K=fold to get a sense of how accurate our model can perform when it is into production, right? And also what I understood is, I don't have to perform the actual train_test_split before the model creation? Please correct me if I'm wrong
how about precision, recall and f-measure?
Hello Krish sir...What if need to perform customize prediction. I do need to perform classifier. predict(test). But in my code it shows me feature name missing. I'm using the Pima Diabetes dataset in kaggle
I would be best if you had provided link of the dataset for the practice and confirmation. Thanks.
Hello Sir, Stratified K fold works only for categorical and multiclass target variables. What if the target variable is continuous? Binning the target variable is the solution ? Thanks
@gollusingh007
2 жыл бұрын
Then go for kfold cv
sir in terms of skf you have miss the line nubmber 86. skf = StratifiedKFold()
sir,if i run stratified k fold code for different time, will the result vary? it shouldn't isn't it? But mine does, I don't know why? Also, if I make changes with the number of folds, my accuracy changes.
How can I find roc curve and confusion matrix from this project in all Train Test Split vs K Fold vs Stratified K fold Cross Validation please give us a video of this.
where can i get the dataset set 🙏
How to get confusion matrix after cross validation
Sir, I have question on this. Since we have imbalanced data set earlier and then we fix it by some feature engineering technique.so after fixing it we can use again k fold CV? or we have to stick with stratified CV only?
@ArjunSingh-gt2fv
3 жыл бұрын
I think we should stick to stratified CV because even after having a balanced dataset the shuffling of the data set for creating fold can create the problem of not having a good proportion of each class.
higher bias does not necessarily mean a good accuracy, best one is low variance and low bias
How different is this from in setting the stratify parameter (stratify=y) while splitting the data using train_test_split?
@panosp5711
2 жыл бұрын
train test_split just gives you an array OF SAME SIZE as your initial dataset (with same class ratio), but the stratified k fold breaks the dataset in k folds (Subset) each containing len(dataset) / k elements and each having the same class ratio
is this video notebook available in gitbut @krish
First!
Hello sir can you share your Jupyter notebook, please.
Gud sir, I like your videos very much, sir I have a question that, in K fold validation, after score value getting how can i make confusion matrix sir....?
Can you provide the link of source code...
cross value is coming what does it mean
Y.iloc[number] is not working. Error is AttributeError: 'numpy.ndarray' object has no attribute 'iloc'
If i have 1000 rows in dataset. Then how can select first 200 rows for testing and last 800 rows for training instead of select randomly in splitting?
@rajeevu3051
3 жыл бұрын
use Leave one out CV and give value as 5, then take first value from the scores you get.
Github link for those of you who need github.com/krishnaik06/Hyperparameter-Optimization
Sir plz upload github link also
This video could have been better.
May I have ur mail-id sir