299 - Evaluating sklearn model using KFold cross validation in python

Ғылым және технология

Code generated in the video can be downloaded from here:
github.com/bnsreenu/python_fo...
Let us start by understanding the Binary classification using keras . This is the normal way most of us approach the problem of binary classification
using sklearn (SVM). In this example, we will split our data set the normal way into train and test groups.
We will then learn to divide data using K Fold splits.
We will iterate through each split to train and evaluate our model.
We will finally use the cross_val_score() function to perform the evaluation.
It takes the dataset and cross-validation configuration and returns a list of
scores calculated for each fold.
KFOLD is a model validation technique.
Cross-validation between multiple folds allows us to evaluate the model performance.
KFold library in sklearn provides train/test indices to split data in train/test sets. Splits dataset into k consecutive folds (without shuffling by default).
Each fold is then used once as a validation while the k - 1 remaining folds
form the training set.
Split method witin KFold generates indices to split data into training and test set. The split will divide the data into n_samples/n_splits groups.
One group is used for testing and the remaining data used for training.
All combinations of n_splits-1 will be used for cross validation.
Wisconsin breast cancer example
Dataset link: www.kaggle.com/datasets/uciml...

Пікірлер: 20

@Master_of_Chess_Shorts Жыл бұрын
You are one of the best data science teacher out there. Thanks for your good work and approach. You explain very well on a wide range of topics.
@newcooldiscoveries5711 Жыл бұрын
Been enjoying this KFold series. Looking forward to the next one. Thanks.
@caiyu538 Жыл бұрын
I used this module a lot during my work. thank for these great free libraries, it make data scientists easier. Most of work is to glue the data to these libraries.
@DmitriiTarakanov Жыл бұрын
Dear Sreeni, thank you so much for your work! Have a good one!
@joebi-den4761 Жыл бұрын
hi, thanks for doing everything and providing it for free. I’m final year EE engineer, not doing great academically. but I hope the future I could be better
@hannukoistinen5329
8 ай бұрын
Hi!! Don't be ashamed!! You are on a very demanding curriculum probably. My acvice: learn R!! You can do everything as Python can and much more!! And you don't have to punch some code, which you don't necessarily even need!! Python is just "fashion!. You can do all the research, all math, all visualization with R. Success and God bless you with your studies!!!
@joebi-den4761
7 ай бұрын
@@hannukoistinen5329 duly noted. very practical advice thanks. so i should be using Rstudio correct? or you have more to say/give
@maheshmaskey4592Ай бұрын
Good post. By the way, how do we select the best model after cross-validation? I am more interested in regression than classification. Have you tried using a multivariate polynomial regression model so that we could establish an empirical relation?
@malithabasuri4491 Жыл бұрын
Hi, great video series. Can you start a video series about medical image processing and ML like 3D MRI processing, stopping leaky validations and etc. It would be really useful because there aren't many resources.
@guiomoff2438 Жыл бұрын
Before doing a crossvalidation, shoudn't you use a dimentionnality reduction technique to determine if all features are necessary for your training? Thanks by advance if you take the time to answer me!
@Gingeey23 Жыл бұрын
Great video. Just to clarify, is the purpose of cross-validation to tune the hyperparameters of models on a variety of different train_test splits to avoid overfitting? Cheers!
@DigitalSreeni
Жыл бұрын
Yes, the main purpose of cross-validation is to estimate the performance of a model on an independent dataset and to tune the hyperparameters of the model to avoid overfitting.
@11111653 Жыл бұрын
how to print roc curve for overall cross validation? i have been trying to print roc curve but it shows me error apparently because i got different counts of tprs/fprs on each fold that prevents the code from showing
@ajay0909 Жыл бұрын
Hi sir, i have been trying to implement video classification using CNN. All the content or tutorials out there are quite hard to implement or maybe I got used to your detailed explanation. Please do a tutorial on how to load video data. Thanks for all the high quality content.
@Athens1992 Жыл бұрын
nice video, one silly question u are using in a pipeline minmaxScaler how does know the cross_val_score to apply minmax_score on X_array? I know it's silly question about I have the question because u don't transform your pipeline to X_array
@maryamshehu8842 Жыл бұрын
Hi Thanks for the video.Code Generated is not in the github file you shared
@marcinmaleszewski2023 Жыл бұрын
Thanks!
@DigitalSreeni
Жыл бұрын
Thank you very much.
@DineshSereno10 ай бұрын
Thanks!
@DigitalSreeni
10 ай бұрын
Welcome! Thank you.