Logistic Regression for Classification | Working with a real-world dataset from Kaggle

💻 For real-time updates on events, connections & resources, join our community on WhatsApp: jvn.io/wTBMmV0
In this lesson we will learn about using Logistic Regression for Classification. Logistic Regression is a commonly used technique for solving binary classification problems. You can experiment with the notebook used in the above video here 👉 jovian.ai/aakashns/python-skl...
🔗 Check out this playlist for the complete lecture series on Gradient Boosting Machines: • Machine Learning with ...
🎯 Topics Covered
• Downloading a real-world dataset from Kaggle
• Splitting a dataset into training, validation & test sets
• Imputing and scaling numeric features
• Encoding categorical columns as one-hot vectors
• Training a logistic regression model using Scikit-learn
• Evaluating a model using a validation set and test set
❓ Ask Questions here: jovian.ai/forum/t/lesson-2-lo...
classification/17915
⌚ Time Stamps:
00:00 Introduction
05:16 Problem Statement
25:43 Downloading a real-world dataset from Kaggle
35:35 Exploring data analysis and visualization
47:06 Splitting a dataset into training, validation & test sets
01:03:04 Filling/Imputing missing values in numeric columns
01:21:55 Scaling numeric features to a(0,1) range
01:28:10 Encoding categorical columns as one-hot vectors
01:39:02 Training a logistic regression model using Scikit-learn
01:53:41 Evaluating a model using a validation set and test set
02:19:38 Saving a model to disk and loading it back
02:36:28 Summary and Conclusion
⚡ Free Certification Course
"Machine Learning with Python: Zero to GBMs(Gradient Boosting Machine)" is a practical and beginner-friendly introduction to supervised machine learning, decision trees, and gradient boosting using Python. You will solve 3 coding assignments & build a course project where you'll train ML models using a large real-world dataset. Enroll now: zerotogbms.com
🔗 Visit the logistic regression lecture page here: jovian.ai/learn/machine-learn...
🎤 About the speaker
Aakash N S is the co-founder and CEO of Jovian - a community learning platform for data science & ML. Previously, Aakash has worked as a software engineer (APIs & Data Platforms) at Twitter in Ireland & San Francisco and graduated from the Indian Institute of Technology, Bombay. He’s also an avid blogger, open-source contributor, and online educator.
#GBM #MachineLearning #Python #Certification #Course #Jovian
-
Learn Data Science the right way at www.jovian.ai
Interact with a global community of like-minded learners jovian.ai/forum/
Get the latest news and updates on Machine Learning at / jovianml
Connect with us professionally on / jovianml
Follow us on Instagram at / jovian.ml
Subscribe for new videos on Artificial Intelligence / jovianml

Пікірлер: 73

  • @anuragthakur5787
    @anuragthakur57873 жыл бұрын

    That was intense!!! This is probably the first time I have watched a tutorial this long without any break You are Awesome sir

  • @SillyLittleMe
    @SillyLittleMe Жыл бұрын

    This video is still one of the best. A literal game changer!

  • @kizzavincent
    @kizzavincent3 жыл бұрын

    Thanks a lot Aakash for the fabulous explanations and infectious passion to empower others. These tutorials are simply unmatched! Bravo!

  • @jovianhq

    @jovianhq

    3 жыл бұрын

    Thanks for the feedback, help us spread the word :)

  • @bongogappo38

    @bongogappo38

    Жыл бұрын

    @@jovianhq sir what can we do if there is a column of string type values like disease name and symptoms

  • @TheAnugupta
    @TheAnugupta3 жыл бұрын

    Nicely explained Akash and Jovian Team..this was probably the most thorough and clearly explained tutorial I came across

  • @parastooaghr
    @parastooaghr3 жыл бұрын

    Great video! I learned a lot! Thank you!

  • @hemangdhanani9434
    @hemangdhanani94343 жыл бұрын

    great explanation with reasonable depth for this topic, such a great video...

  • @jovianhq

    @jovianhq

    3 жыл бұрын

    Thanks for the feedback, help us spread the word :)

  • @rlm3574
    @rlm35743 жыл бұрын

    Really, a lecture full of knowledge

  • @jovianhq

    @jovianhq

    3 жыл бұрын

    Thanks for the feedback, help us spread the word :)

  • @ektakumari4496
    @ektakumari44962 жыл бұрын

    Great content Aakash sir , that too free...really amazed and impressed by jovian !

  • @jovianhq

    @jovianhq

    2 жыл бұрын

    Glad you liked it!

  • @sahilmalhotra7295
    @sahilmalhotra7295 Жыл бұрын

    Thank you, this was very beginner friendly and it helped me understand a lot of practical topics.

  • @jovianhq

    @jovianhq

    Жыл бұрын

    You're very welcome! Glad it was helpful.

  • @mdalamgirhossain6192
    @mdalamgirhossain61922 жыл бұрын

    Salute Boss. This is wholesome 💝💝

  • @tapomayeebasu3047
    @tapomayeebasu30472 жыл бұрын

    Thank you for such a detailed lecture. Very very helpful. Would love to know about more.

  • @jovianhq

    @jovianhq

    2 жыл бұрын

    Glad it was helpful! Go to zerotogbms.com for more lectures on Machine Learning

  • @danielm5729
    @danielm57292 жыл бұрын

    Thank you very much.🙏

  • @anuphp3432
    @anuphp3432 Жыл бұрын

    excellent brother!

  • @foodforthought8415
    @foodforthought84153 жыл бұрын

    Very good tutorial.elaborate and detailed .thanks

  • @jovianhq

    @jovianhq

    3 жыл бұрын

    Thanks for the feedback, help us spread the word :)

  • @NehaSingh-fb8kj
    @NehaSingh-fb8kj4 ай бұрын

    Great content

  • @harshvardhansalve8537
    @harshvardhansalve85374 ай бұрын

    Nice lecture

  • @dataninjaa
    @dataninjaa11 ай бұрын

    Thanks a Lot Bro its nice dataset and you covered very nice from start to end

  • @UsmanKhan-tc4sk
    @UsmanKhan-tc4sk Жыл бұрын

    I was working on a mini data science project in which test.csv and train.csv datasets given to me. I trained my model using training data. Now if i want to find accuracy score of my model on testing data what i will do? If i write model.predict(test_data) then how i will compare the predicted tesing values to the true values? Because there is no target values in the testing dataset

  • @gurjeet333
    @gurjeet3333 жыл бұрын

    Nice Video....Really appreciated. Can we also include the topic of setting up data pre processing pipelines in future sessions.

  • @jovianhq

    @jovianhq

    3 жыл бұрын

    Thanks for the feedback and suggestion!

  • @mayankraj4763
    @mayankraj47633 ай бұрын

    Hello. I have a question. Should we scale the features after the imputation or before because here you imputed the raw_df dataframe which is not imputed? Thanks

  • @thakurprathiksinghrajput7135
    @thakurprathiksinghrajput71355 ай бұрын

    1:45:00 whilst you fitted the transformed cols in to your model, I am still getting a type error

  • @anuphp3432
    @anuphp3432 Жыл бұрын

    hey, also isn't it a common practice to scale the test data that is transform the test data or validation data by fitting it only on training datasets?

  • @sarimahsan6341
    @sarimahsan63415 ай бұрын

    At 1:35:35 ,encoder transform, i am getting an error that columns must be the same as length key.please tell me how to reolve it

  • @sandipansarkar9211
    @sandipansarkar92112 жыл бұрын

    finished watching

  • @sharkk2979
    @sharkk29792 жыл бұрын

    thanks u so good! thanks again

  • @jovianhq

    @jovianhq

    2 жыл бұрын

    You're welcome!

  • @siddharthsahu5048
    @siddharthsahu50482 жыл бұрын

    (1:53:40) when you plot the weights the negative weight would not be considered. And the negative weights also affect the model just in opposite direction. What are your thoughts should the negative weights be considered??

  • @jovianhq

    @jovianhq

    2 жыл бұрын

    Yes, the negative weights should be considered. In fact, you can try and ignore the columns which has very less weights i.e. whose weights are closer to 0. Both negative and positive weights effect the model in some way.

  • @asifsaad5827
    @asifsaad58273 жыл бұрын

    would you mind switching to dark mode? TIA

  • @georgevavolil7005
    @georgevavolil70052 жыл бұрын

    I have a doubt. When we do imputation, we take mean to replace the missing values. We take the mean from each columns of the entire data. The mean of data in each columns of the entire data should be different from means taken from train_df, val_df and test_df separately. It will create some discrepancy in the final result. What's your position on this ? Whether we should conduct imputation based on the entire dataframe or from its subsets

  • @jovianhq

    @jovianhq

    2 жыл бұрын

    A sample of the data should represent the entire dataset. Also, the validation, and training set should be independent of the training set. So imputation can be done differently in validation set and training set.

  • @datahistory2411
    @datahistory2411 Жыл бұрын

    thnks sir...but how to deploy on the website?

  • @shantanusingh2198
    @shantanusingh21982 жыл бұрын

    Thank you🙂

  • @jovianhq

    @jovianhq

    2 жыл бұрын

    Welcome!

  • @krupamehta8705
    @krupamehta8705 Жыл бұрын

    1:26:54 can't understand why is max value in some columns not 1, it should be 1....

  • @rlm3574
    @rlm35743 жыл бұрын

    3 hrs worth watching

  • @jovianhq

    @jovianhq

    3 жыл бұрын

    Thanks for watching!

  • @sandipansarkar9211
    @sandipansarkar92112 жыл бұрын

    FINISHED CODING FULL

  • @lion87563
    @lion875632 жыл бұрын

    So higher the weight more important column is (but only if numerical columns are scaled)? If data is not scaled we cannot derive this conclusion?

  • @jovianhq

    @jovianhq

    2 жыл бұрын

    True! Also, not just higher, the more negative the weight the more important it has i.e. The weight that are closest to 0 have minimum importance

  • @user-ds2vu7uy2q
    @user-ds2vu7uy2q Жыл бұрын

    amazing

  • @jovianhq

    @jovianhq

    Жыл бұрын

    THANKYOU!

  • @truptpatel2597
    @truptpatel25972 жыл бұрын

    Information Leakage timestamp: 1:25:10 , He fitted the scaler on the whole numerical dataset and transform it to train, validation and test sets. But isn't it the Information leakage because the scaler knew the test or validation while fitting?

  • @jovianhq

    @jovianhq

    2 жыл бұрын

    Well, if you have access to the validation dataset, you can do scaling on the training and validation both. Generally, you won't be able to touch the test dataset so we shouldn't fit scaler/encoder on the test dataset.

  • @neurax6688
    @neurax6688 Жыл бұрын

    waoo

  • @arjunbhandari5554
    @arjunbhandari55544 ай бұрын

    1:39:11

  • @pallapothubhargavramfromib2244
    @pallapothubhargavramfromib22443 жыл бұрын

    Sir do you continue this videos

  • @shreeyansjavangula8908

    @shreeyansjavangula8908

    3 жыл бұрын

    Yeah, this is a course on ml. The new videos structure is provided on his website. jovian.ai/learn/machine-learning-with-python-zero-to-gbms

  • @fet_hsc2300
    @fet_hsc23003 жыл бұрын

    0:58:00

  • @fet_hsc2300
    @fet_hsc23003 жыл бұрын

    1:06:09

  • @fet_hsc2300
    @fet_hsc23003 жыл бұрын

    1:00:55

  • @rabbitazteca23
    @rabbitazteca232 жыл бұрын

    Hi I noticed that in 1:53:44 you are making a prediction using the train inputs (X_train).... but shouldn't' t you be making a prediction using the validation inputs instead? I don't think you have passed the X_val into any of the logistic regression model prediction.... or am I just confuse ? HAAHHA.

  • @jovianhq

    @jovianhq

    2 жыл бұрын

    Please check kzread.info/dash/bejne/pZ593Mh8ZKS1eZM.html, at first we're predicting with the train set, later we are also predicting with the validation and test sets.

  • @rabbitazteca23

    @rabbitazteca23

    2 жыл бұрын

    @@jovianhq I am sorry ahahah. you are right. I must have missed this part.

  • @kmishy
    @kmishy Жыл бұрын

    bookmark 1:03:15 .. for me imp part start here

  • @debojitmandal8670
    @debojitmandal86703 жыл бұрын

    What's a solver

  • @jovianhq

    @jovianhq

    3 жыл бұрын

    Hey, please go through the blog to know more about solvers. -> towardsdatascience.com/dont-sweat-the-solver-stuff-aea7cddc3451

  • @adityabenere6004
    @adityabenere60042 жыл бұрын

    1 ;56;49 nicee

  • @rubayetalam8759
    @rubayetalam87593 жыл бұрын

    Please add subtitles.

  • @jovianhq

    @jovianhq

    3 жыл бұрын

    Hey we are in the process of adding subtitles to videos, it will be added soon. Thanks!

  • @rubayetalam8759

    @rubayetalam8759

    3 жыл бұрын

    @@jovianhq thanks! you are doing great!

  • @imaksinsights7202
    @imaksinsights72022 жыл бұрын

    Here is another simplified Logistic Regression tutorial if you are a beginner: kzread.info/dash/bejne/ppeetJqDibbIaag.html

  • @yskasells4014
    @yskasells40142 жыл бұрын

    1:18:01

  • @fet_hsc2300
    @fet_hsc23003 жыл бұрын

    1:08:30

Келесі