Machine Learning in Python: Building a Classification Model

Ғылым және технология

In this video, I will show you how to build a simple machine learning model in Python. Particularly, we will be using the scikit-learn package in Python to build a simple classification model (for classifying Iris flowers) using the random forest algorithm.
🌟 Buy me a coffee: www.buymeacoffee.com/dataprof...
📎CODE: github.com/dataprofessor/code...
⭕ Playlist:
Check out our other videos in the following playlists.
✅ Data Science 101: bit.ly/dataprofessor-ds101
✅ Data Science KZreadr Podcast: bit.ly/datascience-youtuber-p...
✅ Data Science Virtual Internship: bit.ly/dataprofessor-internship
✅ Bioinformatics: bit.ly/dataprofessor-bioinform...
✅ Data Science Toolbox: bit.ly/dataprofessor-datascie...
✅ Streamlit (Web App in Python): bit.ly/dataprofessor-streamlit
✅ Shiny (Web App in R): bit.ly/dataprofessor-shiny
✅ Google Colab Tips and Tricks: bit.ly/dataprofessor-google-c...
✅ Pandas Tips and Tricks: bit.ly/dataprofessor-pandas
✅ Python Data Science Project: bit.ly/dataprofessor-python-ds
✅ R Data Science Project: bit.ly/dataprofessor-r-ds
⭕ Subscribe:
If you're new here, it would mean the world to me if you would consider subscribing to this channel.
✅ Subscribe: kzread.info...
⭕ Recommended Tools:
Kite is a FREE AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite and I love it!
✅ Check out Kite: www.kite.com/get-kite/?...
⭕ Recommended Books:
✅ Hands-On Machine Learning with Scikit-Learn : amzn.to/3hTKuTt
✅ Data Science from Scratch : amzn.to/3fO0JiZ
✅ Python Data Science Handbook : amzn.to/37Tvf8n
✅ R for Data Science : amzn.to/2YCPcgW
✅ Artificial Intelligence: The Insights You Need from Harvard Business Review: amzn.to/33jTdcv
✅ AI Superpowers: China, Silicon Valley, and the New World Order: amzn.to/3nghGrd
⭕ Stock photos, graphics and videos used on this channel:
✅ 1.envato.market/c/2346717/628...
⭕ Follow us:
✅ Medium: bit.ly/chanin-medium
✅ FaceBook: / dataprofessor
✅ Website: dataprofessor.org/ (Under construction)
✅ Twitter: / thedataprof
✅ Instagram: / data.professor
✅ LinkedIn: / chanin-nantasenamat
✅ GitHub 1: github.com/dataprofessor/
✅ GitHub 2: github.com/chaninlab/
⭕ Disclaimer:
Recommended books and tools are affiliate links that gives me a portion of sales at no cost to you, which will contribute to the improvement of this channel's contents.
#dataprofessor #machinelearning #datascienceproject #iris #classification #randomforest #decisiontree #python #learnpython #pythonprogramming #datascience #datamining #bigdata #datascienceworkshop #dataminingworkshop #dataminingtutorial #datasciencetutorial #ai #artificialintelligence #tutorial #dataanalytics #dataanalysis #machinelearningmodel

Пікірлер: 100

  • @DataProfessor
    @DataProfessor4 жыл бұрын

    Code of this tutorial is available as a Jupyter notebook via GitHub (link below). 📎CODE: github.com/dataprofessor/code/tree/master/python/iris

  • @ramprasadsapkota1013

    @ramprasadsapkota1013

    3 жыл бұрын

    When I open that link save into my computer and when I open this file in Jupiter notebook it shows Json format instead of Jupiter note book code

  • @bruhm0ment767

    @bruhm0ment767

    2 жыл бұрын

    @@ramprasadsapkota1013 are you doing it through conda or have you downloaded jupyter lab separately

  • @forestsunrise26
    @forestsunrise262 жыл бұрын

    I learn better in this 20 min video than in 1 semester at the university. Thank you so much!

  • @DataProfessor

    @DataProfessor

    2 жыл бұрын

    Glad to hear that :)

  • @nicholflowers2077
    @nicholflowers20773 жыл бұрын

    I really appreciate your organized approach to making this video very clear and simple. Thank you Professor!

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    You're very welcome! Thanks for the kind words!

  • @friederikebauer7810
    @friederikebauer7810 Жыл бұрын

    Ran my first test for my thesis! Super informative, thanks :)

  • @sc4tterw1nd
    @sc4tterw1ndАй бұрын

    Thank you so much! I was able to come up 17 places in this ML competition because of this. Short and to the point.

  • @thomastimjensen
    @thomastimjensen Жыл бұрын

    Hi, Data Professor! Thank you so much for a very lucid and well structured walk-through of how to build a classification model. I am a master student in data driven organisational change at a university in Denmark, and your course is just perfect to expand my knowledge. Thank you!

  • @DataProfessor

    @DataProfessor

    Жыл бұрын

    Thanks Thomas for the kind words!

  • @soeleos2846
    @soeleos28462 жыл бұрын

    Hey prof, I'm a prof @FSU, direct a ML lab, great tutorial, will use this onboarding new students, Many thanks!.

  • @galnahum4349
    @galnahum43494 жыл бұрын

    ครูชานินวีดีโอคนนี้ดีมากครับ Looking forward to the next video in the series. ขอบคุณมากนะครับ ⁦🙏🏻⁩

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Thanks for the comment and kind words!

  • @siddharthachaganti5639
    @siddharthachaganti56393 жыл бұрын

    Never though people who study biology are interesting but now i change my opinion, you are the best ..

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Thanks!

  • @shiyuran625
    @shiyuran625 Жыл бұрын

    Thank you so much for your patient explaination! I wonder what X[[0]] means?

  • @dd15277
    @dd152773 жыл бұрын

    Thank you for sharing, great explanation!!

  • @wen-chiyeh4332
    @wen-chiyeh43322 жыл бұрын

    Super helpful!!! Thank you so much!!!

  • @marcofestu
    @marcofestu4 жыл бұрын

    Nice video, I've just started using python as well, so I hope u can keep up updating video on python as well as R 😁

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Thanks Marco for your support! More coming up.

  • @sushmaramesh7902
    @sushmaramesh79022 ай бұрын

    Oh my God !! This is great stuff !! Thankyou so much !!

  • @ullaskunder
    @ullaskunder3 жыл бұрын

    Awesome.....better then our college syllabus..........

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Thanks, awesome!

  • @ahmadaltaweel4981
    @ahmadaltaweel49812 жыл бұрын

    My first project is to spell your name correctly :) Love you professor.

  • @maths4you819
    @maths4you8192 жыл бұрын

    You have xplained it nicely..Plz explain machine learning in Python using Brain Arteries data set...

  • @DataOverEverything
    @DataOverEverything Жыл бұрын

    Such a good tutorial.. there aren't many that cover non binary classification. Thank you

  • @DataProfessor

    @DataProfessor

    11 ай бұрын

    You're very welcome!

  • @nguyendaominh1078
    @nguyendaominh10783 жыл бұрын

    Very useful video. Thanks a lot!

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Glad to hear that!

  • @kusumakusuma1150
    @kusumakusuma11502 жыл бұрын

    Thanks Data Professor, clear and informative video! Question in the model performance : why we compare X_test and Y_test? since they are subset of the data? why not the Y_test and Y_predict?

  • @DataProfessor

    @DataProfessor

    2 жыл бұрын

    For evaluating model performance, there are 2 ways: (1) via the score() function as in model.score(X_test, Y_test) which will automatically use r2 for regression models and accuracy for classification models. (2) via the r2_score(Y_test, Y_test_pred) if it is a regression model or via accuracy_score(Y_test, Y_test_pred) if it is a classification model.

  • @kusumakusuma1150

    @kusumakusuma1150

    2 жыл бұрын

    @@DataProfessor noted. thanks

  • @thanzaw3883
    @thanzaw38834 жыл бұрын

    Thank you for great video, the explanation is very clear for me(beginner). I don't have much programming background but when I watch your video I still can understand very well. I'm new ML student and I need to do my school 1st ML project . Could you give me any suggestion where can I get simple dataset and which algorithm should I used. Thank you so much Sir. I'm wishing you a lots of success and happiness in your life.

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Hi Than. Start with simple datasets that you understand what the features means such as the iris dataset for classification task and Boston housing dataset for regression. I’m sure there may be others but these 2 came to mind. The algorithm to start with, I would recommend linear regression for your regression tasks and a tree-based method like decision tree or random forest for your classification tasks.

  • @thanzaw3883

    @thanzaw3883

    4 жыл бұрын

    Dear Sir, Thank you so much for your reply. I will always take note all of your best suggestion.

  • @bruhm0ment767

    @bruhm0ment767

    2 жыл бұрын

    try mnist dataset, its for handwritten digits. Or you could just make your own input vectors and expected output vectors with floats and just predict those right?

  • @blessingadeyemi1289
    @blessingadeyemi12892 жыл бұрын

    Thanks a lot for this tutorial. I wanted to ask why you made a prediction with an instance from the training set instead of using new data. Doesn't this cause overfitting?

  • @DataProfessor

    @DataProfessor

    2 жыл бұрын

    Yes, that is correct. It was used for the initial demo purpose of using the scikit-learn classifier function, which later in the video talked about the use of the train_test_split function for performing data splitting followed by model building and evaluating on the test data.

  • @nuramirahsyahirahzainurin6151
    @nuramirahsyahirahzainurin61513 жыл бұрын

    can I do a classification model in burnout?

  • @kanimozhipanneerselvam3017
    @kanimozhipanneerselvam30174 жыл бұрын

    Great Video Professor as Always!! Kindly upload videos for Handling Sensor Collected \ IoT related times series data & model building!! Thanks in Advance!! 🙂

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Thanks for the suggestion, I’ll definitely consider this for future videos 😃

  • @thestorm4633

    @thestorm4633

    3 жыл бұрын

    @@DataProfessor please do, I am currently working on a project, its IoT related, and i need to use machine learning to detect the DDoS traffic and then use other algorithms to block the traffic. if you can give advice on how to go about it, my department reject the use of sklearn libraries. Your candid advice will be greatly appreciated.

  • @l3gcy337

    @l3gcy337

    9 ай бұрын

    ​@thestorm4633 how did your project go?

  • @moniquediaz674
    @moniquediaz6743 жыл бұрын

    loved the video. You teach very well

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Thank you 😊

  • @Alok-lk4ql
    @Alok-lk4ql4 жыл бұрын

    Sir if we don't have balance class for target variable then what to do?

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Great question then we will have to balance the target variable by performing under sampling or over sampling

  • @todymaverick
    @todymaverick11 ай бұрын

    great job man!

  • @thestorm4633
    @thestorm46333 жыл бұрын

    Hi Prof, I am currently working on a project, its IoT related, and i need to use machine learning to detect the DDoS traffic and then use other algorithms to block the traffic. if you can give advice on how to go about it, my department reject the use of sklearn libraries. currently studying in africa. Your candid advice will be greatly appreciated.

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Cybersecurity isn't really my domain. I would approach the problem by asking experts in the field on what is the current gold standard method to perform this task. Then I would research research papers in the field. Then aggregate all information to plan my own approach. That's how I would do it. Hope this helps.

  • @shwetaredkar734
    @shwetaredkar7344 жыл бұрын

    Simply awesome.

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Thanks again for the kind words 😃

  • @carlosventura1308
    @carlosventura13083 жыл бұрын

    Hi thank you for the video. When I was doing print(clf.feature_importances_) I get a value error that says found input variavles with inconsistent numbers of samples: [150,3]. Could it be that I typed something wrong? Thank you for the help

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Hi, thanks for watching! The feature_importances_ should work after the model is built via model.fit(X,Y) Can you try again and make sure that all code cells were run.

  • @carlosventura1308

    @carlosventura1308

    3 жыл бұрын

    @@DataProfessor I got it. Thank you so much for the quick reply

  • @larrysizemore2891
    @larrysizemore28912 жыл бұрын

    How exactly should we do this on our own?

  • @shailabhshankar884
    @shailabhshankar8843 жыл бұрын

    Lets say, i have a data set that has two features to be included in the Y set... how do we do that.. I have a data set with columns - class name,X,YW,H, cluster Number. I want the model to predict the class name based on the X,Y,W,H and cluster number... For all the same cluster number , i want the model to take into consideration the X,YW,H of only respective cluster number Cluster number are actually the template number of invoices , X,YW,H are co-ordinates... and class names are fields name.. so the problem statement is that... we know the cluster number and X,YWH co-ordinates and we want the system to predict which set of co-ordinates are which data fields.. so the model must only take into account those X,Y,WH for specific cluster number rather than taking all X.YW,H into account. Thanks in advance.

  • @nicholflowers2077
    @nicholflowers20773 жыл бұрын

    Hi Professor, Was "Feature Importance" added to the original dataset? How/When is it calculated for new data that has not already been modeled? 2. How does the dataset.load method know where to get your dataset from?

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Hi Nichol, 1. Feature importance is performed after a model has been built using the random forest algorithm (which has a built-in feature importance function). We can get these important features as follows (notice the feature_importances_ function): clf = RandomForestClassifier() clf.fit(X_train, Y_train) clf.feature_importances_

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Q: How/When is it calculated for new data that has not already been modeled? A: Feature importance can be calculated only for the data that was used to train the model. We can incorporate new data into the dataset, rebuild the model and recompute the feature importance.

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Q: 2. How does the dataset.load method know where to get your dataset from? A: The datasets.load_iris function loads the Iris dataset from the Scikit-learn package and assigns to a variable that we specify such as assigning it to a variable called "iris". iris = datasets.load_iris() In addition, there are several other datasets provided by Scikit-learn as specified in the Scikit-learn documentation scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets For example, we can replace "load_iris" in "datasets.load_iris()" with load_boston, load_breast_cancer, load_diabetes, load_digits, load_wine to use these other datasets Hope this helps.

  • @nicholflowers2077

    @nicholflowers2077

    3 жыл бұрын

    @@DataProfessor oh! I didn't know that. thanks for pointing that out.

  • @nicholflowers2077

    @nicholflowers2077

    3 жыл бұрын

    @@DataProfessor Makes perfect sense. thanks!

  • @1UniverseGames
    @1UniverseGames3 жыл бұрын

    Help Please: While I type clf.fit(X, Y) > I only get a output like:- RandomForestClassifier() It not showing the whole details in my notebook, I wrote the same code above and this line shows different output result, can you help to solve this, is there any syntax to get all detailed output

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Hi, what output are you getting? It should show the parameters used for building the model as the output.

  • @dhwanitrivedi5604

    @dhwanitrivedi5604

    3 жыл бұрын

    @@DataProfessor Hello Sir, I am facing the same issue I was able to do the prediction however while printing the score I got 0% accuracy. I have wrote the same code as shown in the notebook. Please help. Thank you very much.

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    @@dhwanitrivedi5604 Please check to see if the data is loaded properly and that the data variable is read into the fit function. Also check that the variable names match since if nothing is read in then it will not produce the desired results.

  • @pavankalyan6927
    @pavankalyan69274 ай бұрын

    thank you so much

  • @viet-bacnguyen1830
    @viet-bacnguyen18304 жыл бұрын

    Many thanks!

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    A pleasure! Thanks for watching 😃

  • @mankomyk
    @mankomyk2 жыл бұрын

    Trying to use this guide. But noticed, that I need to choose how to import CSV file - as pandas dataframe or as NumPy array. Your instructions and code are for the NumPy array. My CSV file has 35000 rows and 280 columns. The first row is for column names. The first column has string target classification (y) 'good'/'bad' and all other columns are some numeric features. What should I choose?

  • @mankomyk

    @mankomyk

    2 жыл бұрын

    And when I'm trying to import as numpy array with numpy.genfromtxt('file.csv',dtype=None,delimiter=';',names=True) I get strange array shape (35000,) with column names in variables viewer. Trying to import as numpy array with numpy.genfromtxt('file.csv',dtype=None,delimiter=';') and get array shape (35001,280) but column names imported as first row :(

  • @DataProfessor

    @DataProfessor

    2 жыл бұрын

    Hi, have you tried importing using pandas import pandas as pd df = pd.read_csv('file.csv') Afterwards you can separate df to X and y.

  • @mankomyk

    @mankomyk

    2 жыл бұрын

    @@DataProfessor Thanks. Finally I've done like this: data = pd.read_csv('datafile.csv',sep=';') data = pd.DataFrame(imp.transform(data), columns=data.columns) dataArray = data.to_numpy() X = dataArray[:,1:] X.astype('float64') Y = dataArray[:,0] Y.astype('float64') Y = pd.to_numeric(Y)

  • @abhipsatripathy3934
    @abhipsatripathy39344 жыл бұрын

    Prof. I am new to Python, and following your videos regularly. I have a problem.When i am creating a matrix by myself in Jupiter notebook, than "shape" command is working i.e. it shows the no. of rows and no. of columns. But when I am importing the iris dataset from sklearn using the code from sklearn.datasets import load_iris,iris = load_iris() and then using "iris.shape". ERROR is occurring. It shows Keyerror "shape". What can be the reason????????? Please suggest me something because I have been stuck in this.

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    can you try assigning X = iris.data and assign Y = iris.target where iris.data contains the 4 X variables and iris.target contains the Y variable (the species class label). Afterwards you can run X.shape and Y.shape

  • @abhipsatripathy3934

    @abhipsatripathy3934

    4 жыл бұрын

    @@DataProfessor I'll try and tell you. I split the data like X = iris.data & Y = iris.target, and then tried to find the shape. shape is X is coming to be (150,4) and that of Y is (150,). Is it okay?

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    yes that is correct, 150 means that there are 150 rows and 4 means there are columns

  • @abhipsatripathy3934

    @abhipsatripathy3934

    4 жыл бұрын

    @@DataProfessor Thanks a ton Prof.

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    @@abhipsatripathy3934 You're welcome 😃

  • @Mayglie
    @Mayglie3 жыл бұрын

    HI Data Professor, could you demo , cifar10 dataset and also teach how to save the trained data and load the trained data and make prediction. Thank you !

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Great suggestion! I will definitely take a look and consider for future videos 😃

  • @ismaelnadaf2870
    @ismaelnadaf28704 жыл бұрын

    A great video sir, i am an UG student .i want to build an machine learning web app for multi language detection(eg:english,french,chinese,japanese).Please guide me how to do from basic

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Thanks for your comment. What you need to do to make this web app is use the Shiny package in R for building the web app where you can build a ML model and plug it into the web app. I made several videos on this topic. Please check it out below. Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4 kzread.info/dash/bejne/lZmbma-GgbHSnps.html There are 4 other related videos and it guides you from the beginning: 1. Building your First Web Application in R | Shiny Tutorial Ep 1 kzread.info/dash/bejne/ppqCk5KChbuffNI.html 2. Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2 kzread.info/dash/bejne/nndlps1vl7jIlZM.html 3. Building Data-Driven Web Application in R | Shiny Tutorial Ep 3 kzread.info/dash/bejne/dY2M2Lium8-9grA.html 5. Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5 kzread.info/dash/bejne/a3mFmMWwcrTWptI.html

  • @pramishprakash
    @pramishprakash2 жыл бұрын

    thanks a lot sir

  • @HimaniChauhan
    @HimaniChauhan4 жыл бұрын

    sir how to use k means algorithm as a training dataset in python

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    You'll need to import the necessary library from sklearn.cluster import KMeans and use the KMeans() function

  • @HimaniChauhan

    @HimaniChauhan

    4 жыл бұрын

    @@DataProfessor thanku sir

  • @brokerkamil5773
    @brokerkamil57739 ай бұрын

    thx😀

  • @ramprasadsapkota1013
    @ramprasadsapkota10133 жыл бұрын

    Hi, How can I find the file in GitHub l tried but got different files

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Hi the link to the Jupyter notebook file on GitHub repo is normally in the video descriptions, here’s the link again github.com/dataprofessor/code/tree/master/python/iris

  • @ramprasadsapkota1013

    @ramprasadsapkota1013

    3 жыл бұрын

    Thanks heaps, your tutorial is awesome !!!

  • @Mohamm-ed
    @Mohamm-ed4 жыл бұрын

    Thanks for the video, could you pleas build signal classifier model in Python such as EEG and ECG signal or recommend me a materials for that.... thanks in advance

  • @DataProfessor

    @DataProfessor

    4 жыл бұрын

    Thanks for the comment. You might want to check out this GitHub page for some repository that have the code of EEG classifiers github.com/topics/eeg-classification

  • @Mohamm-ed

    @Mohamm-ed

    4 жыл бұрын

    @@DataProfessor thanks so much

  • @RaselAhmed-ix5ee
    @RaselAhmed-ix5ee2 жыл бұрын

    hello how can i contact you?

  • @kelvinedozieobed4899
    @kelvinedozieobed48993 жыл бұрын

    great work and I LOVE YOUR NAME

  • @DataProfessor

    @DataProfessor

    3 жыл бұрын

    Thanks for watching!

  • @T-BWT
    @T-BWT5 ай бұрын

    my professor is a finger print candidate

Келесі