Linear Regression Python Sklearn [FROM SCRATCH]

Descargar Código: / pythonmaraton
Join Patreon: / pythonmaraton
^Downloadable code & more!
linear regression python sklearn. In this video we will learn how to use SkLearn for linear regression in Python. You can follow along with this linear regression sklearn python example. The linear regression in python will be done using SKlearn. The first thing we need to do is import. We will import a pydataset to use in this example. And let’s get Pandas and numpy. Next we import the thing we will need from sklearn. LinearRegression from the linear_model package, train test split, and lastly let’s get matplotlib in there so we can visualize this model. First of all, let’s get our data. We will be using the Pima women data. If you ever want to see details about a dataset you can enter in the keyword. Let’s check this data to see if it is approximately linear. In this example we will see if tricep skin fold measurements can predict body mass index (BMI). We can use the pandas plotting capabilities, with kind as scatter. There is the plot. This looks decently linear. So we will proceed with the model. Now we are going to do a test train split. We are doing supervised learning. Basically we create the model using only the training data, and then we use that model to see how well it predicts the testing data. Let’s plot the train test split so you can see what I mean. Everything in red will be used to create our line, and that line will be tested against the green data. Okay, so now let’s actually create the linear mode. LR.fit() and we will plug in X_train and y_train. We reshape X_train because the input must be two dimensional. So .reshape(-1,1) will work just fine. Now let’s use this model to predict on the test data. We will plot that against a scatter plot of the actual test data. Here it is. The line is the model prediction and the green points are the actual data. It seems to do a decent job at following the overall trend. There may be some outliers. Suppose we want to see how the model will predict a specific skin fold measurement, say 50. Let’s plug that in and see. Alright, now we will score the model using Sklearn’s built-in score function. and it came out at .39… The max it could get would be a 1. I want you to think about what that score means and leave what you think in the comments below. Do you think this is a good model? So there you have it, that is how you can use python’s Sklearn to create a linear regression model. Please check out some of my other python videos and please subscribe for more python content. :D This is a Python anaconda tutorial for help with coding, programming, or computer science. These are short python videos dedicated to troubleshooting python problems and learning Python syntax. For more videos see Python Help playlist by Rylan Fowers.
0:00 Intro
0:10 Preparing Data
1:36 Train Test Split
3:09 Training
5:13 Predicting
5:47 Scoring
✅Subscribe: / @pythonmaraton
📺Channel: / @pythonmaraton
🎵Theme Music: www.bensound.com/royalty-free...
View the documentation: scikit-learn.org/stable/modul...
#PythonMarathon #LearnPython #PythonTutorial Learn Python:
Python Book (English): amzn.to/3HcwgLd
Libro de Python (Español): amzn.to/47woAhQ
Affordable Laptop: amzn.to/48L30Hb
Machine Learning Book: amzn.to/3RNmwfs
Libro de Aprendizaje automatico: amzn.to/3RVAFXU
Neural Networks for Babies: amzn.to/41SELoi
Video Equipment:
Background Color Light: amzn.to/3SgBDzG
Key Light: amzn.to/3NYwXLZ
Microphone: amzn.to/3H9UK89
Other:
Underrated Cheap Basketball: amzn.to/3RVzJTo
Amazing Basketball shorts: amzn.to/3vyRDUM

Пікірлер: 67

  • @dome8116
    @dome81165 жыл бұрын

    Considering your question I guess for a linear regression model is it pretty okay. Much higher accuracy is probably not possible with LR. Other ml models would have to be taken into consideration

  • @bradwang3648
    @bradwang36483 жыл бұрын

    very helpful, thank you!!! I can finally do my HW after watching this video!

  • @shrishsharma8333
    @shrishsharma83334 жыл бұрын

    An awesome video and great explanation. Why it ain't got any views i wonder!!!! Thanks a lot!!

  • @edsonwinnerify
    @edsonwinnerify3 жыл бұрын

    Thanks bro. I already got subscribed and no doubt I will watch all your videos as you are a great teacher. God bless you!

  • @deekshantchoudhary8454
    @deekshantchoudhary84544 жыл бұрын

    Great video, you've explained it nicely. Thanks!!

  • @hudsontorrent6672
    @hudsontorrent6672 Жыл бұрын

    Thanks for the video. Just to make a contribution, there is an outlier with high leverage in the training set (the observation with coordinates around (100, 35)). This is affecting the estimation of the slope coefficient, making its estimated value smaller than it should be. As a result, the estimated line does not fit the testing set well. There are no outliers in the testing set. Thanks again. Cheers.

  • @joseordonez7738
    @joseordonez77382 жыл бұрын

    Mi loco, no se si entiendas; pero tu video salvo mi ser, eres grande

  • @mridulagarwal5881
    @mridulagarwal58814 жыл бұрын

    Nice video! Short and crisp.

  • @jongcheulkim7284
    @jongcheulkim72843 жыл бұрын

    Thank you so much. I learned a lot.

  • @michaelcstorm3808
    @michaelcstorm38083 жыл бұрын

    thanks. this helps me to do a data science assignment

  • @jeffgalef121
    @jeffgalef1213 жыл бұрын

    That was great. Thank you!

  • @aravindkramesh
    @aravindkramesh2 жыл бұрын

    *Thank you, man. I understood.*

  • @paarthmadan1315
    @paarthmadan13154 жыл бұрын

    At 3.32, what was the reshaping criteria: why reshaped to (-1,1) and not anything else? I didn't understand that part.

  • @toihirhalim
    @toihirhalim3 жыл бұрын

    thanks Rylan you are awesome dude !

  • @ced4030
    @ced40303 жыл бұрын

    looks good and it helped me a lot. I did this for a class project a few months ago but it was a great refresher. a question, if i wanted to plug my predictions back into the actual data - to for example tie the prediction to a womans name if it existed in the original data set; how would we do that?

  • @imtiazsajwani4239
    @imtiazsajwani42392 жыл бұрын

    Ryan, thanks for the great video. Do you happen to know why am I getting the fit error ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

  • @abhishekambawale7456
    @abhishekambawale7456 Жыл бұрын

    Great Tutorial. Thanks

  • @ricardosalas7048
    @ricardosalas70483 жыл бұрын

    gracias por existir

  • @akshatbhadani1710
    @akshatbhadani171011 ай бұрын

    i think it was a great model and u are aa great person tysm for making this vid

  • @spitfirelast8761
    @spitfirelast87612 жыл бұрын

    How do you make their values appear normal again after running the model? Like for example: I had a value of 3070.55 then after processing the data, the machine made the value from 3070.55 to 7.189879, then after running the model i get 0.46598782 on mean square error and 0.47839596 for cross validation score. How do i return the value of 7.189879 to original 3070.55 so that i can output the value to original amount?

  • @jaywiji
    @jaywiji3 жыл бұрын

    Very clear video thanks a lot. One questions I have is why do we need to reshape the data ? And why do we need to use .values? Wouldn't it work if we just used X_train, Y_train instead of X_train.values ?

  • @wobblyjelly345

    @wobblyjelly345

    3 жыл бұрын

    That's what I want to know too

  • @amikhalsa3173

    @amikhalsa3173

    3 жыл бұрын

    The reason is essentially because of the datatype. It needs to be in nd array form and needs to be a 2d array. you will get an error if you try to just use X_train because at this point it is a series datatype. You can convert it to a numpy.ndarray by using X_train = X_train.to_numpy() and then reshape to (-1, 1) OR you can just take the values stored in the series and reshape the values directly to (-1, 1). I think this is because the lr.fit() function takes only 2d arrays, not series. Hope that helps!

  • @jeandy4495
    @jeandy4495 Жыл бұрын

    The accuracy of this particular model over this data is pretty good (~40%). The linear model is pretty good at catching the general (linear) trend of the datapoints. But it will be difficult to improve the accuracy with this model, as the datapoints are distributed with a wide variance around the linear model. Other regressors could be more accurate.

  • @muhammadhamza7369
    @muhammadhamza73692 жыл бұрын

    Love it 🥰🔥

  • @gongjiaji2489
    @gongjiaji24893 жыл бұрын

    is the fit() function did all the training job? why is so quick?

  • @sanketchore657
    @sanketchore6573 жыл бұрын

    Thanks bro

  • @prateekyadav7679
    @prateekyadav76793 жыл бұрын

    i am working on an excel file but I get key error for 'height' which is the first column in my data.

  • @gnanashrishetty1465
    @gnanashrishetty14653 жыл бұрын

    can anyone explain 3:40? I couldnt get the output[7], my output was just LinearRegression() and due to that I couldn't further use the .predict either

  • @Continentalky

    @Continentalky

    2 жыл бұрын

    @@pythonmaraton I am still having the same problem. I used the LR = LinearRegression code, but still just returning LinearRegression() when I run the next line of code.

  • @theh1ve

    @theh1ve

    2 жыл бұрын

    @@Continentalky Did you find a solution I am having the exact same issue?

  • @skhdukes1888
    @skhdukes18883 жыл бұрын

    Great video. To answer your question, since the model scored under 70% wouldn't it be considered poor performance?

  • @farelferdinand1089

    @farelferdinand1089

    2 жыл бұрын

    I think it depends because if you are going to predict the percentage of people surviving after an operation, 70 might be a low number.

  • @alokeveer
    @alokeveer4 жыл бұрын

    hey, I just love to work in a dark background. How did you make your background dark... ??

  • @alokeveer

    @alokeveer

    4 жыл бұрын

    @@pythonmaraton Exactly man.... Would u mind telling me the name of the chrome plugin or if possible sending the link of the chrome plugin!!?? Thanks for reply by the way..

  • @alokeveer

    @alokeveer

    4 жыл бұрын

    @@pythonmaraton Thank you so much man. Your tutorial was also awesome!!

  • 4 жыл бұрын

    Is the score the R2?

  • @hannesvideo
    @hannesvideo Жыл бұрын

    @Python Marathoón: can you explain the reshape? Is it just the selection of 2 features from possibly more features? Why -1?

  • @pythonmaraton

    @pythonmaraton

    Жыл бұрын

    Hi, thanks for the question. Sklearn wants the arrays to be vertical. The -1,1 is just a shortcut to flip it vertical. It’s like saying reshape to size N,1 (N rows and 1 column). Likewise if you reshape to (1,-1) it would reshape to size 1,N (1 row and N columns)

  • @hannes672

    @hannes672

    Жыл бұрын

    @@pythonmaraton thanks, that explains it. Great video!

  • @oddnumber8149
    @oddnumber81493 жыл бұрын

    how to import pydataset in jupyter notebook?

  • @ibrahimnadeem1064
    @ibrahimnadeem10649 ай бұрын

    how to get this dataset?

  • @sachindilhan7504
    @sachindilhan75042 жыл бұрын

    how to find dataset

  • @zombotman8776
    @zombotman87763 жыл бұрын

    is there a way to predict "x" using a specific "y" value?

  • @aadhuu

    @aadhuu

    8 ай бұрын

    I mean just feed y instead of x into the model

  • @mandalamtarun5414
    @mandalamtarun54144 жыл бұрын

    I am getting an error: fit() missing 1 required positional argument: 'y' Any suggestions on removing this?

  • @zackmorey

    @zackmorey

    4 жыл бұрын

    @@pythonmaraton Could you help me understand why the reshape is important and what it's doing?

  • @jaymanhire
    @jaymanhire2 жыл бұрын

    My model score is very low, but the predictions are very close. Interesting.

  • @estebanduarte1792

    @estebanduarte1792

    2 жыл бұрын

    Extreme outliers?

  • @Huy-G-Le
    @Huy-G-Le2 жыл бұрын

    ModuleNotFoundError: No module named 'pydataset'

  • @bellatrix625

    @bellatrix625

    2 жыл бұрын

    pip install pydataset

  • @crazycxr
    @crazycxr3 жыл бұрын

    yo, good succint video. thanks

  • @raminlakin7888
    @raminlakin78882 жыл бұрын

    The notebook can we have the code?

  • @yeahjustlikethat
    @yeahjustlikethat2 жыл бұрын

    Sometimes a less accurate but simpler model is better to get others "buy in". I guess that one can need some help though.

  • @ishikakesarwani6278
    @ishikakesarwani62783 жыл бұрын

    At 4:00 what if we don't reshape?

  • @ishikakesarwani6278

    @ishikakesarwani6278

    3 жыл бұрын

    @@pythonmaraton 👍

  • @afifkhaja
    @afifkhaja Жыл бұрын

    I tried installing then importing sklearn but Python didn't recognize it. I had to install skicit-learn instead. # Go to File -> Settings -> Python Interpeter and install pydataset and scikit-learn packages # scikit-learn is called sklearn when using the import statement from sklearn.linear_model import LinearRegression # For linear regression from sklearn.model_selection import train_test_split # To split data into train and test

  • @prafuldhakde3601
    @prafuldhakde36015 ай бұрын

    Bhai mera to nhi ho rha... mene code type kiya jaisa aapne likha vese copy paste lekin vo error dera

  • @vasudhashrikhandey2194
    @vasudhashrikhandey21944 жыл бұрын

    My score is coming 0.0348.Am I still correct?Since I have done all the steps same

  • @prafuldhakde3601
    @prafuldhakde36015 ай бұрын

    Plz aapka koi contact hoto mujhe dede

  • @devilzwishbone
    @devilzwishbone2 жыл бұрын

    No, not a good model as its 39% accurate, ideally you want it in the 3/4 mark or more (75% accuracy) for it to be an okish model and 90% or more for it to be brilliant

  • @ihsanayyach3696
    @ihsanayyach36962 жыл бұрын

    at least do something to improve ur model 0.3 R is very low