Stochastic Gradient Descent vs Batch Gradient Descent vs Mini Batch Gradient Descent |DL Tutorial 14

Stochastic gradient descent, batch gradient descent and mini batch gradient descent are three flavors of a gradient descent algorithm. In this video I will go over differences among these 3 and then implement them in python from scratch using housing price dataset. At the end of the video we have an exercise for you to solve.
🔖 Hashtags 🔖
#stochasticgradientdescentpython #stochasticgradientdescent #batchgradientdescent #minibatchgradientdescent #gradientdescent
Do you want to learn technology from me? Check codebasics.io/?... for my affordable video courses.
Next Video: • Chain Rule | Deep Lear...
Previous video: • Implement Neural Netwo...
Code of this tutorial: github.com/codebasics/deep-le...
Exercise: Go at the end of above link to find description for exercise
Deep learning playlist: • Deep Learning With Ten...
Machine learning playlist : kzread.info?list...
Prerequisites for this series:
1: Python tutorials (first 16 videos): kzread.info?list...
2: Pandas tutorials(first 8 videos): • Pandas Tutorial (Data ...
3: Machine learning playlist (first 16 videos): kzread.info?list...
#️⃣ Social Media #️⃣
🔗 Discord: / discord
📸 Dhaval's Personal Instagram: / dhavalsays
📸 Instagram: / codebasicshub
🔊 Facebook: / codebasicshub
📝 Linkedin (Personal): / dhavalsays
📝 Linkedin (Codebasics): / codebasics
📱 Twitter: / codebasicshub
🔗 Patreon: www.patreon.com/codebasics?fa...

Пікірлер: 250

  • @codebasics
    @codebasics2 жыл бұрын

    Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced

  • @user-zy8sf7tv2f
    @user-zy8sf7tv2f3 жыл бұрын

    I've followed your words to implement the minibatch gradient descent algorithm myself and learned a lot after wathing your implementation about that, thank you very much.

  • @ryansafourr3866
    @ryansafourr38662 жыл бұрын

    The world is better with you in it!

  • @codebasics

    @codebasics

    2 жыл бұрын

    Glad you liked it Ryan and thanks for the donation

  • @girishtripathy275
    @girishtripathy2752 жыл бұрын

    After So many videos I watched to learn ML (Self learn, I am complete noob in ML currently), this playlist might be the best one I got on youtube! Kudos man. Must respect

  • @sanjivkumar8187
    @sanjivkumar81872 жыл бұрын

    Hello Sir, i am following your tutorials by sitting in Germany. You made thing's so simple. Better then Udemy,coursera,.. etc courses. I highly recommend. Please take care of your health as well and hopefully you will be fatter in coming Video 🙂

  • @spiralni
    @spiralni2 жыл бұрын

    When you understand the topic you can explain it easily, and you are a sir, are a master. thanks.

  • @kasyapdharanikota8570
    @kasyapdharanikota85702 жыл бұрын

    when you explain I find deep learning very easy and interesting. Thank you sir!

  • @kaiyunpan358
    @kaiyunpan3583 жыл бұрын

    Thank you for your patient and easily understood explanation which solved my question !!!

  • @bestineouya5716
    @bestineouya5716 Жыл бұрын

    I spent days trying to learn gradient descent and its types. Happy you cleared the mess. Thanks again teacher

  • @watch_tolearn
    @watch_tolearn4 ай бұрын

    You are the best teacher I have come across. you bring understanding in a humble way. Stay blessed.

  • @malharlumbhani8700
    @malharlumbhani87003 жыл бұрын

    Ekdum jordaar bhanavo sir tame, Bov ucchu :)))))

  • @user-qi8xj8jh9m
    @user-qi8xj8jh9m11 ай бұрын

    This is called teaching, love your teaching sir!!

  • @nahidakhter8646
    @nahidakhter86463 жыл бұрын

    Video was fun to watch and the jokes helped keep me focused. Thanks for this :)

  • @codebasics

    @codebasics

    3 жыл бұрын

    Glad you enjoyed it!

  • @prashantbhardwaj7041
    @prashantbhardwaj70412 жыл бұрын

    At about 14:43, a clarification may help someone as to why the Transpose is required. For Matrix product, the thumb rule is that Columns of the 1st matrix must be the same as the rows of the 2nd matrix. since our "w" is 2 columns, the "X_scaled" has to be transposed from a 22X2 matrix into a 2X22 matrix. Yes, the resulting matrix will be a 22 column, 2 rows matrix.

  • @mikeguitar-michelerossi8195

    @mikeguitar-michelerossi8195

    Жыл бұрын

    Why don't we make np.dot(scaled_X, w)? Should give the same result, without the transpose operation

  • @ankitjhajhria7443

    @ankitjhajhria7443

    Жыл бұрын

    w.shape is (2*1) means 1 column and x_scaled.T has (2*20) means 2 rows ? your rule does not follow why ?

  • @tiyasachakraborty4786
    @tiyasachakraborty47862 жыл бұрын

    You are my best teacher. I am becoming a big fan of such a great teacher.

  • @vincemegasonic
    @vincemegasonic2 жыл бұрын

    Good day to you sir! I'm currently an undergraduate in Computer Science, currently working on a paper that is using this neural network. This tutorial helped me understand the neural network pretty quick and helped me adjust our software to function how we intend it to. Please keep up the good work and hope that other students like me can come across and use this in their upcoming studies!! Godspeed on your future content!!

  • @codebasics

    @codebasics

    2 жыл бұрын

    Best of luck! and I am happy this video helped

  • @zhaoharry4113
    @zhaoharry41133 жыл бұрын

    love how you always put memes in your videos HAHA, great work!

  • @zhaoharry4113

    @zhaoharry4113

    3 жыл бұрын

    and thank you for the videos Sir :3

  • @kumudr
    @kumudr3 жыл бұрын

    thanks, i understood finally gradient descent, sgd & mini batch

  • @yen__0515
    @yen__05152 жыл бұрын

    Sincerely appreciate for your enrich content, it helps me a lot!

  • @codebasics

    @codebasics

    2 жыл бұрын

    Thanks for the generous donation 🙏👍

  • @harshalbhoir8986
    @harshalbhoir8986 Жыл бұрын

    Thank you so much sir Now really dont have porblem with Gradient Descent and the exercise at last helps alot!!

  • @yogeshbharadwaj6200
    @yogeshbharadwaj62003 жыл бұрын

    Tks a lot for the detailed explanation...learned a lot...

  • @NguyenNhan-yg4cb
    @NguyenNhan-yg4cb3 жыл бұрын

    Lol i do not want go to sleep and i dont have enough money to watch netflix, so i just take care of my career sir

  • @rociodelarosa1549
    @rociodelarosa15492 жыл бұрын

    Excellent explanation, keep up the good work 👏

  • @piyalikarmakar5979
    @piyalikarmakar59792 жыл бұрын

    Sir, your vedios always answer my all queries around the topics...Thank you so much sir..

  • @raom2127
    @raom21272 жыл бұрын

    Great videos and in simplicity in detailed explanation with coding is super.............

  • @vishaljaiswar6441
    @vishaljaiswar64412 жыл бұрын

    Thank you so much, sir! I think you taught way better than my university lecturer and helped me understand much better!

  • @codebasics

    @codebasics

    2 жыл бұрын

    👍👍👍☺️🎉

  • @rofiqulalamshehab8528
    @rofiqulalamshehab852811 ай бұрын

    Your explanation is excellent. It would be great if you could make a computer vision playlist.Did you make any plans for it?

  • @optimizedintroverts668
    @optimizedintroverts6682 ай бұрын

    hats of to you for making this topic easy to understand

  • @priyajain6791
    @priyajain6791 Жыл бұрын

    @codebasics Loving your videos so far. The way you present the examples and explanations, things really seems to be easy to understand. Thanks a lot for thoughtful content! Just one request, can you please share the PPT you're using as well?

  • @tarunjnv1995

    @tarunjnv1995

    Жыл бұрын

    @codebasics Yes, your content is really outstanding. Also for quick revision of all these concepts we need ppt. Could you please provide it?

  • @waseemabbas5078
    @waseemabbas50782 жыл бұрын

    Hi! Sir i am from pakistan i am following your tutorials, thank you very much for such an amazing guiding material.

  • @sunilkumar-pp6eq
    @sunilkumar-pp6eq3 жыл бұрын

    Your Videos are really helpful, you are so good in coding, it takes time for me to understand. But Thank you so much for making it simple!

  • @codebasics

    @codebasics

    3 жыл бұрын

    I am happy this was helpful to you.

  • @fariya6119
    @fariya61193 жыл бұрын

    I think you have just made everything easy and clear. Thanks a lot . You have just allayed my fears to learn Deep learning.

  • @codebasics

    @codebasics

    3 жыл бұрын

    Glad to hear that

  • @vin-deep
    @vin-deep Жыл бұрын

    Super explanation skill that you have!!!

  • @chalmerilexus2072
    @chalmerilexus2072 Жыл бұрын

    Lucid explanation. Thank you

  • @ramimoustafa
    @ramimoustafa Жыл бұрын

    Thank you man for this perfect explanation

  • @humourin144p
    @humourin144p Жыл бұрын

    Sir Big Fan ….best and simple explanation

  • @dutta.alankar
    @dutta.alankar3 жыл бұрын

    Really well explained in simple terms!

  • @codebasics

    @codebasics

    3 жыл бұрын

    😊👍

  • @adityabhatt4173
    @adityabhatt4173 Жыл бұрын

    Good Bro, The way u used memes is expectational It makes learning fun.

  • @suenosn562
    @suenosn5622 жыл бұрын

    you are great teacher thank you so much sir

  • @fahadreda3060
    @fahadreda30603 жыл бұрын

    Thanks for the video , wish you all the best

  • @codebasics

    @codebasics

    3 жыл бұрын

    I am glad you liked it

  • @Breaking_Bold
    @Breaking_Bold7 ай бұрын

    Great explanation !!!

  • @siddharthsingh2369
    @siddharthsingh23692 жыл бұрын

    If someone is facing trouble in the value of w_grad, b_grad, here is my explanation, please correct me if somewhere i am wrong - I think the error is calculated using the formula (y_predicted - y_true)**2, if u notice in the starting. Hence total error in that case will be mean of all the errors found. However when u do the derivate square term i.e. error **2 will also give 2 in the front ( By derivation of x**2) and along the weight it is showing 2 in front. The -ve value which u are seeing is just reversal of (y_true - y_predicted) in this video. As in previous video it was (y_predicted - y_true). Also if somehow u are getting confused in the transpose implementation of the matrix as the one which is shown here is little different then the one video 13 , then u can use below code for w_grad, b_grad. They will give u the exact value. # Similarity from video 13 while finding w1 , w2, bias - w_grad = ( 2 / total_samples )*np.dot( np.transpose( x ), ( y_predicted - y_true )) . b_grad = 2 * np.mean( y_predicted - y_true ).

  • @very_nice_777
    @very_nice_777 Жыл бұрын

    Thanks a lot sir. Love from Bangladesh!

  • @ashimanazar1193
    @ashimanazar11933 жыл бұрын

    The explanation was very clear. What if the input data X has outliers then if one takes a small batch size then one can't just compare the last two values for theta or cost function. What shall be the convergence condition then? Please explain

  • @abhisheknagar9000
    @abhisheknagar90003 жыл бұрын

    Very nice explanation. Could you please let me the parameter value while training (for SCD, mini batch and batch) using Keras.

  • @shashisaini7919
    @shashisaini7919 Жыл бұрын

    thankyou sir, good tutorial.❣💯

  • @shuaibalghazali3405
    @shuaibalghazali34058 ай бұрын

    Thanks for making this tutorial I think am getting somewhere

  • @ritik444
    @ritik4442 жыл бұрын

    You are an actual legend

  • @dimmak8206
    @dimmak82063 жыл бұрын

    you have a talent at teaching cheers!

  • @codebasics

    @codebasics

    3 жыл бұрын

    Glad you enjoyed it

  • @nasgaroth1
    @nasgaroth13 жыл бұрын

    Awesome teaching skills, nice work

  • @codebasics

    @codebasics

    3 жыл бұрын

    Glad you think so!

  • @swaralipibose9731
    @swaralipibose97313 жыл бұрын

    You are truly talented in teaching

  • @codebasics

    @codebasics

    3 жыл бұрын

    👍☺️

  • @spg2731476
    @spg27314762 жыл бұрын

    At 3:21, why do you need 20 million derivatives. It would just be 3 derivatives - 2 for features and 1 for bias. Isn't it? If so, please update it so that audience are not confused.

  • @JunOfficial16

    @JunOfficial16

    2 жыл бұрын

    I have the same question. for the first epoch, 3 derivatives, the second would be 3 more, and so on. so the number of derivatives depends on how many epochs we go through, right?

  • @JunOfficial16

    @JunOfficial16

    2 жыл бұрын

    And with SGD, at every sample, we calculate 3 derivatives until the error is minimized. If the err is not minimized to 0, it would go through 10m samples, and that would be 10m x 3 = 30m derivatives.

  • @shamikgupta2018
    @shamikgupta20182 жыл бұрын

    17:26 --> Sir it looks like the derivative formulae for w1 and bias are different than what you had shown in previous video.

  • @williammartin4416
    @williammartin44165 ай бұрын

    Excellent lecture

  • @GaneshMuralidharan
    @GaneshMuralidharan2 жыл бұрын

    Excellent bro.

  • @satinathdebnath5333
    @satinathdebnath53332 жыл бұрын

    Thanks for uploading such informative and helpful videos. I am really enjoying it and looking forward to use it in my MS works. Please let me know where I can find the input data like the .CSV file. I could not find it in the link provided in the description.

  • @Mathmagician73
    @Mathmagician733 жыл бұрын

    Waiting 😍........Also make video on optimizers pls

  • @codebasics

    @codebasics

    3 жыл бұрын

    👍😊

  • @spicytuna08
    @spicytuna082 жыл бұрын

    thanks u r really good.

  • @VikramReddyAnapana
    @VikramReddyAnapana2 жыл бұрын

    Wonderful as always.

  • @codebasics

    @codebasics

    2 жыл бұрын

    Glad it was helpful!

  • @spoonstraw7522
    @spoonstraw75227 ай бұрын

    Thank you so much and that cat trying to learn, mini batch gradient, descent is so relatable. In fact, that’s the reason I’m here. My cat is a nerd. We were partying, and then my cat the party pooper he is asked what is mini batch gradient descent and he kind ruined the party. He always does this last time he was annoying everyone by trying to explain what bullion algebra is What a nerd

  • @otsogileonalepelo9610
    @otsogileonalepelo96103 жыл бұрын

    Great content and tutorials, thank you so much.🙏 But I have a few questions: When do you implement early stopping to prevent overfitting? Aren't you supposed to stop training the moment the loss function value increases compared to the last iteration? For instance the zig-zag pattern for the loss displayed by SGD, is that just fine?

  • @mohdsyukur1699
    @mohdsyukur16994 ай бұрын

    You are the best my boss

  • @mayurkumawatmusic
    @mayurkumawatmusic3 жыл бұрын

    great series

  • @JH-kj3xk
    @JH-kj3xk2 жыл бұрын

    many thanks!

  • @user-zy8sf7tv2f
    @user-zy8sf7tv2f3 жыл бұрын

    This video is indeed really good !

  • @codebasics

    @codebasics

    3 жыл бұрын

    Glad it was helpful!

  • @alidakhil3554
    @alidakhil3554 Жыл бұрын

    Very nice lesson

  • @benlyazid
    @benlyazid2 жыл бұрын

    good explication thank for you for your effort , keep going bro ;)

  • @codebasics

    @codebasics

    2 жыл бұрын

    I am happy this was helpful to you.

  • @RAJIBLOCHANDAS
    @RAJIBLOCHANDAS2 жыл бұрын

    Great video.

  • @tinanajafpour7214
    @tinanajafpour7214 Жыл бұрын

    thank you for the video

  • @Chinmay4luv
    @Chinmay4luv3 жыл бұрын

    Ha ha this is a new style of teaching, liked too much 😍 😍 😍 and definitely i am going to open the solution part, however i have already vaccines for my computer in codebasics....

  • @codebasics

    @codebasics

    3 жыл бұрын

    Ha ha. So chinmay ram is the first one to invent the vaccine for corona 😊 you should get a novel prize buddy 🤓 how is it going by the way? Are you still in Orissa or back to Mumbai?

  • @Chinmay4luv

    @Chinmay4luv

    3 жыл бұрын

    @@codebasics that prize will dedicate to codebasics, i am in odisha continuing wfh....

  • @navid9495
    @navid94952 жыл бұрын

    really useful tnq

  • @ISandrucho
    @ISandrucho3 жыл бұрын

    Thanks for the video. I noticed one thing. In SGD you didn't change the partial derivative formula of cost function (but cost function had changed).

  • @r0cketRacoon

    @r0cketRacoon

    28 күн бұрын

    the same question, I wonder why do we need the derivatives divided by total samples when we only pick a stochastic sample? Have u figured out the answer?

  • @sandipansarkar9211
    @sandipansarkar92113 жыл бұрын

    Great session

  • @harshalbhoir8986
    @harshalbhoir8986 Жыл бұрын

    Thank you so much sir

  • @danielniels22
    @danielniels222 жыл бұрын

    love the party cat!

  • @ahmetcihan8025
    @ahmetcihan8025 Жыл бұрын

    Thanks a lot.

  • @vishalsiram1305
    @vishalsiram13053 жыл бұрын

    Also make video on optimizers pls sir

  • @9427gyan
    @9427gyan3 жыл бұрын

    I think there is a need for improvement while explaining scaling at a timeline near 9:45. As per your explanation, the scaling is making it look like 2D but as I think since the data is derived from a column so it's natural occurrence is of column so it appears to be 2D

  • @mohamedyassinehaouam8956
    @mohamedyassinehaouam89562 жыл бұрын

    very interesting

  • @AliAkbar-bv7zp
    @AliAkbar-bv7zp3 жыл бұрын

    that's totally great

  • @codebasics

    @codebasics

    3 жыл бұрын

    glad you liked it

  • @aryac845
    @aryac845 Жыл бұрын

    I was following ur playlist and it's very helpful. But from where I can get the data u used ? So that I can work on it

  • @bangarrajumuppidu8354
    @bangarrajumuppidu83543 жыл бұрын

    superbb

  • @sheruthom6339
    @sheruthom63392 ай бұрын

    Thank you bhai

  • @sahinmuratogur7556
    @sahinmuratogur75562 жыл бұрын

    I have a question why do you calculate cost for each epoch? if you would like to plot the costs for each 5 or 10 steps, is it logical to calculate the costs only at for every 10 th or 5 th step?

  • @RV-qf1iz
    @RV-qf1iz Жыл бұрын

    Like the way of your teaching less theory more coding

  • @farrugiamarc0
    @farrugiamarc04 ай бұрын

    Thank you for sharing your knowledge on the subject with very good and detailed explanation. I have a question with reference to the slide shown at time 3:29. When configured to do batch gradient descent, and there are 2 features with 1 million samples, why is the total number of derivatives equal to 2 million? Isn't it 2 derivatives per epoch? After going through all the 1 million samples you calculate the MSE and then do back propagation to optimise W1 and W2. Am I missing something?

  • @vinny723
    @vinny723 Жыл бұрын

    Great series of tutorials. I would like to know for this tutorial (#14), why the implementations of Stochastic Gradient Descent or Batch Gradient Descent did not include an activation function? Thanks.

  • @r0cketRacoon

    @r0cketRacoon

    28 күн бұрын

    no need to do that because this is a regression task, just classification problems that use sigmoid or softmax

  • @MrBemnet1
    @MrBemnet13 жыл бұрын

    question , why do you have to do 20 million derivatives for 10 million samples? The number of derivatives you have to do should be equal to the number of W's and B's.

  • @danielahinojosasada3158

    @danielahinojosasada3158

    2 жыл бұрын

    Remember that there are multiple features. One sample --> multiple features. This means calculating multiple derivatives per sample.

  • @rahulnarayanan5152

    @rahulnarayanan5152

    2 жыл бұрын

    @golden water Same question

  • @uttamagrahari

    @uttamagrahari

    Жыл бұрын

    Here in these 10 million samples there are 10 million weights and 10 million biases. So we have to do derivatives for every weight and bias, so we have to do 20 million derivatives while updating for the new weight and bias.

  • @shouyudu936
    @shouyudu9363 жыл бұрын

    I have a question, why do we also need to divide by n in stochastic gradient descent, isn't that we are going through each different point?

  • @r0cketRacoon

    @r0cketRacoon

    28 күн бұрын

    same question, do you have an answer for that?

  • @AlonAvramson
    @AlonAvramson3 жыл бұрын

    Thank you!

  • @codebasics

    @codebasics

    3 жыл бұрын

    You're welcome!

  • @sanooosai
    @sanooosai6 ай бұрын

    thank you

  • @1980chetansingla
    @1980chetansingla3 жыл бұрын

    Sir I tried this code for more than 2 inputs it is giving error in last line array with a sequence What to do

  • @md.muntasirulhoque8563
    @md.muntasirulhoque85633 жыл бұрын

    sir can u tell me why u se minmax scaler cant we use standard scalr ?

  • @pranavtiwari1883
    @pranavtiwari18833 жыл бұрын

    Waiting for CNN. and Deep learning projects

  • @vikrantgsai7327
    @vikrantgsai7327 Жыл бұрын

    For mini batch gradient descent, can the samples for the mini batch picked in any order from the main batch?

  • @muhammadafifhidayat2566
    @muhammadafifhidayat25662 жыл бұрын

    tutorial+meme = epic combination

  • @codebasics

    @codebasics

    2 жыл бұрын

    Glad you enjoyed it

  • @010-haripriyareddy5
    @010-haripriyareddy53 ай бұрын

    can we say With large training datasets, SGD converges faster compared to Batch Gradient Descent

  • @ashishmalhotra2230
    @ashishmalhotra22308 ай бұрын

    Hi, why did you do "y_predicted = np.dot(w, X.T) + b". Why is X transpose required here?

  • @abhaydadhwal1521
    @abhaydadhwal15212 жыл бұрын

    Sir i have a question ... in stochastic u wrote -(2/total_samples) in formula of w_grad and b_grad. But in mini-batch u have written -(2/ len(Xj). why the difference?

  • @diligentguy4679
    @diligentguy46795 ай бұрын

    Great Content! I have a question as to why you did not use an activation function here? Is it something we can do?

  • @jeethendra374

    @jeethendra374

    2 ай бұрын

    dude that's the same question got incase if you know the answer please share it with me

  • @diligentguy4679

    @diligentguy4679

    2 ай бұрын

    @@jeethendra374 nope