Stochastic Gradient Descent vs Batch Gradient Descent vs Mini Batch Gradient Descent |DL Tutorial 14
Stochastic gradient descent, batch gradient descent and mini batch gradient descent are three flavors of a gradient descent algorithm. In this video I will go over differences among these 3 and then implement them in python from scratch using housing price dataset. At the end of the video we have an exercise for you to solve.
🔖 Hashtags 🔖
#stochasticgradientdescentpython #stochasticgradientdescent #batchgradientdescent #minibatchgradientdescent #gradientdescent
Do you want to learn technology from me? Check codebasics.io/?... for my affordable video courses.
Next Video: • Chain Rule | Deep Lear...
Previous video: • Implement Neural Netwo...
Code of this tutorial: github.com/codebasics/deep-le...
Exercise: Go at the end of above link to find description for exercise
Deep learning playlist: • Deep Learning With Ten...
Machine learning playlist : kzread.info?list...
Prerequisites for this series:
1: Python tutorials (first 16 videos): kzread.info?list...
2: Pandas tutorials(first 8 videos): • Pandas Tutorial (Data ...
3: Machine learning playlist (first 16 videos): kzread.info?list...
#️⃣ Social Media #️⃣
🔗 Discord: / discord
📸 Dhaval's Personal Instagram: / dhavalsays
📸 Instagram: / codebasicshub
🔊 Facebook: / codebasicshub
📝 Linkedin (Personal): / dhavalsays
📝 Linkedin (Codebasics): / codebasics
📱 Twitter: / codebasicshub
🔗 Patreon: www.patreon.com/codebasics?fa...
Пікірлер: 250
Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
I've followed your words to implement the minibatch gradient descent algorithm myself and learned a lot after wathing your implementation about that, thank you very much.
The world is better with you in it!
@codebasics
2 жыл бұрын
Glad you liked it Ryan and thanks for the donation
After So many videos I watched to learn ML (Self learn, I am complete noob in ML currently), this playlist might be the best one I got on youtube! Kudos man. Must respect
Hello Sir, i am following your tutorials by sitting in Germany. You made thing's so simple. Better then Udemy,coursera,.. etc courses. I highly recommend. Please take care of your health as well and hopefully you will be fatter in coming Video 🙂
When you understand the topic you can explain it easily, and you are a sir, are a master. thanks.
when you explain I find deep learning very easy and interesting. Thank you sir!
Thank you for your patient and easily understood explanation which solved my question !!!
I spent days trying to learn gradient descent and its types. Happy you cleared the mess. Thanks again teacher
You are the best teacher I have come across. you bring understanding in a humble way. Stay blessed.
Ekdum jordaar bhanavo sir tame, Bov ucchu :)))))
This is called teaching, love your teaching sir!!
Video was fun to watch and the jokes helped keep me focused. Thanks for this :)
@codebasics
3 жыл бұрын
Glad you enjoyed it!
At about 14:43, a clarification may help someone as to why the Transpose is required. For Matrix product, the thumb rule is that Columns of the 1st matrix must be the same as the rows of the 2nd matrix. since our "w" is 2 columns, the "X_scaled" has to be transposed from a 22X2 matrix into a 2X22 matrix. Yes, the resulting matrix will be a 22 column, 2 rows matrix.
@mikeguitar-michelerossi8195
Жыл бұрын
Why don't we make np.dot(scaled_X, w)? Should give the same result, without the transpose operation
@ankitjhajhria7443
Жыл бұрын
w.shape is (2*1) means 1 column and x_scaled.T has (2*20) means 2 rows ? your rule does not follow why ?
You are my best teacher. I am becoming a big fan of such a great teacher.
Good day to you sir! I'm currently an undergraduate in Computer Science, currently working on a paper that is using this neural network. This tutorial helped me understand the neural network pretty quick and helped me adjust our software to function how we intend it to. Please keep up the good work and hope that other students like me can come across and use this in their upcoming studies!! Godspeed on your future content!!
@codebasics
2 жыл бұрын
Best of luck! and I am happy this video helped
love how you always put memes in your videos HAHA, great work!
@zhaoharry4113
3 жыл бұрын
and thank you for the videos Sir :3
thanks, i understood finally gradient descent, sgd & mini batch
Sincerely appreciate for your enrich content, it helps me a lot!
@codebasics
2 жыл бұрын
Thanks for the generous donation 🙏👍
Thank you so much sir Now really dont have porblem with Gradient Descent and the exercise at last helps alot!!
Tks a lot for the detailed explanation...learned a lot...
Lol i do not want go to sleep and i dont have enough money to watch netflix, so i just take care of my career sir
Excellent explanation, keep up the good work 👏
Sir, your vedios always answer my all queries around the topics...Thank you so much sir..
Great videos and in simplicity in detailed explanation with coding is super.............
Thank you so much, sir! I think you taught way better than my university lecturer and helped me understand much better!
@codebasics
2 жыл бұрын
👍👍👍☺️🎉
Your explanation is excellent. It would be great if you could make a computer vision playlist.Did you make any plans for it?
hats of to you for making this topic easy to understand
@codebasics Loving your videos so far. The way you present the examples and explanations, things really seems to be easy to understand. Thanks a lot for thoughtful content! Just one request, can you please share the PPT you're using as well?
@tarunjnv1995
Жыл бұрын
@codebasics Yes, your content is really outstanding. Also for quick revision of all these concepts we need ppt. Could you please provide it?
Hi! Sir i am from pakistan i am following your tutorials, thank you very much for such an amazing guiding material.
Your Videos are really helpful, you are so good in coding, it takes time for me to understand. But Thank you so much for making it simple!
@codebasics
3 жыл бұрын
I am happy this was helpful to you.
I think you have just made everything easy and clear. Thanks a lot . You have just allayed my fears to learn Deep learning.
@codebasics
3 жыл бұрын
Glad to hear that
Super explanation skill that you have!!!
Lucid explanation. Thank you
Thank you man for this perfect explanation
Sir Big Fan ….best and simple explanation
Really well explained in simple terms!
@codebasics
3 жыл бұрын
😊👍
Good Bro, The way u used memes is expectational It makes learning fun.
you are great teacher thank you so much sir
Thanks for the video , wish you all the best
@codebasics
3 жыл бұрын
I am glad you liked it
Great explanation !!!
If someone is facing trouble in the value of w_grad, b_grad, here is my explanation, please correct me if somewhere i am wrong - I think the error is calculated using the formula (y_predicted - y_true)**2, if u notice in the starting. Hence total error in that case will be mean of all the errors found. However when u do the derivate square term i.e. error **2 will also give 2 in the front ( By derivation of x**2) and along the weight it is showing 2 in front. The -ve value which u are seeing is just reversal of (y_true - y_predicted) in this video. As in previous video it was (y_predicted - y_true). Also if somehow u are getting confused in the transpose implementation of the matrix as the one which is shown here is little different then the one video 13 , then u can use below code for w_grad, b_grad. They will give u the exact value. # Similarity from video 13 while finding w1 , w2, bias - w_grad = ( 2 / total_samples )*np.dot( np.transpose( x ), ( y_predicted - y_true )) . b_grad = 2 * np.mean( y_predicted - y_true ).
Thanks a lot sir. Love from Bangladesh!
The explanation was very clear. What if the input data X has outliers then if one takes a small batch size then one can't just compare the last two values for theta or cost function. What shall be the convergence condition then? Please explain
Very nice explanation. Could you please let me the parameter value while training (for SCD, mini batch and batch) using Keras.
thankyou sir, good tutorial.❣💯
Thanks for making this tutorial I think am getting somewhere
You are an actual legend
you have a talent at teaching cheers!
@codebasics
3 жыл бұрын
Glad you enjoyed it
Awesome teaching skills, nice work
@codebasics
3 жыл бұрын
Glad you think so!
You are truly talented in teaching
@codebasics
3 жыл бұрын
👍☺️
At 3:21, why do you need 20 million derivatives. It would just be 3 derivatives - 2 for features and 1 for bias. Isn't it? If so, please update it so that audience are not confused.
@JunOfficial16
2 жыл бұрын
I have the same question. for the first epoch, 3 derivatives, the second would be 3 more, and so on. so the number of derivatives depends on how many epochs we go through, right?
@JunOfficial16
2 жыл бұрын
And with SGD, at every sample, we calculate 3 derivatives until the error is minimized. If the err is not minimized to 0, it would go through 10m samples, and that would be 10m x 3 = 30m derivatives.
17:26 --> Sir it looks like the derivative formulae for w1 and bias are different than what you had shown in previous video.
Excellent lecture
Excellent bro.
Thanks for uploading such informative and helpful videos. I am really enjoying it and looking forward to use it in my MS works. Please let me know where I can find the input data like the .CSV file. I could not find it in the link provided in the description.
Waiting 😍........Also make video on optimizers pls
@codebasics
3 жыл бұрын
👍😊
thanks u r really good.
Wonderful as always.
@codebasics
2 жыл бұрын
Glad it was helpful!
Thank you so much and that cat trying to learn, mini batch gradient, descent is so relatable. In fact, that’s the reason I’m here. My cat is a nerd. We were partying, and then my cat the party pooper he is asked what is mini batch gradient descent and he kind ruined the party. He always does this last time he was annoying everyone by trying to explain what bullion algebra is What a nerd
Great content and tutorials, thank you so much.🙏 But I have a few questions: When do you implement early stopping to prevent overfitting? Aren't you supposed to stop training the moment the loss function value increases compared to the last iteration? For instance the zig-zag pattern for the loss displayed by SGD, is that just fine?
You are the best my boss
great series
many thanks!
This video is indeed really good !
@codebasics
3 жыл бұрын
Glad it was helpful!
Very nice lesson
good explication thank for you for your effort , keep going bro ;)
@codebasics
2 жыл бұрын
I am happy this was helpful to you.
Great video.
thank you for the video
Ha ha this is a new style of teaching, liked too much 😍 😍 😍 and definitely i am going to open the solution part, however i have already vaccines for my computer in codebasics....
@codebasics
3 жыл бұрын
Ha ha. So chinmay ram is the first one to invent the vaccine for corona 😊 you should get a novel prize buddy 🤓 how is it going by the way? Are you still in Orissa or back to Mumbai?
@Chinmay4luv
3 жыл бұрын
@@codebasics that prize will dedicate to codebasics, i am in odisha continuing wfh....
really useful tnq
Thanks for the video. I noticed one thing. In SGD you didn't change the partial derivative formula of cost function (but cost function had changed).
@r0cketRacoon
28 күн бұрын
the same question, I wonder why do we need the derivatives divided by total samples when we only pick a stochastic sample? Have u figured out the answer?
Great session
Thank you so much sir
love the party cat!
Thanks a lot.
Also make video on optimizers pls sir
I think there is a need for improvement while explaining scaling at a timeline near 9:45. As per your explanation, the scaling is making it look like 2D but as I think since the data is derived from a column so it's natural occurrence is of column so it appears to be 2D
very interesting
that's totally great
@codebasics
3 жыл бұрын
glad you liked it
I was following ur playlist and it's very helpful. But from where I can get the data u used ? So that I can work on it
superbb
Thank you bhai
I have a question why do you calculate cost for each epoch? if you would like to plot the costs for each 5 or 10 steps, is it logical to calculate the costs only at for every 10 th or 5 th step?
Like the way of your teaching less theory more coding
Thank you for sharing your knowledge on the subject with very good and detailed explanation. I have a question with reference to the slide shown at time 3:29. When configured to do batch gradient descent, and there are 2 features with 1 million samples, why is the total number of derivatives equal to 2 million? Isn't it 2 derivatives per epoch? After going through all the 1 million samples you calculate the MSE and then do back propagation to optimise W1 and W2. Am I missing something?
Great series of tutorials. I would like to know for this tutorial (#14), why the implementations of Stochastic Gradient Descent or Batch Gradient Descent did not include an activation function? Thanks.
@r0cketRacoon
28 күн бұрын
no need to do that because this is a regression task, just classification problems that use sigmoid or softmax
question , why do you have to do 20 million derivatives for 10 million samples? The number of derivatives you have to do should be equal to the number of W's and B's.
@danielahinojosasada3158
2 жыл бұрын
Remember that there are multiple features. One sample --> multiple features. This means calculating multiple derivatives per sample.
@rahulnarayanan5152
2 жыл бұрын
@golden water Same question
@uttamagrahari
Жыл бұрын
Here in these 10 million samples there are 10 million weights and 10 million biases. So we have to do derivatives for every weight and bias, so we have to do 20 million derivatives while updating for the new weight and bias.
I have a question, why do we also need to divide by n in stochastic gradient descent, isn't that we are going through each different point?
@r0cketRacoon
28 күн бұрын
same question, do you have an answer for that?
Thank you!
@codebasics
3 жыл бұрын
You're welcome!
thank you
Sir I tried this code for more than 2 inputs it is giving error in last line array with a sequence What to do
sir can u tell me why u se minmax scaler cant we use standard scalr ?
Waiting for CNN. and Deep learning projects
For mini batch gradient descent, can the samples for the mini batch picked in any order from the main batch?
tutorial+meme = epic combination
@codebasics
2 жыл бұрын
Glad you enjoyed it
can we say With large training datasets, SGD converges faster compared to Batch Gradient Descent
Hi, why did you do "y_predicted = np.dot(w, X.T) + b". Why is X transpose required here?
Sir i have a question ... in stochastic u wrote -(2/total_samples) in formula of w_grad and b_grad. But in mini-batch u have written -(2/ len(Xj). why the difference?
Great Content! I have a question as to why you did not use an activation function here? Is it something we can do?
@jeethendra374
2 ай бұрын
dude that's the same question got incase if you know the answer please share it with me
@diligentguy4679
2 ай бұрын
@@jeethendra374 nope