Support Vector Machines Part 2: The Polynomial Kernel (Part 2 of 3)

Support Vector Machines use kernel functions to do all the hard work and this StatQuest dives deep into one of the most popular: The Polynomial Kernel. We talk about the parameter values and how they calculate high-dimensional coordinates via the dot-product and high-dimensional relationships
NOTE: This StatQuest assumes you already know about...
Support Vector Machines: • Support Vector Machine...
Cross Validation: • Machine Learning Funda...
ALSO NOTE: This StatQuest is based on...
1) The description of Kernel Functions, and associated concepts on pages 352 to 353 of the Introduction to Statistical Learning in R: faculty.marshall.usc.edu/garet...
2) The Polynomial Kernel is also based on the Kernel used by scikit-learn: scikit-learn.org/stable/modul...
For a complete index of all the StatQuest videos, check out:
statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Patreon: / statquest
...or...
KZread Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshirt.com/statques...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
#statquest #SVM #kernel

Пікірлер: 424

@statquest2 жыл бұрын
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@davidonwuteaka2642
Жыл бұрын
How do I get it from Nigeria. I'd love to.
@statquest
Жыл бұрын
@@davidonwuteaka2642 Unfortunately I don't have distribution of physical (printed) copies in Nigeria, but you can get the PDF.
@davidonwuteaka2642
Жыл бұрын
Yes, I have been trying to but the site kept rejecting my card. Thanks for your reply.
@statquest
Жыл бұрын
@@davidonwuteaka2642 Bummer! I'm sorry to hear that.
@stanlukash333 жыл бұрын
I will make it easy for you guys: 3:38 - BAM 4:49 - DOUBLE BAM 5:54 - TRIPLE BAM
@statquest
3 жыл бұрын
Just the hits! BAM! :)
@MrPikkabo
3 жыл бұрын
Thanks I know statistics now
@RHONSON1002 жыл бұрын
Your videos should be mandatory tutorial for Data Science/ ML courses in all the Universities. Students throughout the world would get benefited after watching the best ML video.Hats off to you great Josh Starmer..............
@statquest
2 жыл бұрын
Wow, thanks!
@rameshmitawa2246
2 жыл бұрын
Not mandatory, but my prof recommends this channel after every slide/lecture.
@statquest
2 жыл бұрын
@@rameshmitawa2246 That's awesome!
@tinacole1450
11 ай бұрын
I believe because most instructors don't teach it. They simply give information ....Josh actually explains difficult concepts in a simple way.
@madhuvarun27903 жыл бұрын
Dude, You are amazing. The best tutorial on SVM. I have searched the entire Internet to understand but couldn't. Please continue to make videos.
@statquest
3 жыл бұрын
Thanks, will do!
@atharvapatil600310 ай бұрын
Best machine learning playlist I have encountered on the KZread . The animations and your funny way of teaching makes it easy to understand concepts. The amount of work you put to create these videos deserves great appreciation. I would definitely recommend to go through the videos for anyone who is reading this comment.
@statquest
10 ай бұрын
Glad you like them!
@jacobwalker6891Ай бұрын
I have read and looked at most recommended books and videos on kernels and whilst somewhat familiar with the math, never truly understood the principles. Statquest actually makes complex topics simple, arguably one of the best if not the best teacher on youtube and definitely the best stat explanations. Thanks Josh much appreciated 👍
@statquest
Ай бұрын
Thank you very much! :)
@itsfabiolous9 ай бұрын
Bro you're just a blessing. Never stop with the dry humor. Lot's of love for you!
@statquest
9 ай бұрын
Thank you! Will do!
@marcoharfe98124 жыл бұрын
I want to thank you so much for all your videos. I was lost in a forest of vectors matrices and greek letters when I heard about these topics in lecture and I did not understand a thing. As I was practising for the exam, I discovered your videos and now I do actually understand what is happening. Really love the practical, example driven approach!
@statquest
4 жыл бұрын
Awesome!!!! Good luck with your exam and let me know how it goes. :)
@606Add4 жыл бұрын
You are videos are simply amazing! And the level of abstraction is right at the sweet spot! Thank you for the extremely thoughtful and precise illustrations!
@statquest
4 жыл бұрын
Thank you very much! :)
@jonathannoll33864 жыл бұрын
My man. I'm so happy I have my presentation about SVM's after your uploads... Keep up the great work!
@statquest
4 жыл бұрын
Awesome! :)
@kwok92982 жыл бұрын
I really appreciate how the way it is explained. Please keep on the good job!
@statquest
2 жыл бұрын
Thank you!
@tymothylim65503 жыл бұрын
Thank you for this video! It was very helpful in terms of understanding the details of how the kernel function leads to certain equations that need to be solved to obtain the relevant Support Vector Classifier!
@statquest
3 жыл бұрын
Bam! :)
@priyangkumarpatel93174 жыл бұрын
This is one of the best explanation for support vector machines... If anyone is interested in why dot products are integral to the idea of SVM, please refer to Professor Wilson's MIT lecture on SVM... It is another great explanation for SVM...
@statquest
4 жыл бұрын
Thanks! :)
@deashehu25914 жыл бұрын
I have grown to love your little songs. They sound like Pheobe's songs!!! I have a little question , what do you use for visualization?
@statquest
4 жыл бұрын
Thanks! I draw all the pictures in Keynote.
@gargidwivedi7700
4 жыл бұрын
That's exactly what I and my sister agreed just before we saw your comment! haha.
@statquest
3 жыл бұрын
@Leila Mohammadzadeh Google "svm lagrange dual" and you will see how SVM uses the dot products to find the classifier.
@flaviodefalcao4 жыл бұрын
It is awesome and satisfing to be able to learn an intuition with these videos and reading a textbook understanding everything. THANKS
@statquest
4 жыл бұрын
Awesome! I'm glad the videos are helpful! :)
@flaviodefalcao
4 жыл бұрын
@@statquest BAM!!!
@amalboussere92704 жыл бұрын
thank you a lot you are such a big help in this harsh student world god bless you .
@statquest
4 жыл бұрын
I'm glad you like my videos! :)
@palashchandrakar1112
4 жыл бұрын
@@statquest we just don't only like them we love your videos XOXO
@leif1075
3 жыл бұрын
@@statquest this doesnt show where on esrth you dsrive that formula from..WHY do you multiply a times b and then add r .why not multiply all three or add all three..see what I mean? I don't see how anyone could figure itnout..not enough info here to derive it
@chenghuang4724 Жыл бұрын
Sir, this is the best video for explaining the Kernel!
@statquest
Жыл бұрын
Glad you think so!
@hayskapoy4 жыл бұрын
Would love to see more math after seeing the big picture behind these algorithms 😄
@ahming1234 жыл бұрын
What do you mean by high dimension relationship??
@huhuboss8274
4 жыл бұрын
like the distance but in higher dimensions
@Actanonverba01
4 жыл бұрын
a synonym for 'high dimension' is many features or variables. Relationship think connection(s). So if we have a high D. relationship, we have a set of many variables that are connected by some idea or mathematical formula. Does that help?
@BrandonSLockey
4 жыл бұрын
watch first video (Part I)
@leif1075
3 жыл бұрын
@@Actanonverba01 that's what I thought but that is irrelevant here because we only have obe variable with two possible categories of values. But of course we can add more connecfions and variables which I think is what you are alluding to
@clapdrix72
2 жыл бұрын
@@leif1075 It's not actually what he means and it's not irrelevant. High dimensional space means we take our original input feature space (in this case just X1) and transform it into higher dimensional space by "making up" new dimensions that are functions of our original dimensions (X1) so that the data is linearly separable in that new space. The pair wise relationships (aka similarity) are the dustances between the observations projected into that higher dimensional space (usually referred to as latent space). So it doesn't matter how many features you have in your original dataset nor how many outcome classes you have - those are irrelevant to the SVM algorithm mechanics, they only change the scale.
@MrZidane11283 жыл бұрын
First of all, thanks for your explanation, after plugging two data points into polynomial kernel function a and b then get the value 16,002.25, then you said we get higher dimensional relationship. Could you elaborate further what "relationship" did you refer to based on the value 16,002.25? Sorry I was not quite sure about that
@statquest
3 жыл бұрын
In some sense the "relationships" are similar to transforming the data to the higher dimension and calculating the distances between data points.
@vedgupta1686
2 жыл бұрын
@@statquest But the value 16002.25 alone is a 1-D data point. How do you suppose that helps us classify? Am I missing something?
@statquest
2 жыл бұрын
@@vedgupta1686 Think of that number is a loss value that is used as input for an iterative optimization algorithm like gradient descent.
@HeduAI
2 жыл бұрын
I thought the whole point of using the kernel trick was to save on the computation cost. If we are using an iterative algorithm anyway, how is that better than transforming the data?
@statquest
2 жыл бұрын
@@HeduAI Either way you would still have to use an iterative procedure. So that computation is fixed.
@yulinliu8504 жыл бұрын
Awesome! Josh is back.
@statquest
4 жыл бұрын
:)
@rrrprogram86674 жыл бұрын
After a lonnnnggg waitttt..... MEGAA MEGAAA MEGAAAA BAMMM is back
@statquest
4 жыл бұрын
Ha! Thank you! :)
@shahbazsiddiqi744 жыл бұрын
waited too long... Thanks a ton
@nightawaitsusall96074 жыл бұрын
You my friend are a champion. Yes.
@statquest
4 жыл бұрын
Thank you! :)
@billykristianto38186 ай бұрын
Thank you very much, the explanation is easier to understand compare to my class!
@statquest
6 ай бұрын
Glad it helped!
@evelillac97183 жыл бұрын
You literally saved my homework with your videos
@statquest
3 жыл бұрын
Bam!
@NathanPhippsONeill4 жыл бұрын
Amazing vid! Thanks helping me prepare for my Machine Learning exam 😁
@statquest
4 жыл бұрын
Good luck and let me know how it goes. :)
@NathanPhippsONeill
4 жыл бұрын
@@statquest It went well for a difficult exam. BUT I had a lot to write about thanks to this channel. Appreciate it ❤️
@statquest
4 жыл бұрын
@@NathanPhippsONeill Hooray!!! That's awesome and congratulations. :)
@vincent-paulvincentelli26273 жыл бұрын
Great video ! It would be very nice to have such an intuitive one for kernel PCA :)
@statquest
3 жыл бұрын
I'll keep that in mind.
@dok38202 жыл бұрын
Thank you Josh. Just..thank you
@statquest
2 жыл бұрын
:)
@johnjung-studywithme Жыл бұрын
This is how concepts should be introduced to students.. makes so much more sense
@statquest
Жыл бұрын
Thank you! :)
@TaylorSparks2 жыл бұрын
bam. love it homie. keep it up
@statquest
2 жыл бұрын
Thank you!
@dimitrismarkopoulos3964 Жыл бұрын
First of all congratulations! your videos are super explanatory! One question: The equation of the polynomial kernel has always the same form?
@statquest
Жыл бұрын
As far as I know. However, the variables might have different names.
@thawinhart-rawung463 Жыл бұрын
Good job Josh
@statquest
Жыл бұрын
:)
@sornamuhilan.s.p4 жыл бұрын
John Starmer, you are a genius sir!!
@statquest
4 жыл бұрын
Thank you! :)
@berknoyan75944 жыл бұрын
Hi Josh,Thanks for the video. You are helping me a lot. I have just one question. What do you mean by "high dimensional relationship"? Because It can be achieved by any 2 numbers that has multiplication result of 126 which is Infinite.Its just a dot product of two 3 dimensional data.Cross Validation uses misclassification rate to select best r and d as far as i know. Do CV use these numbers on any calculation?
@statquest
4 жыл бұрын
Cross Validation does not use these high-dimensional relationships. Instead, the algorithm that finds optimal fits, given constraints (like the number of misclassifications you will allow) uses them. Although the dot product seems like it would be too simple to use, it has a geometric interpretation related to how close the points are to each other. For more details, check out the Wikipedia article: en.wikipedia.org/wiki/Dot_product
@trashantrathore49952 жыл бұрын
Earlier i had an intuition of all Algos which was incomplete and which could not be explained to others, Concepts are getting cleared now. Thanks STATQUEST Team, Josh Starmer, will contribute ASA i get a job in DS field.
@statquest
2 жыл бұрын
bam! :)
@technojos3 жыл бұрын
Thanksss Josh Starmer.I am facinated because of your videos. Please make a video about how 16002.25 is used bam?. Moreover I think that you can make video playlist about how machine learning algorithms has coded double bamm . Keep going man, we love you triple bamm!!!
@statquest
3 жыл бұрын
Great suggestions!
@kevinarmbruster2724
3 жыл бұрын
@@statquest How is the relationship of 16.002,25 to be interpreted? I understood that if we transfer everything to the higher dimension we can solve it, but I did not understand the part about relationships between the points and how they help.
@statquest
3 жыл бұрын
@@kevinarmbruster2724 We plug the relationships into an algorithm that is similar to gradient descent and it can use them to find the optimal classifier. However, the details are pretty complex and would require another video.
@leonugraha4 жыл бұрын
Thank you for SVM follow up video, by the way, do you maintain a Github account?
@statquest
4 жыл бұрын
I should...
@temesgenaberaasfaw50764 жыл бұрын
best tutorial for SVM , YOU DID IT THANKS
@statquest
4 жыл бұрын
Thank you! :)
@muhammadiqbalbazmi92754 жыл бұрын
Sir, will you please give us a link to your presentation that you use in these videos.
@muhammadavimajidkaaffah77154 жыл бұрын
SVM for multiclass please, I like your video so much.
@hrdyam8654 жыл бұрын
Thanks for the videos 😊, Can we use SVM for multinomial classification?
@statquest
4 жыл бұрын
I believe you just create one SVM per classification, and each SVM compares one classification to all the others (i.e. a sample either has that classification or not).
@tuongminhquoc4 жыл бұрын
First comment! I have turned on notification for your videos. I love all of your videos!
@statquest
4 жыл бұрын
Awesome! Thank you! :)
@commentor932 жыл бұрын
I've understood more than I ever expected to understand in this topic all thanks to your videos. But now I've stumbled a bit: How do you solve a constant like the one in 5:50? Or what does solving mean in that context now that it isn't a formula? Could you please expand on that?
@statquest
2 жыл бұрын
Think of it as a loss value, and it is something we try to optimize with an iterative algorithm that is similar to Gradient Descent: kzread.info/dash/bejne/pXiqlshto5W5cps.html
@benardmwanjeya83714 жыл бұрын
God bless you Josh STARmer
@statquest
4 жыл бұрын
Thank you very much! :)
@sinarb28843 жыл бұрын
I could be wrong, but I think there is a slight mistake in this video. The kernel function should be of the form (ab-1/2)^2. This is because the support vector classifier is essentially thresholding based on whether x>y or not. Let me know please if I am wrong. And, thanks for your cool videos.
@statquest
3 жыл бұрын
Most people define it the way I defined it in the video, (ab + r)^d. For more details, see: en.wikipedia.org/wiki/Polynomial_kernel and Page 352 of the Introduction to Statistical Learning in R.
@marijatosic2174 жыл бұрын
Thank you for the video! And now, what does this number 16002.25 tell us? :D How will we know what the right dosage?
@statquest
4 жыл бұрын
That's just an example of the kind of values that are used by the kernel trick to determine the optimal placement of the support vector classifier.
@harshitsati3 жыл бұрын
Thank you angel
@statquest
3 жыл бұрын
bam! :)
@hamidomar36182 жыл бұрын
Hey, great video, thanks! What happens after the transformation though? I mean, how does the final result. i.e. a scalar corresponding to relationship between each observation, help in identifying an optimally classifying hyperplane?
@statquest
2 жыл бұрын
The value is used in a way similar to how loss values are used in Gradient Descent. There is an iterative algorithm that uses the values to optimize the fit.
@L.-..4 жыл бұрын
After we find the dot product, with that value how we decide whether the new sample belongs to positive class or negative class? Please clarify Josh.
@statquest
4 жыл бұрын
It's a little too much to put into a comment. The purpose of the video was only to give insight into how the kernel works, not derive the math.
@harithagayathri71854 жыл бұрын
Great explanation 👍 Thanks a ton Josh!!. But, a bit confused here on how to calculate appropriate 'r' coefficient for the eqn.I understand that 'd' value is calculated using Cross Validation
@statquest
4 жыл бұрын
'r' is also determined by cross validation, but I am under the impression that it doesn't have as much impact as 'd'. It basically scales things by a constant, rather than adding extra dimensions.
@thememace
3 жыл бұрын
@@statquest What's the point of setting r anyway since it later gets completely ignored?🤔
@statquest
3 жыл бұрын
@@thememace I'm not sure
@rohanpatel702
2 жыл бұрын
@@thememace it doesn't get completely ignored. When r=1/2, the math works out such that the x-axis doesn't get scaled at all. But when r=1, the x-axis gets scaled by sqrt(2). Even though the third element of the vectors combined by dot product is a constant (and thus ignored), the choice of r still affects how the dot product evaluates because of how it changes the first element of each vector.
@harishh.s47012 жыл бұрын
Hi, Thanks a lot for your content. It is very easy to understand and I appreciate your way of explaining things. I had one doubt. Can you please explain how does Cross-validation help to determine the optimal degree of the polynomial kernel used in SVM's?
@statquest
2 жыл бұрын
I do that in this video: kzread.info/dash/bejne/anVrrpKAo6XPfLQ.html
@alternativepotato3 жыл бұрын
i love u my man you really are a life saver. Just because of that i am gonna buy a tshirt
@statquest
3 жыл бұрын
BAM! Thank you very much! :)
@manaspatil43162 жыл бұрын
God bless you !!!
@statquest
2 жыл бұрын
:)
@tumul14744 жыл бұрын
this is damn amazing !!
@statquest
4 жыл бұрын
Thanks! :)
@61_shivangbhardwaj463 жыл бұрын
Thnx sir great explanation :-)
@statquest
3 жыл бұрын
Thank you! :)
@manasadevadas86853 жыл бұрын
First of all thankyou so much for explaining with such amazing illustrations. One doubt, how can we actually use relationship between points to find the support vector classifier?
@statquest
3 жыл бұрын
Unfortunately that's a difficult question to answer and I'd have to dedicate a whole video to it. However, the simple answer is that it uses a method like Gradient Descent to find the optimal values.
@manasadevadas8685
3 жыл бұрын
@@statquest Thanks for the response! Hopefully later you'd dedicate a whole video to it :)
@iisc2022 Жыл бұрын
thank you
@statquest
Жыл бұрын
Welcome!
@abrahamjacob73603 жыл бұрын
Josh, this is a great video. One question on the Polynormal Kernal derivation. So the original problem was to find a classification point to find drug usage limits that cures or doesnt cure the disease. When we increased the value of 2, you mentioned it introduced a second dimension. I understood, how squaring the value helped to find a better Marginal classifier line, but ideally there is no meaning to the y axis here right, because the case still remains the same. We are just finding if the drug usage had a positive or negative impact. we could still use the y axis to determine its efficity, but if we increase the value to 3, what would Z axis represent here. Sorry if the question was confusing
@statquest
3 жыл бұрын
The new dimensions don't mean anything at all - they are just extra dimensions that allow us to curve and bend the data so that we can separate it. The more dimensions, the more we can curve and bend the data.
@annusrivastava44253 жыл бұрын
To find the value of r and d, can we use GridSearhCV as well?
@statquest
3 жыл бұрын
Yes. GridSearchCV is just a way to do CV.
@muhtasirimran2 жыл бұрын
Mr. Starmer almost unconsciously changing machine Learning's future 😀
@statquest
2 жыл бұрын
:)
@axa35473 жыл бұрын
machine learning algorithimss!!! is it just me or other who has to learn these again n again to fill the gap in knowledge
@statquest
3 жыл бұрын
bam!
@ayoubmarah40634 жыл бұрын
Great content as usual BIG THANKS to you I hope you are having a nice day i have questions if you dont mind : i got confused with the problem of the imbalanced classes , when the classes are imablanced we do either upsampling or downsampling so that we have a balanced data 1) does the accuracy score always wrong using imbalenced data? what about f1_score then ? 2) how to decide which sampling method is good ? should we run them both ? i do my best to try and search for solution but there is so much opinion and im lost , i saw your video last week but when i got my hand dirty with projects i confront new problems that are complicated Thank you again for your help
@statquest
4 жыл бұрын
I'm glad you like the video. For details about how to use SVMs with unbalanced data, see this discussion: stats.stackexchange.com/questions/94295/svm-for-unbalanced-data
@edmondkeogh40573 жыл бұрын
the beep boop thing was hilarious
@statquest
3 жыл бұрын
:)
@jhfoleiss4 жыл бұрын
Great explanation, thanks! One question: what happens when a and b are vectors? I understand that in this quest you wanted to give a simple example (with a single feature) to make things clear. If the answer to this question is in another quest, i'll gladly wait for it :)
@statquest
4 жыл бұрын
If 'a' and 'b' are vectors (because you have measured more than one thing per observation), then you just multiply a^T b, where a^T = a transpose.
@primeprover
4 жыл бұрын
@@statquest Doesn't that assume all the features have the same impact on the outcome? I would have thought that some form of weighting in the sums in the dot product of a and b would be necessary.
@statquest
4 жыл бұрын
@@primeprover That's a good point. Like PCA, SVMs are sensitive to scale, so the first thing you would do is normalize all of the variables you've measured.
@primeprover
4 жыл бұрын
@@statquest Surely more than just normalization is needed? If you provide two normalized variables to a linear regression model they will each get their own coefficient. One could be 1 and the other 0.1. As far as I can see we seem to be giving all features a coefficient of 1 in the models you described? I would have thought that all but one of the additional features(the other would be 1) would need an extra model parameter to scale it in relation to the others.
@statquest
4 жыл бұрын
@@primeprover I think conceptualizing SVMs in terms of linear or logistic models can be a little misleading. The choice of the parameters for the kernels, unlike linear or logistic regression, do not represent a relationship between the data and the classification. All the SVM is doing is applying relatively arbitrary transformations to the data to increase the dimensionality in a way that might be helpful for separation.
@slirpslirp4 жыл бұрын
awesome, so the dot product is equal to the result of the kernel function ?
@statquest
4 жыл бұрын
yep!
@davydfridman3001 Жыл бұрын
Does anyone have a link to a good article that explains all the math behind the kernels?
@aryamahima32 жыл бұрын
@5:09, u said that we need to calculate dot product between each pair of point. How do we use this dot product further? could u please clear to me, u r the only person on whole internet who can clear this. :D
@statquest
2 жыл бұрын
We use it as input to an iterative optimization algorithm similar to gradient descent. For details on gradient descent, see: kzread.info/dash/bejne/pXiqlshto5W5cps.html
@aryamahima3
2 жыл бұрын
@@statquest thank u so much ☺️
@ronitganguly33182 жыл бұрын
The high dimensional relationship you calculated at the end is a number which tells what exactly? How does it help to pseudo transform into higher dimensions?
@statquest
2 жыл бұрын
Are you familiar with Gradient Descent? kzread.info/dash/bejne/pXiqlshto5W5cps.html SVMs use a different algorithm, but the idea is similar, and you can think of the numbers, like 16002.25 as values that the algorithm is trying to optimize.
@chinzzz3884 жыл бұрын
When we calculate relationships between 2 data points, do we calculate relationships between all the points w.r.t all the other points? Ex: if we have 4 data points (1,2,3,4) do we calculate relationship between (1,2) and (3,4) OR do we calculate relationship between (1,2),(1,3),(1,4),(2,3)...etc
@statquest
4 жыл бұрын
We calculate all of the relationships.
@digitalzoul573 жыл бұрын
Hi StatQuest. you said the 'a' and 'b' are two different observations is this means that the k(a, b) depends on the number of classes. For example, if I have 4 classes does it means k(a, b, c, d)?
@statquest
3 жыл бұрын
I'm not sure how this works with more than 2 classes. Usually when there are more than 2 classes, people create one classifier per class and do 1 vs all other classification. So each classifier is still only separating 2 classes.
@preeethan4 жыл бұрын
Amazing explanation:) We find the High Dimensional Relationship between 2 points to be 16002.25. Practically what do we do with this value.? How do we find the Support Vector Classifier with this value.?
@statquest
4 жыл бұрын
It's quite complicated - way too complicated to be described in a comment.
@preeethan
4 жыл бұрын
StatQuest with Josh Starmer Okay. I love all you videos, especially your intro songs! Great work keep it going Josh :)
@sanjivgautam9063
4 жыл бұрын
I want this answer too!
@balasubramanian5232
3 жыл бұрын
@@statquest I want answers for the question. It'll be helpful if you could share links to resources on this
@statquest
3 жыл бұрын
@@balasubramanian5232 Google "svm lagrange dual" and you will have lots and lots of resources.
@marcelocoip72752 жыл бұрын
Visually thinking about the last set of data: if you can draw a line to separate the data if you square each observation to the y-axis, then you can draw a line independently of the scale/ratio of the x-axis. Then I see is that the only thing that it is adding "solving/math value" is increasing the order of the xi-axis to fit a hyperplane (d value). What r contributes to arrive to a better solution?
@statquest
2 жыл бұрын
I don't think it adds much.
@stoicism-1012 жыл бұрын
Dear Sir, Kernels are basically used for finding the relationship between two points using the formulae. How do we further find the Support vector classifier?
@statquest
2 жыл бұрын
The SVC is found using an iterative process that is a lot like Gradient Descent, and the output from the kernels is like the "loss" values.
@redaouazzani71204 жыл бұрын
Great explanation ! But what are the math reasons to choose RBF Kernel or Polynomial Kernel ? It depends on what ?
@statquest
4 жыл бұрын
Usually people just start with the RBF kernel and see how well it performs. If it doesn't do well, they might try to polynomial kernel.
@shivakiranreddy46542 жыл бұрын
Hi Josh, I couldn't get how 16002.25 will help us in drawing the Support Vector Classifier, In comments below you mentioned: "In some sense the "relationships" are similar to transforming the data to the higher dimension and calculating the distances between data points." even this above explanation did not help, if 16002.25 is one of the 2-dimensional relationships that we need to solve for the support vector classifier, what is the other one? how do we get the classifier?
@statquest
2 жыл бұрын
Are you familiar with Gradient Descent? kzread.info/dash/bejne/pXiqlshto5W5cps.html SVMs use a different algorithm, but the idea is similar, and you can think of the numbers, like 16002.25 as values that the algorithm is trying to optimize.
@zheyuanzhou31654 жыл бұрын
super clear tut. Thank you very much! But as a non-English native speaker, I am a little confused, what is BAM trying to express?
@statquest
4 жыл бұрын
kzread.info/dash/bejne/m2idt9ijo6qpfcY.html
@zheyuanzhou3165
4 жыл бұрын
@@statquest A tut for BAM! cool lol
@harshitamangal88614 жыл бұрын
Hi Josh, the explanation is amazing. I had a question- you said that the equation (a*b + r) ^d is used for finding the relationship between two points, how is this found relationship used for getting where the Support Vector Classifier?
@statquest
4 жыл бұрын
Unfortunately the details of how it is used would require a whole video and I can't cram it into a comment. However, making the video is on the to-do list.
@rajatsankhla92612 жыл бұрын
Hii Josh could you help me understand how one should choose the value of r in the kernal function.
@statquest
2 жыл бұрын
In theory, cross validation would work. This is not something I've done before but my guess is that it might not matter much.
@MrWincenzo4 жыл бұрын
since the kernel requires to calculate the dot product for each couple of points, suppose we have 10 points when we do it just for each point with respect to the others and itself we should obtain 10 different dot products for each single point. Which one of those 10 dot products become the new "y" dimension of the point?
@statquest
4 жыл бұрын
None of them end up being the new "y" dimension. The kernel trick works without having to make that transformation. We use the transformation to give an intuition of how the process works, but the kernel trick itself bypasses the transformation. This is the "kernel trick", and I mention it in the first video in the series on SVMs: kzread.info/dash/bejne/l5qGk6Vvc9nOnag.html
@MrWincenzo
4 жыл бұрын
@@statquest yes i misunderstood before, now i got it: when we plug the values into the polynomial expression is equivalent to calculate the dot product in higher dimensions. And since the SVM only depends on those dot products among point we have just "improved" the classification mimicking the dot product in higher dimensions as musch as infinite like with RBF. Still thank you for all your efforts and your gentle replies to our questions. Regards.
@yancheeyang29183 жыл бұрын
Question: The video ends as getting the relationship. I wonder what does it mean and how can we get the optimal hyperplane from here? thanks!
@statquest
3 жыл бұрын
It's an iterative method that is like gradient descent.
@pratyanshvaibhav Жыл бұрын
respected josh sir, thank you for such amazing explanation..sir please help me i have a doubt. will we take the dot products for every pair of points like first red point with all the green points and then so on or we will take first red point with first green point and so on..
@statquest
Жыл бұрын
All pairs
@pratyanshvaibhav
Жыл бұрын
Thank you sir
@p-niddy2 жыл бұрын
What does the "relationship" between two points actually signify? Based on this video, it looks like a number without much meaning that you can map onto the graph.
@statquest
2 жыл бұрын
It has no use for us. However, the algorithm that finds the optimal support vector classifier can use those values to do it's job.
@beshosamir8978 Жыл бұрын
quick question : why it is useful to calculate the relationships between every two point regardless in any dimensions , how it can be useful for calculating the decision boundary ?
@statquest
Жыл бұрын
SVM's are optimized using an iterative algorithm that is similar to Gradient Descent, and the relationship values are essentially the "loss" values and help move the SVC to the correct spot.
@beshosamir8978
Жыл бұрын
@@statquest So how to know That Is the best dimension i'm looking for according the relationship between every two points?
@statquest
Жыл бұрын
@@beshosamir8978 www.cs.cmu.edu/~epxing/Class/10701-08s/recitation/svm.pdf
@suyashmishra88212 жыл бұрын
Hello sir, In the above example It was clear that new transformed axes were a,a^2 but It wasnt clear the mechanism how classifier draws line. Do we get the equation of that classification line from kernel function,dot product or something related?
@statquest
2 жыл бұрын
The output of the kernel function (the dot-products) is fed into an iterative algorithm (similar to gradient descent) to find the optimal support vector classifier.
@RowoonSamshu
2 ай бұрын
I don't understand why we need to calculate the dot products at all. I have a basic idea that the loss function for svm includes calculation of dot products between the observations but I don't understand the intuition behind it. i.e. what actually dot products (similarities) between observations do in finding the hyperplane that classifies the observations. And also they say we have to minimize |w| to get the optimal hyperplane but what is the geometrical intuition behind minimizing the |w|
@DeepakSingh-fo2wm4 жыл бұрын
I am still not clear what happened after finding a relationship in higher dimension like in the video what happened after finding 16002.25 ?? Can you please add a short video over the same if possible.
@statquest
4 жыл бұрын
It would be a long video, but it's on the to-do list.
@Beenum15154 жыл бұрын
What I understood the function of kernel is to transform the data into high dimension so that there exists a classifier in that dimension which seperates those points. Right? If yes than why not just square each value instead of getting each pair to kernel function?
@statquest
4 жыл бұрын
The kernel provides us with the high-dimensional relationships between points without actually doing the transformation. This is the "kernel trick" and it saves time and makes it possible to determine the relationships between points in infinite-dimensions (which is what the radial kernel does).
@eric7522 жыл бұрын
One suggestion: if at the beginning, if the all the topics are listed in a logical way, it would even better. Big thanks for the videos, really appreciate it 🙏
@statquest
2 жыл бұрын
Thanks!
@eric752
2 жыл бұрын
@@statquest thank you
@vianadnanferman97522 жыл бұрын
Thanks for the amazing video. Please, if I have 100 samples for training and each with 5 features and try to apply 2D polynomial. So how my data is converted to a higher dimension? in other words what do we mean by the two input vectors in the polynomial equation?
@statquest
2 жыл бұрын
In this case, the vectors for each point, a and b, contain values for all 5 features and the arithmetic shown in the video is applied to the values for all 5 features.
@vianadnanferman9752
2 жыл бұрын
@@statquest Thanks Dr Josh. Actually, my question is about how to choose the vectors themselves? As I understand from your video, I guessed that the relationship is taken of each point with all others, so the the resulted points are higher in number than the previous! And the features dimensions are already enlarged! Am I right?
@statquest
2 жыл бұрын
@@vianadnanferman9752 Forgive me if I'm just repeating what you wrote, but we calculate all of the different relationships between all of the points. The process of calculating the relationships gives us the features in the higher dimensional space.
@tinacole145011 ай бұрын
Does anyone laugh at how silly yet genius Josh is? Loved the robot.. I rewinded to do the robot.
@statquest
11 ай бұрын
You are my favorite! Thank you so much! I'm glad you enjoy the silly sounds.
@utkarshagrawal47082 жыл бұрын
Any resources for understanding why the dot product?
@statquest
2 жыл бұрын
I'm not sure I fully understand your question - but I'm guessing you are asking how the dot product leads to the optimized support vector classifier. Think of it as the loss function that we use for gradient descent.
@hassanjb834 жыл бұрын
At 6:33 you mention that we need to determine the value of both r and d through cross validation. If we have one dimensional data then shouldn't be d = 2 only?
@statquest
4 жыл бұрын
Why do you say that?
@hemersontacon3168
4 жыл бұрын
I think you got too attached to the example. Imagine the same example but with the two colors all mixed up. Then I think that d = 2 would not be enough to split things up!
@ccuny1
4 жыл бұрын
@@hemersontacon3168 That's an insightful comment that actually opened my eyes. Thank you.
@hemersontacon3168
4 жыл бұрын
@@ccuny1 Glad to know and glad to help ^^
@repackgamers5191 Жыл бұрын
from where did you get values (9,14) at 5:35
@statquest
Жыл бұрын
Those are the x-axis coordinates for those points and correspond to different dosages of a drug.
@varunjindal15203 жыл бұрын
Hello Josh, I have a question. Could you please address? We took the dot product of a and b and found the high dimensional relationship to be 16002 without even transforming to 2D. Do we need to do it for every pair and solve it separately? How do we solve this number, is there any next part to this?
@statquest
3 жыл бұрын
We do this for every pair. Once we have the numbers, an iterative procedure, similar to Gradient Descent (but not exactly the same), can find the optimal boundary.
@varunjindal1520
3 жыл бұрын
@@statquest Thank you so much.
@geo1997jack3 жыл бұрын
I did not understand what that 16000 value means or how it helps us. Could you please clarify? Everything else was crystal clear :)
@statquest
3 жыл бұрын
It's used as a measure of the relationship between two points. Once we calculate the relationships between all of the points, they are used in a method similar to Gradient Descent to find the optimal classifier.
@lonandon2 жыл бұрын
What does the result of the dot product mean when it represents the relationship of two dots?
@statquest
2 жыл бұрын
It's the input to an iterative algorithm, much like gradient descent, that can find the optimal classifier.
@abhishekanand59743 жыл бұрын
What exactly is meant by relationships between observations?
@statquest
3 жыл бұрын
It's some metric of distance.
@nafassaadat83263 жыл бұрын
BAMMMMMMMMMM!!!!!!!!!!!!!!!!!!!!! great , thank you
@statquest
3 жыл бұрын
You're welcome!!
@slirpslirp4 жыл бұрын
so polynomial kernel takes the x values of two one dimensional vector (in this example a and b) and returns a single number ? well, how this work when i want to classify a single vector a ? i mean , i need all the pair kernel relationship to do so?
@statquest
4 жыл бұрын
@Marco Unfortunately, that's beyond the scope of this video (and this comment!). That said, once you have the support vector classifier (the line separating the two categories), you simply plug in 'a' into the equation for that line and "do the math". If the output is positive, you classify one way, and if the output is negative, you classify another. For more details, see page 340 in The Introduction to Machine Learning.
@ruowentan27714 жыл бұрын
Sorry, why green dots are below red dots in 2D? If you compute dosage^2, green dots should be above the first few red dots, right?
@statquest
4 жыл бұрын
What time in the video are you referring to?