How do I select features for Machine Learning?

Selecting the "best" features for your Machine Learning model will result in a better performing, easier to understand, and faster running model. But how do you know which features to select?
In this video, I'll discuss 7 feature selection tactics used by the pros that you can apply to your own model. At the end, I'll give you my top 3 tips for effective feature selection.
WANT TO JOIN MY NEXT WEBCAST? Become a member ($5/month):
/ dataschool
=== RELATED RESOURCES ===
Dimensionality reduction presentation: • Vishal Patel | A Pract...
Feature selection in scikit-learn: scikit-learn.org/stable/module...
Sequential Feature Selector from mlxtend: rasbt.github.io/mlxtend/user_g...
== WANT TO GET BETTER AT MACHINE LEARNING? ==
1) WATCH my scikit-learn video series: • Machine learning in Py...
2) SUBSCRIBE for more videos: kzread.info?su...
3) ENROLL in my Machine Learning course: www.dataschool.io/learn/
4) LET'S CONNECT!
- Newsletter: www.dataschool.io/subscribe/
- Twitter: / justmarkham
- Facebook: / datascienceschool
- LinkedIn: / justmarkham

Пікірлер: 218

  • @mustafabohra2070
    @mustafabohra20704 жыл бұрын

    Even google can't provide so exact answer to the feature selection as you have comprehended in 10mins!!!! Thank you so much!!!

  • @dataschool

    @dataschool

    4 жыл бұрын

    You're very welcome! 👍

  • @hadyaasghar7680
    @hadyaasghar76805 жыл бұрын

    Hey, Kevin, your content is great. I did a whole project by taking help solely from your content 😊

  • @dataschool

    @dataschool

    5 жыл бұрын

    That is awesome to hear! Congratulations on your project 🙌

  • @Tessitura9
    @Tessitura94 жыл бұрын

    Very concise, right to the point, and no convoluted lingo. Thank you!

  • @dataschool

    @dataschool

    4 жыл бұрын

    Thank you!

  • @MrDavisv
    @MrDavisv5 жыл бұрын

    Thank you so much Kevin! Your response was very succinct and clear! I actually showed your video to my colleagues during our machine learning Friday sessions at work and we all loved it. It was a timely topic for us since we’re all fairly new to building ML models.

  • @dataschool

    @dataschool

    5 жыл бұрын

    You are very welcome, Davis! Thanks so much for sharing the video with others, and I'm so glad it was helpful!

  • @AnPham-sc6eo
    @AnPham-sc6eo2 жыл бұрын

    It is filled with information and is so easy to venture through. Thank you for making it available to all of us.

  • @dataschool

    @dataschool

    2 жыл бұрын

    You're very welcome!

  • @marcelaugustoborssatocorta1839
    @marcelaugustoborssatocorta18395 жыл бұрын

    Great video, again. Thanks so much for sharing these valuable tips.

  • @dataschool

    @dataschool

    5 жыл бұрын

    You're very welcome! Glad it was helpful to you.

  • @lonewolf2547
    @lonewolf25475 жыл бұрын

    This video was by far the best video on feature selection

  • @dataschool

    @dataschool

    5 жыл бұрын

    Awesome, thanks so much! :)

  • @datapeek
    @datapeek2 жыл бұрын

    Great tutorial and the way you simplified entire dimensionality reduction aka feature selection is awesome

  • @dataschool

    @dataschool

    Жыл бұрын

    Glad it was helpful!

  • @achmadrifkiraihansyahbagja2113
    @achmadrifkiraihansyahbagja21132 жыл бұрын

    Your channel is great!! The videos are great for beginner and people whose English is not their native language because your voice is sooo clearrr to understand.

  • @dataschool

    @dataschool

    2 жыл бұрын

    Wow, thank you!

  • @DesiAtlas
    @DesiAtlas5 жыл бұрын

    Best school too learn. I am learning it by my self as I I don'have enough bills toh py the fee. I have learned complete pandas from you thanks alooot, fantastic work and bless you

  • @dataschool

    @dataschool

    5 жыл бұрын

    That's awesome to hear! Good for you!

  • @rockroll28
    @rockroll283 жыл бұрын

    Unfortunately Most underrated channel on KZread.

  • @dataschool

    @dataschool

    2 жыл бұрын

    You are too kind! 🙌

  • @ahmarhussain8720
    @ahmarhussain8720Ай бұрын

    great explanation. no extra unnecessary stuff

  • @dataschool

    @dataschool

    Ай бұрын

    Glad it was helpful!

  • @yunes7305
    @yunes73053 жыл бұрын

    Lot of insights in your lecture. Thanks

  • @dataschool

    @dataschool

    3 жыл бұрын

    You're welcome!

  • @meetmeraj2000
    @meetmeraj20004 жыл бұрын

    wonderfully explained!!

  • @dataschool

    @dataschool

    4 жыл бұрын

    Thank you!

  • @msnbmnt
    @msnbmnt Жыл бұрын

    Easily one of the best data science videos on KZread.

  • @dataschool

    @dataschool

    Жыл бұрын

    Thank you so much!

  • @atulmishra5892
    @atulmishra58922 жыл бұрын

    Hi Kevin, Great video on feature selection techniques, but i have more complex question for feature selection strategy. I have a pool of 2k features and it turns out that according to business knowledge sometimes, the LOW CORRELATED FEATURES are more important than the HIGHLY CORRELATED ones. We use normal Pearson Correlation strategy to select the features but that always gives us the high correlated features when top 10 features are opted for. We need to improve on this and i am exploring SelectKBest Methodology as it helps in checking the significance of the correlation too. What else do you suggest, we can do in order to resolve such kind of issue!? Thanks, Atul

  • @djamila920
    @djamila9205 жыл бұрын

    easy to understand your explanation thank you !

  • @dataschool

    @dataschool

    5 жыл бұрын

    You're welcome!

  • @saragorzin8797
    @saragorzin87975 жыл бұрын

    Thank you for your great and helpful videos

  • @dataschool

    @dataschool

    5 жыл бұрын

    You're very welcome!

  • @khawjafarhanDataAnalyst
    @khawjafarhanDataAnalyst4 жыл бұрын

    Really good tips for feature selection.

  • @dataschool

    @dataschool

    4 жыл бұрын

    Thanks!

  • @fernandonakamuta1502
    @fernandonakamuta15023 жыл бұрын

    Great video!

  • @ChetanRane1993
    @ChetanRane19935 жыл бұрын

    Awesome explaination of concept

  • @dataschool

    @dataschool

    5 жыл бұрын

    Thanks!

  • @7810
    @78105 жыл бұрын

    Awesome lesson! This topic is quite important in text classification while the number of words and phrases extracted from text are somehow overwhelmed.

  • @dataschool

    @dataschool

    5 жыл бұрын

    Thanks! You might like this video as well: kzread.info/dash/bejne/jJ1_r6uuiczKiZM.html

  • @arzoo_singh
    @arzoo_singh3 жыл бұрын

    Feature selection and labelling is key ,so what steps we can take ? 1) Focus on question : What does it or you want,there may be so many features what matters to you most then drop the useless features for that project. 2)Visualize the data and plot . 3)Backtestting model : If time is not factor try various features and see the output.

  • @dhristovaddx
    @dhristovaddx3 жыл бұрын

    This is a great video. The way you explain is very easy to understand. Great job! I just have a few questions to ask, if that's okay... How do you do feature selection on categorical variables? Is it a good idea to one hot encode them and then for example use the SelectKBest algorithm? (I've read that it isn't because it's not a good idea to remove dummy variables unless you drop only the first one) So yeah, are there any special algorithms that you use for feature selection for categorical variables or a mix of categorical and numerical variables in the dataset? In practice, do you first do feature selection and then one hot encode the variables?

  • @boejiden7093

    @boejiden7093

    2 жыл бұрын

    You can use the top 10 most frequent categories and set everything else as “others”. It’s one work around. Or you can try and rank each of the categories using another feature. Then basically apply ordinal encoding. That way you dont increase the dimensionality and also ensure that even if the model gives more weightage to a category with a larger number, your model is correct because the weightage is already based on another feature from the dataset.

  • @ericae.2258
    @ericae.22585 жыл бұрын

    Hi you are a great teacher, very clear! I´m starting with DS and I want to ask you if you have the video of the presentation to share and deepen the topic of dimensionality reduction, thanks in advance, Kika

  • @dataschool

    @dataschool

    5 жыл бұрын

    Thanks for your kind words! No, I don't have a video on that topic, sorry!

  • @bolgorwheat8753
    @bolgorwheat87533 ай бұрын

    Just checked the database and I got 95,000 features after vectorization lol. Seems like I really need this one.

  • @dataschool

    @dataschool

    3 ай бұрын

    Yes!

  • @anuragmalhotra3437
    @anuragmalhotra34373 жыл бұрын

    Hi Kevin, i am looking for how to create a feature list related to human error during production release. do you have any data which can help in forecast humar error or something looking at some historical incidents and deployement data.

  • @mattmatt245
    @mattmatt2454 жыл бұрын

    Is this possible to apply a custom loss function in a regression model ? I need to maximize a following function: if [predicted] < [actual] then [predicted] else [-actual]. Would that be possible ? Thanks

  • @updeshpathak4947
    @updeshpathak49474 жыл бұрын

    A big thank to you Brother

  • @rayrivera1830
    @rayrivera18304 жыл бұрын

    If you have two features to predict grass growth, like a Date column and a correlating amount of rain column, is that easy for an ML algorithm to understand? Or should you combine them to one column with categories, like "no rain", "little rain" etc. for the past 3 months?

  • @sagar786able
    @sagar786able4 жыл бұрын

    Great video. I learned so much in just one short video that would need a huge number of articles. One question, can you use ensemble models like decision trees and random forest to look at the feature importance and then use it to train another machine learning model (Say logistic regression)? Aren't the feature_importance given by an ensemble technique specific to themselves?

  • @dataschool

    @dataschool

    4 жыл бұрын

    That's an excellent question! I think you are correct that feature importances are mostly model-specific, but you may still be able to apply that info to other models with some utility. Hope that helps!

  • @TheJetcross
    @TheJetcross3 жыл бұрын

    Dear Evan I would like to do feature selection but my feature are categorical and also countinous is it possible to do 1 technique for the countinous feature and other for categorical? Or I have to convert all the features to categorical because there are total 40 features. I want the best 10.

  • @jasontarimo3997
    @jasontarimo39975 жыл бұрын

    Great one Kevin. When are you going to do one on time series?

  • @dataschool

    @dataschool

    5 жыл бұрын

    Thanks for the suggestion! You might find these videos to be useful: kzread.info/head/PL5-da3qGB5IBITZj_dYSFqnd_15JgqwA6

  • @jovisyang
    @jovisyang2 жыл бұрын

    Where to find the slides of "a practical guide to dimensionality reduction Vishal Patel " ? Thanks.

  • @Analysis317
    @Analysis3173 жыл бұрын

    Hey Kevin, frist of all thank you sooo much for your videos! They are amazing! I got a little question to pairwise correlation and multicolinearity. If used already pairwise correlation and deleted attribute, which are highly correlated, its also nesscary to do a Multicolinearity test? Or would it be enough to use one of them, and when yes, which you ?

  • @mixalisk.5413

    @mixalisk.5413

    2 жыл бұрын

    I have the exact same question. To me 3 (pairwise correlation) & 4 (multicolinearity) are the same thing. I don't see any difference

  • @fikiledube6745
    @fikiledube67453 жыл бұрын

    Thank you for this insightful video. I am curious about whether there is a way to find the inputs that are most influential to the output of an ML model such as ANN. Is there a way to determine this?

  • @valentinfontanger4962

    @valentinfontanger4962

    3 жыл бұрын

    Well, you can start by visualizing the data. It all depends on what kind of data you are working for. I highly recommend you to go on kaggle, look for the titanic dataset, and pick the most popular project. You will see how visualizing the data clearly helps choosing the features.

  • @ahmedatef5654
    @ahmedatef56543 жыл бұрын

    Creative Content Not Reduntant at all Really Helpful

  • @dataschool

    @dataschool

    3 жыл бұрын

    Thanks!

  • @phuccoiinkorea3341
    @phuccoiinkorea33415 жыл бұрын

    Great post

  • @dataschool

    @dataschool

    5 жыл бұрын

    Thanks!

  • @ayyasamy8730
    @ayyasamy87305 жыл бұрын

    Good one !!

  • @dataschool

    @dataschool

    5 жыл бұрын

    Thanks!

  • @balajee41
    @balajee415 жыл бұрын

    Hey..thanks for the video. Can you make a video on how to identify multicollinearity, correlation etc from the dataset?

  • @dataschool

    @dataschool

    5 жыл бұрын

    Thanks for your suggestion!

  • @yuvaraj2457
    @yuvaraj24573 жыл бұрын

    Hi Kevin, Great respect 4 u. Y haven't u touched unsupervised and reinforcement topics? Expecting it.

  • @hikershike4441
    @hikershike44413 жыл бұрын

    Great video

  • @dataschool

    @dataschool

    2 жыл бұрын

    Thanks!

  • @clickethiopia8915
    @clickethiopia89155 жыл бұрын

    thank you for your nice video and with good presentation and i have question, have data set but the data does not have Labeled and i want to made feature selection for classification? how can i select features for unlabeled data

  • @carolinapelegrincuartero9287

    @carolinapelegrincuartero9287

    5 жыл бұрын

    I'd do cluster analysis. Or search in google methods for unsupervised learning :)

  • @dineshjoshi4100
    @dineshjoshi4100 Жыл бұрын

    Hello, Thanks for the explanation. I have one question. My question is, Does using best features helps to reduce the training data sets. Say I do not have a large datasets, but I can make independent variable that is highly corelated with the dependent variable, will it help me reduce my traning data sets. Your response will be highly valuable.

  • @rdubitsk
    @rdubitsk4 жыл бұрын

    Aren't there ML libraries that can optimize the features? Ie by running and dropping various features and using that process to optimize features included in final model?

  • @aivoryuk
    @aivoryuk2 жыл бұрын

    Very useful video as I have taken over a machine learning project. Question if one technique such as correlation with target shows a feature to have little correlation but using say RFE shows it has importance - which should I trust?

  • @dataschool

    @dataschool

    2 жыл бұрын

    Great question! It's hard to say - neither of those processes are guaranteed to be a reliable way of estimating the usefulness of a particular feature. That being said, my initial reaction is to trust the RFE score more, but it may depend on the particular situation. Hope that helps!

  • @ananddeshmukh4939
    @ananddeshmukh49395 жыл бұрын

    the way of Superior teaching!

  • @dataschool

    @dataschool

    5 жыл бұрын

    Thanks!

  • @PMetheney84
    @PMetheney844 жыл бұрын

    Hi. I'm thinking about writing a bachelors thesis about using ML techniques to authenticate users based on keystroke dynamics. So you'd have CSV files that would be like: key down at timestamp A key up at timestamp B etc for a number of test subjects. This data should then be feature selected and fed to various ML Algorthims. I'm trying to picture what the features for this data would even be. LOL. Any ideas?

  • @suratasvapoositkul8481
    @suratasvapoositkul84814 жыл бұрын

    Hi Kevin! Thanks for a very clear explanation. This video is very useful as I'm very new in machine learning. I have one question related to the feature selection. I started learning ML by implementing the decision tree. Most of the online tutorials just put all the features into the decision tree and let the DT select the features by itself. However, what if you have tons of features (let's say 100,000 variables), is it better to perform some feature selection before building the DT model? or it doesn't matter since DT can use Gini to automatically select the potential attribute to the model.

  • @dataschool

    @dataschool

    4 жыл бұрын

    That's a great question! Doing feature selection first is likely to help.

  • @suratasvapoositkul8481

    @suratasvapoositkul8481

    4 жыл бұрын

    @@dataschool Thanks Kevin! I will try to implement it and compare the performance!

  • @jazminsutcliff4106
    @jazminsutcliff41064 жыл бұрын

    Thanks dear!

  • @dataschool

    @dataschool

    4 жыл бұрын

    You're welcome!

  • @nackyding
    @nackyding2 жыл бұрын

    Do features have to be stationary when applying ML models to time series data?

  • @shaktiranjandev
    @shaktiranjandev2 жыл бұрын

    great video

  • @dataschool

    @dataschool

    2 жыл бұрын

    Thanks!

  • @niksethi500
    @niksethi5004 жыл бұрын

    Nice Sir! Love and Respect from India ❣️

  • @dataschool

    @dataschool

    4 жыл бұрын

    Thanks!

  • @WaqasAhmed-om8ph
    @WaqasAhmed-om8ph3 жыл бұрын

    I always appreciate you....

  • @dataschool

    @dataschool

    3 жыл бұрын

    Thank you!

  • @adrielcabral6634
    @adrielcabral66343 жыл бұрын

    how i can evaluate the correlation between a quantitative variable and qualitative variable ?

  • @kiranachanta9741
    @kiranachanta97415 жыл бұрын

    Hello Kevin, Can you make a video on finding multicollinearity with VIF using sklearn library or may be with some other library.

  • @dataschool

    @dataschool

    5 жыл бұрын

    Thanks for your suggestion!

  • @rudzanimulaudzi7947
    @rudzanimulaudzi79474 жыл бұрын

    Hi Kevin, love the channel. But, there is a big difference between dimension reduction and feature selection. PCA, LCA are dimension reducing, they form part of the preprocessing steps, when you use PCA, the output is not a subset of the original feature set, it's a lower dimension of your data. Feature selection results in a subset of your features, LASSO, Elastic Net, Information Gain, etc are feature reducing. We normally talk about wrapper, embedded and filter methods in feature selection.

  • @dataschool

    @dataschool

    4 жыл бұрын

    I'm familiar with all these terms, and I respectfully disagree with your point that feature selection is not dimensionality reduction. Dimensionality refers to the number of columns. Reducing that by any means is a reduction of dimensionality. I realize that some people use "dimensionality reduction" to only mean certain methods, but that doesn't change the fact that feature selection reduces the dimensions of your dataset.

  • @shadiaelgazzar9195
    @shadiaelgazzar91954 жыл бұрын

    thnak you for your great video but i have a question : i'm want to use machine learning with econometrics to build a random forest classifier which method shouid i use for feature selection

  • @dataschool

    @dataschool

    4 жыл бұрын

    Hard for me to say, sorry!

  • @jongcheulkim7284
    @jongcheulkim72842 жыл бұрын

    Thank you ^^

  • @dataschool

    @dataschool

    2 жыл бұрын

    You're welcome!

  • @rulala
    @rulala2 жыл бұрын

    Like your accent very much, keep going!

  • @dataschool

    @dataschool

    2 жыл бұрын

    Thank you!

  • @betanapallisandeepra
    @betanapallisandeepra2 жыл бұрын

    Thank you

  • @dataschool

    @dataschool

    2 жыл бұрын

    You're welcome!

  • @datascienceds7965
    @datascienceds79655 жыл бұрын

    I did Recursive Feature Elimination with Cross Validation and Variance Inflation Factor for dimentionality reduction :-)

  • @dataschool

    @dataschool

    5 жыл бұрын

    Those are two great suggestions - thanks for sharing! :)

  • @datascienceds7965

    @datascienceds7965

    5 жыл бұрын

    @@dataschool you are welcome :-)

  • @ElectronicsInside

    @ElectronicsInside

    5 жыл бұрын

    @@datascienceds7965 can we use RFE with grid search CV to select no. of features??

  • @datascienceds7965

    @datascienceds7965

    5 жыл бұрын

    @@ElectronicsInside I don't know. I unfamiliar with it.

  • @ElectronicsInside

    @ElectronicsInside

    5 жыл бұрын

    @@datascienceds7965 Hi Kevin, can you make videos on Time Series analysis with ARMA model, Customer behavior analysis with k means clustering and how to improve your random forest classifier with AdaBoost and Xg boost. Pls make your next videos on these topics.

  • @esramuab1021
    @esramuab10213 жыл бұрын

    could you provide the book you explained it

  • @karthik-ex4dm
    @karthik-ex4dm5 жыл бұрын

    I'm working with a 2000 dimension data, Is it ok to use pca to reduce them to 50 and then use forward feature selection to further reduce to 20 or is it ok go from 2000 to 20 using pca itself?? Is it ok to use 2000 to 20 pca reduction method?

  • @dataschool

    @dataschool

    5 жыл бұрын

    There's no universal answer to how it "should" be done, but I think just using PCA would be preferable.

  • @napent
    @napent10 ай бұрын

    Great talk! Any thoughts on tsfresh library?

  • @dataschool

    @dataschool

    9 ай бұрын

    I'm not familiar with tsfresh, sorry!

  • @napent

    @napent

    9 ай бұрын

    @@dataschool its a cool way to automatically select and validate features - you might find it really useful

  • @TheOnlySaneAmerican
    @TheOnlySaneAmerican2 жыл бұрын

    this guy embodies the look of a data scientist

  • @owaisfarooqui6485
    @owaisfarooqui64854 жыл бұрын

    Thanks for the help .......

  • @dataschool

    @dataschool

    4 жыл бұрын

    You're welcome!

  • @sudipthazarika7628
    @sudipthazarika76284 жыл бұрын

    sir, I have a dataset generated from 9 sensors, i.e it has 9 features (columns). if I make a subset of the dataset containing the maximum, minimum and some percentiles of each sensor (features), will it be called feature extraction. the new data set still has 9 features (columns), having less data (rows). if not what can we call it? this has been done to reduce computational cost.

  • @dataschool

    @dataschool

    4 жыл бұрын

    That's feature engineering!

  • @kartickshow
    @kartickshow5 жыл бұрын

    Hi. Thanks for your nice video. I am from India. I need help. If I want to filter data frame based one column with specific value (like: football) where number of times ouwn column value is max. How do I write. Please help.

  • @dataschool

    @dataschool

    5 жыл бұрын

    I'm sorry, I don't quite understand your question... good luck!

  • @david-vr1ty
    @david-vr1ty4 жыл бұрын

    In the presentation from Vishal Patel you are refering there is a workflow presented. I have two questions refering to the workflow (33:00 in the video): 1. What is the difference between pairwise correlation and multicollinearity. As far as I know to handle multicollinearity different pairwise correlation techniques (like pearson correlation coefficent, chi 2 or VIF) can be used. 2. Why would you perform either PCA or pairwise correlation/multicollinearity? If performing a PCA on (high) correlated data the output (principle components) still suffer from the (high) correlation eventhough the principle components itselfe are of course not correlated to each other. (imagen you do a PCA on 3 variables and 2 of them are highly correlated) Of cource the workflow diagram in the presentation is meant to be flexible as the whol feature selection process is, but could you still provide some thoughts to my questions. Many thanks, David

  • @dataschool

    @dataschool

    4 жыл бұрын

    These are excellent questions, but beyond what I have time to address in the KZread comments... sorry!

  • @KhangTran-ml2hm
    @KhangTran-ml2hm5 жыл бұрын

    That speech clarity

  • @dataschool

    @dataschool

    5 жыл бұрын

    Thanks!

  • @evanchugh4330
    @evanchugh43305 жыл бұрын

    Do you have any tips on how to handle datasets where there is a strong class imbalance? (ie. 95% of class A, 5% of class B?) Thanks, these videos are extremely helpful!

  • @dataschool

    @dataschool

    5 жыл бұрын

    To handle class imbalance, you can try downsampling the majority class, upsampling the minority class, or techniques like SMOTE. Also, make sure you have chosen an appropriate evaluation metric. This video might help if you are doing classification with scikit-learn: kzread.info/dash/bejne/ammY1suGqpzag9I.html Glad you like the videos! :)

  • @beautyandstudyworks3532
    @beautyandstudyworks35322 жыл бұрын

    These are different Algorithms to select best features, but how to select the algorithm and when to use each of them? For example: if I have a multi-class classification problem where all the features are numerical and the output is categorical, which feature selection algorithm can I use?

  • @dataschool

    @dataschool

    2 жыл бұрын

    Depends on what library you are using. For scikit-learn, see here: scikit-learn.org/stable/modules/feature_selection.html Hope that helps!

  • @fet3595
    @fet35953 жыл бұрын

    1:25 "Now, why do you want to perform 'Feature Selection' in the first place?" The reason you do 'Feature Selection' is because removing irrelevant features results: (1) in a better performing model, (2) in an easy to understand model, and (3) in a model that runs faster. "So those are the three reasons for which 'Feature Selection' is useful."

  • @fet3595

    @fet3595

    3 жыл бұрын

    I'm glad you like it, thanks.

  • @dataschool

    @dataschool

    3 жыл бұрын

    Thanks for pulling out this quote!

  • @tonyhathuc
    @tonyhathuc3 жыл бұрын

    Hi, is the presentation available?

  • @dilipgawade9686
    @dilipgawade96865 жыл бұрын

    Hey Kevin, Thanks for your videos. They are extremely helpful. I have some knowledge on Python and Tableau and would like to switch my career to machine learning. I have been watching many videos on machine learning but confused from where to start. Please guide me how should I learn it stepwise. Thanks

  • @dataschool

    @dataschool

    5 жыл бұрын

    This might be helpful to you: www.dataschool.io/launch-your-data-science-career-with-python/

  • @nikhilkenvetil1594
    @nikhilkenvetil15945 жыл бұрын

    So does that mean we *may* do this on every dataset, or is it imperative that we do all of this in all datasets?

  • @dataschool

    @dataschool

    5 жыл бұрын

    You should do it when it's useful, but no, you don't need to do it on every dataset.

  • @amrdel2730
    @amrdel27305 жыл бұрын

    i am a phd student from ALGERIA and i d like to thank u for your helpfull vedeos and the effort you put to do them , can i ask you please to show us an example of how to build train and test an adaboost classifier in scikit learn like u did with knn and please can you tell us can we use SVM as a weak learner for adaboost ?? and how to make that weak learner loop in the classifier and compute those params error alpha of the weak learner and weight update ?? thanks in advance sir

  • @dataschool

    @dataschool

    5 жыл бұрын

    Thanks for your suggestion!

  • @martinusgrady2380
    @martinusgrady23802 жыл бұрын

    how about LDA?

  • @bharadwajchivukula2945
    @bharadwajchivukula29455 жыл бұрын

    Can you please explain in detail about Onehot encoding various features in detail because it would be helpful for many , Thank you

  • @dataschool

    @dataschool

    5 жыл бұрын

    Thanks for your suggestion!

  • @ninjawarrior_1602
    @ninjawarrior_16024 жыл бұрын

    Hi can we use feature selections for unsupervised learning Clustering problem, where there is no target variable. Please let me know I will be highly thankful to you

  • @dataschool

    @dataschool

    4 жыл бұрын

    I'm not sure, sorry!

  • @ninjawarrior_1602

    @ninjawarrior_1602

    4 жыл бұрын

    @@dataschool Basically i completed the project on this and the best thing u can use for feature selection in such scenarios is looking two parameters i.e variance of a each feature and number of zeroes in each column

  • @monuvishwakarma8133
    @monuvishwakarma81335 жыл бұрын

    Sir,can you make video on data visualizatuin using all distributions of statistics? ?

  • @dataschool

    @dataschool

    5 жыл бұрын

    Thanks for your suggestion!

  • @vijjuu0
    @vijjuu04 жыл бұрын

    hi can you please let me know how to start the project in data science for bike sharing in detail with step by step

  • @dataschool

    @dataschool

    4 жыл бұрын

    Sorry, I won't be able to help, good luck!

  • @edmkiller9117
    @edmkiller91173 жыл бұрын

    Best one :))

  • @spartanghost_17
    @spartanghost_172 жыл бұрын

    Why would you skip PCA?

  • @lydiaaidyl3328
    @lydiaaidyl33285 жыл бұрын

    I am trying to learn machine learning on my own so I can't quite understand the steps you take. So based on what you said about choosing features, if one wants to eliminate features using forward selection should they know beforehand which algorithm they are going to use and try to do forward selection on the specific algorithm? Or should one do forward selection using logistic/linear regression and then having found the significant variables choose an algorithm (e.g Decision trees, kNN,..)? Thanks in advance.

  • @dataschool

    @dataschool

    5 жыл бұрын

    Great question! The former is usually a better plan.

  • @lydiaaidyl3328

    @lydiaaidyl3328

    5 жыл бұрын

    @@dataschool Thanks so much for answering to my question. Can I please ask something more? So if I go with the former plan how am I going to choose which algorithm I want? I ve seen people advising to test all algorithms and see which performs better. Are you advising to test all algorithms having a full model with all features then choose the algorithm and then eliminate features or something else? Sorry I am a beginner and I don't know if I am asking something straight forward that everyone has already figured out..

  • @dataschool

    @dataschool

    5 жыл бұрын

    No, everyone has definitely not figured this out :) You are asking a great question, but this is not a solved problem. This might be helpful to you: www.dataschool.io/comparing-supervised-learning-algorithms/

  • @lydiaaidyl3328

    @lydiaaidyl3328

    5 жыл бұрын

    @@dataschool thank you, I love the table you made. I think I am getting into understanding this a bit more.

  • @dataschool

    @dataschool

    5 жыл бұрын

    Great to hear!

  • @jaxayprajapati5597
    @jaxayprajapati55974 жыл бұрын

    Can you provide me this presentation ppt for my personal use. Please sir

  • @tanveerahmedsiddiqi3447
    @tanveerahmedsiddiqi34474 ай бұрын

    Please demonstrate Features selection techniques in Python or in Matlab

  • @rohitchandanshiv6295
    @rohitchandanshiv62954 жыл бұрын

    Hi , I have data set which having most of the data is in negative and exponential columns as features for multiclass classification

  • @rohitchandanshiv6295

    @rohitchandanshiv6295

    4 жыл бұрын

    How to deal with them

  • @dataschool

    @dataschool

    4 жыл бұрын

    Sorry, I won't be able to help... good luck!

  • @EdgeTechAcademy
    @EdgeTechAcademy9 ай бұрын

    Great

  • @dataschool

    @dataschool

    7 ай бұрын

    Thanks!

  • @VeynVerse
    @VeynVerse5 жыл бұрын

    Hey, I don't quite get this part "Tree based feature selection is only useful if that is your model that you're using or you could theoretically use a tree based model to look at feature importance, and then not actually use a tree based model for your model that you're building." Why is it? I think that because of those features are important (using tree based) then we can build a great model using tree based algorithm. Or maybe I am missing something here?

  • @dataschool

    @dataschool

    5 жыл бұрын

    The point is this: You can use a tree-based model to determine feature importance, and those features are important, regardless of which model you decide to use. Hope that helps!

  • @syedhamzajamil4490
    @syedhamzajamil44904 жыл бұрын

    Sir I learn lot of information about data science to see your videos.but sir i have some doubt about i hope you provide me a best information to remove my doubt. Qno1: what is the different between multi-colinearilty and PCA. Qno2: Is multi-colinearity and PCA is Same. Qno3: Is mulit-colinearity is only used for Regression model. Qno4: What are reason we did not used multicolinearity in our classification model

  • @dataschool

    @dataschool

    4 жыл бұрын

    Sorry, I can't summarize any of these topics in a KZread comment. But they are great questions!

  • @ElectronicsInside
    @ElectronicsInside5 жыл бұрын

    How to work with Plotly and Cufflinks in visual studio code ??

  • @dataschool

    @dataschool

    5 жыл бұрын

    I have no idea, sorry!

  • @ElectronicsInside

    @ElectronicsInside

    5 жыл бұрын

    ​@@dataschool Can you please make videos on Decision Trees, Random Forests, SVM, Recommender Systems and PCA???

  • @dataschool

    @dataschool

    5 жыл бұрын

    Thanks for your suggestion!

  • @beautyisinmind2163
    @beautyisinmind21632 жыл бұрын

    It would ne more awesome if you had done coding part too

  • @chanellioos
    @chanellioos2 жыл бұрын

    Kevin is a G

  • @manishsharma2211
    @manishsharma22114 жыл бұрын

    There's an Indian everywhere. Vishal Patel is an Indian 🤩

  • @tejas8211

    @tejas8211

    3 жыл бұрын

    Saw you on Krish Naik's channel as well

  • @manishsharma2211

    @manishsharma2211

    3 жыл бұрын

    @@tejas8211 yo thanks mate 😀😀

  • @skn180
    @skn1805 жыл бұрын

    another way would be the automated backward elimination with a loop

  • @dataschool

    @dataschool

    5 жыл бұрын

    That's right - backward selection is another option. Thanks for sharing!

  • @TheAlderFalder
    @TheAlderFalder5 жыл бұрын

    I‘m the first. That’s why I‘m gonna become rich prior to all of you!!! Except Kev maybe.

  • @dataschool

    @dataschool

    5 жыл бұрын

    Ha! :)

  • @TheAlderFalder

    @TheAlderFalder

    5 жыл бұрын

    I‘m Jakob from LinkedIn btw. ;)

  • @dataschool

    @dataschool

    5 жыл бұрын

    Ah! Nice to see you :)

  • @gabiie9839
    @gabiie98395 жыл бұрын

    XGboost model automatically calculates feature importance

  • @dataschool

    @dataschool

    5 жыл бұрын

    Great point! That makes sense, since it uses an ensemble of decision trees.

  • @edmkiller9117
    @edmkiller91173 жыл бұрын

    I am datascientist and i was having jssue from some day but now all fix

Келесі