13.3.2 Decision Trees & Random Forest Feature Importance (L13: Feature Selection)
Ғылым және технология
This video explains how decision trees training can be regarded as an embedded method for feature selection. Then, we will also look at random forest feature importance and go over two different ways it's computed: (a) impurity-based and (b) permutation-based.
Slide link: sebastianraschka.com/pdf/lect...
Code link: github.com/rasbt/stat451-mach...
-------
This video is part of my Introduction of Machine Learning course.
Next video: • 13.4.1 Recursive Featu...
The complete playlist: • Intro to Machine Learn...
A handy overview page with links to the materials: sebastianraschka.com/blog/202...
-------
If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka
Пікірлер: 19
even though this video is pretty old, this video helped me, the way you explain things is easily understandable, subscribed already 😊
@SebastianRaschka
3 ай бұрын
Nice. Glad it's still useful after all these years!
Preparing for an interview and this is a great refresher. Thanks!
@SebastianRaschka
2 жыл бұрын
Glad to hear! And best of luck with the interview!!
Thank you so much!!! Your videos are super great..
very nice explained. So informative. Thank you for making this video.
@SebastianRaschka
2 жыл бұрын
thanks for the kind comment, glad to hear!
Glad you added Lecture 13 after the fact for those that are interested. Also, do you a list of the equipment you use for video recording? The new tablet setup looks great.
great
really good video, you should have more subscribers! Greetings from Germany
@SebastianRaschka
2 жыл бұрын
Glad to hear you liked it!
Really great!
@Sebastian Can DTs and RFs also be used to select features for Regression models?
Which method of feature selection is best if datasets have many categorical variables. I have a dataset which comprises continuous as well as categorical variables. What should be the approach in this case?
Hi, when I use randomforest , DecisionTree and xgboost on RFE and to look feature_importances_, even if all of them tree based models, they returned completely different orders. On my dataset has 13 columns on xgboost one of feature importance rank is 1, same feature rank on Decisiontree is 10, an same feautre on Randomforest is 7. How can I trust wich feature is better than others in general purpose ? İf a feature is better predictive than others, shouldnt it be de same rank all tree based models ? I am so confused about this. Also its same on SquentialFeatureSelection
Professor Raschka, please allow me to ask. Can statistical test procedures be implemented into feature coefficient values (such as Gini impurity)? Like in image 13:04, can we compare the values statistically if we obtained the mean and confidence interval of each feature importance (Proline, Flavanoids, etc) from cross-validation instead of the train-test split? (using Friedman test, or t-test, or Wilcoxon). I do not think any statistical restriction to apply statistical tests to any feature importance coefficient since they are numeric in nature, but I am afraid I missed something because I never encountered any paper that tests statistical feature coefficient. An expert opinion, as you are, is my referee in this case. Thank you, Professor.
I were using correlation heatmap, p-value and information gain for feature selection, the values are pretty similar, but I use the result with all algorithms I were using , the accuracy decreased, and I tried using random feature importance , the result I get from RF feature importance has improve my accuracy, so please help me understand why ?
@SebastianRaschka
Жыл бұрын
I think it may depend on what your downstream model looks like. The correlation method may work better for generalized linear models than tree-based methods because tree-based methods have feature selection built-in already
❤