13.3.2 Decision Trees & Random Forest Feature Importance (L13: Feature Selection)

Ғылым және технология

This video explains how decision trees training can be regarded as an embedded method for feature selection. Then, we will also look at random forest feature importance and go over two different ways it's computed: (a) impurity-based and (b) permutation-based.
Slide link: sebastianraschka.com/pdf/lect...
Code link: github.com/rasbt/stat451-mach...
-------
This video is part of my Introduction of Machine Learning course.
Next video: • 13.4.1 Recursive Featu...
The complete playlist: • Intro to Machine Learn...
A handy overview page with links to the materials: sebastianraschka.com/blog/202...
-------
If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka

Пікірлер: 19

@bobrarity3 ай бұрын
even though this video is pretty old, this video helped me, the way you explain things is easily understandable, subscribed already 😊
@SebastianRaschka
3 ай бұрын
Nice. Glad it's still useful after all these years!
@anikajohn69242 жыл бұрын
Preparing for an interview and this is a great refresher. Thanks!
@SebastianRaschka
2 жыл бұрын
Glad to hear! And best of luck with the interview!!
@nooy222810 ай бұрын
Thank you so much!!! Your videos are super great..
@lekhnathojha85372 жыл бұрын
very nice explained. So informative. Thank you for making this video.
@SebastianRaschka
2 жыл бұрын
thanks for the kind comment, glad to hear!
@kenbobcorn Жыл бұрын
Glad you added Lecture 13 after the fact for those that are interested. Also, do you a list of the equipment you use for video recording? The new tablet setup looks great.
@jamalnuman4 ай бұрын
great
@MrMAFA992 жыл бұрын
really good video, you should have more subscribers! Greetings from Germany
@SebastianRaschka
2 жыл бұрын
Glad to hear you liked it!
@Jhawk_LA Жыл бұрын
Really great!
@AyushSharma-jm6ki Жыл бұрын
@Sebastian Can DTs and RFs also be used to select features for Regression models?
@yashodharpathak1896 ай бұрын
Which method of feature selection is best if datasets have many categorical variables. I have a dataset which comprises continuous as well as categorical variables. What should be the approach in this case?
@cagataydemirbas7259 Жыл бұрын
Hi, when I use randomforest , DecisionTree and xgboost on RFE and to look feature_importances_, even if all of them tree based models, they returned completely different orders. On my dataset has 13 columns on xgboost one of feature importance rank is 1, same feature rank on Decisiontree is 10, an same feautre on Randomforest is 7. How can I trust wich feature is better than others in general purpose ? İf a feature is better predictive than others, shouldnt it be de same rank all tree based models ? I am so confused about this. Also its same on SquentialFeatureSelection
@abdireza12988 ай бұрын
Professor Raschka, please allow me to ask. Can statistical test procedures be implemented into feature coefficient values (such as Gini impurity)? Like in image 13:04, can we compare the values statistically if we obtained the mean and confidence interval of each feature importance (Proline, Flavanoids, etc) from cross-validation instead of the train-test split? (using Friedman test, or t-test, or Wilcoxon). I do not think any statistical restriction to apply statistical tests to any feature importance coefficient since they are numeric in nature, but I am afraid I missed something because I never encountered any paper that tests statistical feature coefficient. An expert opinion, as you are, is my referee in this case. Thank you, Professor.
@bezawitdagne5756 Жыл бұрын
I were using correlation heatmap, p-value and information gain for feature selection, the values are pretty similar, but I use the result with all algorithms I were using , the accuracy decreased, and I tried using random feature importance , the result I get from RF feature importance has improve my accuracy, so please help me understand why ?
@SebastianRaschka
Жыл бұрын
I think it may depend on what your downstream model looks like. The correlation method may work better for generalized linear models than tree-based methods because tree-based methods have feature selection built-in already
@pulkitmadan63812 жыл бұрын
❤