Forecastegy

Forecastegy

I am a Machine Learning (ML) Expert & Data Scientist with 7+ years of experience helping companies globally.

Kaggle Grandmaster who achieved multiple 1st place finishes in global Kaggle competitions, and top global rank at 12th of 50,000+

Content Consultant for Applied Data Science with Venture Applications: Data-X (INDENG 135/235) at University of California, Berkeley

Site: forecastegy.com
Follow me on Twitter: twitter.com/mariofilhoml
Kaggle profile: www.kaggle.com/mariofilho
LinkedIn: linkedin.com/in/mariofilho/

🤖 Building machine learning systems since 2014
🏆 2x Prize Winner Kaggle Competitions Grandmaster
📊 Former Lead Data Scientist @ Upwork
🎓 @UCBerkeley Data-X Consultant

Пікірлер

  • @mamyrak1114
    @mamyrak111420 күн бұрын

    i can do the same processus if in place of week i have a date like yyyy-mm-dd and how to handle the year?

  • @bennyadrianmartinez
    @bennyadrianmartinez23 күн бұрын

    Thank you. You did so very much in such little time in comparison to TWO different bootcamp instructors could in so much time...

  • @yuvrajchauhan9874
    @yuvrajchauhan98743 ай бұрын

    00:01 Learn feature engineering for high performance models 02:00 Aggregation is essential for extracting useful information from tables and can be compared to the group-by function in various programming languages. 03:56 Feature engineering involves creating customer-specific features to predict fraud in transactions. 06:01 Feature Engineering is all about aggregation and encoding for capturing patterns and anomalies. 08:00 Feature engineering techniques like lag, difference, rolling, and date components are significant for analyzing time series data. 09:55 Seasonal patterns and time differences for feature engineering 11:55 Reverse engineer feature computation from Kaggle solutions 13:57 Feature engineering can be applied universally in tabular data for extracting features from multiple tables. 15:47 Feature engineering techniques used in data processing 17:41 Utilizing feature engineering to create indicators for bot usage from IP data. 19:22 Geolocation and network features are key for advanced feature engineering. 21:03 Graph features are important for model prediction.

  • @jackcarter97
    @jackcarter974 ай бұрын

    How do I find the season effect features?

  • @jackcarter97
    @jackcarter974 ай бұрын

    how do I find the season effect features?

  • @chungrandy780
    @chungrandy7805 ай бұрын

    🎯 Key Takeaways for quick navigation: 00:00 📊 *Understanding Feature Engineering for Tabular Data* - Feature engineering is essential for high-performance machine learning models. - The key to feature engineering is aggregation, which involves grouping and summarizing data. - Aggregations can be applied to various types of data, including categorical and numerical variables. 06:22 🔄 *Common Feature Engineering Techniques* - Feature engineering techniques include lag, difference, rolling, date components, and time differences. - Lag captures the previous value of a variable in a sequence. - Difference calculates the difference between consecutive values in a sequence. - Rolling involves computing aggregations over a rolling window of data. - Date components extract information like month or day from dates for seasonality patterns. - Time differences measure the time elapsed between events. 15:21 🧩 *Reverse Engineering Features from a Kaggle Solution* - Analyzing features from a Kaggle competition example. - Median time between bids can be computed by grouping by user and calculating time differences between bids. - Mean number of bids per auction is determined by grouping by user and auction, then counting bid occurrences. - Detecting IP addresses used by both users and bots involves complex filtering and merging based on IP data. 21:05 🌐 *Advanced Feature Engineering* - Geolocation features can be important, calculating distances between locations, and spatial data aggregations. - Network or graph features involve representing data as graphs and computing graph-related metrics. - Suggests exploring the Instacart competition for advanced feature engineering with multiple tables. 22:16 📺 *Conclusion and Next Steps* - Encourages viewers to like, subscribe, and leave comments. - Offers a link to a time series forecasting workshop for further learning. Made with HARPA AI

  • @tom199520000
    @tom1995200006 ай бұрын

    I just checked this amazing video after your feature selection engineering video! I have no idea why this is video isn’t popular!!! Respect the effort you spent on this!

  • @tom199520000
    @tom1995200006 ай бұрын

    I am in a Kaggle competition. Learnt a lot from this video!! Thank you so much for uploading this video for us!!

  • @pcdowling
    @pcdowling7 ай бұрын

    Thank you.

  • @dy8576
    @dy85768 ай бұрын

    Love the videos and blogs- absolute mad content, thank you very much

  • @paulkim244
    @paulkim24410 ай бұрын

    Fantastic video, so many useful references, I'm glad I watched the entire thing!

  • @VG-yw2mp
    @VG-yw2mp11 ай бұрын

    Why dont we use product_code as one of the features while training?

  • @Gabriel-iw3hc
    @Gabriel-iw3hc11 ай бұрын

    how i future forecast with this method ? Ex: forecast week 52 ? i think, need to forecast another series too for another features .

  • @ElChe-Ko
    @ElChe-Ko Жыл бұрын

    Nice! It would be interesting to see what to do if the time series have different lengths.

  • @Septumsempra8818
    @Septumsempra8818 Жыл бұрын

    Are we going to get a video on cross-validation and selecting the right model? Your time series videos have been a wealth of knowledge.

  • @user-fh7gb2yf5z
    @user-fh7gb2yf5z Жыл бұрын

    Mario, boa tarde. Tem algum dica para usarmos a LSTM para predições com passos à frente em um sistema MISO? .

  • @zulhas9
    @zulhas9 Жыл бұрын

    Hi Mario, thanks for the wonderful presentation. One qouestion, how could you use the feature the "Sales" to predict sales? Using that features, when you predict using .predict function, you have to pass that as an argument. In reality, you would not have that information available.

  • @chengeeri
    @chengeeri Жыл бұрын

    Good One!!!!! Expecting more from You!!!!!!

  • @ThePaintingpeter
    @ThePaintingpeter Жыл бұрын

    I just found your video and it's great. The reference to FeatureTools was frustrating to say the least. The documentation on the site is not working and the github repo also has examples that just don't work. It's too bad

  • @dimka11ggg
    @dimka11ggg Жыл бұрын

    Try different versions, probably examples for some old versions

  • @stonesupermaster
    @stonesupermaster Жыл бұрын

    Hello Mario, I have a question... how does the model know that we're trying to predict multiple products at once? I've trying to train a model in order to predict the sales of 2000 SKU and the main concern I have now is how to do it efficiently. I watched everything that you did but I still have the same problem, do you know where I can find an example of it? thank you very much for your video

  • @therussiankid7296
    @therussiankid7296 Жыл бұрын

    #getthistrending

  • @Mohammad-vr9dj
    @Mohammad-vr9dj Жыл бұрын

    Thanks for the useful video. Sorry, is it possible to implement independent spatial sequences simultaneously? I have a dataset which is consist of 1000 independent spatial sequences with dimension 2*7 (2 for x and y, and the length 7 for positions in each time). I implemented it with Simple RNN, LSTM and GRU. Can I do it with transformers (attention mechanism)? Could you introduce me a practical example?

  • @marmadukewynn9826
    @marmadukewynn9826 Жыл бұрын

    🤘 ρгό𝔪σŞm

  • @MrGhustavo22
    @MrGhustavo22 Жыл бұрын

    give more, please!

  • @SuperHddf
    @SuperHddf Жыл бұрын

    Thank you! :) ♥

  • @gregoryoliveira8358
    @gregoryoliveira8358 Жыл бұрын

    I used this on my last project. It is very important to read the library documentation and find this unbalanced parameters.

  • @garcialn
    @garcialn Жыл бұрын

    Hi, Mario. Big fan of yours from DataHackers here! Do you know if the same applies for imbalanced data sets for anomalies detection? Such as default prediction or fraud detection problems? It's, usually, not a problem from sampling, but its from the nature of those problems having such imbalanced data... Don't know if it would end up creating bias or data leakege because of it...? Do you know better technics for this kinds of problems?

  • @Forecastegy
    @Forecastegy Жыл бұрын

    Hi Lucas, you can use it for anomaly detection. This is just a way of telling the model to pay more attention to the less frequent examples. Just remember to calibrate your predictions if you need probabilities instead of just a ranking score.

  • @anwarhermuche
    @anwarhermuche Жыл бұрын

    Very clear explanation! Thank you for the video Mario

  • @Kevin-fp6gk
    @Kevin-fp6gk Жыл бұрын

    Loved the way you presented.

  • @RicardoZibordi
    @RicardoZibordi Жыл бұрын

    Clear, objective and very practical - congratulations!

  • @sekiro_19
    @sekiro_19 Жыл бұрын

    Thank you so much man crazy good explanation

  • @XiboquinhaMilGrau
    @XiboquinhaMilGrau Жыл бұрын

    Por essa eu não esperava kkkk

  • @gauravmalik3911
    @gauravmalik3911 Жыл бұрын

    It would be great if you could show demo also , thank you for information

  • @snk2288
    @snk2288 Жыл бұрын

    Difference between time features would lead to negative values. Do we take min max scaler after that?

  • @ozan4702
    @ozan4702 Жыл бұрын

    You would want to apply difference such that future data is subtracted from past so its never negative.

  • @darkchoco7407
    @darkchoco7407 Жыл бұрын

    No problem having negative values as features, at all

  • @Mohammad-vr9dj
    @Mohammad-vr9dj Жыл бұрын

    Thanks for your useful video. Sorry, If our dataset has two target columns how can we write the codes?

  • @Learner_123
    @Learner_123 Жыл бұрын

    Thank you for making the topic simple. Since you have combined all the product sales to train and validate your model, How can one use this model to predict sales for 'any single' product only?

  • @zabmaz10
    @zabmaz10 Жыл бұрын

    I have the same question, but I guess one way is to convert the product code into dummy variables and use those as features in the random forest.

  • @winniethepooh4891
    @winniethepooh4891 Жыл бұрын

    This channel is a hidden gem !!!

  • @kaianchan7768
    @kaianchan7768 Жыл бұрын

    Thanks for this tutorial. Will you provide some videos about many features? Thanks!

  • @faraza5161
    @faraza5161 Жыл бұрын

    The Simple Imputer will impute mean values for the entire column in the missing values. Shouldn't that be done product wise as well? Thanks for a wonderful lecture btw :-)

  • @favourifunanya6108
    @favourifunanya6108 Жыл бұрын

    Incredible sir

  • @Orlandobelli
    @Orlandobelli2 жыл бұрын

    Good video, we can make multiples time series with ARIMA model?

  • @diegosccp09
    @diegosccp092 жыл бұрын

    you are a legend Im using this to do a masters assessment

  • @SimonVictor
    @SimonVictor2 жыл бұрын

    Thank you so much! This has been very helpful in getting me to think differently about feature engineering

  • @Forecastegy
    @Forecastegy2 жыл бұрын

    Glad it was helpful!

  • @ozan4702
    @ozan47022 жыл бұрын

    Why the difference should be a feature? Given sales and lag sales, difference can be already known.

  • @alirezajabbari2537
    @alirezajabbari25372 жыл бұрын

    Thank you Mario! You saved me in my 4th year project ciao

  • @Forecastegy
    @Forecastegy2 жыл бұрын

    Glad to hear that!

  • @sancarlitos1125
    @sancarlitos11252 жыл бұрын

    Excellent explanation! Thanks for sharing it! I was realizing a similar forecasting, and I was wondering if when product number changes, let say from 0 to 1… the rolling window and the lag should be modified? Because we would be using the information of the last product. Thank you very much!

  • @Dragnar21
    @Dragnar212 жыл бұрын

    First of all, thank you for that video and that extraordinary explanation. I would like to know how would you structure your data, if the data is not the same length ?

  • @towhidultonmoy3046
    @towhidultonmoy30462 жыл бұрын

    Keep it up! You have a long way to go brother. Best wishes!

  • @nehan.2199
    @nehan.21992 жыл бұрын

    This is very helpful thank you! Where can I find the dataset to download?

  • @Forecastegy
    @Forecastegy2 жыл бұрын

    Great, here it is: archive.ics.uci.edu/ml/datasets/Sales_Transactions_Dataset_Weekly

  • @aacharyadhruvi8301
    @aacharyadhruvi83012 жыл бұрын

    From where I can get Sales_Transactions_Dataset_Weekly.csv ?

  • @Forecastegy
    @Forecastegy2 жыл бұрын

    Here archive.ics.uci.edu/ml/datasets/Sales_Transactions_Dataset_Weekly