Predict Football Match Winners With Machine Learning And Python

In this video, we'll use machine learning to predict who will win football matches in the EPL.
We'll start by cleaning the EPL match data we scraped in the last video (Web Scraping Football Matches From The EPL With Python [part 1 of 2]). Don't worry if you missed the last video - you'll still be able to download the data.
We'll create predictors and train a machine learning model to predict the winner of each of the football matches.
Then we'll end by measuring error and making improvements.
You can find the data and code here - github.com/dataquestio/projec...
Chapters
00:00 Introduction
00:59 Reading match data into pandas dataframe
02:58 Investigating missing data
05:55 Cleaning our data for machine learning
08:05 Creating predictors for machine learning
14:00 Creating our initial machine learning model
22:34 Improving precision with rolling averages
31:07 Retraining our machine learning model
34:08 Combining home and away predictions
42:12 Recap and next steps
------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: bit.ly/3O8MDef
#Dataquest #Tutorial #DataScience #MachineLearning #WebScraping #Python

Пікірлер: 206

  • @vikasparuchuri
    @vikasparuchuri Жыл бұрын

    Hi everyone! You can find the data and code for this tutorial here - github.com/dataquestio/project-walkthroughs/tree/master/football_matches .

  • @user-jk8dx4me8x
    @user-jk8dx4me8x8 ай бұрын

    i just started learning Python n Machine learning. I started learning from your tutorials and it is making me better in Data science day by day. Keep it up. you are best online teacher.

  • @chasingwildlife6584
    @chasingwildlife6584 Жыл бұрын

    Great Video Vik. Love the work. Thanks for giving us this great resource. Now time to find the rest of the data.

  • @stephenwood6139
    @stephenwood6139 Жыл бұрын

    This is by far the best and most practical video on football predictions I've seen online, very well explained and actually leaves you with something useful afterward. Great work!

  • @stephenwood6139

    @stephenwood6139

    Жыл бұрын

    I managed to resolve this :)

  • @titrecords2294
    @titrecords2294 Жыл бұрын

    Been learning ML on provided data ever since, thank you sir for teaching me in the last tutorial how to curate my own data. 🙏

  • @user-ws7ky2mk8l
    @user-ws7ky2mk8l Жыл бұрын

    The video content shared by this author is very good, and it provides a lot of reference directions for predicting stocks. Thank you so much.

  • @knotty2348
    @knotty2348 Жыл бұрын

    You are a hero. Had this project in mind for years. You saved me some hundreds of hours of research and learning :) Thanks a lot!

  • @emmanuelteitelbaum
    @emmanuelteitelbaum2 жыл бұрын

    I like that as the founder of Dataquest, you yourself are providing the tutorial (as opposed to hiring someone). Also, thanks for offering the free access to educators and students.

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Thanks, Emmanuel! -Vik

  • @ifkica1822

    @ifkica1822

    Жыл бұрын

    @@Dataquestio sorry, I just joined Dataquest. can you please tell me if the free option for students is still available?

  • @user-yw3zn7lf4s
    @user-yw3zn7lf4s2 жыл бұрын

    Bro. Literally learnt to play with data in just 2 videos. Thanks.

  • @rodi21
    @rodi21 Жыл бұрын

    Amazing, Vic! I'm following you! Great job and explanation!

  • @sureshmakwana8709
    @sureshmakwana8709 Жыл бұрын

    You saved my this semester's Machine Learning mini Project ❤️❤️

  • @chigstardan7285
    @chigstardan72852 жыл бұрын

    This video came at the right time i trying to figure how to get rolling averages for a dataframe and especially that part with the 'left' argument, Thanks so much.

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Glad it helped! -Vik

  • @aravindgpandey
    @aravindgpandey Жыл бұрын

    Very nice explanation. This is what I was looking for so long. Thanks much

  • @avibm948
    @avibm94811 ай бұрын

    Nice video Vic, learned a lot from your videos recently my only criticism is that some of the viewers may feel that they can generate positive returns based on probability higher than 50 or 60 percent. It would be better to predict the probability of winning because the betting reward is based on probability. So assuming we predict that a team wins is 70 percent and the odd reward is less than 7/10 we are going to lose on average, even though our model was right. The reason the model is able to predict with a probability of higher than 50 percent is that some teams are better than others and the betting odds reflect it. One can scrap the odds also and do the analysis but I believe the betting companies already use AI to predict the initial odds. There will be opportunities when the odds differ substantially from a good predictive model.

  • @goober-ll1wx

    @goober-ll1wx

    9 ай бұрын

    yeah its basically a massive nothing burger, you'll still lose money and if by some miracle you can model it well, then your bookie will back you off before you make any money!

  • @InvestorLondon
    @InvestorLondon Жыл бұрын

    Amazing Video! Your really helping me Through my ML journey!

  • @alemassa6632
    @alemassa663210 ай бұрын

    Wonderful, I litterally have understood nothing but.... wonderful!

  • @zuzekavova4651
    @zuzekavova465110 ай бұрын

    i hope you dont stop making these videos

  • @mhch77
    @mhch77 Жыл бұрын

    Hey Vick, Great Video! Wanted to ask how would I go about making predictions for a single match?

  • @kenneth_wu
    @kenneth_wu Жыл бұрын

    Great video. Thanks for sharing. I think I am going to have a try.

  • @StartupPickMeUps
    @StartupPickMeUps Жыл бұрын

    This is so good! It would be good to see a video on exactly how to feed in future fixtures as I'm unclear on how this is achievable :D

  • @Dataquestio

    @Dataquestio

    Жыл бұрын

    Hi Liam - thanks for the suggestion. What you need to do is pass in future data to the predict methods, the same way we're passing in the test set now. I can look into making a video.

  • @StartupPickMeUps

    @StartupPickMeUps

    Жыл бұрын

    @@Dataquestio after asking this question, I actually gave it a go myself but unless I add future data to my test data, I’m unsure how to do it, and it takes the accuracy is way off for me :D

  • @pain-nw5lo

    @pain-nw5lo

    Жыл бұрын

    @@Dataquestio Yes please! Im also stuck on passing future data :c

  • @harryhaz4629
    @harryhaz4629 Жыл бұрын

    Great video thanks. But I was wondering how do you get the model to predict the upcoming football matches. Let's say Manchester United vs Liverpool etc.

  • @samdowns4786
    @samdowns4786 Жыл бұрын

    Hi, great video. I am just wondering how to implement this onto matches in the future, predicting who would win the game this weekend for example

  • @_craig_
    @_craig_2 жыл бұрын

    Hi Vic, Excellent excellent video. So many tips and tricks. Thank you. A few clarifications, 1) the value counts is 1500+ , number of matches is 20 C 2 *2(home and away) *2(year 21&22) 2) it's not temporal data until rolling averages was included 3) I'm being silly here... matches played on the 1st of Jan are not in the train/test set because you didn't use >= 4)

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Thanks, Craig! For 1, it should be 38 (total matches per team) * 20 (total teams) * 2 seasons. For 2 - I agree, there probably won't be any issues if you do cross validation without taking the temporal aspect into account. But the opponent code in particular can leak future data into the past. I generally prefer to treat any time-ordered data carefully. 3 - yes, you're right!

  • @russelldavis5248
    @russelldavis5248 Жыл бұрын

    Excellent tutorial. As a C# guy, I really appreciated seeing your code for interacting with the pandas.

  • @pratiek8s
    @pratiek8s2 жыл бұрын

    Very informative. Thank you sir.

  • @avikpal6508
    @avikpal65082 жыл бұрын

    I generally opposed to the idea of using AI/ML model for EPL or in any sports , but definitely concept can be reused in multiple business cases . Great job mate !

  • @ctrl-shift-run8681
    @ctrl-shift-run8681 Жыл бұрын

    This is a very cool project! I ran it across 7 leagues and it is interesting how the same set of predictors get very different results. In England and France, it does pretty well but in Brazil and Japan, not so much.

  • @Dataquestio

    @Dataquestio

    Жыл бұрын

    That is interesting! I wonder if there is more variance there due to transfers, less data, etc.

  • @madhuacharyya6963
    @madhuacharyya6963 Жыл бұрын

    Hi, I have enjoyed watching your demonstration of predicting the EPL game results. However, the predicted results don't reflect the actual results. So my question is, how can I predict more accurate results, and how can I train the dataset. Looking forward to hearing your reply.

  • @ILikeNoisyGoat
    @ILikeNoisyGoat Жыл бұрын

    Hi! Can I make the predicted value into probability? or logistic regression? Thank you!

  • @thiagotms1
    @thiagotms1 Жыл бұрын

    This some quality video! Thanks!

  • @chrissherman6591
    @chrissherman65915 ай бұрын

    Love the video, once I finish the model how do I feed in data from new games

  • @ukaszhangiel7610
    @ukaszhangiel7610 Жыл бұрын

    Does this model completely ignore who the opponent is?! From what I see, the features used are: a) general match features - time of the game, home/away b) rolling averages for one team As a result the program tries to predict the outcome of the game completely ignoring who the opponent is. It will come with a predictions which is purely based on general match factors, and the past performance of one team, completely ignoring the specific opponent features. I.e. for a Arsenal game it will give me the same result retrospectively if Arsenal plays the 1st or the last team in the table. Do I get it right? If so, how can it make sense?

  • @ibukunalade4286
    @ibukunalade42866 ай бұрын

    I really love this work. I will try with 10 seasons and make my train 70% of the dataset and my test 30%. But I want to ask, after all is done. How do I predict specific upcoming matches. I plan on adding upcoming games I want to predict to the test part and then predicting from there.

  • @siraatmedia8348
    @siraatmedia83485 ай бұрын

    What you did with the rolling averages was impressive. Is there such a thing as when a ML algo creates such features for you? I.e. it randomly multiply/dividing this by that or rolling averages or random features to create a new feature?

  • @KabirKohli-rm7xm
    @KabirKohli-rm7xm Жыл бұрын

    Hi, Thanks for the awesome video. I had one doubt (might be stupid) The aim of the model is to predict the winner of match between two teams (suppose team A vs team B). But for training the model on a single match result , we are only giving the stats for home team (A). Would'nt it make more sense to add stats for team B also in the same row , and then ask it to make the prediction.

  • @francescoscalia3541
    @francescoscalia3541 Жыл бұрын

    hey @Dataquest amazing content. i created the algo to predict games using your tutorial. im asking now what i have to do to make the algo do the predictions for the futures games since i noticed of course it predicted the past games. Could u tell me? thanks!

  • @pstryq224
    @pstryq2242 жыл бұрын

    Great tutorial! Do you have any advice for future matches - what values ​​should I add to the data in my CSV file in a situation when I want to predict the results of future matches? I mean the values ​​that we do not know yet, such as distance, shots on target, etc. All test data in the video have these data supplemented, so I wonder what to put in these "empty" columns. Thank you.

  • @Dataquestio

    @Dataquestio

    Жыл бұрын

    Hi there - distance, shots on target, etc, are only looked at for prior matches. If you're trying to predict future matches, you would use the rolling average of those columns from previous matches (this is what the video shows).

  • @FlisB
    @FlisB Жыл бұрын

    Interesting. I was running a similar model on football matches, except that I had rolling attributes of both teams as the predictors and the class was home_win, draw, away_win. A match is included only once. However I think your approach might be better.

  • @kiss-my-axe8810

    @kiss-my-axe8810

    4 ай бұрын

    what was your win%??

  • @Qubitmyst
    @Qubitmyst2 жыл бұрын

    Inspiring well done ! Can you use gf and ga direct columns in your predictors with no using rolling_avarage function ? Now imagine you can get a very good algorithm for prediction after you save the model , how do you use this algorithm for the next season to predict games ?? Can you give me a clue ? For example sesson 2022 - 2023 to predict one game? thank You

  • @tomi4tv126

    @tomi4tv126

    5 ай бұрын

    You have to use rolling averages because when you try to predict the outcome of the match (before it has started) you wont know gf and ga yet. But we know average gf and ga of last 3 games the team has played. Model can be used for new seasons, but the problem is data. You will have to gather data about games after this video. That is the tricky part, but he made also video before this one about Web scraping (getting new data direct from web). Or maybe you can find some updated data set online (maybe Kaggle). From my experience, those data sets you find online wont have more detailed statistics of game, so it would be best to web scrape the data yourself.

  • @jacobdebrone
    @jacobdebrone7 ай бұрын

    interesting stuff bro You just got yourself a subscriber

  • @mirror1023
    @mirror1023 Жыл бұрын

    When creating the new columns using rolling_averages, we lost the first few games of the season when we dropped na rows. We also carried rolling averages into other seasons. How do we fix this?

  • @tomkmb4120
    @tomkmb4120 Жыл бұрын

    What's a good way to split data for training, test if it doesn't contain something like a DateTime component?

  • @paulohss2
    @paulohss2 Жыл бұрын

    Great content! May I just ask why you did the division at the end of the tutorial? It was 27 / 40. From where the '40' figure came from?

  • @rishavmishra5786

    @rishavmishra5786

    Жыл бұрын

    its 27 for 1 and 13 for 0 , totaling 27+13=40. and weight of 1 in total weight of 40. 27/40

  • @sushik.8043
    @sushik.80436 ай бұрын

    Where can I find a whole spreadsheet like this but for the NFL or NBA?

  • @cevikyi
    @cevikyi2 жыл бұрын

    Hi, thanks for the great video. Why didn't you involve "team" as a predictor in each model as you've used opponent team information? Doesn't this miss the relationship between team A vs team B and so on?

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Hi Yigit - great question. You are welcome to try it with team and measure error. The reason I didn't use it is because using a column like that can have a tendency to overfit. Some teams have performed really well in the last few seasons, but that doesn't necessarily mean they'll perform well in the future.

  • @cevikyi

    @cevikyi

    2 жыл бұрын

    @@Dataquestio Thanks for the guidance!

  • @johnowusukonduah2305
    @johnowusukonduah2305 Жыл бұрын

    Is it positive to add the concept of time series to model the performance behavior of teams in the epl?

  • @matilda_aaaaa
    @matilda_aaaaa Жыл бұрын

    Hi Excellent video and thanks for this. I want to know how I can calculate the rolling averages on sql as I’m not proficient in python

  • @uncaged3076
    @uncaged30763 ай бұрын

    Is there anyway I can reference your work? I am trying to use the idea of rolling averages on a project

  • @acegameboy6232
    @acegameboy6232 Жыл бұрын

    I just finished writing this out and for the most part it works except for this line: combined, error = make_predictions(matches_rolling, predictors + new_cols) error: ValueError: Found array with 0 sample(s) (shape=(0, 12)) while a minimum of 1 is required This line in particular is giving me trouble in both the one I hand wrote myself and copying and pasting your program. I've looked through the code and some forums but nothing seems to be wrong. I think maybe it could be a year issue in that the way to write this out has changed as time went on and that this form of writing it is old. I'm not sure what the issue is so if someone could help me out that would be great. I'm planning to use this as an American Football predicter to see if the program will be able to predict which team will win. I'm doing it primarily because of my cousin and his fondness for fantasy football. It got me a little interested in the sport but I figured I'd create a model to make things a little fun for me.

  • @jamespapworth1477
    @jamespapworth14775 ай бұрын

    Why do you use RandomForest Classifier for this? Is it superior in someway for this application as compared to other Machine Learning models eg KNN, ANN etc

  • @danielgonzalez5052
    @danielgonzalez5052 Жыл бұрын

    Hi Vikas! When doing the rolling part I'm facing an issue that says: "closed only implemented for datetimelike and offset based windows" You know what can be the problem? Thank you!

  • @danielgonzalez5052
    @danielgonzalez5052 Жыл бұрын

    Hi Vika, amazing tutorial! I have one question, how should we treat the ties in this model? Thank you!

  • @Dataquestio

    @Dataquestio

    Жыл бұрын

    It's up to you. You could make this a 3-class classification problem, and code loss as 0, tie as 1, win as 2. You can also do what's done in the video, and code a tie as a loss.

  • @Makako_Loko
    @Makako_Loko4 ай бұрын

    First of all, thank you for this video. I have a doubt, how do I apply this to future matches that will happen? How do I put it in the ML?

  • @Rip_Ta4
    @Rip_Ta48 күн бұрын

    I was looking to extend this, however there would be a problem extending the data. The one problem with these types of predictory models is that there are financial takeovers, financial problems, key players coming in and leaving, player injuries, etc. For example, the massive spending on the Chelsea squad, and them actually doing worse, and that is something that a AI most likely would not be able to predict.

  • @megwedgomaa7831
    @megwedgomaa78312 жыл бұрын

    Amazing work!!!

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Thank you! -Vik

  • @kevwhiteford5167
    @kevwhiteford516710 ай бұрын

    Is there a quick way to add and predict up and coming matches?

  • @alessandrocerri5668
    @alessandrocerri56685 ай бұрын

    HI, I have a question, everything was built without taking into consideration the matches that still have to be played so there is no real prediction of future matches but only on those already actually played, correct?

  • @Captain_Roy16
    @Captain_Roy163 ай бұрын

    Can we implement something like Fixture difficulty code and predict more accurately?

  • @torezo9028
    @torezo90284 ай бұрын

    Is there a recently updated data set?

  • @bn_ln
    @bn_ln Жыл бұрын

    this is seriously underrated content

  • @Dataquestio

    @Dataquestio

    Жыл бұрын

    Thanks, Ben!

  • @berrauniverse
    @berrauniverse7 ай бұрын

    Did this using logistic regression with binary classification and achieved a 70% precision. Used different parameters for training the model though. Also had to put the sleep time to 10 seconds when scraping to avoid 429 HTTP response.

  • @cgruita

    @cgruita

    3 ай бұрын

    Wow, 70% precision is very impressive! What did you use? XGBoost, LightGBM?

  • @johanBe75
    @johanBe75 Жыл бұрын

    So many great Reviews, but yet just youtube!

  • @adrianfong4347
    @adrianfong4347 Жыл бұрын

    Hi Vik! I am learning so much through this video and decided to try adopt it to NBA data too:) . I am running into an issue where I merging the combined dataframe with on left_on = game_date, team and right = game_date, opponent. However, my new merged table is blank. My theory is that despite my data having the same 3 letter abbreviations for the teams (LAL, WAS, CHI, etc) in both the team and opponent, python is saying they aren't the same and not joining the tables. They are both 'object' data types (if that matters...). Any recommendations on how I can make them identical? Thank you!

  • @Dataquestio

    @Dataquestio

    Жыл бұрын

    Hi Adrian - do you actually have data from both sides of the match? For example, if LAL played WAS, you would need a row where WAS is the team and LAL is the opponent, and a row where LAL is the team and WAS is the opponent for the same game day. If you don't have this, you would need to create those rows (by duplicating the dataframe then swapping team and opponent) before merging.

  • @Dataquestio

    @Dataquestio

    Жыл бұрын

    You would also need to swap points for/against, etc.

  • @FlisB

    @FlisB

    Жыл бұрын

    Did you scrape the data from basketball-reference?

  • @kevinbarnes4474
    @kevinbarnes4474 Жыл бұрын

    This is great, using goal-scoring/concession stats more (home and away) could also help with refining accuracy.

  • @stephenbube965
    @stephenbube965 Жыл бұрын

    am new to this.....was asking how one can get the predictions from the machine learning, am stuck at the combined precision stage and cant find a way of extracting future predictions.any help will be highly appreciated

  • @youtubeuser4878
    @youtubeuser48782 жыл бұрын

    Hi Vikas. Thanks for the tutorial. At the end, you mentioned we can use other data points however we can't use attendance because we only know that after the game is over. Isn't that the same for other data points like shots on target, distance, etc?

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Hi there - some of the data points, like whether the game is home or away, you can use data from the current game as predictors. So if you're trying to predict if Arsenal will win on 7-10-2022, you can use data about whether the game on 7-10-2022 is at home or away. For other columns, like shots on target and attendance, we don't know the data for the current match until it ends. So we instead use an average of data from past matches (before 7-10-2022).

  • @chasingwildlife6584

    @chasingwildlife6584

    Жыл бұрын

    Yes the data points like shots can't be known in advance. We use the old data, let's say the last three games in like in the video. The attendance of previous matches has no bearing (none that we know about) on the next match. However the number of shots taken in the last three games can be an indicator of what it might be in the game we are predicting.

  • @Kiirby1x
    @Kiirby1x5 ай бұрын

    Hello, could someone explain to me how I could input future games for it to make a prediction?

  • @Skeeyeee613
    @Skeeyeee613 Жыл бұрын

    Thank you very much for such wonderful content. When I try running your line 65 I'm getting an error saying mapping is not defined. Any suggestion?

  • @johanBe75

    @johanBe75

    Жыл бұрын

    it is fake tutorials with clickbait. Just look at reviews so many of then so great isn´t it?

  • @obaidulmostafa3384
    @obaidulmostafa3384 Жыл бұрын

    Which algorithm did you use to complete this project, Brother?

  • @mrcaljoe1
    @mrcaljoe1 Жыл бұрын

    37:50 what does the ** before map_values do?

  • @chottomtaki
    @chottomtaki2 жыл бұрын

    Thanks for the very interesting training, can you please provide the one relating credit scroring modeling for

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Thanks for the suggestion - I'll consider it for a future video.

  • @velsiu
    @velsiu6 ай бұрын

    how to use it to predict future matches from like today or tomorrow ?

  • @bonifaceboban368
    @bonifaceboban368 Жыл бұрын

    i got an error like this after writing below code can you please explain how to resolve it preds = rf.predict(test[predictors]) NotFittedError: This RandomForestClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

  • @madebymate4870
    @madebymate4870 Жыл бұрын

    This is a very great video, but i don't understand exactly how to predict the individual matches. what parameters and how should i put in rf.predict() if i want to have the outcome of a single match?

  • @royalzikhali5295

    @royalzikhali5295

    10 ай бұрын

    did you ever find the answer

  • @nonsobismark1846
    @nonsobismark1846 Жыл бұрын

    Great work... By is there any prediction sites where you update the predictions

  • @Dataquestio

    @Dataquestio

    Жыл бұрын

    Thanks! There is no live site yet, but someone can make one with this code :)

  • @tomphillips5513
    @tomphillips55134 ай бұрын

    I have seen a lot of other people ask this in the comments, but there hasn't really been a solid reply... how can you apply this to predict the results of matches that haven't occurred yet? Because this is all well and good to split the data into parts that the ML algorithm sees and does not see, but it is pretty useless when applying it to life because we already know the result of that game that occurred, even if the ML doesn't. Could someone either explain to me what I am missing, or suggest the next steps for predicting matches of which there is limited data recorded already?

  • @sakariyaqaase6773
    @sakariyaqaase67732 жыл бұрын

    thanks Vic, i tried to run the rolling average function but it's give me this error value ValueError: closed only implemented for datetimelike and offset based windows

  • @martincal7115

    @martincal7115

    Жыл бұрын

    I'm having the same issue. Did you find a way to fix it? Thanks

  • @alexjamarco
    @alexjamarco2 жыл бұрын

    Hi Vikas. Very nice tutorial. I was able to code all along and i was my first ML project. Seems awesome how the computer predicts stuff like this. I have a question: we have our training and testing datasets, right? How can we ask the algorithm to predict an event that it's not on the training data? For example, let's say I have a csv of next weekend's matches. How Can I ask the algorithm to try to predict the winner? Sorry if it seems a silly question, but I actually couldn't find a more clearer way to ask. Thanks and well done once again!

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Hi Alexandre - you'd basically put the information for next weekend's matches (opponent code, venue code, rolling averages, etc) into a new testing set, and then make predictions on that set.

  • @kennedyogutu4099

    @kennedyogutu4099

    2 жыл бұрын

    Feed your data into your trained model.

  • @amragl

    @amragl

    2 жыл бұрын

    @@Dataquestio Hi Vikas, would it be possible to explain it in a different way? I still don't understand it. Many thanks for your videos!!

  • @DarkCode
    @DarkCode4 ай бұрын

    I'm trying to predict who will win the NHL championship, their divisions, and the rest of the regular season l. I need help with this project, I will be using machine language. I'm using colab. I need help with this. Any takers? Any and all help, would help!

  • @benjaminmwangi6872
    @benjaminmwangi6872 Жыл бұрын

    Hi, 1. Kindly suggest a roadmap for me to adequately comprehend this project. I have no experience in the field nor programming background. 2.How do I run this project in the meantime as i upscale my skills? Awsome tutorial. Got yourself a believer.

  • @Dataquestio

    @Dataquestio

    Жыл бұрын

    I would recommend following the data scientist path at Dataquest - www.dataquest.io/path/data-scientist/ . This will help you learn all of the skills (including programming) to build this model.

  • @kavinpandian
    @kavinpandian25 күн бұрын

    great tutorial!

  • @andreeadumitrescu1717
    @andreeadumitrescu171710 күн бұрын

    Hi! What if I have all the data in a .txt file, one column, and separated rows? How do I translate that in a dataframe? Exemple: FT Greece 3 - 0 Italy Sunday 12/04/2008 FT France 1 - 2 England ....and so on

  • @user-tg7mz2qh7s
    @user-tg7mz2qh7s Жыл бұрын

    Big thanks for this video! Helped me a lot! Tried this method on my project with soccer data analysis and everything went fine until this function: "def make_predictions(data, predictors):". Got KeyError: "["rolling_cols"] not in index". Any advice on solving this issue? Thanks in advance!

  • @bigtomDW

    @bigtomDW

    Жыл бұрын

    " predictions + new_cols " seems to be my issue. having predictions by itself doesnt throw the error.

  • @amragl
    @amragl2 жыл бұрын

    Hi!, I don't think I understand how you can use the rolling_average cols on the predict dataset, you wouldn't have that information until after you match is finished, right? so, how can those columns be used in the predict dataset? , Many thanks for your great videos and content! Well explained and very educative.

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Hi there - the rolling average is computed on matches prior to the current one. We don't use any knowledge of the current match. -Vik

  • @amragl

    @amragl

    2 жыл бұрын

    @@Dataquestio many thanks for taking the time to respond!! You and your learning platform are awesome 😎!!!

  • @doll0101
    @doll0101 Жыл бұрын

    Please somebody help me to plot a graph for output!(source code) pls pls

  • @agdaltarek
    @agdaltarek2 жыл бұрын

    hello, my question is how would you deal with predicting newly promoted teams results ? especially teams that maybe are promoted for the first time in a very long time.

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    This is a tricky one. You could build a separate model to predict how well a team will do in the first season after promotion based on lower league results.

  • @agdaltarek

    @agdaltarek

    2 жыл бұрын

    @@Dataquestio yep maybe based on previous promoted teams, i thought about that

  • @meetupadhyay9687
    @meetupadhyay9687 Жыл бұрын

    Hey what is train test percentage?

  • @hristolakov3563
    @hristolakov3563 Жыл бұрын

    Why are we only looking at matches that have been played? I mean, i understand it for the learning part and the back testing, but the machine hasn't actually predicted a match, that hasn't been played, from the date of the video going forward. That would have been useful. Is it like we just have to add these upcoming matches to the matches.csv? It is what i am trying to do, but it is pretty tough for a beginner, like me. Will push harder, hopefully find a solution. Thank you for the video and the great explanations.

  • @hristolakov3563

    @hristolakov3563

    Жыл бұрын

    When we merge the 'matches' with 'shooting', we basically get rid of all the future matches. I should probably keep the not-played matches in the list somehow with NaN values under shooting?

  • @Dataquestio

    @Dataquestio

    Жыл бұрын

    If you want to predict future matches, you can just feed them into the prediction methods. The reason we remove the rows where matches haven't been played is because we can only use data for training if we know the outcome. But once we train a model, you can feed that data in to get future predictions (the same way we feed in the test set).

  • @anlgoy9386
    @anlgoy9386 Жыл бұрын

    My English is weak, so I'm using Chatgpt for translation. Can we combine data from different websites to create a CSV file and analyze it to increase our chances of winning? For example, we could gather match data and odds from Flashscore, voting results from Oddsportal for each match , and win/loss probabilities from Tablesleague. Then, we could use artificial intelligence to create a prediction program. Would you be interested in this?

  • @manasseholowoyeye3236

    @manasseholowoyeye3236

    11 ай бұрын

    did you later discover any means or do you use any app currently?

  • @gabriel.o.michael9549
    @gabriel.o.michael95492 жыл бұрын

    I have to say, you're a natural educator. If you haven't, please consider teaching a younger audience. I bet you'll be good at it.

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Thank you, Gabriel! I really appreciate that. -Vik

  • @joshuakanatt7552
    @joshuakanatt75522 жыл бұрын

    Sir can you help me out on this ? How to get the data set of player salary, Contract period and statistics in a single dataset from NBA ? if its seperate data sheets it might not be easy to combine it .

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Hi Joshua - I don't know of a single table where you can get all of that data, but you might want to look at www.basketball-reference.com/ . You could either scrape and combine multiple datasets, or find a single table with the data you want.

  • @joshuakanatt7552

    @joshuakanatt7552

    2 жыл бұрын

    ​@@Dataquestio Thanks, Got it. Really help full contents from your channel.

  • @karolkowalewski9832
    @karolkowalewski98327 ай бұрын

    Great video

  • @user-qk3tt4fs9n
    @user-qk3tt4fs9n2 жыл бұрын

    Hi, thanks for amazing video. can you give me the link to the website/anything that you took the csv file from?

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Hi there - there's a previous video about scraping the data - kzread.info/dash/bejne/gKhruayaYszbYNY.html .

  • @user-qk3tt4fs9n

    @user-qk3tt4fs9n

    2 жыл бұрын

    @@Dataquestio Thanks!!

  • @shaunhankey
    @shaunhankey2 жыл бұрын

    How could you include draws? I've been playing around with the code and data from another source, but I seem to only be able to predict 'win' or 'lose' e.g. 1 or 0. Pls help! Tutorial was awesome!

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Hi Shaun - you would have a multiclass classification problem. So just code `win` as 1, draw as `2`, and loss as `3`, then you can use the same technique in the video. Alternatively, you can try to frame this as a regression problem where you're predicting the point spread. So your target would be the difference between the team's score and the opponents score (win would be greater than 0, draw is 0, loss is less than 0). This will be more accurate. Either way, you're changing the target (what the algorithm is trying to predict).

  • @shaunhankey

    @shaunhankey

    2 жыл бұрын

    @@Dataquestio Thank you! That’s very helpful, really appreciate it 👍🏼

  • @dcr7417

    @dcr7417

    Жыл бұрын

    @@shaunhankey Hi Sean, have you tried this? How would one change this code: matches["target"] = (matches["result"] == "W").astype("int") to add draws as a target ?

  • @ericmckee8007
    @ericmckee80072 жыл бұрын

    Thank you greatly, this has been extremely helpful. I ran into a KeyError issue when running make_predictions telling me that all of the rolling columns were not in index (gf_rolling,..). Do you have an idea as to why this is happening? I followed the code exactly, so I'm not sure what is causing this... If I remove "+ new_cols" when calling the function it works fine. Thanks again

  • @Dataquestio

    @Dataquestio

    Жыл бұрын

    Hi Eric- this would happen if the new columns aren't in the matches_rolling dataframe. This is the code that adds the columns - "matches_rolling = matches.groupby("team").apply(lambda x: rolling_averages(x, cols, new_cols))"

  • @PeterKrusz91
    @PeterKrusz912 жыл бұрын

    At line 30, on the 17:49 mark, when we run, preds = rf.predict(test[predictors]) , I get a ValueError, "ValueError: Found array with 0 sample(s) (shape=(0, 4)) while a minimum of 1 is required." Is anyone running into a similar issue?

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Hi Peter - I'm guessing your test set is empty. You might want to check your code that splits the train and test set up. -Vik

  • @acegameboy6232

    @acegameboy6232

    Жыл бұрын

    @@Dataquestio what about line 58? I get a ValueError saying ValueError: Found array with 0 sample(s) (shape=(0, 12)) while a minimum of 1 is required What can I do to fix this? I typed everything in correctly and I even did it 5 times and it gives the same result.

  • @jamshidnoori1496
    @jamshidnoori14962 жыл бұрын

    why I get this error = TypeError: list indices must be integers or slices, not list after I write this code rf.fit(train[predictors],train['target']).Thanks

  • @xsquirrel7091

    @xsquirrel7091

    2 жыл бұрын

    Because you are putting a list as a list indice. In this case you have probably forgot to put the quotation marks in train['predictors'].

  • @Dataquestio

    @Dataquestio

    2 жыл бұрын

    Hi Jamshid - `train` should be a DataFrame, but it looks like you might have it stored as a list. The full code is here if you want to compare - github.com/dataquestio/project-walkthroughs/blob/master/football_matches/prediction.ipynb .

  • @jamshidnoori1496

    @jamshidnoori1496

    2 жыл бұрын

    @@xsquirrel7091 Hi, Thank you very much. I have already put " predictors" as variable to choose de columns name. like this ( predictors = ['venue_code','opp_code','hour','day_code']).

  • @jamshidnoori1496

    @jamshidnoori1496

    2 жыл бұрын

    @@Dataquestio Great work thanks

  • @jamshidnoori1496

    @jamshidnoori1496

    2 жыл бұрын

    Yes , you are right. I passed the 'train ' and " test " as a list not dataframe. train = [matches[matches["date"] test = [matches[matches["date"] > '2022-01-01']] But should be like this train = matches[matches["date"] test = matches[matches["date"] > '2022-01-01']

  • @user-zr4ue2iv8l
    @user-zr4ue2iv8l Жыл бұрын

    is it ANN model???

  • @NguyenNamDuong-kx4gu
    @NguyenNamDuong-kx4gu Жыл бұрын

    can you do it for the future :( i really need it

  • @robnotaro8584
    @robnotaro858410 ай бұрын

    How do you use this to predict the upcoming weeks matches??

  • @Denis-bu4ri

    @Denis-bu4ri

    4 ай бұрын

    +