Predicting Horse Race Winners Using Advanced Statistical Methods

Ғылым және технология

Conditional Logistic Regression with Frailty applied to predicting horse race winners in Hong Kong.
www.helios.ai
Since first proposed by Bill Benter in 1994, the Conditional Logistic Regression has been an extremely popular tool for estimating the probability of horses winning a race.
I propose a new prediction process that is composed of two innovations to the common CLR model and a unique goal for parameter tuning . First, I modify the likelihood function to include a "frailty" parameter borrowed from epidemiological use of the Cox Proportional Hazards model. Secondly, I use a LASSO penalty on the likelihood, where profit is the target to be maximized. (As opposed to the much more common goal of maximizing likelihood.)
Finally, I implemented a Cyclical Coordinate Descent algorithm to fit the model in high-speed parallelized code that runs on a Graphics Processing Unit (GPU), allowing me to rapidly test many tuning parameter settings.
Historical data from 3681 races in Hong Kong were collected and a 10-fold cross validation was used to find the optimal outcome. Simulated betting on a hold out set of 20% of races yielded a return on investment of 36.73%.

Пікірлер: 58

  • @tylergramling424
    @tylergramling4244 жыл бұрын

    this was way before its time. Thanks for the great upload!!

  • @1minnows
    @1minnows5 жыл бұрын

    This is all nice, but tell me which horse is going to win the first race at Aqueduct tomorrow.

  • @vwazp
    @vwazp6 жыл бұрын

    Mr. Silverman, thanks for your talk. I'm wondering if you can give any suggestions on possible sources to turn to for a novice without a statistical background and wants to bet on horses using statistically proven methods. Thank you.

  • @thegoodbetdotcom3069
    @thegoodbetdotcom30695 жыл бұрын

    Five years old but still an interesting talk. I thought about adding some thoughts on a few points but I realise there are so many different ways to predict races that my thoughts probably won't make a difference. Bill Benter's story (and his associates) is pretty far out considering the technology of the time when he was doing his thing with horse racing. Maybe he still is betting along with his academic pursuits, I don't know. Personally I bet on horses every day using my own spreadsheet formulas but the latest thing is with the help of a data science guy we are developing a deep supervised-learning ann using the parameters I know work best for the data sets. It's working somewhat but time will tell how well that goes. As for ROI, accuracy of prediction and staking methods, I believe that when someone is getting tangible results over a period of time they probably won't be telling others how it's done. Even if they do I remember something Bill Benter said which really resonated with me and that was many people don't want to roll up their sleeves and do the hard work. The countless hours I have put into writing formulas or making data sets, lol, I don't even want to think about it. Anyway, it's all fun :)

  • @mattwilsn
    @mattwilsn5 жыл бұрын

    Noah, Got two questions for you; 1) Does using conditional probability give you any advantage over just using the probability and ranking the horses grouped by race? 2) Are you able to provide any detail on the features that you've used? I'm looking at doing something similar for my MSc dissertation. thanks, Matt

  • @rdomer2010
    @rdomer20107 жыл бұрын

    Thanks for a great overview of your modeling. Are you using any open source libraries to do your conditional logistic regression and the LASSO optimization? Did you write this in C++ for the MAC? Thanks for any information you can provide on your algorithms.

  • @NoahSilverman

    @NoahSilverman

    7 жыл бұрын

    This project was all custom code. Some in R and some in C++

  • @gabrielbejenaru2549
    @gabrielbejenaru25494 жыл бұрын

    The problem we , all the gamblers face in the end is: how race is going to develop knowing that the betting companies know prior to the start 1.) The amount of bets placed on a particular horse and 2.) The value of this bets.. Basically every thing can be controlled (manipulated) for the benefit of betting companies, otherwise this companies will cease to exist. If I misspelled something then please Pardon my French, but English is not my first speaking language

  • @SergeantKeel
    @SergeantKeel8 жыл бұрын

    Hi Noah, thanks for the great talk. I was wondering how you came up with 186 variables?! And how many of these did LASSO manage to get rid of?

  • @NoahSilverman

    @NoahSilverman

    7 жыл бұрын

    Thanks James, The 186 variables were deduced from reading a ton of literature on the subject, speaking to experts, and a lot of trial and error. The LASSO I used was an L2, so some variables were pushed to small number, but none to 0

  • @TrueSaintly

    @TrueSaintly

    7 жыл бұрын

    You can break down handicapping factors in a variety of ways. Average prizemoney won by jockey, horse's success from outside barrier, jockey/trainer strike rate for the last 12 months. A lot can be made redundant but one of the more successful high profile horse players Alan Woods used something like 130+ factors.

  • @Ricatellez682
    @Ricatellez6824 жыл бұрын

    I need more information about econométrica method and betting Sports, please

  • @vishwajithkp1418
    @vishwajithkp14188 жыл бұрын

    Dear Dr.N Silverman can you please help me to find the parameters for benter correction in harvile formula. How to get maximum likelihood estimator on a sample of past data.

  • @robertspence8638
    @robertspence86387 жыл бұрын

    Hey Noah, excellent talk. How did you get the .3 to .4 correlation between the odds and rank outcomes? Is that a number that you computed or something that comes from the academic literature? if you could provide a reference I'd be very grateful. Thanks.

  • @NoahSilverman

    @NoahSilverman

    7 жыл бұрын

    Empirical correlation from dataset. If you want a formal "academic literature" reference, see my paper published on the topic.

  • @chevalierdeloccident5949
    @chevalierdeloccident59496 жыл бұрын

    Judging from the video description the betting public underestimates the winning chance of a horse in 2 out of 10 races in Hong Kong, enough to overcome the track takeout over the long term. Is that correct? What betting strategy was simulated? Flat betting the bare minimum or a fixed proportion of the bankroll? This is important to know because horse racing typically doesn't encourage the implementation of a Kelly Strategy with a large bankroll relative to the size of the parimutuel pool.

  • @NoahSilverman

    @NoahSilverman

    6 жыл бұрын

    For that academic study, I used a fairly standard Kelly strategy. In "real life", it would be something more complex to manage risk

  • @shanwu2739
    @shanwu27396 жыл бұрын

    Super work , I am a Chinese and it's very interest in Hong Kong racing research. How I can learn that .and using your data for it

  • @vishwajithkp1418
    @vishwajithkp14188 жыл бұрын

    Dear Dr. Noah Silverman.!! Thanks for uploading such a informative video, for my knowledge it is little hard to understand. I want to know how the parameters for Benter correction in Harville formula can be obtained. Thanks in advance.

  • @NoahSilverman

    @NoahSilverman

    8 жыл бұрын

    +VISHU JITH You,'ll have to find that one on your own.

  • @vishwajithkp1418

    @vishwajithkp1418

    8 жыл бұрын

    +Noah Silverman Thanks for your immediate response. Sorry, that was a typological mistake. I mean how the parameters for Benter correction can be obtained.

  • @jbeaz11
    @jbeaz118 жыл бұрын

    Noah Silverman, how can i get a copy of your study and use it to apply to U.S. horse racing.

  • @NoahSilverman

    @NoahSilverman

    8 жыл бұрын

    +Joe Beasley Data Science Ltd offers consulting services for the gaming markets.

  • @jbeaz11

    @jbeaz11

    8 жыл бұрын

    +Noah Silverman what's their website address?

  • @NoahSilverman

    @NoahSilverman

    8 жыл бұрын

    +Joe Beasley www.datascience.io

  • @NoahSilverman

    @NoahSilverman

    7 жыл бұрын

    New website: www.helios.ai

  • @damien2198
    @damien21988 жыл бұрын

    Thank you so much Since then, have you played with LSTM or Conv on this project or similar ? any better results ?

  • @NoahSilverman

    @NoahSilverman

    8 жыл бұрын

    I have not. The challenge with any ANN is setting up the conditional probability (the probabilities for horses in a race must sum to 1.0)

  • @michaelbarson9898
    @michaelbarson98989 жыл бұрын

    Hello Noah, I have been doing similar things with a Benter style two step regularised conditional regression on Australian races and have read your dissertation thoroughly. My question is, using the frailty/strength term from the odds has a effect similar to using the Kelly criterion? You are weighting horses that your model favours more than the public (odds) with a greater final probability? Are you then placing a uniform bet across all races? Wouldn't that be the same as finding a win probability that is un-weighted by the odds and using a Kelly bet to modify your stake to maximize your winnings?

  • @NoahSilverman

    @NoahSilverman

    9 жыл бұрын

    ***** The two are not mutually exclusive. You can use weights in training AND Kelly for betting. They're separate things.

  • @michaelbarson9898

    @michaelbarson9898

    9 жыл бұрын

    Noah Silverman Thanks for replying. I suppose you can do both, and I guess they both do a similar thing. Interesting to see how a Kelly strategy works for your already weighted system, could be more robust due to the regression but also more non-linear as similar information is being used twice. Thanks again!

  • @tonzafundetsme
    @tonzafundetsme9 жыл бұрын

    Noah, was the quoted ROI calculated off closing prices?

  • @NoahSilverman

    @NoahSilverman

    9 жыл бұрын

    Daniel Wishart I don't actually remember. This talk was from several years ago, and things have advanced significantly beyond the work presented.

  • @joshcolbert5613
    @joshcolbert56134 жыл бұрын

    Is this only optimal at Hong Kong could this be used a Fonner Park in Nebraska?

  • @bodylove2009ab
    @bodylove2009ab4 жыл бұрын

    by the way, benter had hired journalists so they could get him some insider info.

  • @samiab6077
    @samiab60774 жыл бұрын

    at 4:05 if I remember my 8th-grade math correctly does ∝ mean that there is a constant in the formula or am I an idiot?

  • @dennismontoro7312
    @dennismontoro73127 жыл бұрын

    are you saying you would combine the public's implied odds (strength) with your coefficients? you're using public odds as a coefficient?

  • @NoahSilverman

    @NoahSilverman

    7 жыл бұрын

    Lot of racing models use the public odds as *one* of several factors. There is information in there.

  • @NoahSilverman

    @NoahSilverman

    7 жыл бұрын

    And, to clarify: We have "factors" in the model, and then use machine learning techniques to estimate the coefficients (weights of the factors). So, the public odds is a "factor" not a coefficient

  • @dennismontoro7312

    @dennismontoro7312

    7 жыл бұрын

    so this differs from benter slightly as he suggested running a second logit model with combined public estimate and your fundamental estimate?

  • @NoahSilverman

    @NoahSilverman

    7 жыл бұрын

    There are many ways to do this.

  • @dennismontoro7312

    @dennismontoro7312

    7 жыл бұрын

    Noah Silverman last question, do you think your model's rsquare outperforming public model rsquare is a good indicator of potential success (along with OOS testing for ROI)?

  • @acwchangs
    @acwchangs7 жыл бұрын

    What will happen if a quantum computer give you a optimized result in fraction of second, and ruined the whole industry?

  • @NoahSilverman

    @NoahSilverman

    7 жыл бұрын

    Nice fantasy, but things don't work that way. Just because a machine is "quantum" doesn't mean it has infinite insight into any phenomenon in the world.

  • @acwchangs

    @acwchangs

    7 жыл бұрын

    but if you have a model, then all the way out is get a optimized answer, which i think the quantum machine D-wave in google can do the rest of answer, isn't it?

  • @pwnycny
    @pwnycny6 жыл бұрын

    Unless someone has inside info about a race, there is no reliable way of predicting the outcome of a thoroughbred horse race. There are too many variables, not the least of which is the horse itself, whose temperament and condition at post time is known only to the horse, and the horse is keeping that a secret. The fact that even the most successful jockeys win only a small fraction of their races is proof that, presuming that the races are legitimate, the outcome is not a sure bet. Recently, in a maiden claiming race, the 75 to 1 longshot won by two lengths while the 6 to 5 favorite came in eighth. Predicting races is entertaining, but don't expect the horses to cooperate. They have other concerns that have nothing to do with money.

  • @NoahSilverman

    @NoahSilverman

    6 жыл бұрын

    I respectfully disagree (of course)

  • @MikeKleinsteuber

    @MikeKleinsteuber

    5 жыл бұрын

    Tell Bill Benter that there's no reliable way of predicting the outcome of a horse race lol

  • @3DComputing

    @3DComputing

    5 жыл бұрын

    "Predicting races is entertaining, but don't expect the horses to cooperate. They have other concerns that have nothing to do with money." LOL FOFL My coffee nearly came out of my nose. GOOD ONE

  • @lklim3914

    @lklim3914

    5 жыл бұрын

    You would have to use Kelly and the law of large numbers to mitigate uncertainty and bad luck. Is that what you would do Noah?

  • @wesley621375

    @wesley621375

    5 жыл бұрын

    The reason why people can win money in Hong Kong field is that the pool is a pari-mutual pool with many punters without intelligent that there are rooms of different between the probability and odds

  • @Crispytastyduck
    @Crispytastyduck5 жыл бұрын

    Soo.... Have you made your billions yet?

Келесі