GDC
6 жыл бұрын
305,506
1

Three Statistical Tests Every Game Developer Should Know

In this 2016 GDC session, Insomniac Games' Elan Ruskin gives a how-to on statistics for answering questions like "does this new camera control scheme make players happier?", "how many players do I need to test this design change on to prove whether it works better?" and "does the framerate really get faster when I do this thing or is it just a fluke of measurement?
Register for GDC: ubm.io/2gk5KTU
Join the GDC mailing list: www.gdconf.com/subscribe
Follow GDC on Twitter: / official_gdc
GDC talks cover a range of developmental topics including game design, programming, audio, visual arts, business management, production, online games, and much more. We post a fresh GDC video every day. Subscribe to the channel to stay on top of regular updates, and check out GDC Vault for thousands of more in-depth talks from our archives.

Пікірлер: 225

@iestynne2 жыл бұрын
I'm proud to have worked with Elan for several years. As you can tell, he always puts a great deal of effort into preparing for his presentations. Amazingly though, this is actually his normal level of conversational speed, clarity and humor :)
@snarfymcsnarfface2323
2 жыл бұрын
I thought he was just nervous or trying to fit in in a small time lol
@0netwoguy54
2 жыл бұрын
Wait what do you mean "normal"? Does he have a turbo mode???
@Dekharen
2 жыл бұрын
@@0netwoguy54 GAS GAS GAS
@ToriTheChicken6 жыл бұрын
Some of the GDC talks are very badly presented for KZread videoes. Not this one. This was great, in just about every way.
@PrimerBlobs2 жыл бұрын
"Any actual statisticians are totally cringing." Yep. It's not just pedantry. People will literally not know what their test means, and then they will judge whatever change they make in hindsight anyway.
@aleksaa24
6 ай бұрын
funny seeing you here, love your vids
@Attewir
5 ай бұрын
Easier to digest and more accurate statistics content on PrimerBlobs's channel And the currently 1.7 million subscribers agree
@ReeseEifler6 жыл бұрын
Not only is this an amazingly useful talk, it's essentially a perfect presentation. Dope shit.
@WhiteThunder121
2 жыл бұрын
@CruzZ fake news
@dontfk
2 жыл бұрын
@CruzZ what are you talking about. This guy provided a ton of real world examples where statistics could help solve a problem. That doesn’t mean people will always use statistics for good though, he even mentions that in the presentation with an example. Just because big gaming companies suck at stats doesn’t mean his presentation wasn’t phenomenal!
@ailurusfulgens1849
2 жыл бұрын
@@dontfk Big gaming companies most definitely do not suck at stats, if anything, that's the one thing they master above all else. It's just that most statistics are not relevant to the players enjoyement. They are very relevant to shareholders tho.
@dontfk
2 жыл бұрын
@@ailurusfulgens1849 You're right, I used poor word choice there. What I meant by that was that they don't always use their stats for good intentions
@_lime.2 жыл бұрын
13:00, this is a really good one. With Minecraft, Mojang came to a realization that very few players had ever been to the Nether (based on the percent of the population that had the achievement "We need to go deeper!" which is received upon entering the Nether). The ended up realizing that very few non-hardcore players (players that didn't consume game related content outside of the game, like videos, guides, articles, etc...) knew that the Nether existed. This is why the added obsidian monoliths and broken portals around the Overworld to give you hints.
@Sarmachus
2 жыл бұрын
Where did they say this? I’m having a hard time finding it.
@_lime.
2 жыл бұрын
@@Sarmachus Sorry I can't remember exactly. I saw it in a game development video a year or so ago, and I believe he flashed a tweet from one of the MC devs on the screen. Regardless of it's authenticity it still serves as a good example and valuable lesson.
@Sarmachus
2 жыл бұрын
@@_lime. Thanks for clarifying
@infcaat5 жыл бұрын
wow, he is a fantastic speaker. charismatic, to-the-point, funny and practical.
@NunSuperior6 жыл бұрын
Thanks for the talk. I had to learn this stuff on the job at Big Software, Inc. when we started measuring PC boot time impact. There were large variances between each boot.
@colinstreck7102 жыл бұрын
That was fantastic. Their presentation skills are off the charts.
@kittykittylicization2 жыл бұрын
As a Biologist (MS)... i was indeed shouting at my screen when you were talking about P values....and then you called it out so im happy now.
@Discipol6 жыл бұрын
Excelent presentation. THIS was simplified? I am afraid of the scenic route xD I wish to know more, and more practical applications on game dev.
@2Cerealbox
6 жыл бұрын
Extremely simplified. Statistics is, like, a whole field of mathematics.
@stephenborntrager6542
6 жыл бұрын
It matters a lot for procedural generation, as statistical distribution is a huge part of random number generation. It can also be used to approximate various things... replacing physics in some cases. Sometimes called an "analytical" solution, you can see this show up on some games oceans, etc. The ocean is based on statistical analysis of real oceans, instead of trying to actually simulate fluid dynamics. I'm sure there are more uses than that, especially outside the game.
@LoudSodaCaleb
5 жыл бұрын
Yeah, that thing he did was called hypothesis testing. That took me a good 20 hours during a single week to figure out how to do it by hand at school. Finding out that it could be done in a minute in excel blew my mind.
@joshelguapo5563
2 жыл бұрын
As a data scientist... it's a LOT. But really, you don't need all the math to do it practically. You really just need to know the basic definitions, and what the test does. And there you go you got analysis. If you're a game dev, assuming you got some programming experience, you can already do a lot of these things in the language R, with very little effort, and even very easily build some machine learning models.
@jonaza2105
2 жыл бұрын
I essentially got most of this stuff during my semester of statistics class. As he said, he pretty much blazes through it, you mostly need time to understand when what is used, why to use it, what the downsides of using it are, etc and lastly of course, HOW to use it.
@maximeflageole7705 жыл бұрын
More interesting and useful presentation about statistics I've ever watched.
@KillerBearsaw2 жыл бұрын
Absolutely fantastic presentation, would love to hear him speak more
@Vospi6 жыл бұрын
As an educator and a grateful listener: that was bril-li-ant.
@ZZaarraakkii2 жыл бұрын
@12:53 A thing to note is that in that example, people have been playing the "hard" puzzle before and the "easy" puzzle is a novelty, which may cause players to spend more time on it for the experiment, without it being the better solution long term.
@PR-cj8pd
9 ай бұрын
Eiði
@jonasnockert4 жыл бұрын
Love this talk! I spent quite some time trying to derive the 8.14 confidence interval in the first example and finally had to install Excel to verify. I couldn't see it at first but the slides actually mix five and six observations. At ~7:38 there are five observations. At 8:19, the confidence interval is calculated using six observations, i.e. T_DIFFMEANS(A2:A7, ...) rather than the 2 x 5 observations shown on the left.
@zikarisg90252 жыл бұрын
Excellent, used this to explain the p-Value to some colleagues, since our data science team is not able to explain their models that well...
@robelbelay40652 жыл бұрын
Great talk and amazing delivery :)
@ArsenicDrone2 жыл бұрын
While I wouldn't do everything identically, I didn't have any large complaints, which is not generally what happens listening to quick statistics intros. A good talk.
@lookatnow57306 жыл бұрын
Wonderful talk
@KrossX6 жыл бұрын
Happy new year!
@Joeofiowa6 жыл бұрын
Absolutely brilliant.
@hamsandwich7804 жыл бұрын
One of the best explanations of the T test I have ever seen, read, or perceived in any medium.
@phillipA1234 жыл бұрын
a semester of stats in 30min. thanks guy.
@lan1ord Жыл бұрын
The first talk where I needed to decrease the playback speed instead of increasing. Great material! =)
@elizaknight69806 жыл бұрын
This is enjoyable, thanks :)
@TomiTapio4 жыл бұрын
Worth a listen.
@RglMrn7 ай бұрын
Incredible talk. Thank you so much!
@Fmlad2 жыл бұрын
Incredible talk
@kaloqnchyyy2 жыл бұрын
the best presentation I have ever seen
@Brindlebrother2 жыл бұрын
People are awful at five-star ratings whether that be a game, book, movie, show, item, etc. Basically, people will give 4-5 if the product was at all fun or engaging, or a 1 if there was a problem/complaint/issue or any offense taken. Good video. Statistics are fun.
@SakuraWulf
2 жыл бұрын
Chik-fil-a is not a five-star establishment, people >_>
@JohnDoe-mx1sq2 жыл бұрын
This video has existed for almost 4 years and it feels like not a single game dev has ever watched it. Their sales division has warehouses of supercomputers simulating human brain functions trying to figure out how crap a game can be before you will buy it, and just how much you will spend on DLC just to play the game at all.
@Adaministrator3 жыл бұрын
excellent talk
@jonwatte42935 жыл бұрын
"p values" aren't just complicated; they're a root cause of reproduction problems in studies with small sample sizes, and a general frequentist foible. Bayesians of the world, unite! (Interestingly, the "pick sub-samples" illustrations could lead to an IMO much better solution!)
@hamm8934
2 жыл бұрын
Bayesians can play around with their Bayes factors all they like, but at the base, they’re still operating under a frequentist model if theyre gunna do any form of null hypothesis testing. Without a criteria to reject the null (p val), you can’t falsify a hypothesis. So collect all the data you want and build up those Bayes factors, but you’re not escaping the problem of induction. :) Frequentists of the world, unite (and not be undermined by a single black swan)!
@jonwatte4293
2 жыл бұрын
@@hamm8934 The belief that you can "reject" the null hypothesis based on a single yes/no measurement IS THE PROBLEM. (Sorry, got a little loud there.) Look at the PDF. Draw conclusions about underlying behaviors. Make better predictions and test again. Do not pretend that "there's a 96% probability in this case" and "there's a 94% probability in this case" are vastly different, binary outcomes.
@hamm8934
2 жыл бұрын
@@jonwatte4293 what statistician or scientist worth their salt believes that a single positive or negative outcome is sufficient? That’s a bit of a straw man. Of course you either (1) directly replicate the result or (2) perform an extension with a different operationalization of the same hypothesis. If it isn’t replicating approximately 95% of the time, it’s quite safe to say the effect isn’t there (assuming adequate power). If it is replicating approximately 95% of the time, it’s quite safe to say the effect is there. The point I (and other frequentists) make is you have to have a criteria of falsification for null hypothesis testing. If you don’t, the very logic of hypothesis testing collapses as you are no longer able to discern a success from a failure. You have to make a judgement call for null hypothesis testing to exist. This whole notion that Bayesian stats somehow avoids or overcomes this judgement call is a complete failure to acknowledge that you are still making a judgement call, just with a different threshold. (See chp 1 and 2 of The Logic of the Scientific Discovery). Get those Bayes factors as juicy as you want. It just takes 1 falsification for them to be undone. We’ll see which method is more fruitful :)
@neur0leptic782
2 жыл бұрын
@@hamm8934 bruh I feel like you're still being incredibly disingenuous about this whole thing. The key issue with NHST is that a p-value *only* tells you p(Data | H0 = TRUE)-that's it, full stop. The far more interesting question is p(H | D), and that's entirely beyond the realm of classical frequentist methods. 'Rejecting the null' with p < .05 doesn't mean that there's a 95% chance the null is indeed false, or that the alternative is actually true. What we should be doing is systematically pitting models against each other, and this, I think, is something Bayesian methods are exquisitely well-suited for. And sure, there are some rules of thumb when you're doing Bayesian model comparison and trying to figure out how 'meaningful' the difference between models is, but it's a laughably false equivalence to say that the process of multi-model inference (literally comparing the evidence in favor of competing models) is anything close to a binary NHST decision based on differences in means or a correlation. Not to mention you can compare models based not only on the parameters you include, but on your priors, or the underlying likelihood function... Shit, you don't even need to use Bayes Factors-it's super trivial to compare models via their posterior predictive densities using Bayesian cross-validation with PSIS-LOO. All of this ranting is basically just to say that 'all models are wrong, but some are useful'-and I think if we really want to find the best models that explain (or even better, can *generate*) our data, you're gunna have a bad time with frequentist NHST.
@PrimerBlobs
2 жыл бұрын
@@neur0leptic782 Preach
@perfectloveweddings5 жыл бұрын
You talk exactly like Jesse Eisenberg from the Social Network when he's coding. It's fantastic.
@franksonjohnson4 жыл бұрын
Watched the Spiderman talk then this one. Just, damn, passion. Awesome.
@inguanara6 жыл бұрын
that was awesome
@FreekHoekstra6 жыл бұрын
at 19:00 mins, do you care about the median? I think thats a rather brazen assumption! sometimes its better to have some people who are really invested and really care, and thus are willing to spend on your product, rather then a lot of people that will play for free but don't care enough to spend money, or come back repeatedly. Great talk overall though!!
@AngleSideSideThm
4 жыл бұрын
This depends on assumptions; the assumption here probably is "I am optimizing my game for ability for at least most of the initial group to make it through".
@Aidiakapi
4 жыл бұрын
That wasn't actually the point he was trying to make. Especially with a small sample size, outliers greatly skew the mean. As for the point of a few dedicated players willing to spend money, that only works if it's a game that does not depends on having an active online community.
@stuartconrod8364
2 жыл бұрын
I know I'm really late to finding your comment, but I thought the same thing! Also, Mark Rosewater (of Magic: The Gathering) has a presentation on KZread about Game Design and in HIS opinion, that highly polarized distribution is better. It's better to make something that SOME people love even if some other people hate it, instead of something that everyone gives a 'meh' to. In game design, I think it's the difference between "cult classic that some people love and play forever" and "totally forgettable game that disappears in two weeks". If at least some group loves it, it can spread by word-of-mouth and certain reviews. Provided your budget was appropriate to build a niche game, you can have a success... while some game that everyone merely tolerates probably makes no impact and loses money.
@FreekHoekstra
2 жыл бұрын
@@stuartconrod8364 exactly :) Better to be hated by 90% ignored by 5% and loved by 5% Then hated by 20% ignored by 80% and loved by none. Who is going to spend money on a product they don’t love when they have so many alternatives. Plus all those haters are free press too! I think we should lean into the fans more, look at dark souls, its brutal unforgiving and very niche, but clearly doing fine. League of legends, unforgiving, brutal player interactions, but doing fantastically well. Counter strike, same thing. Yes i do think we should keep games accessible, but Not at the cost of what the fans love. I think for example what halo infinite is doing is great, bringing back bots to practice offline before going into the fray. Allows the multiplayer to be as cutthroat and great as it always was, not with unlockable weapons thate give you an edge at the start of the round, No everyone starts with the same weapons, and you need to earn and fight over better ones, so its a true skill matchup. Thats why ists so unforgiving to new players, but also why its so incredibly good.
@ferinzz
2 жыл бұрын
@@FreekHoekstra it really REALLY depends on how you make money off your game. If it's some recurring revenue, then you need to retain a decent number of players. If it's a game which has interaction between users, then you need a decent player pool. If it's a one-time purchase, you can keep it mediocre across the board. If it's for e-sport publicity, you better make that as balanced as possible. Make the goals easy to understand and controls simple enough to get players pouring in. Overall, no matter the game, a larger pool of players will bring more potential spenders, and of those players only 20% of them will be providing your entire income. Money keeps a business going. So making a game for only 2 people is a ridiculous endeavor unless each piece of content is a guaranteed buy and they cannot continue into the new 'season' without making their purchases... Though if only two people are playing they'll need to be spending hundreds of thousands each time you release content. in a free to play game competition drives purchases. You need some fodder for the big spenders to show off their purchases/power to, or they have no reason to buy the newest released item/cosmetic the day it comes out.
@LoudSodaCaleb5 жыл бұрын
His style reminds me of the professor that made me fall in love with stats.
@CineGoodog2 жыл бұрын
I took an entire statistics course on college and I can remember almost everything he said
@summonsays26102 жыл бұрын
God, statistics is why I can't ever tell anyone I am sure of something. "Hey does this code work like X?" "Well, I was there during requirements gathering, I wrote the code, deployed it, and no one has changed it since. So I think so!" "Yes or no?" .... uhhhhhh
@yottawatts94702 жыл бұрын
I didn't even watch this but scrolled through a few times and could tell this is an amazing presentation. Will watch later bravo.
@simlife445
2 жыл бұрын
it is.. but its is not its a video on how poor ftp gamers flock because lack of money... and how to get them to spend more.... and about how bad ssd are... in 2016 but are now 40-60% cheaper per gigabyte and much much faster... bravo to skipping the description and basic computer imp in the last 5-6 years....
@yottawatts9470
2 жыл бұрын
@@simlife445 Moron alert. You confirmed that it is indeed a good presentation then went on some personal rant of the content you didn't like? I don't give a damn sheesh.
@raventhorX6 жыл бұрын
this guy is my new idol lol.
@SomethingEternal2 жыл бұрын
24:14 I dunno, I love my coffee black and I think that study has a point xD
@aakk1000114 жыл бұрын
20:12 When you say Fred being right is 3%, but we are using a two-tailed test. I think the conclusion should be Orange version is different than the old version, it's either better or worse.
@dominicparker61242 жыл бұрын
how he answered that first question was amazing, you can see he knows his shit.
@ArneBab2 жыл бұрын
Actually your boss wants to know how large the probability is of being wrong: that you pay more than you save. So you want the t-test of the SSDs compared to (HDD minus the time difference needed to pay for the SSDs). You’re not below 0.05 for that with your 4 runs, so your boss cannot not sure enough that she’ll be right. But that’s nitpicking and I really like your video :-)
@gabrote422 жыл бұрын
I always tell people that basic statistics and sourcing should be taught at age 11. Would reduce the number of no-argument-freds and would reduce the fake news plausibility rate
@andrewneedham32814 жыл бұрын
It was great, right up to the "always use the 2-tailed value." Tons of circumstances where it's better to use a one-sided t-test.
@davidfoley8546
2 жыл бұрын
In fact, his own first example should have been a one-tailed test.
@richardsejour7731
2 жыл бұрын
Under what circumstances would a 1 tailed test be more useful? The 2 tailed t test is more stringent and the test statistic will tell you the direction of the effect.
@andrewneedham3281
2 жыл бұрын
@@richardsejour7731 A 2 tailed test splits your significance level on both tails, so it's only half as strong as a one tailed test when showing a difference between groups IN A SPECIFIC DIRECTION. Frankly, a 2-tailed test is a sloppy but acceptable way to test, but it really shouldn't be used when you have a specific direction of difference between the groups in mind. A 1 tailed test has more power at the same alpha level. It's basically weakening your hypothesis to hedge your bets by using a 2-tailed test when you should be using one. That's why I don't like this lecture. It's a computer programmer with a SINGLE statistical tool he knows, so everything looks good to apply that tool on. It's like that old adage that if you have only a hammer, everything looks like a nail. If he were a statistician, he'd know better. But he's sitting there spouting off like he does, when in fact he's dead wrong.
@richardsejour7731
2 жыл бұрын
@@andrewneedham3281 I didn't interpret it as such. He was trying to create an antagonistic force between him (supporting the alternate hypothesis) and the...wtf animal that was (supporting the null). He was more interested in testing the efficacy of the ssd which is tested using a 2 tail test. The 2 tailed test is more conservative in general and will give you the direction of the effect which is why it's generally more preferred. One sided tests are rarely used, and are often associated with p hacking because there is rarely a scenario to assume the directionality of effect. Yes the one tailed test gives more statistical power, but that's only if you are certain that you won't see any effect in the opposite tail, which is incredibly rare. His wording was off because he should have never assumed that the ssd can only be better or the same, when the ssd could be worse. However, his approach to use the 2 tailed test was spot on for this type of question.
@andrewneedham3281
2 жыл бұрын
@@richardsejour7731 Sure. I never said that he shouldn't use a 2-tailed test in that situation. I merely said that it's foolish to say "Always use the 2-tailed value." Edit: In science, if you have a hypothesis, your hypothesis generally has directionality to it, or you've written a piss-poor hypothesis. So, frankly, I'm often using 1-tailed tests to show that X is strictly less than/strictly greater than, on some real life data, such as, "Are female babies truly smaller than male babies?" or "Did the biodiversity index for the Upper Nooksack area truly increase due to our conservation measures?" In those cases, as a scientist trying to get published in a peer reviewed paper, I'd get laughed right out of publication for trying to use a 2-tailed test in those or many other situations where I find myself relying on statistical inference. Just saying.
@jarrakul2 жыл бұрын
Very good talk, even I'm kind of screaming at the use of p-values as "the chance that Fred is right." But you clearly know that, and are simplifying because p-values are confusing and don't actually measure quite what we use them to measure. Which is a good reason to switch to subjectivist statistics, but you can hardly explain how to responsibly use priors in a 30-minute talk.
@buttonasas Жыл бұрын
Hours played for different versions being radicalised is pretty normal and there are often very good reasons for that because games have lots of humps or steep curves or brick walls. There might be something _terribly_ wrong in the tutorial that makes x% of people just not get past that. And, honestly, I prefer 20% of players go "this is amazing" and the other "bad game" than everyone saying it was "just ok".
@julio1148Ай бұрын
Great intro, but as an artist, I WISH it took a year to be Rembrandt lol Great talk too!
@stefanomaggio51093 жыл бұрын
pls tell me the name of the book where i can find all this shit in detal specifically applied for game cases
@xGriffy932 жыл бұрын
But Fred didn't hypothesise that SSDs don't make any difference to build times, he was questioning the return on investment the SSDs would bring. Or am I off the mark here?
@zacsnowbank7632
2 жыл бұрын
He needed to prove SSDs had any improvement at all first. After that he had a good idea on how much it improved, and eventually he proved Fred right. It would take too many daily builds for SSDs to be worth it. But before that, he needed to know what the difference even was, and after that he used a simple formula to see how much money it saved. Poor Fred just had some words put in his mouth to make the presentation go a little smoother at the beginning.
@donanderson3653
2 жыл бұрын
To be fair, that wasn't daily builds, it was total builds, since SSDs are a one-time investment. Getting even the lowball estimate of 210 builds out of the lifetime of the SSD is probably easily achievable, so SSDs would be a worthwhile investment.
@nlb137
2 жыл бұрын
He covered that briefly with the discussion of dev time cost and how many builds you'd have to do for the SSD to pay for itself. You have to have a null hypothesis to test, and "X isn't worth it" isn't possible, IIRC. It's been a while, but I think your test *has* to basically 'touch zero'; either x=0, x>0, etc. An "even if does save time, does it save *enough* time" hypothesis requires a test that is basically "is x >= y" (where y is the 'threshold' where SSDs pay for themselves). It's either easier to first prove that there *is* a time difference, then calculate the 'value' of the time difference, or it's not even possible to do it the other way (or at least not with 101 statistics).
@tomasxfranco
2 жыл бұрын
@@donanderson3653 Also, SSDs can speed up OS and App boot times as well as many other tasks, so it's ignoring a lot of the other benefits they give.
@iwersonsch51312 жыл бұрын
23:17 That's 45 two-sided tests so you go look for p values below 0.00056. That gives you a 5% false positive rate overall, but I can tell you that you're almost guaranteed to find a true positive unless the classes are carbon copies of one another
@droidBasher
2 жыл бұрын
That works if you want all 1 vs 1 fights to be mostly fair. Think of something like Street Fighter where you can't change your character mid-match. A rock paper scissors relationship would be fair but then if you are playing rock and the opponent is paper then the match isn't a good test of skill, the game was over at the character select screen. Depending on your context (something like Team Fortress or StarCraft) you might need to instead find the Nash equilibrium to make sure all units have their niche. But looking purely at win rates might mislead you if your player base is not playing optimally. Even if you can trust your win rate statistics, finding the Nash equilibrium is NP complete, meaning that each new character class exponentially increases the complexity of the problem. And there's probably units like the SCV where the kill death ratio is exceedingly bad but you can't win without them because their role is non-combat. Or a unit like the carrier (maybe? I'm not a pro) that isn't resource efficient but is a way to force the game to end if you are already ahead in resources and tech. If that's true and you analyze the carrier per unit, it might look overpowered, if you look at it per resource it might look underpowered, but it still has a niche. I guess that all I'm saying is that it's a hard problem, and game theory might be useful, but could still be difficult to apply if you have a game that is interestingly complex.
@alfredoeleazarorozcoquesad29882 жыл бұрын
Hi! Great talk thanks!! a QUICK TIP for A/B testing! (I'm economist) You could randomly choose who goes into experimental/control group :) That way you don't have to switch, you just have to apply the procedure to many people once, like this: 1) New player enters 2) You generate a random number (between 0 and 1 can be) 3) is it geater than 0.5? experimental, no? control 2) register their group and their target number :D Even if they play only once (you don't need multiple rounds), you can compare the means between those groups ;) Thanks again for the talk!
@mrichards2 жыл бұрын
Wasn’t he wrong in choosing two tailed t-test? Since he is testing whether SSDs are faster, not just that SSD load times come from a different population than HDD’s
@davidfoley8546
2 жыл бұрын
Yes.
@ArsenicDrone
2 жыл бұрын
Fair question. His reasoning was pretty sound. He would want the one-tailed t-test if it were a safe assumption that SSDs are always either faster or the same (an assumption about the underlying distribution). Making that assumption (which is a bad assumption) is not the same as being mostly interested in finding out if they are faster (which is valid, but does allow for them being slower). His test concluded that they were different distributions, and he could also see that the difference was to SSDs' benefit.
@mrichards
2 жыл бұрын
@@ArsenicDrone The boss was specifically asking if SSDs were worth it (i.e. sufficiently faster that their mean speeds come from a different, faster, population than HDD mean load speeds). Wouldn't it be a mistake to intentionally test a broader hypothesis than you require just to verify your actual, narrower hypothesis by observation at the end?
@ArsenicDrone
2 жыл бұрын
@@mrichards Ah, one of many not-so-intuitive things about statistics. It really comes down to only making the assumptions that you can justify. What the boss was interested in doesn't determine what's possible to test or what assumptions are valid. Notice that his p-value is half as large for the one-tailed test (the result is even more significant). The test got substantially more powerful, but that power doesn't come for free, it comes by making this unjustified assumption. (It's not justified because before he runs the test, he really doesn't know which outcome will happen, and it could actually be slower.)
@davidfoley8546
2 жыл бұрын
@@ArsenicDrone No, he really is mistaken. Whether or not it is a safe assumption that SSDs are always faster is actually irrelevant. What is relevant is that the hypothesis he's testing is a one-sided hypothesis--that SSDs are faster. If he had measured SSDs to be slower, by any magnitude, the hypothesis would have been rejected.
@YT7752 жыл бұрын
@15:30 "As opposed to 20 to 22", doesnt he mean 21% instead of 22% or am I missing something?
@dezimal9143
2 жыл бұрын
If you have 20% of something... let's say all IBM shares, and you increase your holdings by 5% = now you have 22%. But when you say you have increased it by 5% percentage POINT you went from 20%=>25%.
@YT775
2 жыл бұрын
@@dezimal9143 I bamboozled myself. meant to say 21% sry. How is 5% of 20 = 2 ?
@dezimal9143
2 жыл бұрын
@@YT775 Actually it isn't 2% I didn't check the math xD. And you are right it should be 21 vs 25%.
@YT775
2 жыл бұрын
Thanks, so I guess theres no hidden meaning, it was just a minor error/inaccuracy of the speaker. :)
@KHamurdik4 жыл бұрын
I feel educated
@laureven2 жыл бұрын
Gold on KZread :)
@jerrygreenest2 жыл бұрын
12:37 negative less? Wait, that's more!
@GameTesterBootCamp11 ай бұрын
As a math dummy, this talk make my brain implode.
@tanagato37212 жыл бұрын
Damn, I'm not a game developer. I have never googled this topic. I just wrote down the idea of a some computer game that accidentally came to mind and described the game mechanics in the note app on my android smartphone and youtube immediately recommended this video to me. Coincidence? Now I do not know whether it is good or bad...
@joshuahaag76444 жыл бұрын
nice
@Weckacore6 жыл бұрын
This is probably very helpful, but just forget everything he said if you're taking a class on stats... EDIT: This does an amazing job of teaching intuition and importance, good talk
@Alex-re3qm2 жыл бұрын
This kinda stuff is what game dev tycoon is missing
@roeyshapiro48783 жыл бұрын
Did anyone else look at the picture of Rembrandt that he had up there and think that it looked peculiarly similar to him?
@slavskee3 жыл бұрын
God - like speaker
@georhodiumgeo98274 жыл бұрын
This makes me so happy! Great talk I learned a lot. We had 100 barrels at work that were documented to have 50 kg in each. You could quickly tell none of them were empty and it looked like our written inventory was close. The account (not my boss) told me to measure all of them to see how accurate we were. I measured 8 and calculated the standard deviation. Jokes on you I’m not going to break my back and work my ass off to learn something I already know. I’m sorry if you don’t understand what I’m doing I’ll send you a Wikipedia link after I’m done.
@AdrianTache2 жыл бұрын
Statistics are a fun way to compare datasets but unfortunately sample size and methodology usually mean that whatever conclusions you draw might be completely irrelevant. And as he's saying, the more questions you ask, the more likely you are to be completely wrong.
@lushen9522 жыл бұрын
Problem with your cupcake mode example. Making the game easier may have a positive impact in the short term and may have a negative impact long term. Short term statistics can only measure short term results.
@jacobb5484
2 жыл бұрын
The test was simply to determine whether difficulty had an effect on time played in either direction greater than the margin of error for the sample size. These are great as backup tests to ensure the results aren't just a fluke without a unreasonably large sample size.
@lushen952
2 жыл бұрын
@@jacobb5484 Doesn't matter. If I'm a tester and only testing the game for 10-15 minutes if its too hard I'm going to report that it's too hard. If the game gets made easier and released and I pick it up and find that 30 mins in it's too easy, I'm going to get bored and quit. I think he oversimplifies the situation.
@jacobb5484
2 жыл бұрын
@@lushen952 Its a simple example of a T test on a paired sample. this isn't for small engaged focus groups with detailed subjective data, but rather big data statistics such as the example of a sub mode being beta tested. The situation in this example the T test gives a percentage chance of either: A. The change had the effect of either increasing OR decreasing what's being measured by a notable amount. B. the data is probably skewed due to bad sampling and falls within the margin of error. once you rule that out, you can make further changes and run detailed tests to actually make an improvement.
@neruba21732 жыл бұрын
Ill throw a question out of fashion this days. How many players are having fun with my game, and thus, eager to buy anything at all at my shop.
@MrDavidCollins
2 жыл бұрын
If your game has a shop you've already failed.
@drumer960
2 жыл бұрын
@@MrDavidCollins that's just objectively wrong lots of incredibly good and fun games have shops
@motbus3 Жыл бұрын
With moderate power comes moderate responsibility
@QuietSnake-xs5vx3 жыл бұрын
I understood only half....need to brush up on my probability
@brandonwilbur21462 жыл бұрын
Okay KZread recommendations, I clicked it.
@nimm902 жыл бұрын
I still have no idea how Fred is not convinced with an upgrade that generates $34 per 100 players of profit.
@gabrieldta3 жыл бұрын
Sony is following the monocle example right now by giving 10USD credit to random accounts. Fo'sure that's Sony's ulterior motive: Measure how much more likely people ate to engage into the store and (if they're lucky) top that 10USD to buy more expensive games... =)
@garryiglesias40742 жыл бұрын
14:36 - Historical and linguistic horror: Sans-culottes MEANT with pants...
@Parker--2 жыл бұрын
17:27 Watching him shit on gullible health journalists in the COVID timeline. It's like he knew.
@Hlkpf6 жыл бұрын
super cute!
@Preaplanes2 жыл бұрын
Guy dismissed me in the first 21 seconds. Won't pretend I'm not tempted to continue watching. Statistics as a science (rather than bad statistics as a political tool) is the only kind of math I can say I greatly enjoy.
@IntrusiveThot4202 жыл бұрын
Any presentation that's got a 538 joke in there is a good presentation
@andrewcamden2 жыл бұрын
More often than not the data you do NOT have is more important than the data you do have. For instance, I and probably millions of other people didn't buy Dead Space 3 BECAUSE it was infested with microtransactions. There is no data for that though since a lost sale literally doesn't show up on the balance sheet. Game devs who decide NOT to "leave money on the table" by making real games without microtransactions are actually leaving a great deal of money on the table in lost sales for which they don't have any data. Game devs need leadership, empathy (essential for understanding customers even if you have no moral concerns whatsoever) and common sense to make good decisions. There isn't any amount of data that can substitute for these attributes.
@frostknight76872 жыл бұрын
Send this to chris
@Brodysseus1132 жыл бұрын
Something I'd like to add to the graph at 19:00, the blue analytics are healthier because it produced a stronger reaction. Those are the people who are willing to put money into your game.
@lmartinson69632 жыл бұрын
I'm pretty sure people who drink their coffee black being sociopathic is entirely factual
@Daniels2l3 жыл бұрын
If I were this guys boss id be like OK, If i buy the f*ck*ng SSDs will you shut up?!!
@Intrexa2 жыл бұрын
"The relative risk of somebody in control group b buying pants.." Relative risk of buying pants? What are you making these pants out of?
@tach5884
2 жыл бұрын
Kryptonite.
@PoppyGaming432 жыл бұрын
youtube: *recommends me this video* me, who's literally never gonna use any of this: *interesting*
@poyi10132 жыл бұрын
I’m more confused after the video~~~
@CraigNicoll6 жыл бұрын
Holy sh*$ a math talk I could ACTUALLY follow! BEST math GDC talk of all time. *awards him Golden KZread User ClickedCookie.
@anonymoususer35612 жыл бұрын
This could have been half as long, probably
@yungthunder26812 жыл бұрын
If you're a game developer, and didn't take AP statistics, please tell me how you became a game developer?
@jacobb5484
2 жыл бұрын
lots of practice by making mods, level design, digital modeling, etc.?
@MrDavidCollins
2 жыл бұрын
I took statistics and didn't become a game developer (at a company, I just make it all myself now). College costs too much
@Skronkful2 жыл бұрын
The only thing that made me cringe was when he said people should ignore the one-sided p-value, when his example (and most things you'd want to test in real life) is a one-sided hypothesis. It's not necessarily that we assume/know that SSDs are faster, it's that if we find that SSDs are significantly slower, we shouldn't be rejecting the test. He is actually doing a test of size 2.5% instead of 5%.
@f.p.5410
2 жыл бұрын
If only that was the only cringe part... His explanation of p-values is statistical illiteracy 101. I was really surprised when I heard him making that mistake, interpreting p-values as Pr(H0). I thought that this statistical concept entered pop culture (kind of like "correlation is not causation" already did)... Amazing that people like him have the confidence to give talks on statistics.
@richardsejour7731
2 жыл бұрын
You can tell the direction and magnitude of the effect using a 2 tailed t test. The limitations of a 1 tailed t test is that it is less stringent and in most cases you don't have any justification to assume that an effect is greater or lower.
@richardsejour7731
2 жыл бұрын
@@f.p.5410 what did he say wrong? I missed it.
@f.p.5410
2 жыл бұрын
@@richardsejour7731 he always says that the p-value is the probability of the boss (can't remember the name) being right. In other words, the probability of H0 being true. So that p
@richardsejour7731
2 жыл бұрын
@@f.p.5410 I see what you mean. In a lot of schools and even on some online sources, the p value is described as the probability of incorrectly rejecting your null hypothesis, which some people can misconstrue as "proving" whether the null is correct or not. In practice, you can never really prove a null, or really any hypothesis. I think that his attempt at making the presentation lighthearted muddied the waters a little, but the general premise is still there. Speaking of which, I was sloppy in my wording. The 2 sided test is more conservative since the cutoff is 0.025 in either tail instead of 0.05 in 1 direction. Usually, when a two tailed test is statistically significant, a one tailed test is significant in that direction, but there are many cases when a one tailed test is significant, but a two tailed test won't be significant. There are very few circumstances to assume directionality before doing a statistical tests, which is why one sided tests are so rare. Most researchers will just do a 2 sided test to be conservative and then calculate effect sizes, and will consider one sided tests as p hacking. For the research question that he was asking (the efficacy of the ssd) he can't assume that the ssd can only be better out the same, when it's entirely possible that it could be worse. From his presentation of the question, he understands this, but used sloppy wording to make things simple. Regardless, the 2 tailed test is the best approach here.
@tomasxfranco2 жыл бұрын
Not that it's the point of the presentation, but this misses the other marginal benefits of working on SSDs all the time, not just in builds. Additionally, if build time doesn't change when moving to SSDs, then the bottleneck is elsewhere and could be tackled via a different component or algorithmic improvement.
@simlife445
2 жыл бұрын
or that his is 5 year old gdc session(read the discription) so this data is insanely old ssd are 60% cheaper per gig and much faster
@13b78rug5h
2 жыл бұрын
Yeah and long build times is actually one of the biggest blockers to ci/cd, which the lack of is usually the best indicator of long lead time which is the best indicator for slow development trapping more resources inside the system, increasing the number of bugs, less feedback, less data, less experimentation and less revenue. Overall meaning slower delivery and lower quality product and/or requiring more resources to deliver. And in the end you should not generally build on your local machine but do it automatically on a build server.
@mano_lamancha47162 жыл бұрын
The question posed in the thumbnail says everything about why video game quality has plummeted in the past decade.
@crg78lf2 жыл бұрын
Dont forget: If you buy SSD to improve build time make sure to put your SWAP memory on the SSD. If you don't have a lot of RAM the extra memory used by the compiler/linker will then go on the ssd as well, drastically improving your build time
@jbluepolarbear2 жыл бұрын
I think the hard drive example falls flat. no other data other than building times doesn't show anything of importance. analyzing the stages of the build would be more beneficial. It may have shown results that point to an issue in the build step as opposed to hdd vs ssd performance.
@sirnathan84176 жыл бұрын
Arms lol
@mayonaise0002 жыл бұрын
Stats are my anti-drug
@13b78rug5h2 жыл бұрын
The only thing you save with faster build times isn't less time it takes. You make builds less often the more time it takes, it increases your lead time from feature idea to a working feature, therefore trapping value inside the system, slowing down the feedback of data or in some cases revenue. Also opening up your project and files or whatever decreases developer productivity and gets on their nerves. But in the end, all this is a false dichotomy as you should have a build server that does all the builds automatically and not rely on local manual builds. Continuous integration and delivery are a cornerstone of all high performing engineering cultures for a damn good reason.