Jes Ford - Getting Started Testing in Data Science - PyCon 2019

"Speaker: Jes Ford
How do you know if your data science results are correct? Robust software usually has tests asserting that certain conditions hold, but as a data scientist it’s often not straightforward or obvious how to integrate these best practices. Our workflow includes exploration, statistical models, and one-off analysis. This talk will give concrete examples of when and how testing should play a role, and provide you with enough introduction to get started writing your first data science tests using `pytest` & `hypothesis`.
Slides can be found at: speakerdeck.com/pycon2019 and github.com/PyCon/2019-slides"

Пікірлер: 9

  • @anantharamaniyer9135
    @anantharamaniyer91352 жыл бұрын

    Very well presented. Many thanks for presenting this, especially the section of testing dataframes, quite clear and succinct

  • @lalligood
    @lalligood5 жыл бұрын

    This talk had me pumped when I got back to work after watching Jes Ford demonstrate the hypothesis library to fabricate a test pandas dataframe. I also think that getting some testing practices into folks doing data science is long overdue. What a fantastic presentation!

  • @erectlocution
    @erectlocution5 жыл бұрын

    Fantastic presentation. Not only is it a nice introduction to testing generally, and then specific libraries, it's also a nice peak at a couple practical analytical methodologies.

  • @mailsiraj
    @mailsiraj3 жыл бұрын

    Fantastic and very practical presentation. I really loved the division of work into 3 buckets - one-off, exploratory and defined work and applying slightly different testing strategies for each one of them, rather than being pedantic about testing. I learnt a number of useful ideas to improve my pandas testing. I am gonna checkout Hypothesis library

  • @Nino234mff
    @Nino234mff5 жыл бұрын

    An excellent talk! A different wording from Jes but I think she would agree with me; data scientists in many situations do two jobs, science and engineering. I myself practice TDD for engineering and quick defensive programming for science. As Jes noted, science part is too exploratory to do TDD, it's just not well suited for TDD. However, you will thank yourself if you write tests for engineering part such as feature engineering.

  • @orianabaldizan8209
    @orianabaldizan82095 жыл бұрын

    Please don't use "asserts" in production code. Try catch and properly handle exceptions.

  • @user-nm6ns2cf6o

    @user-nm6ns2cf6o

    3 жыл бұрын

    Could you tell pros and cons of your opinion?

  • @yangyu7309

    @yangyu7309

    3 жыл бұрын

    @@user-nm6ns2cf6o not OP, but if you run `python -O script.py`, python will skip all assert statements. Some CI softwares I believe runs python in -O mode. -O stands for `optimize mode`. It can also be disabled with the PYTHONOPTIMIZE environment variable. Because assert can be globally turned off, it is not recommended that you use `assert` outside of testing. With `assert` you also wouldn't be able to define your error, since it always raises assertion error. stackoverflow.com/questions/40182944/difference-between-raise-try-and-assert

Келесі