Will Polars replace Pandas for Data Science?

Ғылым және технология

Polars is a blazingly fast alternative to pandas for working with data in python. I couldn't believe the speed difference
#python #datascience #dataframe

Пікірлер: 187

  • @shlokbhakta2893
    @shlokbhakta2893 Жыл бұрын

    Python devs will use anything but python to make python faster lol

  • @samueljehanno

    @samueljehanno

    Жыл бұрын

    Lmao

  • @bernardcrnkovic3769

    @bernardcrnkovic3769

    Жыл бұрын

    so what? that is the point of python, to be a pretty wrapper around optimized components :D

  • @lukaswalker2342

    @lukaswalker2342

    Жыл бұрын

    ​@@bernardcrnkovic3769 exactly that

  • @grantpeterson2524

    @grantpeterson2524

    Жыл бұрын

    Uh, yeah, exactly. Why is that a bad thing? Python is a lot faster to write, C/C++/Rust (or any compiled language) is faster to run. Most of the time, when I profile, 5% of my code takes up 95% of the runtime. Rewriting that 5% in Rust or C let's me have my cake and eat it too.

  • @shlokbhakta2893

    @shlokbhakta2893

    Жыл бұрын

    @@grantpeterson2524 it’s not a bad thing, just a funny joke because it’s ironic lol

  • @Kim-re7hs
    @Kim-re7hs Жыл бұрын

    Polars performance benchmarks are great + developing roadmap looks promising. Looking forward to your upcoming Polars series 👍

  • @robmulla

    @robmulla

    Жыл бұрын

    Coming soon! Thanks for watching.

  • @chrstfer2452

    @chrstfer2452

    7 ай бұрын

    @@robmulla where's that polars series?

  • @abh1yan
    @abh1yan Жыл бұрын

    Rust is getting over everything guys.

  • @robmulla

    @robmulla

    Жыл бұрын

    True

  • @incremental_failure

    @incremental_failure

    Жыл бұрын

    You could say, everything is rusty. Badum-tssh.

  • @Diabolic9595

    @Diabolic9595

    Жыл бұрын

    Funny joke mate!

  • @_Moonlight_22
    @_Moonlight_22 Жыл бұрын

    Python really do includes the whole north pole😂

  • @robmulla

    @robmulla

    Жыл бұрын

    And all the bears!

  • @user-myraklejnr

    @user-myraklejnr

    7 ай бұрын

    Santa here we come😂😂😂😂😂😂😂😂

  • @jeanchindeko5477
    @jeanchindeko5477 Жыл бұрын

    For Polars to replace Pandas, they have to up they game in term of integration. Pandas is the de facto library for data engineer and data science in Python, meaning tones of other libraries are integrated with pandas (SqlAlchemy, pySpark, Arrow, scikit-learn, matplotlib, etc… basically any Python data engineering and data science libraries have integration with Pandas. And you also have to count all the peoples who knows Pandas, working at making it faster with vectorisation

  • @robmulla

    @robmulla

    Жыл бұрын

    All great points. I think it will take time. But it works well for what it was designed to do.

  • @adrianjdelgado

    @adrianjdelgado

    Жыл бұрын

    Polars also uses vectorization and has a quick and easy way to transform a polars dataframe to a pandas dataframe. In some benchmarks, it is faster to create a pandas dataframe via polars than using pandas directly.

  • @chrstfer2452

    @chrstfer2452

    9 ай бұрын

    It'll take rewriting their interface to match pandas' interface. Then it'd pretty much be a drop-in replacement; Edit, having worked with it, i still think this but the polars interface is better so im thinking it should be a LazyFrame/DataFrame.pdcompat type module

  • @ctm92

    @ctm92

    4 ай бұрын

    Polars need to be a drop in replacement for pandas to be used in the field. Data scientists know how to use pandas and switching over to something other has a steep learning curve and it might not be worth it, especially for a new project

  • @peteredwards7680
    @peteredwards7680 Жыл бұрын

    mfw its the 100th time today that I've seen something "designed for speed from the group up, in Rust".

  • @robmulla

    @robmulla

    Жыл бұрын

    What were the other 99? I want to know.

  • @mannycalavera121
    @mannycalavera121 Жыл бұрын

    Rust allows C like speed without the decades of experience required to write safe and optimised C code.

  • @robmulla

    @robmulla

    Жыл бұрын

    This is an interesting take! I don't know that much about coding in C or Rust but I didn't know that was one of the benefits of Rust.

  • @mickolesmana5899
    @mickolesmana5899 Жыл бұрын

    Waiting for faster version of GeoPandas, Sjoin-ing 100+ rows already took long enough

  • @robmulla

    @robmulla

    Жыл бұрын

    I’ve only done a little work with geopandas but noticed it was slow too.

  • @AgnaldoC

    @AgnaldoC

    Жыл бұрын

    Geo pandas-dask

  • @lequedicatsamarge4228
    @lequedicatsamarge42285 ай бұрын

    I used to be a die-hard pandas-user and just recently switched to polars - I am not going back. It's not just speed, it's data types (ok pandas 2.0 has made huge progress here), syntax, and the kind of no-bullshit-fuckarounds with indices. I fell in love with polars, especially with the now available api to hvplot

  • @JordiRosell
    @JordiRosell Жыл бұрын

    I hope so. Not only for speed, but for code cleanliness.

  • @robmulla

    @robmulla

    Жыл бұрын

    You like polars code style better? I don't know how I feel about all the `pl.col()` it needs.

  • @JordiRosell

    @JordiRosell

    Жыл бұрын

    @@robmulla I think it helps writing in more chained style. I agree that pl.col isn't great. I prefer to use col importing it, but it's not ideal.

  • @avinashthakur80
    @avinashthakur80 Жыл бұрын

    Another library which is blazingly fast because of Rust.

  • @robmulla

    @robmulla

    Жыл бұрын

    What other ones are?

  • @LeNguyen-yj9ol

    @LeNguyen-yj9ol

    Жыл бұрын

    Ruff 😊

  • @cloinca_rpe11

    @cloinca_rpe11

    Жыл бұрын

    WhiteBox Tools as well if you do spatial analysis

  • @LordPompinchu666

    @LordPompinchu666

    Жыл бұрын

    Another library fast in Rust because people never cared to learn C and spoil the shit out of performance. Try to run reverse sqiared root in C vs Rust. You'll face the hard truth: modern programmers are way worse than the older ones, when performance mattered. Doom runs on my fridge. Try to run Rust on your coffee machine... good luck

  • @adrianjdelgado

    @adrianjdelgado

    Жыл бұрын

    ​​@@LordPompinchu666 in the specific case of Polars, one of the main reasons it is faster is because of multitheading. A lot of potential bugs in that realm Rust catches them at compile time. Rust makes writing a multitheading version of pandas feasible. Doing it in C would be a minefield.

  • @eyadamin4089
    @eyadamin4089 Жыл бұрын

    Do you think it will replace pandas ? And do it have the same options as pandas

  • @robmulla

    @robmulla

    Жыл бұрын

    Great question. For some tasks I think it will. It still lacks some functionality like native plotting. Look out for a full length video I’m going to be making about polars soon.

  • @eyadamin4089

    @eyadamin4089

    Жыл бұрын

    @robmulla Waiting for it, all of your videos and lives are very helpful and interesting tho

  • @Linkario86
    @Linkario86 Жыл бұрын

    Polars it is then. I'm relatively new and use Jupyter Notebook but I assume I can just import Polars like Pandas as shown in the video?

  • @robmulla

    @robmulla

    Жыл бұрын

    Yes, I have a longer video where I review polars on my channel and explain. Check it out here: kzread.info/dash/bejne/iHyl0JmulszPnKg.html

  • @Linkario86

    @Linkario86

    Жыл бұрын

    @@robmulla thanks!

  • @thebosscrystal
    @thebosscrystal Жыл бұрын

    Is that an extension that shows the running block and it’s time? (Not the timeit)

  • @KevinWeatherwalks
    @KevinWeatherwalks Жыл бұрын

    How much does the loading of the data contribute to the time?

  • @Neura1net
    @Neura1net7 ай бұрын

    They should have used the syntax of pandas

  • @robmulla

    @robmulla

    7 ай бұрын

    I think they purposefully wanted to be different. There are already a lot of pandas alternatives that don't work too great. Polars is it's own thing entirely.

  • @chrism6880
    @chrism6880 Жыл бұрын

    Ugh I am literally 2/3 of the way through refactoring an old project created by a former contractor where I replaced his list and dict comprehension with pandas...guess I gotta refactor my refactor.

  • @robmulla

    @robmulla

    Жыл бұрын

    Is the main goal of the project speed? If so dict and lists are going to be hard to beat. If not then pandas should be sufficient.

  • @chrism6880

    @chrism6880

    Жыл бұрын

    @Rob Mulla the project compares very large datasets. Since pandas has a numpy backend implemented in c, many of the operations are orders of magnitude faster than using dicts.

  • @giagoskapetanakis6033
    @giagoskapetanakis60337 ай бұрын

    What ide is this?

  • @DeebzFromThe90s
    @DeebzFromThe90s Жыл бұрын

    Alright there are way too many options floating around right now. I spent the last week letting a modest gaming right run 24/7 to convert a bunch of SAS7BDAT files into parquet files because the pyreadstat multithreaded reading in chunks didn't work as expected. For that same dataset which is several hundred GBs in disk size, I have to do some data wrangling and I'm growing ill at the thought of how long it would take pandas to loop through it. Now I either risk learning dask, polars, or maybe even SQLite only to not get the desired results at a suitable speed, or stick to pandas. Thoughts?

  • @robmulla

    @robmulla

    Жыл бұрын

    I agree, it's hard to say what the best option is right now. I think the main question I ask myself is: how fast do I need it to run? and can I do do my computation on a single machine in local memory? - The choice really depends on the answers to those questions.

  • @maskedvillainai
    @maskedvillainai5 ай бұрын

    I love how everyone is lightning faster than the other lightning faster framework lol

  • @fluffyflextail
    @fluffyflextail7 ай бұрын

    Only ones I know, are bidirectionally opposed from each other and stored in one source object

  • @dhaval1489
    @dhaval1489 Жыл бұрын

    I use Polars more then pandas, Polars syntax is much more simple and way faster

  • @robmulla

    @robmulla

    Жыл бұрын

    Nice! I still can't fully move away from pandas, but polars for major data pipelines for sure!

  • @dhaval1489

    @dhaval1489

    Жыл бұрын

    @@robmulla me neither pandas eco-system is much larger and mature, you can always change Polars database frame to pandas and vice versa, so at the end of the day whatever get the job done efficiently should be used.

  • @Xarxes104
    @Xarxes1047 ай бұрын

    Does it matter if youre just running the code once.

  • @alexandrodisla6285
    @alexandrodisla6285 Жыл бұрын

    Polaris can work with pandas beautifully!

  • @syukcode
    @syukcode7 ай бұрын

    You are using Python 3.8.5, what if you use Python 3.11?

  • @naseva9319
    @naseva9319 Жыл бұрын

    For a noob as I am, it takes me 10mn just to import more than 20 modules before actually writing some functions

  • @robmulla

    @robmulla

    Жыл бұрын

    I can relate. Copy/paste can save some time though if you do it a lot.

  • @shivamjha5202
    @shivamjha52025 ай бұрын

    Nice 👍

  • @primary4075
    @primary4075 Жыл бұрын

    In my uni, I'm still using pandas for data science. Not that much different I think for now

  • @orlandogarcia885
    @orlandogarcia8857 ай бұрын

    What about the new versions of pandas ? Specially since 2.0 , it increase its speed ?

  • @priyadarshanmohanty277
    @priyadarshanmohanty277 Жыл бұрын

    Does it have integration with snowflake?

  • @robmulla

    @robmulla

    Жыл бұрын

    Not sure. Good question.

  • @sakatagintoki8835

    @sakatagintoki8835

    6 ай бұрын

    Well snowflake has python api. So you can use it to load data after processing the data using polars.

  • @sourajitpaul9064
    @sourajitpaul90643 ай бұрын

    Why not using PySpark instead of Polar or Pandas???

  • @camus83489
    @camus83489 Жыл бұрын

    interesting wondering how this compares to say pyspark and Cudf

  • @robmulla

    @robmulla

    Жыл бұрын

    Probably depends on the dataset. My understanding is polars can work well for opeations in a single machine's memory, pyspark is more for distributing across many nodes and cudf is fast if your data can fit into GPU memory.

  • @camus83489

    @camus83489

    Жыл бұрын

    @@robmulla ahh cool, interesting, so this polars thing probably best way to speed up data wrangling on a single computer (for at home hobbyists). Another interesting thing would be for df.apply(lambda x: etc) operations - how quickly can polaris iterate through a dataset. I think that would be a huge game changer

  • @brandonrich4956
    @brandonrich4956 Жыл бұрын

    Eventually people will realize that they save more time by just moving to 100% Julia instead of wasting all this time building everything in 1 language to execute it in another.

  • @robmulla

    @robmulla

    Жыл бұрын

    I guess every language is popular in it's own way. I've never learned Julia.

  • @evanshlom1

    @evanshlom1

    7 ай бұрын

    Or you do it all in rust which is better than Julia

  • @PythonPlusPlus
    @PythonPlusPlus5 ай бұрын

    So Polars is like Pandas, but cooler? (pun intended)

  • @soffwhere
    @soffwhere Жыл бұрын

    Super useful

  • @robmulla

    @robmulla

    Жыл бұрын

    Thanks! Glad you found it useful.

  • @sw11500
    @sw11500 Жыл бұрын

    What editor is this?

  • @robmulla

    @robmulla

    Жыл бұрын

    Vscode with the jupyter extension.

  • @geekyprogrammer4831
    @geekyprogrammer4831 Жыл бұрын

    I was using dask earlier

  • @robmulla

    @robmulla

    Жыл бұрын

    Nice! You should try out polars too.

  • @aakashkhamaru9403
    @aakashkhamaru9403 Жыл бұрын

    What ide do you use?

  • @jamesn6458

    @jamesn6458

    Жыл бұрын

    Looks like Visual Studio Code

  • @ElinLiu0823
    @ElinLiu0823 Жыл бұрын

    I'd rather using cudf if gpu available on system,else i will use polars

  • @robmulla

    @robmulla

    Жыл бұрын

    Still need to do more testing with cudf. But it’s fast for sure.

  • @8koi245
    @8koi245 Жыл бұрын

    BLAZINGLY FAST

  • @robmulla

    @robmulla

    Жыл бұрын

    🔥 🚗 🔥

  • @dung-olymzeus
    @dung-olymzeus Жыл бұрын

    where u get the dataset

  • @robmulla

    @robmulla

    Жыл бұрын

    Here you go: www.kaggle.com/datasets/robikscube/flight-delay-dataset-20182022

  • @dung-olymzeus

    @dung-olymzeus

    Жыл бұрын

    @@robmulla thanks

  • @tadamacky
    @tadamacky Жыл бұрын

    What ide is this or notebook or something

  • @skelaw

    @skelaw

    Жыл бұрын

    vsc with jupyter extension

  • @robmulla

    @robmulla

    Жыл бұрын

    Yes! VSCode and jupyter

  • @gc1979o
    @gc1979o Жыл бұрын

    What about dask ?

  • @robmulla

    @robmulla

    Жыл бұрын

    I have an entire video that compares dask, modin, and vaex. Check it out here: kzread.info/dash/bejne/fnmcr7Ohc9mZe8o.html

  • @andreichalapco1446
    @andreichalapco1446 Жыл бұрын

    How many cores in pandas, how many cores in Polaris?

  • @robmulla

    @robmulla

    Жыл бұрын

    Depends on the function you are running. Some pandas functions don't run multithreaded but others do. Polars is completely multithreaded I believe.

  • @user-fv1576
    @user-fv157613 күн бұрын

    Now compare to MySQL and sql

  • @HaunterButIhadNameGagWtf
    @HaunterButIhadNameGagWtf Жыл бұрын

    What's the HW?

  • @robmulla

    @robmulla

    Жыл бұрын

    I show all my hardware in my setup video. But I have a ryzen threadripper with a lot of cores.

  • @HaunterButIhadNameGagWtf

    @HaunterButIhadNameGagWtf

    Жыл бұрын

    @@robmulla thx. Why it's not counted on GPU? Or data mining tasks cannot be accelerated by this, just neural networks itself? I am beginner, bought rtx 3060 12gb for basic tasks. U got link to that video pls?

  • @robmulla

    @robmulla

    Жыл бұрын

    @@HaunterButIhadNameGagWtf You should try out cudf if you want to process on a GPU. It is really fast but requires enough GPU memory for your data.

  • @bocchitherock-ob2bl
    @bocchitherock-ob2bl6 ай бұрын

    a minute of silence for those rustaceans who will use this as an excuse to say Rust is faster than C. (saying that as someone who loves Rust btw)

  • @sitrakaforler8696
    @sitrakaforler8696 Жыл бұрын

    Dam 😮

  • @robmulla

    @robmulla

    Жыл бұрын

    Yea. Pretty crazy. Am I right?

  • @eugenex8892
    @eugenex8892 Жыл бұрын

    Knowledge of pure SQL is much more effective...

  • @robmulla

    @robmulla

    Жыл бұрын

    Could be true but also depends on what you’re trying to accomplish.

  • @AndoroidP
    @AndoroidP Жыл бұрын

    Just use Rust from the ground up. It's not that hard

  • @robmulla

    @robmulla

    Жыл бұрын

    I tried. It was hard. 😂

  • @theLowestPointInMyLife

    @theLowestPointInMyLife

    Жыл бұрын

    Rust is terrible for most things

  • @bhavyakukkar

    @bhavyakukkar

    7 ай бұрын

    also terrible for everything until you get the hang of it

  • @code2compass
    @code2compass8 ай бұрын

    Damn

  • @panda_dva2261
    @panda_dva2261 Жыл бұрын

    Real Data Scientists will wait 10 hours for their Refresh Data in excel. Patience is virtue. All these new scallywags with their tik toks and 5 second attention spans looking for the fastest thing possible.

  • @robmulla

    @robmulla

    Жыл бұрын

    Patience is a virtue!

  • @bhavyakukkar

    @bhavyakukkar

    7 ай бұрын

    patience is virtue = help i can't keep up

  • @bigkatoan5076
    @bigkatoan50767 ай бұрын

    Actually 2.8s and 600ms still the same cus 1 click complete :))

  • @throwaway6288
    @throwaway6288 Жыл бұрын

    Wow 4 times faster!!! 😐

  • @robmulla

    @robmulla

    Жыл бұрын

    I take it you’re not impressed…

  • @NeArMe.
    @NeArMe. Жыл бұрын

    Still new and learning about pandas 😢

  • @robmulla

    @robmulla

    Жыл бұрын

    We all have to start somewhere!

  • @jaybestemployee
    @jaybestemployee7 ай бұрын

    Yeah, learn a new library to save seconds at a time.

  • @donaldli4755
    @donaldli4755 Жыл бұрын

    Short story: no

  • @robmulla

    @robmulla

    Жыл бұрын

    But also: maybe?

  • @blackpilledbuddha4944
    @blackpilledbuddha4944 Жыл бұрын

    Will my boss pay me 4 times as much...

  • @robmulla

    @robmulla

    Жыл бұрын

    Guaranteed!

  • @johannesmphaka7433
    @johannesmphaka7433 Жыл бұрын

    Vaex, is much better. Have a look at it.

  • @robmulla

    @robmulla

    Жыл бұрын

    I made a video about it already. Check my channel for the video about pandas alternatives

  • @johannesmphaka7433

    @johannesmphaka7433

    Жыл бұрын

    @@robmulla Thanks.

  • @stevenpaulsen5975
    @stevenpaulsen5975 Жыл бұрын

    y’all could do this in excel with fast results 😭

  • @robmulla

    @robmulla

    Жыл бұрын

    nooooooooooooooooooooooooooooooooo 😂

  • @pineapple3832
    @pineapple3832 Жыл бұрын

    why do people use pandas or polars when sql exists?

  • @robmulla

    @robmulla

    Жыл бұрын

    There are a few situations where it might be more appropriate to use Pandas over SQL for a particular task: When working with small or medium-sized datasets: Pandas is generally faster and more convenient than SQL for working with small or medium-sized datasets, especially if the data is already in a structured format (such as a CSV file). When the data is not stored in a database: If the data you are working with is not stored in a database. When you need to perform complex data manipulation tasks: Pandas provides a wide range of functions and methods that can be used to manipulate and summarize data in a variety of ways. This can be particularly useful when you need to perform complex data manipulation tasks that would be difficult or time-consuming to accomplish using SQL alone.

  • @pineapple3832

    @pineapple3832

    Жыл бұрын

    @@robmulla yeah okay that all sounds like it makes sense. So basically, SQL is for really large datasets, data that's already in a database, and there's certain "complex manipulation tasks" that can be done in pandas and not sql.

  • @robmulla

    @robmulla

    Жыл бұрын

    @@pineapple3832 You got it. Also data exploration can be much easier when working with the data in the computer’s memory. Check my EDA video for some examples.

  • @Jacob-bn1nj

    @Jacob-bn1nj

    Жыл бұрын

    ​@@robmullaHow would you compare the usefulness to R? Currently in college and I taken a couple courses using primarily R and im split on which of the 2 languages I should focus on

  • @Onrirtopia
    @Onrirtopia Жыл бұрын

    I don't know how python Devs have the face to call anything "lightning fast"

  • @robmulla

    @robmulla

    Жыл бұрын

    Don't gatekeep me bro

  • @Onrirtopia

    @Onrirtopia

    Жыл бұрын

    @@robmulla it's not gatekeeping. Seriously are you dumb? If you want to get into programming, sure, here are some "lightning fast" beginner languages: Lua, Kotlin, Dart, Nim, Go and many more. It's not gatekeeping, it's the truth. Python is slow and nothing made with python should be called "lightning fast" considering the same thing has been created in Go and runs 3times faster. Also, the language for data science is Julia, not python.

  • @robmulla

    @robmulla

    Жыл бұрын

    @@Onrirtopia ok. This package is written in rust with a python api. Most python packages are written on C. Saying python is slow is hilariously ignorant.

  • @Onrirtopia

    @Onrirtopia

    Жыл бұрын

    @@robmulla the API speed still makes it slower than just using a native Go package. Above that, cython (or C-python) is only as fast as the person it's written by. And yes, python is interpeted so if you write C code in python that also has to be interpeted making it slower than any compiled language, again. Stop trying to make up lies just to win an arguement. Python is slow.

  • @robmulla

    @robmulla

    Жыл бұрын

    @@Onrirtopia you started it 😝

  • @LuvxJacqu4li8e
    @LuvxJacqu4li8e Жыл бұрын

    Too bad I'm not into data science... yet or never

  • @robmulla

    @robmulla

    Жыл бұрын

    Come on! You know you want to! 😊

  • @gaurav_r13
    @gaurav_r13 Жыл бұрын

    Sql

  • @robmulla

    @robmulla

    Жыл бұрын

    For databases it’s great!

Келесі