Polars is a blazingly fast alternative to pandas for working with data in python. I couldn't believe the speed difference #python #datascience #dataframe
Жүктеу.....
Пікірлер: 187
@shlokbhakta2893 Жыл бұрын
Python devs will use anything but python to make python faster lol
@samueljehanno
Жыл бұрын
Lmao
@bernardcrnkovic3769
Жыл бұрын
so what? that is the point of python, to be a pretty wrapper around optimized components :D
@lukaswalker2342
Жыл бұрын
@@bernardcrnkovic3769 exactly that
@grantpeterson2524
Жыл бұрын
Uh, yeah, exactly. Why is that a bad thing? Python is a lot faster to write, C/C++/Rust (or any compiled language) is faster to run. Most of the time, when I profile, 5% of my code takes up 95% of the runtime. Rewriting that 5% in Rust or C let's me have my cake and eat it too.
@shlokbhakta2893
Жыл бұрын
@@grantpeterson2524 it’s not a bad thing, just a funny joke because it’s ironic lol
@Kim-re7hs Жыл бұрын
Polars performance benchmarks are great + developing roadmap looks promising. Looking forward to your upcoming Polars series 👍
@robmulla
Жыл бұрын
Coming soon! Thanks for watching.
@chrstfer2452
7 ай бұрын
@@robmulla where's that polars series?
@abh1yan Жыл бұрын
Rust is getting over everything guys.
@robmulla
Жыл бұрын
True
@incremental_failure
Жыл бұрын
You could say, everything is rusty. Badum-tssh.
@Diabolic9595
Жыл бұрын
Funny joke mate!
@_Moonlight_22 Жыл бұрын
Python really do includes the whole north pole😂
@robmulla
Жыл бұрын
And all the bears!
@user-myraklejnr
7 ай бұрын
Santa here we come😂😂😂😂😂😂😂😂
@jeanchindeko5477 Жыл бұрын
For Polars to replace Pandas, they have to up they game in term of integration. Pandas is the de facto library for data engineer and data science in Python, meaning tones of other libraries are integrated with pandas (SqlAlchemy, pySpark, Arrow, scikit-learn, matplotlib, etc… basically any Python data engineering and data science libraries have integration with Pandas. And you also have to count all the peoples who knows Pandas, working at making it faster with vectorisation
@robmulla
Жыл бұрын
All great points. I think it will take time. But it works well for what it was designed to do.
@adrianjdelgado
Жыл бұрын
Polars also uses vectorization and has a quick and easy way to transform a polars dataframe to a pandas dataframe. In some benchmarks, it is faster to create a pandas dataframe via polars than using pandas directly.
@chrstfer2452
9 ай бұрын
It'll take rewriting their interface to match pandas' interface. Then it'd pretty much be a drop-in replacement; Edit, having worked with it, i still think this but the polars interface is better so im thinking it should be a LazyFrame/DataFrame.pdcompat type module
@ctm92
4 ай бұрын
Polars need to be a drop in replacement for pandas to be used in the field. Data scientists know how to use pandas and switching over to something other has a steep learning curve and it might not be worth it, especially for a new project
@peteredwards7680 Жыл бұрын
mfw its the 100th time today that I've seen something "designed for speed from the group up, in Rust".
@robmulla
Жыл бұрын
What were the other 99? I want to know.
@mannycalavera121 Жыл бұрын
Rust allows C like speed without the decades of experience required to write safe and optimised C code.
@robmulla
Жыл бұрын
This is an interesting take! I don't know that much about coding in C or Rust but I didn't know that was one of the benefits of Rust.
@mickolesmana5899 Жыл бұрын
Waiting for faster version of GeoPandas, Sjoin-ing 100+ rows already took long enough
@robmulla
Жыл бұрын
I’ve only done a little work with geopandas but noticed it was slow too.
@AgnaldoC
Жыл бұрын
Geo pandas-dask
@lequedicatsamarge42285 ай бұрын
I used to be a die-hard pandas-user and just recently switched to polars - I am not going back. It's not just speed, it's data types (ok pandas 2.0 has made huge progress here), syntax, and the kind of no-bullshit-fuckarounds with indices. I fell in love with polars, especially with the now available api to hvplot
@JordiRosell Жыл бұрын
I hope so. Not only for speed, but for code cleanliness.
@robmulla
Жыл бұрын
You like polars code style better? I don't know how I feel about all the `pl.col()` it needs.
@JordiRosell
Жыл бұрын
@@robmulla I think it helps writing in more chained style. I agree that pl.col isn't great. I prefer to use col importing it, but it's not ideal.
@avinashthakur80 Жыл бұрын
Another library which is blazingly fast because of Rust.
@robmulla
Жыл бұрын
What other ones are?
@LeNguyen-yj9ol
Жыл бұрын
Ruff 😊
@cloinca_rpe11
Жыл бұрын
WhiteBox Tools as well if you do spatial analysis
@LordPompinchu666
Жыл бұрын
Another library fast in Rust because people never cared to learn C and spoil the shit out of performance. Try to run reverse sqiared root in C vs Rust. You'll face the hard truth: modern programmers are way worse than the older ones, when performance mattered. Doom runs on my fridge. Try to run Rust on your coffee machine... good luck
@adrianjdelgado
Жыл бұрын
@@LordPompinchu666 in the specific case of Polars, one of the main reasons it is faster is because of multitheading. A lot of potential bugs in that realm Rust catches them at compile time. Rust makes writing a multitheading version of pandas feasible. Doing it in C would be a minefield.
@eyadamin4089 Жыл бұрын
Do you think it will replace pandas ? And do it have the same options as pandas
@robmulla
Жыл бұрын
Great question. For some tasks I think it will. It still lacks some functionality like native plotting. Look out for a full length video I’m going to be making about polars soon.
@eyadamin4089
Жыл бұрын
@robmulla Waiting for it, all of your videos and lives are very helpful and interesting tho
@Linkario86 Жыл бұрын
Polars it is then. I'm relatively new and use Jupyter Notebook but I assume I can just import Polars like Pandas as shown in the video?
@robmulla
Жыл бұрын
Yes, I have a longer video where I review polars on my channel and explain. Check it out here: kzread.info/dash/bejne/iHyl0JmulszPnKg.html
@Linkario86
Жыл бұрын
@@robmulla thanks!
@thebosscrystal Жыл бұрын
Is that an extension that shows the running block and it’s time? (Not the timeit)
@KevinWeatherwalks Жыл бұрын
How much does the loading of the data contribute to the time?
@Neura1net7 ай бұрын
They should have used the syntax of pandas
@robmulla
7 ай бұрын
I think they purposefully wanted to be different. There are already a lot of pandas alternatives that don't work too great. Polars is it's own thing entirely.
@chrism6880 Жыл бұрын
Ugh I am literally 2/3 of the way through refactoring an old project created by a former contractor where I replaced his list and dict comprehension with pandas...guess I gotta refactor my refactor.
@robmulla
Жыл бұрын
Is the main goal of the project speed? If so dict and lists are going to be hard to beat. If not then pandas should be sufficient.
@chrism6880
Жыл бұрын
@Rob Mulla the project compares very large datasets. Since pandas has a numpy backend implemented in c, many of the operations are orders of magnitude faster than using dicts.
@giagoskapetanakis60337 ай бұрын
What ide is this?
@DeebzFromThe90s Жыл бұрын
Alright there are way too many options floating around right now. I spent the last week letting a modest gaming right run 24/7 to convert a bunch of SAS7BDAT files into parquet files because the pyreadstat multithreaded reading in chunks didn't work as expected. For that same dataset which is several hundred GBs in disk size, I have to do some data wrangling and I'm growing ill at the thought of how long it would take pandas to loop through it. Now I either risk learning dask, polars, or maybe even SQLite only to not get the desired results at a suitable speed, or stick to pandas. Thoughts?
@robmulla
Жыл бұрын
I agree, it's hard to say what the best option is right now. I think the main question I ask myself is: how fast do I need it to run? and can I do do my computation on a single machine in local memory? - The choice really depends on the answers to those questions.
@maskedvillainai5 ай бұрын
I love how everyone is lightning faster than the other lightning faster framework lol
@fluffyflextail7 ай бұрын
Only ones I know, are bidirectionally opposed from each other and stored in one source object
@dhaval1489 Жыл бұрын
I use Polars more then pandas, Polars syntax is much more simple and way faster
@robmulla
Жыл бұрын
Nice! I still can't fully move away from pandas, but polars for major data pipelines for sure!
@dhaval1489
Жыл бұрын
@@robmulla me neither pandas eco-system is much larger and mature, you can always change Polars database frame to pandas and vice versa, so at the end of the day whatever get the job done efficiently should be used.
@Xarxes1047 ай бұрын
Does it matter if youre just running the code once.
@alexandrodisla6285 Жыл бұрын
Polaris can work with pandas beautifully!
@syukcode7 ай бұрын
You are using Python 3.8.5, what if you use Python 3.11?
@naseva9319 Жыл бұрын
For a noob as I am, it takes me 10mn just to import more than 20 modules before actually writing some functions
@robmulla
Жыл бұрын
I can relate. Copy/paste can save some time though if you do it a lot.
@shivamjha52025 ай бұрын
Nice 👍
@primary4075 Жыл бұрын
In my uni, I'm still using pandas for data science. Not that much different I think for now
@orlandogarcia8857 ай бұрын
What about the new versions of pandas ? Specially since 2.0 , it increase its speed ?
@priyadarshanmohanty277 Жыл бұрын
Does it have integration with snowflake?
@robmulla
Жыл бұрын
Not sure. Good question.
@sakatagintoki8835
6 ай бұрын
Well snowflake has python api. So you can use it to load data after processing the data using polars.
@sourajitpaul90643 ай бұрын
Why not using PySpark instead of Polar or Pandas???
@camus83489 Жыл бұрын
interesting wondering how this compares to say pyspark and Cudf
@robmulla
Жыл бұрын
Probably depends on the dataset. My understanding is polars can work well for opeations in a single machine's memory, pyspark is more for distributing across many nodes and cudf is fast if your data can fit into GPU memory.
@camus83489
Жыл бұрын
@@robmulla ahh cool, interesting, so this polars thing probably best way to speed up data wrangling on a single computer (for at home hobbyists). Another interesting thing would be for df.apply(lambda x: etc) operations - how quickly can polaris iterate through a dataset. I think that would be a huge game changer
@brandonrich4956 Жыл бұрын
Eventually people will realize that they save more time by just moving to 100% Julia instead of wasting all this time building everything in 1 language to execute it in another.
@robmulla
Жыл бұрын
I guess every language is popular in it's own way. I've never learned Julia.
@evanshlom1
7 ай бұрын
Or you do it all in rust which is better than Julia
@PythonPlusPlus5 ай бұрын
So Polars is like Pandas, but cooler? (pun intended)
@soffwhere Жыл бұрын
Super useful
@robmulla
Жыл бұрын
Thanks! Glad you found it useful.
@sw11500 Жыл бұрын
What editor is this?
@robmulla
Жыл бұрын
Vscode with the jupyter extension.
@geekyprogrammer4831 Жыл бұрын
I was using dask earlier
@robmulla
Жыл бұрын
Nice! You should try out polars too.
@aakashkhamaru9403 Жыл бұрын
What ide do you use?
@jamesn6458
Жыл бұрын
Looks like Visual Studio Code
@ElinLiu0823 Жыл бұрын
I'd rather using cudf if gpu available on system,else i will use polars
@robmulla
Жыл бұрын
Still need to do more testing with cudf. But it’s fast for sure.
@8koi245 Жыл бұрын
BLAZINGLY FAST
@robmulla
Жыл бұрын
🔥 🚗 🔥
@dung-olymzeus Жыл бұрын
where u get the dataset
@robmulla
Жыл бұрын
Here you go: www.kaggle.com/datasets/robikscube/flight-delay-dataset-20182022
@dung-olymzeus
Жыл бұрын
@@robmulla thanks
@tadamacky Жыл бұрын
What ide is this or notebook or something
@skelaw
Жыл бұрын
vsc with jupyter extension
@robmulla
Жыл бұрын
Yes! VSCode and jupyter
@gc1979o Жыл бұрын
What about dask ?
@robmulla
Жыл бұрын
I have an entire video that compares dask, modin, and vaex. Check it out here: kzread.info/dash/bejne/fnmcr7Ohc9mZe8o.html
@andreichalapco1446 Жыл бұрын
How many cores in pandas, how many cores in Polaris?
@robmulla
Жыл бұрын
Depends on the function you are running. Some pandas functions don't run multithreaded but others do. Polars is completely multithreaded I believe.
@user-fv157613 күн бұрын
Now compare to MySQL and sql
@HaunterButIhadNameGagWtf Жыл бұрын
What's the HW?
@robmulla
Жыл бұрын
I show all my hardware in my setup video. But I have a ryzen threadripper with a lot of cores.
@HaunterButIhadNameGagWtf
Жыл бұрын
@@robmulla thx. Why it's not counted on GPU? Or data mining tasks cannot be accelerated by this, just neural networks itself? I am beginner, bought rtx 3060 12gb for basic tasks. U got link to that video pls?
@robmulla
Жыл бұрын
@@HaunterButIhadNameGagWtf You should try out cudf if you want to process on a GPU. It is really fast but requires enough GPU memory for your data.
@bocchitherock-ob2bl6 ай бұрын
a minute of silence for those rustaceans who will use this as an excuse to say Rust is faster than C. (saying that as someone who loves Rust btw)
@sitrakaforler8696 Жыл бұрын
Dam 😮
@robmulla
Жыл бұрын
Yea. Pretty crazy. Am I right?
@eugenex8892 Жыл бұрын
Knowledge of pure SQL is much more effective...
@robmulla
Жыл бұрын
Could be true but also depends on what you’re trying to accomplish.
@AndoroidP Жыл бұрын
Just use Rust from the ground up. It's not that hard
@robmulla
Жыл бұрын
I tried. It was hard. 😂
@theLowestPointInMyLife
Жыл бұрын
Rust is terrible for most things
@bhavyakukkar
7 ай бұрын
also terrible for everything until you get the hang of it
@code2compass8 ай бұрын
Damn
@panda_dva2261 Жыл бұрын
Real Data Scientists will wait 10 hours for their Refresh Data in excel. Patience is virtue. All these new scallywags with their tik toks and 5 second attention spans looking for the fastest thing possible.
@robmulla
Жыл бұрын
Patience is a virtue!
@bhavyakukkar
7 ай бұрын
patience is virtue = help i can't keep up
@bigkatoan50767 ай бұрын
Actually 2.8s and 600ms still the same cus 1 click complete :))
@throwaway6288 Жыл бұрын
Wow 4 times faster!!! 😐
@robmulla
Жыл бұрын
I take it you’re not impressed…
@NeArMe. Жыл бұрын
Still new and learning about pandas 😢
@robmulla
Жыл бұрын
We all have to start somewhere!
@jaybestemployee7 ай бұрын
Yeah, learn a new library to save seconds at a time.
@donaldli4755 Жыл бұрын
Short story: no
@robmulla
Жыл бұрын
But also: maybe?
@blackpilledbuddha4944 Жыл бұрын
Will my boss pay me 4 times as much...
@robmulla
Жыл бұрын
Guaranteed!
@johannesmphaka7433 Жыл бұрын
Vaex, is much better. Have a look at it.
@robmulla
Жыл бұрын
I made a video about it already. Check my channel for the video about pandas alternatives
@johannesmphaka7433
Жыл бұрын
@@robmulla Thanks.
@stevenpaulsen5975 Жыл бұрын
y’all could do this in excel with fast results 😭
@robmulla
Жыл бұрын
nooooooooooooooooooooooooooooooooo 😂
@pineapple3832 Жыл бұрын
why do people use pandas or polars when sql exists?
@robmulla
Жыл бұрын
There are a few situations where it might be more appropriate to use Pandas over SQL for a particular task: When working with small or medium-sized datasets: Pandas is generally faster and more convenient than SQL for working with small or medium-sized datasets, especially if the data is already in a structured format (such as a CSV file). When the data is not stored in a database: If the data you are working with is not stored in a database. When you need to perform complex data manipulation tasks: Pandas provides a wide range of functions and methods that can be used to manipulate and summarize data in a variety of ways. This can be particularly useful when you need to perform complex data manipulation tasks that would be difficult or time-consuming to accomplish using SQL alone.
@pineapple3832
Жыл бұрын
@@robmulla yeah okay that all sounds like it makes sense. So basically, SQL is for really large datasets, data that's already in a database, and there's certain "complex manipulation tasks" that can be done in pandas and not sql.
@robmulla
Жыл бұрын
@@pineapple3832 You got it. Also data exploration can be much easier when working with the data in the computer’s memory. Check my EDA video for some examples.
@Jacob-bn1nj
Жыл бұрын
@@robmullaHow would you compare the usefulness to R? Currently in college and I taken a couple courses using primarily R and im split on which of the 2 languages I should focus on
@Onrirtopia Жыл бұрын
I don't know how python Devs have the face to call anything "lightning fast"
@robmulla
Жыл бұрын
Don't gatekeep me bro
@Onrirtopia
Жыл бұрын
@@robmulla it's not gatekeeping. Seriously are you dumb? If you want to get into programming, sure, here are some "lightning fast" beginner languages: Lua, Kotlin, Dart, Nim, Go and many more. It's not gatekeeping, it's the truth. Python is slow and nothing made with python should be called "lightning fast" considering the same thing has been created in Go and runs 3times faster. Also, the language for data science is Julia, not python.
@robmulla
Жыл бұрын
@@Onrirtopia ok. This package is written in rust with a python api. Most python packages are written on C. Saying python is slow is hilariously ignorant.
@Onrirtopia
Жыл бұрын
@@robmulla the API speed still makes it slower than just using a native Go package. Above that, cython (or C-python) is only as fast as the person it's written by. And yes, python is interpeted so if you write C code in python that also has to be interpeted making it slower than any compiled language, again. Stop trying to make up lies just to win an arguement. Python is slow.
Пікірлер: 187
Python devs will use anything but python to make python faster lol
@samueljehanno
Жыл бұрын
Lmao
@bernardcrnkovic3769
Жыл бұрын
so what? that is the point of python, to be a pretty wrapper around optimized components :D
@lukaswalker2342
Жыл бұрын
@@bernardcrnkovic3769 exactly that
@grantpeterson2524
Жыл бұрын
Uh, yeah, exactly. Why is that a bad thing? Python is a lot faster to write, C/C++/Rust (or any compiled language) is faster to run. Most of the time, when I profile, 5% of my code takes up 95% of the runtime. Rewriting that 5% in Rust or C let's me have my cake and eat it too.
@shlokbhakta2893
Жыл бұрын
@@grantpeterson2524 it’s not a bad thing, just a funny joke because it’s ironic lol
Polars performance benchmarks are great + developing roadmap looks promising. Looking forward to your upcoming Polars series 👍
@robmulla
Жыл бұрын
Coming soon! Thanks for watching.
@chrstfer2452
7 ай бұрын
@@robmulla where's that polars series?
Rust is getting over everything guys.
@robmulla
Жыл бұрын
True
@incremental_failure
Жыл бұрын
You could say, everything is rusty. Badum-tssh.
@Diabolic9595
Жыл бұрын
Funny joke mate!
Python really do includes the whole north pole😂
@robmulla
Жыл бұрын
And all the bears!
@user-myraklejnr
7 ай бұрын
Santa here we come😂😂😂😂😂😂😂😂
For Polars to replace Pandas, they have to up they game in term of integration. Pandas is the de facto library for data engineer and data science in Python, meaning tones of other libraries are integrated with pandas (SqlAlchemy, pySpark, Arrow, scikit-learn, matplotlib, etc… basically any Python data engineering and data science libraries have integration with Pandas. And you also have to count all the peoples who knows Pandas, working at making it faster with vectorisation
@robmulla
Жыл бұрын
All great points. I think it will take time. But it works well for what it was designed to do.
@adrianjdelgado
Жыл бұрын
Polars also uses vectorization and has a quick and easy way to transform a polars dataframe to a pandas dataframe. In some benchmarks, it is faster to create a pandas dataframe via polars than using pandas directly.
@chrstfer2452
9 ай бұрын
It'll take rewriting their interface to match pandas' interface. Then it'd pretty much be a drop-in replacement; Edit, having worked with it, i still think this but the polars interface is better so im thinking it should be a LazyFrame/DataFrame.pdcompat type module
@ctm92
4 ай бұрын
Polars need to be a drop in replacement for pandas to be used in the field. Data scientists know how to use pandas and switching over to something other has a steep learning curve and it might not be worth it, especially for a new project
mfw its the 100th time today that I've seen something "designed for speed from the group up, in Rust".
@robmulla
Жыл бұрын
What were the other 99? I want to know.
Rust allows C like speed without the decades of experience required to write safe and optimised C code.
@robmulla
Жыл бұрын
This is an interesting take! I don't know that much about coding in C or Rust but I didn't know that was one of the benefits of Rust.
Waiting for faster version of GeoPandas, Sjoin-ing 100+ rows already took long enough
@robmulla
Жыл бұрын
I’ve only done a little work with geopandas but noticed it was slow too.
@AgnaldoC
Жыл бұрын
Geo pandas-dask
I used to be a die-hard pandas-user and just recently switched to polars - I am not going back. It's not just speed, it's data types (ok pandas 2.0 has made huge progress here), syntax, and the kind of no-bullshit-fuckarounds with indices. I fell in love with polars, especially with the now available api to hvplot
I hope so. Not only for speed, but for code cleanliness.
@robmulla
Жыл бұрын
You like polars code style better? I don't know how I feel about all the `pl.col()` it needs.
@JordiRosell
Жыл бұрын
@@robmulla I think it helps writing in more chained style. I agree that pl.col isn't great. I prefer to use col importing it, but it's not ideal.
Another library which is blazingly fast because of Rust.
@robmulla
Жыл бұрын
What other ones are?
@LeNguyen-yj9ol
Жыл бұрын
Ruff 😊
@cloinca_rpe11
Жыл бұрын
WhiteBox Tools as well if you do spatial analysis
@LordPompinchu666
Жыл бұрын
Another library fast in Rust because people never cared to learn C and spoil the shit out of performance. Try to run reverse sqiared root in C vs Rust. You'll face the hard truth: modern programmers are way worse than the older ones, when performance mattered. Doom runs on my fridge. Try to run Rust on your coffee machine... good luck
@adrianjdelgado
Жыл бұрын
@@LordPompinchu666 in the specific case of Polars, one of the main reasons it is faster is because of multitheading. A lot of potential bugs in that realm Rust catches them at compile time. Rust makes writing a multitheading version of pandas feasible. Doing it in C would be a minefield.
Do you think it will replace pandas ? And do it have the same options as pandas
@robmulla
Жыл бұрын
Great question. For some tasks I think it will. It still lacks some functionality like native plotting. Look out for a full length video I’m going to be making about polars soon.
@eyadamin4089
Жыл бұрын
@robmulla Waiting for it, all of your videos and lives are very helpful and interesting tho
Polars it is then. I'm relatively new and use Jupyter Notebook but I assume I can just import Polars like Pandas as shown in the video?
@robmulla
Жыл бұрын
Yes, I have a longer video where I review polars on my channel and explain. Check it out here: kzread.info/dash/bejne/iHyl0JmulszPnKg.html
@Linkario86
Жыл бұрын
@@robmulla thanks!
Is that an extension that shows the running block and it’s time? (Not the timeit)
How much does the loading of the data contribute to the time?
They should have used the syntax of pandas
@robmulla
7 ай бұрын
I think they purposefully wanted to be different. There are already a lot of pandas alternatives that don't work too great. Polars is it's own thing entirely.
Ugh I am literally 2/3 of the way through refactoring an old project created by a former contractor where I replaced his list and dict comprehension with pandas...guess I gotta refactor my refactor.
@robmulla
Жыл бұрын
Is the main goal of the project speed? If so dict and lists are going to be hard to beat. If not then pandas should be sufficient.
@chrism6880
Жыл бұрын
@Rob Mulla the project compares very large datasets. Since pandas has a numpy backend implemented in c, many of the operations are orders of magnitude faster than using dicts.
What ide is this?
Alright there are way too many options floating around right now. I spent the last week letting a modest gaming right run 24/7 to convert a bunch of SAS7BDAT files into parquet files because the pyreadstat multithreaded reading in chunks didn't work as expected. For that same dataset which is several hundred GBs in disk size, I have to do some data wrangling and I'm growing ill at the thought of how long it would take pandas to loop through it. Now I either risk learning dask, polars, or maybe even SQLite only to not get the desired results at a suitable speed, or stick to pandas. Thoughts?
@robmulla
Жыл бұрын
I agree, it's hard to say what the best option is right now. I think the main question I ask myself is: how fast do I need it to run? and can I do do my computation on a single machine in local memory? - The choice really depends on the answers to those questions.
I love how everyone is lightning faster than the other lightning faster framework lol
Only ones I know, are bidirectionally opposed from each other and stored in one source object
I use Polars more then pandas, Polars syntax is much more simple and way faster
@robmulla
Жыл бұрын
Nice! I still can't fully move away from pandas, but polars for major data pipelines for sure!
@dhaval1489
Жыл бұрын
@@robmulla me neither pandas eco-system is much larger and mature, you can always change Polars database frame to pandas and vice versa, so at the end of the day whatever get the job done efficiently should be used.
Does it matter if youre just running the code once.
Polaris can work with pandas beautifully!
You are using Python 3.8.5, what if you use Python 3.11?
For a noob as I am, it takes me 10mn just to import more than 20 modules before actually writing some functions
@robmulla
Жыл бұрын
I can relate. Copy/paste can save some time though if you do it a lot.
Nice 👍
In my uni, I'm still using pandas for data science. Not that much different I think for now
What about the new versions of pandas ? Specially since 2.0 , it increase its speed ?
Does it have integration with snowflake?
@robmulla
Жыл бұрын
Not sure. Good question.
@sakatagintoki8835
6 ай бұрын
Well snowflake has python api. So you can use it to load data after processing the data using polars.
Why not using PySpark instead of Polar or Pandas???
interesting wondering how this compares to say pyspark and Cudf
@robmulla
Жыл бұрын
Probably depends on the dataset. My understanding is polars can work well for opeations in a single machine's memory, pyspark is more for distributing across many nodes and cudf is fast if your data can fit into GPU memory.
@camus83489
Жыл бұрын
@@robmulla ahh cool, interesting, so this polars thing probably best way to speed up data wrangling on a single computer (for at home hobbyists). Another interesting thing would be for df.apply(lambda x: etc) operations - how quickly can polaris iterate through a dataset. I think that would be a huge game changer
Eventually people will realize that they save more time by just moving to 100% Julia instead of wasting all this time building everything in 1 language to execute it in another.
@robmulla
Жыл бұрын
I guess every language is popular in it's own way. I've never learned Julia.
@evanshlom1
7 ай бұрын
Or you do it all in rust which is better than Julia
So Polars is like Pandas, but cooler? (pun intended)
Super useful
@robmulla
Жыл бұрын
Thanks! Glad you found it useful.
What editor is this?
@robmulla
Жыл бұрын
Vscode with the jupyter extension.
I was using dask earlier
@robmulla
Жыл бұрын
Nice! You should try out polars too.
What ide do you use?
@jamesn6458
Жыл бұрын
Looks like Visual Studio Code
I'd rather using cudf if gpu available on system,else i will use polars
@robmulla
Жыл бұрын
Still need to do more testing with cudf. But it’s fast for sure.
BLAZINGLY FAST
@robmulla
Жыл бұрын
🔥 🚗 🔥
where u get the dataset
@robmulla
Жыл бұрын
Here you go: www.kaggle.com/datasets/robikscube/flight-delay-dataset-20182022
@dung-olymzeus
Жыл бұрын
@@robmulla thanks
What ide is this or notebook or something
@skelaw
Жыл бұрын
vsc with jupyter extension
@robmulla
Жыл бұрын
Yes! VSCode and jupyter
What about dask ?
@robmulla
Жыл бұрын
I have an entire video that compares dask, modin, and vaex. Check it out here: kzread.info/dash/bejne/fnmcr7Ohc9mZe8o.html
How many cores in pandas, how many cores in Polaris?
@robmulla
Жыл бұрын
Depends on the function you are running. Some pandas functions don't run multithreaded but others do. Polars is completely multithreaded I believe.
Now compare to MySQL and sql
What's the HW?
@robmulla
Жыл бұрын
I show all my hardware in my setup video. But I have a ryzen threadripper with a lot of cores.
@HaunterButIhadNameGagWtf
Жыл бұрын
@@robmulla thx. Why it's not counted on GPU? Or data mining tasks cannot be accelerated by this, just neural networks itself? I am beginner, bought rtx 3060 12gb for basic tasks. U got link to that video pls?
@robmulla
Жыл бұрын
@@HaunterButIhadNameGagWtf You should try out cudf if you want to process on a GPU. It is really fast but requires enough GPU memory for your data.
a minute of silence for those rustaceans who will use this as an excuse to say Rust is faster than C. (saying that as someone who loves Rust btw)
Dam 😮
@robmulla
Жыл бұрын
Yea. Pretty crazy. Am I right?
Knowledge of pure SQL is much more effective...
@robmulla
Жыл бұрын
Could be true but also depends on what you’re trying to accomplish.
Just use Rust from the ground up. It's not that hard
@robmulla
Жыл бұрын
I tried. It was hard. 😂
@theLowestPointInMyLife
Жыл бұрын
Rust is terrible for most things
@bhavyakukkar
7 ай бұрын
also terrible for everything until you get the hang of it
Damn
Real Data Scientists will wait 10 hours for their Refresh Data in excel. Patience is virtue. All these new scallywags with their tik toks and 5 second attention spans looking for the fastest thing possible.
@robmulla
Жыл бұрын
Patience is a virtue!
@bhavyakukkar
7 ай бұрын
patience is virtue = help i can't keep up
Actually 2.8s and 600ms still the same cus 1 click complete :))
Wow 4 times faster!!! 😐
@robmulla
Жыл бұрын
I take it you’re not impressed…
Still new and learning about pandas 😢
@robmulla
Жыл бұрын
We all have to start somewhere!
Yeah, learn a new library to save seconds at a time.
Short story: no
@robmulla
Жыл бұрын
But also: maybe?
Will my boss pay me 4 times as much...
@robmulla
Жыл бұрын
Guaranteed!
Vaex, is much better. Have a look at it.
@robmulla
Жыл бұрын
I made a video about it already. Check my channel for the video about pandas alternatives
@johannesmphaka7433
Жыл бұрын
@@robmulla Thanks.
y’all could do this in excel with fast results 😭
@robmulla
Жыл бұрын
nooooooooooooooooooooooooooooooooo 😂
why do people use pandas or polars when sql exists?
@robmulla
Жыл бұрын
There are a few situations where it might be more appropriate to use Pandas over SQL for a particular task: When working with small or medium-sized datasets: Pandas is generally faster and more convenient than SQL for working with small or medium-sized datasets, especially if the data is already in a structured format (such as a CSV file). When the data is not stored in a database: If the data you are working with is not stored in a database. When you need to perform complex data manipulation tasks: Pandas provides a wide range of functions and methods that can be used to manipulate and summarize data in a variety of ways. This can be particularly useful when you need to perform complex data manipulation tasks that would be difficult or time-consuming to accomplish using SQL alone.
@pineapple3832
Жыл бұрын
@@robmulla yeah okay that all sounds like it makes sense. So basically, SQL is for really large datasets, data that's already in a database, and there's certain "complex manipulation tasks" that can be done in pandas and not sql.
@robmulla
Жыл бұрын
@@pineapple3832 You got it. Also data exploration can be much easier when working with the data in the computer’s memory. Check my EDA video for some examples.
@Jacob-bn1nj
Жыл бұрын
@@robmullaHow would you compare the usefulness to R? Currently in college and I taken a couple courses using primarily R and im split on which of the 2 languages I should focus on
I don't know how python Devs have the face to call anything "lightning fast"
@robmulla
Жыл бұрын
Don't gatekeep me bro
@Onrirtopia
Жыл бұрын
@@robmulla it's not gatekeeping. Seriously are you dumb? If you want to get into programming, sure, here are some "lightning fast" beginner languages: Lua, Kotlin, Dart, Nim, Go and many more. It's not gatekeeping, it's the truth. Python is slow and nothing made with python should be called "lightning fast" considering the same thing has been created in Go and runs 3times faster. Also, the language for data science is Julia, not python.
@robmulla
Жыл бұрын
@@Onrirtopia ok. This package is written in rust with a python api. Most python packages are written on C. Saying python is slow is hilariously ignorant.
@Onrirtopia
Жыл бұрын
@@robmulla the API speed still makes it slower than just using a native Go package. Above that, cython (or C-python) is only as fast as the person it's written by. And yes, python is interpeted so if you write C code in python that also has to be interpeted making it slower than any compiled language, again. Stop trying to make up lies just to win an arguement. Python is slow.
@robmulla
Жыл бұрын
@@Onrirtopia you started it 😝
Too bad I'm not into data science... yet or never
@robmulla
Жыл бұрын
Come on! You know you want to! 😊
Sql
@robmulla
Жыл бұрын
For databases it’s great!