Python Libraries You Should Know As A Data Engineer - Python For Beginners

Ойын-сауық

What python libraries should data engineers know?
Here is a list from beginner to advanced!
Beginner
- Requests
- Paramiko
- Psycopg2 or SQLAlchemy
- Datetime
Mid
- BeautifulSoup
- Airflow
- All the cloud libraries(AWS, GCP, Azure)
Advanced
- PySpark
- PyKafka
0:00 Intro
2:10 Requests
2:44 Paramiko
3:02 Psycopg2
4:00 Basic Data Engineering Project Idea
4:42 BeautifulSoup
5:02 Datetime
6:00 Airflow
6:33 All the cloud libraries(AWS, GCP, Azure)
8:30 PySpark and PyKafka
If you enjoyed this video, check out some of my other top videos.
Top Courses To Become A Data Engineer In 2022
• Top Courses To Become ...
What Is The Modern Data Stack - Intro To Data Infrastructure Part 1
• What Is The Modern Dat...
If you would like to learn more about data engineering, then check out Googles GCP certificate
bit.ly/3NQVn7V
If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
seattledataguy.substack.com/​​
Or check out my blog
www.theseattledataguy.com/
And if you want to support the channel, then you can become a paid member of my newsletter
seattledataguy.substack.com/s...
Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio
_____________________________________________________________
Subscribe: / @seattledataguy
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.
*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.

Пікірлер: 28

  • @SeattleDataGuy
    @SeattleDataGuy Жыл бұрын

    If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/

  • @shravanshenoy3873
    @shravanshenoy3873 Жыл бұрын

    Beginner - 1. Requests (and sftp) 2. Psycopg2 and similar database libraries 3. Beautifulsoup and scrapy 4. Datetime 5. Virtualenv Intermediate - 6. Airflow 7. Boto3 and similar libraries to interact with cloud 8. Flask/Django Advanced (based on need to know) - 9. Pyspark 10. Pyarrow

  • @ChatGPT-ef6sr

    @ChatGPT-ef6sr

    Жыл бұрын

    Up

  • @seya2183

    @seya2183

    6 ай бұрын

    Warning e logging too

  • @RSKriegs
    @RSKriegs Жыл бұрын

    Some other cool libraries from my side: - Pandas - you've mentioned it but you haven't put it in a context that one should know I think (vide the case from your Facebook interviews) - I think its essential for any sort of data wrangling with Python. - NumPy - essential stuff for any sort of algebra if you want to dive deeper into ML - MyPy/Pydantic - for data validation & static typing - Pytest - for testing - matplotlib & seaborn - for data visualization in Python - any sort of file libraries for specific file formats like json, csv, avro-python etc. - ML libraries like scikit-learn - FastAPI as an alternative to Django/Flask - Selenium - argparse for scripting Although I haven't used most of these in my job on a regular basis - I think it doesn't hurt to know them :)

  • @data-dylan

    @data-dylan

    Жыл бұрын

    sympy is more of an algebra library. I think you meant numpy is a linear algebra library. This can be a good way of thinking about it for a beginner who wants to learn ML, but I find it gets used a lot for stuff where you want to try and represent continuous mathematics as closely as possible on a computer. For example, numpy would also be also be good for stuff like signal processing or creating a function of best fit for your data that can be plotted.

  • @luizhenriquecudo125
    @luizhenriquecudo125 Жыл бұрын

    Great content as usual! I'd add json library to that

  • @shashankemani1609
    @shashankemani1609 Жыл бұрын

    amazing thank you!

  • @SeattleDataGuy

    @SeattleDataGuy

    Жыл бұрын

    You're very welcome!

  • @lkellermann
    @lkellermann Жыл бұрын

    Watching the premiere... expecting to hear about the tenacity library here xD

  • @SanjeevKumar-dr6qj
    @SanjeevKumar-dr6qj Жыл бұрын

    You are awesome.

  • @matthewwiese6972
    @matthewwiese6972 Жыл бұрын

    Psycho pg2 is how I've heard folks say it too!

  • @pcargolo1
    @pcargolo1 Жыл бұрын

    I've gone through possibly all python courses in Udemy but have never seen a course focused on Data Engineering and the good-to-know libraries. Some times there is one short chapter about one of them buth nothing complete. Anyone has any tips?

  • @EbeneezerGumb
    @EbeneezerGumb Жыл бұрын

    good list, but most of your psycopg2 stuff prob would have been easier with sqlalchemy

  • @hdr-tech4350
    @hdr-tech4350 Жыл бұрын

    Requests Psycopg Bigquery Beautifulsoup & scrapy Datetime Boto 3 Flask Virtualenv Spark Pyarrow Pykafka Snowflake

  • @SeattleDataGuy

    @SeattleDataGuy

    Жыл бұрын

    Thanks! I finally added in the agenda so these are now included.

  • @redrum4486
    @redrum4486 Жыл бұрын

    I have to use a shell script ti execute mysql queries then pass the resulrt as an argument in my python scripts >_< wish i could just use mysql connector

  • @data-dylan
    @data-dylan Жыл бұрын

    How can you know pandas every which direction, but not understand a dictionary? You wouldn't know how to construct a dataframe from a dictionary of lists (often my approach when webscraping) or know how to use the map function to change categorical names. Wes McKinney (who created pandas) even says that a pandas series data structure is similar to an ordered dictionary.

  • @EH-it8pj
    @EH-it8pj Жыл бұрын

    I'm stuck in a "data engineer" position where all my boss will let me do is debug SQL script and it's killing me

  • @gavinkalaher7314

    @gavinkalaher7314

    Жыл бұрын

    how long have you been there?

  • @jeffGordon852

    @jeffGordon852

    Жыл бұрын

    QUIT

  • @playea123

    @playea123

    Жыл бұрын

    Leave if you can. You are doing yourself no favors by wasting years at a job you don’t like and especially one that isn’t improving your skills

  • @gabrielkolletalves493
    @gabrielkolletalves493 Жыл бұрын

    Regarding to APIs I always thought we should learn how to pull from them, not actually create them. So where does Flask fits into all that?

  • @playea123

    @playea123

    Жыл бұрын

    Depends on what product is built on top of your db/dw. You might need to build an api on top of your warehouse to power your product.

  • @gabrielkolletalves493

    @gabrielkolletalves493

    Жыл бұрын

    @@playea123 Cool. And do you know what kind of custom API could run over a DW? I could only think such case in an OLTP context...

  • @playea123

    @playea123

    Жыл бұрын

    @@gabrielkolletalves493 depends on how you model your DW. If you want something similar to an OLTP, Snowflake rolled out hybrid tables a few months ago

  • @alexanderpotts8425
    @alexanderpotts8425 Жыл бұрын

    hey! leave gcp libs alone 😂

Келесі