The NEW way to ingest data with Airflow in a single task!

Ғылым және технология

The NEW way to ingest data with Airflow in a single task!
Get started with Airbyte with the dedicated Udemy course 👇 (FREE until June 3th)
www.udemy.com/course/the-comp...
Or use the coupon KZread_1
Data ingestion is tricky, but it doesn't have to be!
Stop wasting time finding the right operator to transfer data.
Rule them all in a single task with Airflow and PyAirbyte!
In this video, you will:
✅ Transfer data from S3 to BigQuery efficiently with PyAirbyte
✅ Create Python Virtual Environments to save running time
✅ Use the ExternalPythonOperator to avoid dependency conflicts
✅ Check data in BigQuery with the BigQueryHook
And more!
No need to juggle between different operators or use custom scripts anymore!
🤖 The code:
robust-dinosaur-2ef.notion.si...
🏆 BECOME A PRO: www.udemy.com/course/the-comp...
👍 Smash the like button to become an Airflow Super Hero!
❤️ Subscribe to my channel to become a master of Airflow

Пікірлер: 10

  • @GCPollaa
    @GCPollaaАй бұрын

    Great video. I saw that the schema of resulting table in BigQuery are mostly strings... What if we are working with parquet files with a specified schema? How to avoid data type conversion issues between aws and gcp? If I have airflow running locally, do I still need the astro cli? Thanks Marc...

  • @yuvalinselberg5570
    @yuvalinselberg5570Ай бұрын

    Excellent video, thanks!

  • @suebrickston
    @suebrickstonАй бұрын

    Merci Marc! I would like to know why two python environments are needed. I installed airflow on my machine and pip installed airbyte in the same environment where I install airflow. Do I need to specify same environment twice or not at all (working local)?

  • @MarcLamberti

    @MarcLamberti

    Ай бұрын

    Great question! I think they are some PyAirbyte dependencies that conflict with Airflow dependencies. That’s why we create one Python Virtual Environment with PyAirbyte installed. The second one is optional but recommended. It has the source installed to avoid having get_source installing the source s3 each time the task runs. That will save a lot of runtime

  • @user-xs7pf1ut9r
    @user-xs7pf1ut9rАй бұрын

    Great video and thank you Marc! I have a question regarding the two virtual envs, how does the get_source() method know to use the s3 virtual environment? I see the external_python task is configured with pyairbyte env, but not mentioning anything about the s3 env, so how does pyairbyte knows where to find the source dependencies?

  • @MarcLamberti

    @MarcLamberti

    27 күн бұрын

    Great question! By defining source-s3 in get_source, the function automatically look for a python virtual environment s3-source. If it doesn’t exist, it creates it and installs the Airbyte s3 connector in it. Since we already create this environment in the dockerfile, the function uses it.

  • @mihaibadea7958
    @mihaibadea795826 күн бұрын

    Great video but the course link from description is not free..and it is not still 3rd of June

  • @MarcLamberti

    @MarcLamberti

    26 күн бұрын

    🧐 It should be free! Use this coupon KZread_1

  • @vikneshwararb3354

    @vikneshwararb3354

    23 күн бұрын

    @@MarcLamberti On using this coupon, I get the following error in Udemy and it is not still 3rd of June "This coupon has exceeded its maximum possible redemptions and can no longer be used".

Келесі