Airflow Data Pipeline with AWS and Snowflake for Beginners | Project
👍 Smash the like button to become an Airflow Super Hero!
❤️ Subscribe to my channel to become a master of Airflow
🏆 BECOME A PRO: www.udemy.com/course/the-comp...
🚨 My Patreon: / marclamberti
Build a data pipeline in Airflow and the Astro SDK that interacts with AWS and Snowflake.
You can find the text version of that video and orignal DAG here:
astro-sdk-python.readthedocs....
Materials:
➡️ orders_data_header.csv
order_id,customer_id,purchase_date,amount
ORDER1,CUST1,1/1/2021,100
ORDER2,CUST2,2/2/2022,200
ORDER3,CUST3,3/3/2023,300
➡️ Env vars
AIRFLOW__CORE__ENABLE_XCOM_PICKLING=True
AIRFLOW__ASTRO_SDK__SQL_SCHEMA=ASTRO_SDK_SCHEMA
➡️ SQL requests
CREATE DATABASE ASTRO_SDK_DB;
CREATE WAREHOUSE ASTRO_SDK_DW;
CREATE SCHEMA ASTRO_SDK_SCHEMA;
CREATE OR REPLACE TABLE customers_table (customer_id CHAR(10), customer_name VARCHAR(100), type VARCHAR(10) );
INSERT INTO customers_table (CUSTOMER_ID, CUSTOMER_NAME,TYPE) VALUES ('CUST1','NAME1','TYPE1'),('CUST2','NAME2','TYPE1'),('CUST3','NAME3','TYPE2');
CREATE OR REPLACE TABLE reporting_table (
CUSTOMER_ID CHAR(30), CUSTOMER_NAME VARCHAR(100), ORDER_ID CHAR(10), PURCHASE_DATE DATE, AMOUNT FLOAT, TYPE CHAR(10));
INSERT INTO reporting_table (CUSTOMER_ID, CUSTOMER_NAME, ORDER_ID, PURCHASE_DATE, AMOUNT, TYPE) VALUES
('INCORRECT_CUSTOMER_ID','INCORRECT_CUSTOMER_NAME','ORDER2','2/2/2022',200,'TYPE1'),
('CUST3','NAME3','ORDER3','3/3/2023',300,'TYPE2'),
('CUST4','NAME4','ORDER4','4/4/2022',400,'TYPE2');
Enjoy 🔥
Ready?
Let's go!l
Пікірлер: 47
Thanks Marc! Great Tutorial!
@MarcLamberti
9 ай бұрын
You’re welcome 🫶
Thanks man , very much appriciated.
@MarcLamberti
10 ай бұрын
You’re welcome
Learn an easiest way to build dev env for airflow data pipeline. Great!!
Still works 😄. really cool pipeline
@MarcLamberti
Ай бұрын
Good to know 🥹
For those who don't see the host anymore, in the account field, make sure you add: youraccountnumber.yourregion.yourcloud For example: nb71231.eu-west-3.aws Basically, take everything between and snowflakecomputing.com and leave the region field empty Enjoy
What is the best way to pass CSV between tasks? for example: one function parse a JSON to CSV second function take the CSV to S3 bucket.
I was struggling with airflow installation, so I purchased your udemy course. Hoping, will get some better suppport.
@MarcLamberti
Жыл бұрын
Keep me posted ;)
@datalearningsihan
Жыл бұрын
@@MarcLamberti did not really help. I asked for a refund to the udemy. I had issues with the installation in your way. My CPU was maxing out. Nothing really was working after I was able to install the airflow in your recommended way. So, it was a bad first impression of the course. So, had to ask for a refund. Sorry.
@MarcLamberti
Жыл бұрын
@@datalearningsihan you don't have to be sorry. I believe your issues is more related to Docker than Airflow or the course. Check that you have enough memory. Otherwise, you can still install Airflow manually with pip install
Is there a benefit to using airflow instead of snowpipe for this purpose?
@alejandroflorian9574
4 ай бұрын
Imagine needing to consume and migrate not just a single table, but over 100. You'd have to create 100 pipes for inserting the data. Now, with Airflow, it's easier to customize and scale this process.
How do we manage connections credentials not via UI? I mean deploy them as code with a reference to secrets manager.
Thanks Marc, I am facing this error when connecting to Snowflake from airflow; Airflow is running in docker compose (the file you provided in udemy course), ERROR- 250001: 250001: Could not connect to Snowflake backend after 2 attempt(s).Aborting I checked all the parameters but still facing this issue ( Airflow version - v2.8.1)
anyone else having issues with snowflake connection? I followed everything but it doesn't seem to work. Not even sure how to know what went wrong
@aldoaguirre9864
Жыл бұрын
yeah, same problem for me 250001: 250001: Could not connect to Snowflake backend after 0 attempt(s).Aborting
can you check on creating the connection within airflow to snowflake? the interface has changed slightly and now i'm unable to create a connection. i've verified that all parameters are correct and yet the test is still failing
@isaachernandez3094
10 ай бұрын
Yes I have the same issue
@MarcLamberti
9 ай бұрын
I’ve just released a new video that shows how to make that connection kzread.info/dash/bejne/i46IxauiZdKddqw.htmlsi=8-8-Q8LUasYfz2V0
At SQL Requests STEP -> I had to execute the query to create Dataware House and Schema separately since I ran into a " No active warehouse selected in the current session " Error, later trying to Insert values into the table. Also, in the Airflow UI, in connections I don't have the Amazon S3 option !
@MarcLamberti
Жыл бұрын
Use the AWS option instead of the connection. Thanks for sharing
salut marc, est ce que je dois faire astro dev start encore une fois lorsque je crée le nouveau dag dans le dossier dags
@MarcLamberti
8 ай бұрын
Nop
I'm unable to see Amazon S3 on airflow localhost. Can you please help me with that?
@MarcLamberti
Жыл бұрын
Did you install the Amazon provider?
@Yonatanx3
Жыл бұрын
Hi Ruchi, I'm facing the same issue. Did you mange to solve this? Thanks
@MarcLamberti
Жыл бұрын
@@Yonatanx3 Use Amazon Web Services for the connection type ;)
Hi, when creating connections in airflow, the test button is greyed out and says 'Testing connections is disabled in Airflow configuration. Contact your deployment admin to enable it' please can you help on this, so test is enabled, I can see in config it's set to disabled, just need to know how to switch it. Thanks
@MarcLamberti
9 ай бұрын
Yes. That has been introduced in 2.7. Change the configuration setting AIRFLOW__CORE__TEST_CONNECTION to enabled
For me there is no option to add the host url for snowflake as connection type ....please suggest something
@MarcLamberti
10 ай бұрын
You need to install the apache-airflow-providers-snowflake==4.4.0 provider
@kkampassi4820
10 ай бұрын
@@MarcLamberti I tried but still it is not working, could you please share the git repo for the entire process, this gonna be of great help for us
@MarcLamberti
9 ай бұрын
@@kkampassi4820 Look at the pinned comment :) I will release a video tomorrow that uses Snowflake as well with the updated way
I couldn't find the "Amazon S3 connection" on the airflow ui. What's going on?
@sampyism
5 ай бұрын
can someone explain how I can install the s3 provider package?
I can't see amzon s3 connection type in airflow web
@MarcLamberti
2 ай бұрын
It’s AWS now
@alex45688
2 ай бұрын
@@MarcLamberti ok
I just can't stand that accent
@MarcLamberti
Жыл бұрын
Me too 🤢
@akj3344
Жыл бұрын
@@MarcLamberti I love your accent. Dont listen to ungrateful morons.
@MarcLamberti
Жыл бұрын
@@akj3344 Thank you🙏
@konnen4518
Жыл бұрын
@@akj3344 eat deek