The New Way of Scheduling DAGs in Airflow with Datasets

Airflow Datasets bring a new way of scheduling your data pipelines
👍 Smash the like button to become an Airflow Super Hero!
❤️ Subscribe to my channel to become a master of Airflow
🏆 BECOME A PRO: www.udemy.com/course/the-comp...
🚨 My Patreon: / marclamberti
Ready?
Let's go!

Пікірлер: 31

  • @brunosompreee
    @brunosompreee6 ай бұрын

    Great content as always Marc!

  • @trench6118
    @trench6118 Жыл бұрын

    Airflow has been on fire lately - I love TaskFlow API and dynamic task mapping. Data aware scheduling came out at a perfect time and simplified a real problem for me

  • @MarcLamberti

    @MarcLamberti

    Жыл бұрын

    Other great features are coming. Stay tuned ;)

  • @practicalgcp2780
    @practicalgcp2780 Жыл бұрын

    Amazing video Marc! This is a truly amazing feature! Although, one thing I couldn't seem to find is a way to pass some parameters to the consumer DAG. Is there a way to access the context of what triggered the DAG? Or the extra params can be passed in the Dataset? This could be useful metadata such as the latest timestamp of some data been updated which can be useful to the downstream processes when triggered. Thank you!

  • @richie.edwards
    @richie.edwards Жыл бұрын

    I started working more with Airflow at my job and your videos have been very helpful when I want to switch up learning format and not look through docs to get exposed to concept.

  • @MarcLamberti

    @MarcLamberti

    Жыл бұрын

    Thank you 🙏

  • @RobsonLanaNarvy
    @RobsonLanaNarvy9 ай бұрын

    Nice demonstration, I will test a MySQL as dataset to explore this feature

  • @user-qh4bd7ub5x
    @user-qh4bd7ub5x10 ай бұрын

    This is awesome. No more such ugly Triggers, Sensors and etc. Thx for explanation Marc!

  • @MarcLamberti

    @MarcLamberti

    10 ай бұрын

    you're welcome :)

  • @davideairaghi6763
    @davideairaghi6763 Жыл бұрын

    Hi Marc, datasets looks like to be very useful but how they can be used to trigger a dag based on a SQL database update? Is there any example of it? Thanks in advance

  • @Jeoffrey54
    @Jeoffrey54 Жыл бұрын

    Amazing 👀

  • @lifeofindians1695
    @lifeofindians1695 Жыл бұрын

    Hi Marc, I am watching your airflow architecture video In single node Executor update the metastore In multi architecture executor put data in queue So who will update the metastore in multinode after job is done Queue or executor

  • @askmuhsin
    @askmuhsin Жыл бұрын

    Hi Marc, This is indeed a truly amazing feature. Just wondering if there is always going to be an instance of consumer DAG triggered for every file (URI) change. ie.. in case the consumer DAG is running while the producer DAG has created a new file change, will the new change cause a new coumser DAG instance to run on the new data (ie.. while the previous conumser instance is still running). if that makes sense. as always thank you for the content.

  • @MarcLamberti

    @MarcLamberti

    Жыл бұрын

    yes

  • @tiankun4450
    @tiankun4450 Жыл бұрын

    can i use template var (like ds_nodash) in Dataset uri ?

  • @ady3949
    @ady39497 ай бұрын

    Hi Marc, This is indeed amazing feature. I try to use dataset scheduling, but when job finished/or failed, it is didn't trigger my on_success_callback/my_failure_callback. While using normal scheduling (ex: @hourly, etc), it trigger my on_success_callback/my_failure_callback. Is there any config that I missed? or is it a bug?

  • @rohithspal
    @rohithspal Жыл бұрын

    A very utilitarian feature!.. Isn't "task aware scheduling" a more appropriate name for this feature? Since there is no real interaction with data .

  • @MarcLamberti

    @MarcLamberti

    Жыл бұрын

    I think there will be real interaction with data at some point 😉

  • @minnieshi2934
    @minnieshi2934 Жыл бұрын

    Same thing, great /not direct comment about: If the producer DAG ‘s task had defined the outlet, but does not really access the file/folder. Or has nothing to do with the content of the URI in the task logic, what would happen? the consumer DAG still runs. So it is really just using the URI as a link between the two DAGs.

  • @VallabhGhodkeB

    @VallabhGhodkeB

    6 ай бұрын

    Yeah exactly it is just the URI that acts as a bridge. It does not actually point to anything

  • @alfahatasi
    @alfahatasiАй бұрын

    How to extract table from postgre database instead of txt file as dataset. Is there an example video for this?

  • @Empusas1
    @Empusas1 Жыл бұрын

    You mentioned that the consumer dag triggered by the dataset is alway run when the producer dag did run successfully, not when the dataset has changed. Let`s say the producer has a compare task, and only changes the dataset if necessary. In that case the consumer would always run anyway. Any way to solve that?

  • @RalfredoSauce
    @RalfredoSauce Жыл бұрын

    how do we trigger it off an SQL table update rather than a file? He mentions its possible but I can't seem to find documentation for it anywhere

  • @minnieshi2934
    @minnieshi2934 Жыл бұрын

    Very good to comment that external system updates the dataset file will NOT make the consumer DAG to run.

  • @MarcLamberti

    @MarcLamberti

    Жыл бұрын

    Not yet. But it will be possible very soon

  • @bettatheexplorer1480
    @bettatheexplorer1480 Жыл бұрын

    is this only available on airflow >= 2.4 ?

  • @MarcLamberti

    @MarcLamberti

    Жыл бұрын

    Yes

  • @MarcLamberti

    @MarcLamberti

    Жыл бұрын

    Yes

  • @bettatheexplorer1480

    @bettatheexplorer1480

    Жыл бұрын

    Cloud composer doesn’t have airflow 2.4 yet 😞

  • @imosolar
    @imosolar Жыл бұрын

    please update the udemy with dataset

  • @MarcLamberti

    @MarcLamberti

    Жыл бұрын

    coming