Databricks - Change Data Feed/CDC with Structured Streaming and Delta Live Tables

In this video we see how to use the CDF/CDC features on Databricks to propagate changes and deletions through the Lakehouse. You can find the Databricks Notebooks here:
github.com/apostolos1927/Chan...
Follow me on social media:
LinkedIn: www.linkedin.com/in/apostolos-athanasiou-9a0baa119
GitHub: github.com/apostolos1927/
00:00 - Change Data Feed Intro
03:19 - CDF in Databricks with Structured Streaming and DLTs

Пікірлер: 8

  • @jhonsen9842
    @jhonsen984224 күн бұрын

    Big Thank you , Cant give much but subscribed and Liked .

  • @andriifadieiev9757
    @andriifadieiev97574 ай бұрын

    Great video, just as always!

  • @AthanasiouApostolos

    @AthanasiouApostolos

    4 ай бұрын

    Thank you, glad you find it helpful!!

  • @user-nv9fv2up5d
    @user-nv9fv2up5d3 ай бұрын

    Quick Question : If a record is dropped from Source table i.e hard delete how does apply_changes handle it .

  • @UltimaWeaponz
    @UltimaWeaponz3 ай бұрын

    Do you know why the output of your DLT pipeline didnt give Streaming Tables? I wrote a similar pipeline where I read CDC from a SQL source, write into bronze using Autoloader in DLT, into a streaming table, then write into silver in SCD2 in DLT as well. Both bronze and silver tables are generated as streaming tables... In your example, you get normal Delta tables. I might confusing concepts... Also, do you have a video covering streaming tables and materialized views? Thanks!

  • @AthanasiouApostolos

    @AthanasiouApostolos

    3 ай бұрын

    It depends how you specify the tables. You can specify them as streaming tables or just simple Live tables which are materialized views. Streaming tables are continuously updated while materialized views get updated every few minutes or when you manually update them. docs.databricks.com/en/delta-live-tables/index.html

  • @UltimaWeaponz

    @UltimaWeaponz

    3 ай бұрын

    @@AthanasiouApostolos so only streaming tables dépend on checkpoint locations then from what I gather. ATM my streaming tables are using scheduled DLT pipelines so are not continuous. Thanks for your replies btw! One of the best databricks content creators for sure. 🤗

  • @AthanasiouApostolos

    @AthanasiouApostolos

    3 ай бұрын

    Yes exactly checkpoints are for streaming tables. When you use scheduled pipelines is essentially like doing batch jobs despite the fact you are using streaming tables. You need to use continuous to get the actual benefits of streaming tables. That is of course if you have a good budget hahaha. And thank you, much appreciated.