How to use Common Table Expressions vs Subqueries vs Views vs Temp Tables like a data engineer!

The data for today's video is available here: github.com/EcZachly/video-gam...
Common Table Expression and subquery is a very common debate! The temporary table crowd is pretty loud too! Don't forget the materialized view crowd !
Check out dataexpert.io for my DataExpert community!

Пікірлер: 57

  • @stanislawcronberg3271
    @stanislawcronberg32713 ай бұрын

    When a DE at my work showed me how to rewrite some of my queries with indexed temp tables to join all the model features into a final table I was so happy

  • @Kyriaeus
    @KyriaeusАй бұрын

    The CROSS APPLY or LATERAL JOIN clause is also nice if you’re doing multiple steps of transformations on the same relation and don’t want to break out a separate CTE for each layer.

  • @anandahs6078
    @anandahs60783 ай бұрын

    fist time learning that materialized can be run concurrently with unique index. thanks for great content.

  • @EcZachly_

    @EcZachly_

    3 ай бұрын

    Thanks! Anything else you’d want to learn from me?

  • @pavelk.1558

    @pavelk.1558

    3 ай бұрын

    Yep, it's really helpful function.

  • @ssjdonkey4489

    @ssjdonkey4489

    3 ай бұрын

    Great video! How do you go about learning complex data engineer concepts? What's your learning algorithm? Thanks

  • @chrishabgood8900

    @chrishabgood8900

    2 ай бұрын

    If you want them to be current then yep

  • @peekknuf
    @peekknuf3 ай бұрын

    Man you're on fire, producing content at the pace I can't consume it :D Nice job!

  • @mikefenn00
    @mikefenn003 ай бұрын

    Subscribed on TikTok when I came across your videos a couple of months ago. Now KZread has recommended you as well. Seems like a good sign for you!

  • @user-ud2zd7nx4c
    @user-ud2zd7nx4c3 ай бұрын

    Zach…sir…this video is effin awesome!!! Like the speed of explanation and the examples!!! 👊

  • @karlnorberg7768
    @karlnorberg77682 ай бұрын

    Huge thanks for this informative video! Picking up some handy stuff even after +10 years of data engineering or whatever we call it now 🙂

  • @chandanjha3205
    @chandanjha3205Ай бұрын

    Important thing to note here is that at 6:30, if you are creating a permanent view in your DB, you need to do it only once as its a DDL statement adding a new object to your schema. It does not have to be repeated everytime else your code will start throwing error. Modern DBs may let you create temporary views whose scope is only during the execution and is not global. You may choose to use that.

  • @jhonnafg
    @jhonnafg3 ай бұрын

    this is so helpful Zach, we get table apis and do a bunch of dataset out of it to build up our sass applications. This sheds light since my current role as a DE in this company is really heavy on SQL but not that part of the etl, they kind of separated that and is more on the backend de. Im trying to bridge that since this is also my first role as de

  • @apibarra
    @apibarra3 ай бұрын

    Love the video! I am a huge proponent of CTEs. Nothing worst when you open a table’s source code and see on massive block of join to multiple tables and sub queries all with their own unique filtering logic at the bottom of the joins or nested in the sub query. I feel like I can understand the logic and read the code much quicker with the CTEs at the top.

  • @SQLBucketHatGirl
    @SQLBucketHatGirl3 ай бұрын

    If only this was posted earlier id probably pass my technical interview on materialized transformation. 😢 thank you for the great explanation and use cases.

  • @subhasishsarkar5106
    @subhasishsarkar51063 ай бұрын

    Great work, thank you so much Zach! Very informative and concise. Keep these coming!

  • @EcZachly_

    @EcZachly_

    3 ай бұрын

    I appreciate your support! Any topics in particular you'd want me to cover?

  • @subhasishsarkar5106

    @subhasishsarkar5106

    3 ай бұрын

    @@EcZachly_ I'd be interested to know your thoughts on the different architecture patterns like the lakehouse, vault, mesh, etc. Looking forward to more from you! :D

  • @diaconescutiberiu7535
    @diaconescutiberiu75353 ай бұрын

    Great content, as ever! I'd like to see a video on Data Vault (with hashes) or at least if you can recommend some resources for it... a practical example would be best. Keep up the good work

  • @madhanv160
    @madhanv1602 ай бұрын

    This is new to me. Great comparison.

  • @ingenieroriquelmecagardomo4067
    @ingenieroriquelmecagardomo40673 ай бұрын

    Awesome video! Would love to see you cover Change Data Capture - with all your breadth and experience, i'm sure it would result in an amazing video.

  • @EcZachly_

    @EcZachly_

    3 ай бұрын

    Thanks for the suggestion!

  • @sagarjoshi7097
    @sagarjoshi70973 ай бұрын

    Hey Zach, thank for such informative videos. Next can you please explain something about dbt usecases. Points like how it tests sql, orchestration, pipline. Other alternativer if any.

  • @EcZachly_

    @EcZachly_

    3 ай бұрын

    Great idea!

  • @tsveti_p
    @tsveti_p3 ай бұрын

    Great explanation Zach! This is one of the best comparisons I've heard between all these - with all their pros and cons! 🌟🌟🌟 On this topic, I would be really thankful for an advice about performance between temporary tables and using staging tables, which are truncated and reloaded on every execution of the ETL. My use case is a complex transformation in which we have multiple stages of preparing the dataset moving it from one temporary table to another after applying some logic, joining a table or deleting some rows. It all happens within one script/session, so we don't need to keep the intermediary results, only the final dataset. We use Redshift and the performance is definitely not great. Do you think we can get any performance gains if we turn these temp tables into actual staging tables and we truncate and reload them? Also, do you know if defining DISTRIBUTION/SORT key on a temporary table in Redshift is as efficient as dist/sort key on an actual table?

  • @sf-spark129

    @sf-spark129

    Ай бұрын

    Personally I use staging tables over temp tables for debugging purposes. When your end/user-facing table is returning incorrect or duplicate data, you can always trace back to an upstream staging table layer by layer to identify the root cause of the issue. Also, it's a lot easier to fine-tune the performance with staging table because you can test and validate different optimization techniques on them.

  • @strawhatSEO
    @strawhatSEO3 ай бұрын

    Great video thank you Zach! Where do stored procedures come in with this conversation? Are sprocs materialized or non-materialized? Or are sprocs used for a purpose besides storing transformations?

  • @EcZachly_

    @EcZachly_

    3 ай бұрын

    They are mostly non-materialized

  • @shiroyasha7178
    @shiroyasha71782 күн бұрын

    So good. Thanks!

  • @md.jannatultasnim2661
    @md.jannatultasnim26613 ай бұрын

    i used CTEs , views , materialized views, temp tables but never found any detail in depth explanations how they works internally and performance comparison. Now everything is clear to me Thank you for the contents you shared. Another thing, as you described here 15:14 does this happens only in the relational DBMS ? or in both relational and data warehouses like snowflake BigQuery ?

  • @EcZachly_

    @EcZachly_

    3 ай бұрын

    Great question! I haven’t tested that exact case. My guess is it’s still blocking based on how refreshes work but that might be something that varies with data warehouses

  • @pasqualesalomone8902
    @pasqualesalomone89023 ай бұрын

    I was reviewing some SQL code today, and I came across an example of nested SQL query aka an infinite series of subqueries, and I was asking myself: WHY?

  • @muhammadraza3290
    @muhammadraza32903 ай бұрын

    Hey Zach, been following you almost religiously now, cheers for the clear conceptual content. What I am not clear on is why the computation for the CTE 9:55 teammates and enemies took that long. Shouldn't the creation of a CTE and the creation of a materialized table take the same amount of time? Does it have anything to do with the complex JOIN condition?

  • @EcZachly_

    @EcZachly_

    3 ай бұрын

    No because the CTE also had an aggregation step afterward

  • @muhammadraza3290

    @muhammadraza3290

    3 ай бұрын

    @@EcZachly_ so temp table + aggregation was far lower time than CTE with aggregation. would it always be the case though?

  • @EcZachly_

    @EcZachly_

    3 ай бұрын

    @@muhammadraza3290 Temp Tables can be useful for improving performance though. It's very difficult to say something will always be the case

  • @alialfaily7829
    @alialfaily78293 ай бұрын

    thanks for great content.

  • @EcZachly_

    @EcZachly_

    3 ай бұрын

    Glad you enjoy it!

  • @tangy_ra
    @tangy_ra3 ай бұрын

    I have a question Zach. In snowflake, you cannot use join in materialized view. So, does materialized views vary from software to software? Also, there are many restrictions on using materialized view on snowflake.

  • @byte_easel
    @byte_easel2 ай бұрын

    This is really great. Can you make videos about SQL aggregate functions, triggers, and transactions? especially aggregate functions and transactions if you can

  • @EcZachly_

    @EcZachly_

    2 ай бұрын

    Thanks for the suggestion!

  • @JPY_Data
    @JPY_Data3 ай бұрын

    hi Zach, great video. What would be the difference between creating a materialized view and creating a table of the results? Would it be the same thing performance wise?

  • @EcZachly_

    @EcZachly_

    3 ай бұрын

    If the materialized view has the same indices as the table, they’ll be the same. The main difference is you can automatically refresh the materialized view. But you can incrementally build a table

  • @hanpanBR
    @hanpanBRАй бұрын

    The spark "registerAsTempTable" goes into what type of those? It works like a temporary table, saving the data just for that session? I guess I've been using it wrong =p

  • @FranLegon
    @FranLegon26 күн бұрын

    I don't really get the benefits of using a materialized view instead of a regular table other that it's a little easier to update (with regular table you would need tp update using keys and check for changed rows, probably using hash)

  • @arulkumarankumaraswamipill2516
    @arulkumarankumaraswamipill25163 ай бұрын

    Timeseries databases like TimescaleDB has continuous aggregates on materialiazed views , which is incremental.

  • @EcZachly_

    @EcZachly_

    3 ай бұрын

    That’s rad!

  • @user-bh5ws7hx1t
    @user-bh5ws7hx1t3 ай бұрын

    we have a really long sql statement that uses a sequence of CTE. Each CTE is then used in the next CTE through a join condition. Do you think we would benefit from going to temp tables instead?

  • @EcZachly_

    @EcZachly_

    3 ай бұрын

    Probably. Try it out!

  • @dn9416
    @dn94163 ай бұрын

    Intresting

  • @EcZachly_

    @EcZachly_

    3 ай бұрын

    Thanks!

  • @fernandoangelrodriguezgonz1393
    @fernandoangelrodriguezgonz13933 ай бұрын

    Good

  • @NeumsFor9
    @NeumsFor93 ай бұрын

    Materialized views are oftentimes like the fine print or the auction speak at the end of a commercial..... SO many limitations (depending on the vendor or edition or featureset). It's like dating the perfect 10...... as long as you pay the 5k dinner bill and have the person home by 8pm because that person has a data engineering conference to speak at in the morning.... 😂😂😂😂

  • @johndt1013
    @johndt10133 ай бұрын

    Hi Zach, thanks for this vid, it was such a great breakdown! I had 1 question if you have time: At 17:40 you mention how temp tables have incremental gains, I'm struggling to understand how that is, seeing as how similar to materialized views it appears. Is this because if you're adding new data 'A' to existing data 'B', temp tables can just focus on adding 'A', whereas materialized views require you to handle 'B' as well due to the refresh? This might be me not understanding the concept of incremental gains, thanks!

  • @EcZachly_

    @EcZachly_

    3 ай бұрын

    Your understanding is right!

  • @NeumsFor9
    @NeumsFor93 ай бұрын

    So many collateral contexts to any of these...