Azure Synapse Analytics - Mapping Data Flows & Delta!

Mapping Data Flows has been in Azure Data Factory for a while now, but what does the Synapse version look like? How much can we achieve through parameters & dynamic content? This week Simon digs through and compares a mapping data flow to a similar Databricks job and also looks at the new Delta inline source within Data Factory itself
*CORRECTION - In this video Simon mentions Delta querying using SQL On-Demand. This is not yet possible, although we're hoping it will be in before Synapse is Generally Available!
Are you just getting started with Data Factory? Why not check out the Data Factory training courses available through our website: www.advancinganalytics.co.uk/...
And for more analytics & sparky goodness, don't forget to stop by: www.advancinganalytics.co.uk/...
If you liked the video, don't forget to hit that like & subscribe button!

Пікірлер: 26

  • @gtosXD
    @gtosXD4 жыл бұрын

    Really good content, thank you for that!

  • @andrewfogarty1447
    @andrewfogarty14473 жыл бұрын

    A quick video on how to lookup tables and "for each" table around 4:27 would be great.

  • @jugnu1234
    @jugnu12344 жыл бұрын

    Very informative as always, you mentioned Sql on demand can read delta format, didn't see that in any of your video or Synapse documentation, unless i completely missed it.

  • @AdvancingAnalytics

    @AdvancingAnalytics

    4 жыл бұрын

    Thats... a very good point. With Delta being enabled in ADF, and supported through the Synapse spark pools, I was getting confused with what's Delta compatible. It's certainly something the SQL OD guys are aware of and I'm sure it's on their roadmap, but I think you're right - support isn't there yet. I'll pop a correction on the video and see if I can dig out an estimated timeline for when we'll see it. Apologies for that! Simon

  • @jugnu1234

    @jugnu1234

    4 жыл бұрын

    @@AdvancingAnalytics as SQL on-demand is Amazon Athena competitor hopefully Microsoft or Data bricks will come up with a solution as they did with Athena to read delta format, that will make it so easy to make a data lake house available to end user without worrying about the spark clusters and Databricks licensing plans, and end to end solution from within Azure Synapse itself docs.databricks.com/delta/presto-integration.html#presto-and-athena-to-delta-lake-integration

  • @sid0000009
    @sid00000094 жыл бұрын

    Do you plan to have something on Azure Databricks Security. Specially Table level security option in Databricks Admin UI console.. Sorry its bit off topic. Love your clear explanation so would be so helpful.

  • @AdvancingAnalytics

    @AdvancingAnalytics

    4 жыл бұрын

    Sure, I can add a walk through of Databricks security to our backlog. Likely won't get to it for a few weeks though! Simon

  • @eirikandersen9848
    @eirikandersen98483 жыл бұрын

    Hi Simon! Great video on doing things dynamically in data flows. I have one question regarding the flatten transformation in data flows, how do you make this dynamic? I am struggling figuring this out and I need to flatten the JSON before dumping it into an on-premise SQL Server

  • @AdvancingAnalytics

    @AdvancingAnalytics

    3 жыл бұрын

    Hey Eirik - just took a quick peek, doesn't look like the flatten activity is currently dynamic - it's expecting a hardcoded pointer to the array column you want to flatten out. This is an area where I'd currently fall back on spark, you can use the explode() function on a column to achieve this same flattening, and this can be completely parameterised. You'd then pick up the flattened dataset and push it to your on-prem sink back in your dataflow. Obviously not ideal if you want everything to stick entirely within a mapping data flow! Simon

  • @eirikandersen9848

    @eirikandersen9848

    3 жыл бұрын

    @@AdvancingAnalytics Cheers mate! I'll try it out

  • @rahulwamane4145
    @rahulwamane41453 жыл бұрын

    Its really easy to understand with you. Thanks. Is there any way at this moment to onboard Workspace into Azure DevOps to promote changes to Test or Prod environment. This should include one time replication as well as delta promotions (going forward).

  • @AdvancingAnalytics

    @AdvancingAnalytics

    3 жыл бұрын

    I don't believe there's any Synapse integration with Azure DevOps yet, I'm certainly interested to see how it'll work, and how coherent a story we can build (ie: I don't want to see different tools / approaches for different parts of it!) Simon

  • @peterko8871
    @peterko8871Ай бұрын

    How's this - meaning that you switch across 10-20 windows on this UI - is faster than doing simply in an SQL code? It's sooo tiring for the mind.

  • @chidiackieffer515
    @chidiackieffer5154 жыл бұрын

    Thank you for this video, simon please, what is the importance to use de parquet file instead of csv file or excel file for example? I saw that in all your video you use a parquet file.

  • @AdvancingAnalytics

    @AdvancingAnalytics

    4 жыл бұрын

    Hey! Few reasons - 1) Parquet is columnstore, so has AWESOME compression, you'll generally find your queries going much faster over parquet. 2) Parquet is typed, it has a schema built in so you don't need to specify column names, data types etc and 3) It works with the Delta format, which gives you lots of transaction log, time travel, optimisation functionality. Basically, use parquet :) Simon

  • @chidiackieffer515

    @chidiackieffer515

    4 жыл бұрын

    @@AdvancingAnalytics ok, i try to use it. but concretly, what is delta?is it a dataset or what??

  • @AdvancingAnalytics

    @AdvancingAnalytics

    4 жыл бұрын

    Delta is a file format. It is based on parquet but includes a transaction log (JSON files that contain metadata about the files, an audit of operations, statistics etc). There are special libraries for working with Delta that enable lots of advanced functionality that you don't normally find in lakes (merge statements, temporal queries, snapshot isolation etc). It's rapidly becoming the defacto format for spark-based data lakes. It's open source (delta.io/) but made & promoted by Databricks

  • @Jaryd1224
    @Jaryd12243 жыл бұрын

    Writing as a Delta Lake sink does work in Synapse Studio Data Flows now

  • @GuillaumeBerthier

    @GuillaumeBerthier

    3 жыл бұрын

    I just tried to create a Synapse data flow with a sink=Delta and I can confirm that this Inline sink type is now available on Synapse too (and it works)

  • @kromerm
    @kromerm4 жыл бұрын

    Does it help if you think of "$$" as "this" instead? That might be a good analogy for you to better understand $$ in the column patterns.

  • @AdvancingAnalytics

    @AdvancingAnalytics

    4 жыл бұрын

    Hey Mark! Absolutely, it makes sense in that context, it just threw me when I first took a look at it so I assumed other people would have similar confusion around it. Kinda like the item() inside the forEach iterator - it works great once you get used to it :) Simon

  • @kromerm

    @kromerm

    4 жыл бұрын

    @@AdvancingAnalytics Cool ... I'm thinking "this" is a more intuitive syntax even though it has a programmer's feel to it in a data engineer's tool

  • @AdvancingAnalytics

    @AdvancingAnalytics

    4 жыл бұрын

    @@kromerm "this" makes sense as an abstract concept, but is it ever referencing something other than the column object? If not $col or $column may be more easily understood by analysts/non-coders looking through it?

  • @NeumsFor9

    @NeumsFor9

    3 жыл бұрын

    @@kromerm Good concepts. Sorry to ask on Simon's turf, but two things..... when can we expect mapping data flows on Self Hosted IRs? I did put in a request for this a few days ago. Secondly, any chance we can have a better integration with with Managed Instances in ADF/Synapse pipelines over private endpoints? The current tutorial solution on MSFT documentation with port forwarding is a little kludgey and can cost an extra 150/200 month. Otherwise, mapping data flows are getting more usable in private settings.

  • @MrMailKamath
    @MrMailKamath2 жыл бұрын

    Which one is cheaper: Use ADB or Mapping data flow for populating delta lake?

  • @AdvancingAnalytics

    @AdvancingAnalytics

    2 жыл бұрын

    It depends on how you see cost. The development cost is cheaper on databricks because you can do more. Run cost might be a little less with mdf

Келесі