Data pipeline vs Dataflow vs Shortcut vs Notebook in Microsoft Fabric
FREE 40-minute Fabric fundamentals course: www.skool.com/microsoft-fabri...
So you want to move your analytics workloads from a Power BI-centric model to a Fabric-centric model? But how do you do that?
This video is the third in the series where I discuss different options for getting data into Microsoft Fabric, including Data Pipelines, Dataflow Gen2, Fabric Notebooks, OneLake Shortcuts, Database Mirroring.
We talk about the pros and cons of each to help you make better architectural decisions in Microsoft Fabric.
Catch up on the Power BI to Microsoft Fabric Transition Guide series here: • Power BI to Fabric Tra...
Timeline
0:00 Intro
1:08 Series review
1:29 The problem
3:07 Overview of the different methods
5:19 Data ingestion principles
7:29 Dataflow overview
7:54 Dataflow: when to use/ when not to use
10:09 Dataflow: implementation notes
13:11 Data pipeline overview
13:27 Pipelines: when to use/ not to use
15:35 Pipelines: implementation notes
17:38 Notebooks overview
18:29 Notebooks: when to use/ not to use
20:16 File/database replication overview
22:30 Shortcuts overview
22:52 Internal shortcut limitations
24:05 Shortcuts: implementation notes
25:17 Database mirroring overview
27:53 What to consider when choosing a method
#powerbi #microsoftfabric #dataanalytics
Пікірлер: 149
Hey everyone, I'm back! If you found this video helpful, please do give it a like or share it with colleagues, it really helps grow the channel 😊 THANK YOU!
@rameshpaskarathas6512
2 ай бұрын
Awesome video. Thoughtful considerations and nicely done.
@josecardenas2736
Ай бұрын
Great video thank you
@anitatrpenoska8739
Ай бұрын
😊Thank you! Very helpful
Thank you. I love the was you explained each concepts.⏳
⏳watched it all the way through. Thanks for the detail. Loads of options for loading data!
This is my favorite video and reshaping my knowledge framework about Fabric! Thank you, Will !
@LearnMicrosoftFabric
2 ай бұрын
Great to hear, glad you’re finding it useful!
You have such an inclusive approach- how could we resist watching until the end? Thank you for such helpful content!
@LearnMicrosoftFabric
3 ай бұрын
Thanks for the lovely comment, glad you’re finding the videos useful 🙌
great video, nice pacing and really useful framing of the topics covered. Thank you!
@LearnMicrosoftFabric
2 ай бұрын
Thanks a lot for your comments, glad you enjoyed the vid 🙌
Greetings form Guatemala Central America, Thanks I´m learning a lot with your videos ⌛
I really enjoyed your approach to describing the features of Fabric. Thank you. ⌛️
Very well explained ! Thanks !
@LearnMicrosoftFabric
Ай бұрын
thanks for watching!
I really liked the way you encapuslated the methods. Much better more high level than Learn. And hopefully I have enough ⏳ to finish these before the exam.
Thanks a lot for all your videos. I’m new to fabric and they have been a huge help in getting my feet wet. I really like your presentation style and pacing
@LearnMicrosoftFabric
3 ай бұрын
Thanks for watching and for the lovely comment, glad you're enjoying!! A lot more to come :)
@NicoVonHagar
3 ай бұрын
If it makes sense for your channel, I would love to see a video or series that goes into some best practices or design patterns for incremental processing of data through a medallion architecture. I am seeing use cases where users need to periodically drop a file every so often like after a weekly payroll or after a month end. Users may need to add files, delete files, or replace a file in case a mistake happened. I’m coming up with some creative processes to manage that without reprocessing the entire set through the silver and gold layer each time but I’m not sure if there are better patterns out there.
UPDATE: the following features have now been released: Database Mirroring (Public Preview) has now been released, you can read more here: learn.microsoft.com/en-us/fabric/database/mirrored-database/overview On-premise data gateway for data pipelines: learn.microsoft.com/en-us/fabric/data-factory/how-to-access-on-premises-data
Thank you for this explanation of data ingestion in Fabric environments ⏳
@LearnMicrosoftFabric
Ай бұрын
No problem, glad you enjoyed!
This is very helpful stuff as I study for the DP-600, and also eye a on-prem Postgres database that I’d like to get into a Fabric lakehouse. Thank you! ⏳
@LearnMicrosoftFabric
2 ай бұрын
Glad you found it useful! Got more DP-600 related stuff on the way soon 👍
Very helpful ! Thank you⌛️
@LearnMicrosoftFabric
3 ай бұрын
Ah I'm so glad you found it helpful, thanks for watching and for commenting, it really means a lot!!!
Excellent tutorial videos Will! Question: When you say Small or Large datasets, what sizes are we considering to both?
Perfect explanation
@LearnMicrosoftFabric
2 ай бұрын
Thanks for watching!!
⏳ - superb, thanks Will!
@LearnMicrosoftFabric
2 ай бұрын
Thanks for watching James, glad you enjoyed 🙌
Really useful information thanks for sharing ⏳
@LearnMicrosoftFabric
2 ай бұрын
Thanks for watching!! Really appreciate it
Great work.
@LearnMicrosoftFabric
2 ай бұрын
Wow thank you, that’s very generous! I really appreciate your support 🙌🏽
"How to get the data into MS Fabric"? - Better way explanation on possible options to get the data into MS Fabric i.e, using Data pipeline, Dataflow, Internal/External Shortcuts, files/Database replication, Fabric Notebooks, Database mirroring. Keep it up Will :)
@LearnMicrosoftFabric
Ай бұрын
Thnks!
⌛️ Thanks so much such insightful contents. Waiting for capacity and costing consideration video.
@LearnMicrosoftFabric
3 ай бұрын
Thanks for watching! Here's some previous content that you might have missed: On capacities: kzread.info/dash/bejne/emqHz8aqdsXOdZM.html On costing: kzread.info/dash/bejne/qWhsk6SMiM6Wcto.html
⏳ Thank you for sharing your knowledge and experience.
@LearnMicrosoftFabric
2 ай бұрын
Thanks for watching!!
Thanks!
@LearnMicrosoftFabric
2 ай бұрын
Wow very generous - really appreciate it, thanks 🙌🙌🙌
Thanks for sharing, any reference of configuring Apache Sedona in Microsoft Fabric?
@LearnMicrosoftFabric
3 ай бұрын
Thanks for watching! it’s not something I’ve used I’m afraid, and haven’t seen anyone talk about Sedona Fabric integration I’m afraid! Good luck though ☺️
⌛Thanks, nice overview
@LearnMicrosoftFabric
2 ай бұрын
Thanks for watching!! 🙌
⌛️very useful. Thanks a lot
@LearnMicrosoftFabric
2 ай бұрын
No problem, thanks for watching!!
Great Video....Please do publish videos to prepare for DP 600 certification.
@LearnMicrosoftFabric
3 ай бұрын
Thanks a lot for watching and for commenting, I really appreciate it! And yes... DP-600 is coming v soon (after I finish this series, that will most likely be the next one) 😊 Are you currently preparing for the exam?
Thanks so much Will for very good and detailed content, easy to follow with this pace. I know your channel as many people shared your KZread Channel in Fabcon24 in Vegas. One question please, in the video you mentioned that on-premises gateway is not available with Data Pipeline but I can see it now in Fabric. Is it something new after the video was published or I misunderstood that part? Thanks again
@LearnMicrosoftFabric
2 ай бұрын
Yup that one was announced and released at FABCON (after I recorded), I have added it to the comments section ☺️ thanks! Hope you enjoyed FABCON!
⌛️brilliant series! I’m learning fabric so I can ingest azure resource graph queries from azure tenants for use in powerbi dashboards
@LearnMicrosoftFabric
Ай бұрын
Awesome, good luck with that!
Good Content !
@LearnMicrosoftFabric
Ай бұрын
Thanks!
Nice video.For data validation and data quality testing in notebooks which methods do you suggest in automated way?
@LearnMicrosoftFabric
3 ай бұрын
Thanks! I have a 1hr+ tutorial on data validation in Fabric with demos and notebooks coming out on Friday so watch out for that one :)
⌛ Thank you for the video! Do you know if Azure DB for postgreSQL is also in private preview for database mirroring?
@LearnMicrosoftFabric
3 ай бұрын
Hey Sergio! Thanks for watching 😊 I believe currently in Private Preview is Azure Cosmos DB, Azure SQL DB and Snowflake. But they do mention they are working on SQL Server, Azure PostgreSQL, Azure MySQL, MongoDB to be released sometime later this year Read more here : blog.fabric.microsoft.com/en-us/blog/introducing-mirroring-in-microsoft-fabric/ Is that a feature you'll be waiting for I'm guessing??
Very helpful! ⌛️
⌛Enjoyed the pragmatic approach. Development and engineering require knowing what tools to use and when to use them and data ingestion methods are vital in this arena. This video did not have any fluff but encapsulated the following quote: "Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." - Antoine de Saint-Exupéry
@LearnMicrosoftFabric
2 ай бұрын
😂 very kind, thanks for watching! 🙌
⌛excellent video!
@LearnMicrosoftFabric
2 ай бұрын
Thanks, glad you enjoyed 🙌🏽🙌🏽
⌛️ great overview!
@LearnMicrosoftFabric
3 ай бұрын
Thank you! And thanks for watching!
⌛️ love your videos!
@LearnMicrosoftFabric
2 ай бұрын
Glad you like them! Thanks for watching! 😊
Good Job !
@LearnMicrosoftFabric
3 ай бұрын
Thanks a lot for watching!! Have you implemented any of these yet?
@meriambenmustapha6251
3 ай бұрын
@@LearnMicrosoftFabric I'm attempting to utilize a Microsoft Fabric solution with PowerBI, but there's currently a lack of comprehensive documentation on this topic! I eagerly anticipate your future videos, as they have proven to be immensely beneficial
Great videos !!! Thanks for sharing these details and comparisons. Talking about CDC and delta time travel, a question came to my mind. Do the transactions are replicated in CDC or it's a batch process? If it is batch, we're not really getting a time-travel option on the delta side as if we can land in any previous time or state.
@LearnMicrosoftFabric
2 ай бұрын
That's an interesting question, I assume you're talking about when the mirroring process does the first lift-and-shift of the current state when you first set it up? I don't know the answer, but Database Mirroring is now in Public Preview so we can test it out
@amirhosseinborghei6336
2 ай бұрын
@@LearnMicrosoftFabric Yes that’s a good idea, thank you so much. Can you also create a video comparing databricks connected to unity catalog and Microsoft Fabric.
Great video - I captured all your graphics and organized then into my Fabric OneNote notebook, so much easier than taking notes. Hope that is ok?⏳
@LearnMicrosoftFabric
3 ай бұрын
Thanks Christopher - and no problem, whatever helps you learn! 😊
Thanks⌛
@LearnMicrosoftFabric
2 ай бұрын
Thanks a lot for watching!
⏳Thank you!
@LearnMicrosoftFabric
Ай бұрын
thanks for watching!
⌛ Great content again, Thank you Will! - I do see parameters in dataflow gen2, and they seem to work OK, perhaps I am misunderstanding you - yes unfortunately I am also not seeing looping or other control blocks, before Fabric, I have implemented some transformation logic using Synapse mapping dataflows that utilize them (particularly your example, when consuming paginated api response), well… will have to do it with notebook as you are saying (I prefer to have the looping or control encapsulated in the T activity), although the name is misleading, I guess this is more of a direct replacement of PowerQuery (in Excel/PowerBI) than Synapse/Azure dataflow. - I believe pipelines CAN be run with RUN button w/o having to schedule them - I am very used to (in Synapse and other Azure places) to be able to make quick changes by looking at the json code behind objects, dataflows and pipelines give me a “View Json Code” that is unfortunately read only :-( - I really liked the “metadata driven” approach for pipelines.. I do not have a use case for that at the moment, but it is very interesting concept. What I have used pipeline hierarchy before is for reusability of common components. One example is that in Synapse pipeline, there is no “email” activity, so I have encapsulated email logic in a reusable pipeline (with several parameters) that in time call http Azure LogicApp that send the email.
@LearnMicrosoftFabric
3 ай бұрын
Hi Ricky, thanks for watching and for the great questions!! - Parameters: sorry yes, I probably wasn't clear enogh. You can create 'parameters' but these are just constants, currently you cannot pass dynamic input paramters to a dataflow from a data pipeline (like you can in a notebook). - Looping: you can use the Until activity to loop through a list in a Data Pipeline. You can also use the Pagination Setting in a CopyData activity for that specific use case (but yes personally I prefer managing it in a Fabric Notebook code as sometimes the pagination logic is not 'obvious' or regular for some APIs - Yes pipelines they can be run manually using the Run button. - JSON code is currently view only, that's correct - here's two resources if you want to read more about metadata-driven pipelines: 1: Using Lakehouse: techcommunity.microsoft.com/t5/fasttrack-for-azure/metadata-driven-pipelines-for-microsoft-fabric/ba-p/3891651 2L using Data Warehouse: techcommunity.microsoft.com/t5/fasttrack-for-azure/metadata-driven-pipelines-for-microsoft-fabric-part-2-data/ba-p/3906749 Thanks for the great questions!!
Thank you. ⌛️
@LearnMicrosoftFabric
2 ай бұрын
Thanks a lot for watching!! Really appreciate it
This is by far the best Fabric series that I have ever seen (even the learning materials from MS are not so well organized ). By the way I would like to ask 1. Can we use shortcut between two adls gen2 accounts, or must the destination of the shortcut be Fabric storage(data warehouse or lakehouse). 2.why did you mention that database mirroring prevents data duplication? It actually duplicates the whole set of data in delta format right? Also I am really looking forward to the cost and vnet related video. For example, how does Microsoft bill different ingestion methods? Which ingestion method is more cost effective? Because most of time, cost is the deal breaker of how people choose a product(Fabric or others) or a methods(dataflow or spark) to use. Cheers.
@LearnMicrosoftFabric
2 ай бұрын
Thanks, glad you’re enjoying 🙌🏽 1. no 2. Yes, mirroring is duplication. Not sure why I wrote that, and how I missed it when recording. Sorry about that. And yes interesting to hear you talking about cost and capacity usage of different methods, I’d love to do some benchmarking in the future 👍
At 12:00 you mention looping through a looping through a paginated API. Is this the same as looping through multiple csv files that all have the same headers?
@LearnMicrosoftFabric
Ай бұрын
Hey! It's a different use case, but you could use the ForEach loop to solve both problems yes!
Hi Will, really useful video, thanks. ⌛ You refer to shortcuts as a live sync of the source. Are you sure about that? My understanding was that it was a live link and wasn't actually copying or moving data anywhere. For ADLS and internal shortcuts at least. Of course database mirroring is different, which I'm glad to hear is in the roadmap. Cheers.
@LearnMicrosoftFabric
3 ай бұрын
Thanks for watching Mark and for the great question! You’re right, I could have been a bit more accurate with my description on this, I will clarify in the next video ☺️ in short, for external shortcuts to ADLS, and internal shortcuts, it’s a live ‘link’ I.e. no data copy, but for database mirroring and external shortcuts to S3, a local cache inside fabric is needed.
@markburgess4440
3 ай бұрын
Great, thanks for clarifying.
Thanks again Will, i can't thank you enough for what you are providing the fabric community with. here is the hour glass you have asked for ⌛⌛⌛⌛⌛⌛ 😄😄😄😄😄😄. I have a question related to the difference between shortcuts + data base mirroring from one side and the three tools that you have showed for Data ingestion. you have mentioned in the slide of File / database replication that shortcuts and Database mirroring doesn't include ETL process, but in the next point you have said that fabric would create a delta table 'cache' of the full dataset the first time. So isn't the latter a Extract from source and Load to onelake process or a at least a copy process? Thanks in advance.
@LearnMicrosoftFabric
Ай бұрын
Haha thanks 🙌 mirroring is database replication yes. With shortcuts, by default the data stays in the source location until query time, BUT there is a setting in the admin portal (for Amazon S3 and Google Cloud Storage) to Enable Caching - this creates a cache of the data in Fabric, which can help reduce your egress costs
⌛ great video
@LearnMicrosoftFabric
3 ай бұрын
Thanks a lot for watching!! Whch of these do you think you'll be using out of interest?
Compared to the environment of Powerplatform, what would be the direct merit despite of scalebility. Compared to Power BI pro, how much more do you need pay, TY
@LearnMicrosoftFabric
Ай бұрын
Hi, Fabric is quite a different platform to the Power Platform tbh, feel free to check out the 38 minute fundamentals video on my channel for a full outline of capabilities. plus my video on pricing for details on pricing 👍
⏳ good job
@LearnMicrosoftFabric
3 ай бұрын
Thanks a lot 🙌
We are exploring ways to synchronize live data from our on-premises Oracle database to Fabric. Could anyone share their experiences or suggest the best practices for implementing this? Any insights on tools or methods that work well with Microsoft technologies would be greatly appreciated.
@LearnMicrosoftFabric
Ай бұрын
Not sure about that one tbh!
⏳great content
@LearnMicrosoftFabric
2 ай бұрын
Thanks, and thanks for watching!
Is it possible to import a shortcut into a semantic model and add calculated columns or even make transformations in the data?
@LearnMicrosoftFabric
3 ай бұрын
I think it would be possible to achieve something similar to that. In general, you can't edit shortcut data. Also you can't import a shortcut into a semantic model, only into a Lakehouse/ KQL database, and then create your semantic model using that shortcut table. You can however build on top of shortcuts, so you could create a SQL view on top of the shortcut that added additional logic/ calculated columns. OR you could do it in the Power Query editor of Power BI Desktop but I wouldn't necessarily recommend that! Does that make sense?
@satellitepop
3 ай бұрын
@@LearnMicrosoftFabric amazing, thanks !
Question: You said that Database mirroring does not "replicate data," however, it seems to me it is. Original copy in a Azure SQL database, and a second copy in Fabric, with automatic updating. Am I seeing this wrong?
@LearnMicrosoftFabric
3 ай бұрын
Hmm not sure I said that did I? database mirroring definitely is a form of data replication 👍
@christopherpfeifer9772
3 ай бұрын
At 27:17 in the video, the graphic says "Prevents Data Duplication." Maybe I'm interpreting the graphic wrong.
⌛😄 Well done!
@LearnMicrosoftFabric
3 ай бұрын
Thanks! And thanks for watching 😊
⌛⌛ Very Informative video - thank you for this! ⌛⌛ Regarding using Notebooks to write scripts that fetch data from external APIs - how can we store the credentials that we need to authenticate against the APIs in a secure place? Also, like the requests library, what are the other libraries available? Can we also download & use any library we want, from some sources like pip's requirements.txt?
@LearnMicrosoftFabric
3 ай бұрын
Thanks for watching! On your first point, you can use Azure Key Vault to store keys securely and then access them using the azure identity & azure key vault python packages, like this: learn.microsoft.com/en-us/azure/key-vault/secrets/quick-create-python?tabs=azure-cli I've mostly only used 'requests' for Python API calls, which is the industry standard. But you can use any Python library you like! Either you can install the package in a notebook cell using %pip install {package_name} or you can install libraries at the workspace level, in Workspace Settings. Does that answer your questions?
@LaZyBuM999
3 ай бұрын
@@LearnMicrosoftFabric yes! Thank you so much Will! On the first one - I assume we have to create a managed identity and assign it to the fabric resource, and that will automatically be picked up when authenticating against keyvault. I am not sure if this is possible yet.. any idea? P.S I am looking forward to more of your videos! ✌
⌛⌛!! thanks man!
@LearnMicrosoftFabric
Ай бұрын
Thanks for watching!
⏳!
@LearnMicrosoftFabric
Ай бұрын
Thanks for watching!!
⏳
It lacks the "Eventstream" ingestion. :-(
@LearnMicrosoftFabric
3 ай бұрын
I know! Yes sorry about that. The video was already quite long and I want to cover Eventstreams in more detail (because they are quite different to the others!). I'll be doing a whole series on real-time/ KQL/ eventstreams 👍
⌛
@LearnMicrosoftFabric
3 ай бұрын
Thanks for watching Mark!
⏳
⌛
⏳
@LearnMicrosoftFabric
2 ай бұрын
Thanks for watching!!
⏳
@LearnMicrosoftFabric
Ай бұрын
Thanks for watching! Appreciate it 🙌
⏳
@LearnMicrosoftFabric
Ай бұрын
Thanks for watching!!
⏳
@LearnMicrosoftFabric
Ай бұрын
Thanks Carlos 🙌
⏳
@LearnMicrosoftFabric
2 ай бұрын
Thanks for watching!!
⌛
⌛
@LearnMicrosoftFabric
Ай бұрын
Thanks for watching!! Appreciate it 🙌
⌛
@LearnMicrosoftFabric
2 ай бұрын
Thanks for watching!!!
⌛
@LearnMicrosoftFabric
2 ай бұрын
Thanks for watching!!
⌛
@LearnMicrosoftFabric
Ай бұрын
Thanks for watching!!
⌛
@LearnMicrosoftFabric
3 ай бұрын
Thanks for watching!