Transforming data | PySpark, T-SQL & Dataflows in Microsoft Fabric

Transforming data | PySpark, T-SQL & Dataflows in Microsoft Fabric | DP-600 EXAM PREP (7 of 12)

Free DP-600 study notes inside community: www.skool.com/microsoft-fabri...
In this video (7 of 12 in the series), cover the following:
Data cleansing:
Implement a data cleansing process
Identify and resolve duplicate data, missing data, or null values
Convert data types by using Dataflows or PySpark
Filter data
Data enrichment
Merge or join data
Enrich data by adding new columns or tables
Data modelling
Implement a star schema for a lakehouse or warehouse, including Type 1 and Type 2 slowly changing dimensions
Implement bridge tables for a lakehouse or a warehouse
Denormalize data
Aggregate or de-aggregate data
This video is part of the DP-600 Exam Preparation series: • DP-600 Exam Preparation
Timeline
0:00 Intro
1:29 Data cleansing process
2:26 Introduction to the dataset
3:31 Dataflow: data cleaning
6:55 T-SQL: data cleaning
10:51 PySpark: data cleaning
20:25 Star schema
22:41 Slowly-changing dimensions
23:36 Type 1 SCD
24:27 Type 2 SCD
27:53 Bridge tables
28:56 Implementing a bridge table in T-SQL
32:53 Normalized vs Denormalized data
34:53 Data aggregation (and de-aggregation)
37:54 Practice Questions
43:45 Outro and next steps
#microsoftfabric #dp600 #powerbi

Пікірлер: 36

@LearnMicrosoftFabricАй бұрын
Hey everyone, thanks for watching!! How are you finding the course so far? A lot to learn??
@nagarjunabm2738
Ай бұрын
I find this course to be very helpful and effective in helping me learn for the DP-600 exam. Looking forward to next one!
@LearnMicrosoftFabric
Ай бұрын
That's awesome, glad the course is helping 🙌
@mohamedammar2805
Ай бұрын
awesome , thanks for your time and efforts
@josecardenas2736
Ай бұрын
Awsome very well explained, looking forward to pass the exam soon.
@user-data_junkie
27 күн бұрын
Good. Thanks for putting in the work to create this.
@user-dy8xu7uj8k24 күн бұрын
Hi Will, your videos provide great learning experience, thank you for creating such good content.
@cuilanzou8638Ай бұрын
It's happy day today because we have a video of DP-600 series! La, La, La, La,,,,,,,. Thank you Will !!!
@LearnMicrosoftFabric
Ай бұрын
Haha I hope you find it useful, thanks Norya!
@jamesbarrett1878Ай бұрын
Thanks Will. I was waiting for the next video. Great stuff so far.
@LearnMicrosoftFabric
Ай бұрын
Thanks for watching James!! Glad you're enjoying 🙌
@padmasubbiah625921 күн бұрын
Thanks for the awesome videos Will !!
@azwarmzafarАй бұрын
Man you are doing a great job, your contents are golden and a real eye opener into the platform. Many thanksss.
@yazankabalan4775Ай бұрын
A brilliant explanation of fundamental concepts in data transformation and data modelling. Thanks a lot Will, keep up the great work! 🔝
@LearnMicrosoftFabric
Ай бұрын
Thanks for watching!
@mattroberts9665Ай бұрын
Brilliant Will. Another brilliant video. Thank you so much.
@LearnMicrosoftFabric
Ай бұрын
Thanks Matt! Glad you’re enjoying 🙌
@junpei0berkeley25 күн бұрын
great content!!
@TheOneRichy27 күн бұрын
In my work we broke orders out into a yearly reportatble table using a SQL contraint on an important date. We then query against a view in sql where all the other tables are gathered together again. We use partition view functionality to then speed the data returned because it's smart enough to limit the tables it needs to look at. This is what came to mind regarding aggregation/deaggregation for me.
@juanc.alcazar75073 күн бұрын
👍
@moeeljawad5361Ай бұрын
Hi Will, When you talked about bridging tables, was the aim to break the many to many relationship that will be introduced when a type 2 SCD is connected to the fact table?
@LearnMicrosoftFabric
Ай бұрын
Bridging was just the next data modelling concept in the list, not necessarily related to Type 2 SCDs. But yes, in general it can be used to resolve anywhere you have a M2M relationship in your data model 👍
@user-dy8xu7uj8k24 күн бұрын
Will, I have a SQL server stored procedure which updates, deletes and merges data into a table , how do I convert the stored procedure to pyspark job, is it possible to update a table in fabric using pyspark?, please make a video on this topic
@moeeljawad5361Ай бұрын
Hello Will, that is me again :D. in the step where you were droping duplicates where you wrote deduped = df.dropDuplicates(), it is not clear how spark knew that it needs to drop the duplicates on the combination of columns [ 'Branch_ID','Date_ID']. is there a missing step?
@LearnMicrosoftFabric
Ай бұрын
yes dropDuplicates() also has the subset parameter, if you want to check for duplicates only within certain columns. In this example, I wanted to remove the row if every value was the same, so no need to pass in the subset parameter 👍
@nguyenminhthu7064Ай бұрын
Can you make a tutorial video about Type 1 Type 2 how to change dimension
@LearnMicrosoftFabric
Ай бұрын
Yes I would like to go into more detail of SCDs in the future!
@gopaiahswamyvysetti3980Ай бұрын
In the 5th question, don't we need the "isCurrent" flag to categorize it as a type 2 dimension?
@LearnMicrosoftFabric
Ай бұрын
It's more 'optional' - can also be calculated from the dates, if need be
@carlosnavia1361Ай бұрын
✅
@LearnMicrosoftFabric
Ай бұрын
Thanks for watching Carlos!!
@drisselfigha3547Ай бұрын
You sepeak very very fast!!!
@LearnMicrosoftFabric
Ай бұрын
Sorry about that, feel free to use the Playback Speed to slow it down 👍
@Lonely.Planet.
Ай бұрын
Will speaks at perfect pace, super clear British English and his video editing is amazing. You can always reduce the playback speed as Will suggested
@bloom6874
20 күн бұрын
You can use the custom option with Playback speed on KZread Player. This would help in adjusting the speed pace as per your comfort.