Transforming data | PySpark, T-SQL & Dataflows in Microsoft Fabric | DP-600 EXAM PREP (7 of 12)
Free DP-600 study notes inside community: www.skool.com/microsoft-fabri...
In this video (7 of 12 in the series), cover the following:
Data cleansing:
Implement a data cleansing process
Identify and resolve duplicate data, missing data, or null values
Convert data types by using Dataflows or PySpark
Filter data
Data enrichment
Merge or join data
Enrich data by adding new columns or tables
Data modelling
Implement a star schema for a lakehouse or warehouse, including Type 1 and Type 2 slowly changing dimensions
Implement bridge tables for a lakehouse or a warehouse
Denormalize data
Aggregate or de-aggregate data
This video is part of the DP-600 Exam Preparation series: • DP-600 Exam Preparation
Timeline
0:00 Intro
1:29 Data cleansing process
2:26 Introduction to the dataset
3:31 Dataflow: data cleaning
6:55 T-SQL: data cleaning
10:51 PySpark: data cleaning
20:25 Star schema
22:41 Slowly-changing dimensions
23:36 Type 1 SCD
24:27 Type 2 SCD
27:53 Bridge tables
28:56 Implementing a bridge table in T-SQL
32:53 Normalized vs Denormalized data
34:53 Data aggregation (and de-aggregation)
37:54 Practice Questions
43:45 Outro and next steps
#microsoftfabric #dp600 #powerbi
Пікірлер: 35
Hey everyone, thanks for watching!! How are you finding the course so far? A lot to learn??
@nagarjunabm2738
Ай бұрын
I find this course to be very helpful and effective in helping me learn for the DP-600 exam. Looking forward to next one!
@LearnMicrosoftFabric
Ай бұрын
That's awesome, glad the course is helping 🙌
@mohamedammar2805
Ай бұрын
awesome , thanks for your time and efforts
@josecardenas2736
Ай бұрын
Awsome very well explained, looking forward to pass the exam soon.
@user-data_junkie
22 күн бұрын
Good. Thanks for putting in the work to create this.
Hi Will, your videos provide great learning experience, thank you for creating such good content.
Thanks for the awesome videos Will !!
It's happy day today because we have a video of DP-600 series! La, La, La, La,,,,,,,. Thank you Will !!!
@LearnMicrosoftFabric
Ай бұрын
Haha I hope you find it useful, thanks Norya!
Thanks Will. I was waiting for the next video. Great stuff so far.
@LearnMicrosoftFabric
Ай бұрын
Thanks for watching James!! Glad you're enjoying 🙌
Brilliant Will. Another brilliant video. Thank you so much.
@LearnMicrosoftFabric
Ай бұрын
Thanks Matt! Glad you’re enjoying 🙌
great content!!
In my work we broke orders out into a yearly reportatble table using a SQL contraint on an important date. We then query against a view in sql where all the other tables are gathered together again. We use partition view functionality to then speed the data returned because it's smart enough to limit the tables it needs to look at. This is what came to mind regarding aggregation/deaggregation for me.
Man you are doing a great job, your contents are golden and a real eye opener into the platform. Many thanksss.
A brilliant explanation of fundamental concepts in data transformation and data modelling. Thanks a lot Will, keep up the great work! 🔝
@LearnMicrosoftFabric
Ай бұрын
Thanks for watching!
Hi Will, When you talked about bridging tables, was the aim to break the many to many relationship that will be introduced when a type 2 SCD is connected to the fact table?
@LearnMicrosoftFabric
Ай бұрын
Bridging was just the next data modelling concept in the list, not necessarily related to Type 2 SCDs. But yes, in general it can be used to resolve anywhere you have a M2M relationship in your data model 👍
Will, I have a SQL server stored procedure which updates, deletes and merges data into a table , how do I convert the stored procedure to pyspark job, is it possible to update a table in fabric using pyspark?, please make a video on this topic
Hello Will, that is me again :D. in the step where you were droping duplicates where you wrote deduped = df.dropDuplicates(), it is not clear how spark knew that it needs to drop the duplicates on the combination of columns [ 'Branch_ID','Date_ID']. is there a missing step?
@LearnMicrosoftFabric
Ай бұрын
yes dropDuplicates() also has the subset parameter, if you want to check for duplicates only within certain columns. In this example, I wanted to remove the row if every value was the same, so no need to pass in the subset parameter 👍
✅
@LearnMicrosoftFabric
29 күн бұрын
Thanks for watching Carlos!!
Can you make a tutorial video about Type 1 Type 2 how to change dimension
@LearnMicrosoftFabric
Ай бұрын
Yes I would like to go into more detail of SCDs in the future!
In the 5th question, don't we need the "isCurrent" flag to categorize it as a type 2 dimension?
@LearnMicrosoftFabric
Ай бұрын
It's more 'optional' - can also be calculated from the dates, if need be
You sepeak very very fast!!!
@LearnMicrosoftFabric
Ай бұрын
Sorry about that, feel free to use the Playback Speed to slow it down 👍
@Lonely.Planet.
24 күн бұрын
Will speaks at perfect pace, super clear British English and his video editing is amazing. You can always reduce the playback speed as Will suggested
@bloom6874
14 күн бұрын
You can use the custom option with Playback speed on KZread Player. This would help in adjusting the speed pace as per your comfort.