Transforming data | PySpark, T-SQL & Dataflows in Microsoft Fabric | DP-600 EXAM PREP (7 of 12)

Free DP-600 study notes inside community: www.skool.com/microsoft-fabri...
In this video (7 of 12 in the series), cover the following:
Data cleansing:
Implement a data cleansing process
Identify and resolve duplicate data, missing data, or null values
Convert data types by using Dataflows or PySpark
Filter data
Data enrichment
Merge or join data
Enrich data by adding new columns or tables
Data modelling
Implement a star schema for a lakehouse or warehouse, including Type 1 and Type 2 slowly changing dimensions
Implement bridge tables for a lakehouse or a warehouse
Denormalize data
Aggregate or de-aggregate data
This video is part of the DP-600 Exam Preparation series: • DP-600 Exam Preparation
Timeline
0:00 Intro
1:29 Data cleansing process
2:26 Introduction to the dataset
3:31 Dataflow: data cleaning
6:55 T-SQL: data cleaning
10:51 PySpark: data cleaning
20:25 Star schema
22:41 Slowly-changing dimensions
23:36 Type 1 SCD
24:27 Type 2 SCD
27:53 Bridge tables
28:56 Implementing a bridge table in T-SQL
32:53 Normalized vs Denormalized data
34:53 Data aggregation (and de-aggregation)
37:54 Practice Questions
43:45 Outro and next steps
#microsoftfabric #dp600 #powerbi

Пікірлер: 35

  • @LearnMicrosoftFabric
    @LearnMicrosoftFabricАй бұрын

    Hey everyone, thanks for watching!! How are you finding the course so far? A lot to learn??

  • @nagarjunabm2738

    @nagarjunabm2738

    Ай бұрын

    I find this course to be very helpful and effective in helping me learn for the DP-600 exam. Looking forward to next one!

  • @LearnMicrosoftFabric

    @LearnMicrosoftFabric

    Ай бұрын

    That's awesome, glad the course is helping 🙌

  • @mohamedammar2805

    @mohamedammar2805

    Ай бұрын

    awesome , thanks for your time and efforts

  • @josecardenas2736

    @josecardenas2736

    Ай бұрын

    Awsome very well explained, looking forward to pass the exam soon.

  • @user-data_junkie

    @user-data_junkie

    22 күн бұрын

    Good. Thanks for putting in the work to create this.

  • @user-dy8xu7uj8k
    @user-dy8xu7uj8k19 күн бұрын

    Hi Will, your videos provide great learning experience, thank you for creating such good content.

  • @padmasubbiah6259
    @padmasubbiah625916 күн бұрын

    Thanks for the awesome videos Will !!

  • @cuilanzou8638
    @cuilanzou8638Ай бұрын

    It's happy day today because we have a video of DP-600 series! La, La, La, La,,,,,,,. Thank you Will !!!

  • @LearnMicrosoftFabric

    @LearnMicrosoftFabric

    Ай бұрын

    Haha I hope you find it useful, thanks Norya!

  • @jamesbarrett1878
    @jamesbarrett1878Ай бұрын

    Thanks Will. I was waiting for the next video. Great stuff so far.

  • @LearnMicrosoftFabric

    @LearnMicrosoftFabric

    Ай бұрын

    Thanks for watching James!! Glad you're enjoying 🙌

  • @mattroberts9665
    @mattroberts9665Ай бұрын

    Brilliant Will. Another brilliant video. Thank you so much.

  • @LearnMicrosoftFabric

    @LearnMicrosoftFabric

    Ай бұрын

    Thanks Matt! Glad you’re enjoying 🙌

  • @junpei0berkeley
    @junpei0berkeley19 күн бұрын

    great content!!

  • @TheOneRichy
    @TheOneRichy22 күн бұрын

    In my work we broke orders out into a yearly reportatble table using a SQL contraint on an important date. We then query against a view in sql where all the other tables are gathered together again. We use partition view functionality to then speed the data returned because it's smart enough to limit the tables it needs to look at. This is what came to mind regarding aggregation/deaggregation for me.

  • @azwarmzafar
    @azwarmzafar24 күн бұрын

    Man you are doing a great job, your contents are golden and a real eye opener into the platform. Many thanksss.

  • @yazankabalan4775
    @yazankabalan4775Ай бұрын

    A brilliant explanation of fundamental concepts in data transformation and data modelling. Thanks a lot Will, keep up the great work! 🔝

  • @LearnMicrosoftFabric

    @LearnMicrosoftFabric

    Ай бұрын

    Thanks for watching!

  • @moeeljawad5361
    @moeeljawad5361Ай бұрын

    Hi Will, When you talked about bridging tables, was the aim to break the many to many relationship that will be introduced when a type 2 SCD is connected to the fact table?

  • @LearnMicrosoftFabric

    @LearnMicrosoftFabric

    Ай бұрын

    Bridging was just the next data modelling concept in the list, not necessarily related to Type 2 SCDs. But yes, in general it can be used to resolve anywhere you have a M2M relationship in your data model 👍

  • @user-dy8xu7uj8k
    @user-dy8xu7uj8k19 күн бұрын

    Will, I have a SQL server stored procedure which updates, deletes and merges data into a table , how do I convert the stored procedure to pyspark job, is it possible to update a table in fabric using pyspark?, please make a video on this topic

  • @moeeljawad5361
    @moeeljawad5361Ай бұрын

    Hello Will, that is me again :D. in the step where you were droping duplicates where you wrote deduped = df.dropDuplicates(), it is not clear how spark knew that it needs to drop the duplicates on the combination of columns [ 'Branch_ID','Date_ID']. is there a missing step?

  • @LearnMicrosoftFabric

    @LearnMicrosoftFabric

    Ай бұрын

    yes dropDuplicates() also has the subset parameter, if you want to check for duplicates only within certain columns. In this example, I wanted to remove the row if every value was the same, so no need to pass in the subset parameter 👍

  • @carlosnavia1361
    @carlosnavia136129 күн бұрын

  • @LearnMicrosoftFabric

    @LearnMicrosoftFabric

    29 күн бұрын

    Thanks for watching Carlos!!

  • @nguyenminhthu7064
    @nguyenminhthu7064Ай бұрын

    Can you make a tutorial video about Type 1 Type 2 how to change dimension

  • @LearnMicrosoftFabric

    @LearnMicrosoftFabric

    Ай бұрын

    Yes I would like to go into more detail of SCDs in the future!

  • @gopaiahswamyvysetti3980
    @gopaiahswamyvysetti3980Ай бұрын

    In the 5th question, don't we need the "isCurrent" flag to categorize it as a type 2 dimension?

  • @LearnMicrosoftFabric

    @LearnMicrosoftFabric

    Ай бұрын

    It's more 'optional' - can also be calculated from the dates, if need be

  • @drisselfigha3547
    @drisselfigha3547Ай бұрын

    You sepeak very very fast!!!

  • @LearnMicrosoftFabric

    @LearnMicrosoftFabric

    Ай бұрын

    Sorry about that, feel free to use the Playback Speed to slow it down 👍

  • @Lonely.Planet.

    @Lonely.Planet.

    24 күн бұрын

    Will speaks at perfect pace, super clear British English and his video editing is amazing. You can always reduce the playback speed as Will suggested

  • @bloom6874

    @bloom6874

    14 күн бұрын

    You can use the custom option with Playback speed on KZread Player. This would help in adjusting the speed pace as per your comfort.