No video

3.06 Mastering Common Silver and Gold zone transformations with PySpark in Microsoft Fabric

• Microsoft Fabric For B...
This video explores common transformation techniques in Silver and Gold zones that are part of Medallion architecture. I explain data enrichment and type conversion transformations and demonstrate how to use PySpark API's and methods to address these tasks.
I also demonstrate how to process historical data from the Bronze layer using Window functions. Next, I explain core Kimball dimensional modelling concepts and demonstrate how they can be implemented using PySpark methods.
Finally, I demonstrate creating aggregates.
You can download the related demo notebook from here: github.com/faz...
Chapters:
00:00- Introduction
02:21- Preview
06:19- Lakehouse historical data storage strategy
09:00- Demo start- preparing data
10:24- Creating shortcuts to Bronze tables
11:24- Notebook demo- reading data from shortcuts
12:30- Inspecting data frame schema
13:48- Data Type conversion transformations
16:05- Ordering data
20:00- Handling historical data using Window functions
24:25- Data enrichment transformations
25-45- Using regular expressions to parse text data
26:40- Generating time dimension
30:45- Dimensional modelling concepts
32:12- Slowly changing dimensions (SCD)
33:05- SCD Type-2 dimensions
34:54- Surrogate keys
35:32- Relationships between facts and dimensions
37:00- Generating surrogate keys using monotonically_increasing_id function
38:00- Distributed computing and Spark partitions
41:31- Reducing data frame partition count
43:02- How to link Fact and Dimension tables
47:14- Incremental write into destination tables
49:02- Using MERGE INTO query for destination write
50:50- Aggregation transformations
Please subscribe: / @fazizov
Official Documentation:
learn.microsof...
learn.microsof...
sparkbyexample...
www.kimballgro...
spark.apache.o...
Hashtags:
#datafactory, #microsoft,#microsoftfabric ,#azure, #dataengineering,#cloudcomputing, #dataanalytics, #lakehouse, #azuretutorial, #azuretraining, #datapipeline, #dataextraction , #dataintegration, #datatransfer, #dataflow, #spark, #deltalake, #synapse, #synapsedataenginering, #demo, #datalake, #transformation, #ingested, #datawarehouse, #dataintegration, #azuredatabricks ,#databricks, #bigdata, #bigdatatechnologies, #pyspark, #sparksql, #notebook ,#transformationvideo, #bronze, #medallion, #kimball, #dimensions , #modeling, #facts, #silver, #gold, #historical data, #dimensional

Пікірлер: 6

  • @joseluiscorreasalazar5670
    @joseluiscorreasalazar5670Ай бұрын

    Thank you very much! This is one of the best tutorials on Fabric Lakehouses out there

  • @fazizov

    @fazizov

    Ай бұрын

    Thanks for watching!

  • @kevthebandit
    @kevthebandit6 ай бұрын

    Thanks for breaking this down!

  • @fazizov

    @fazizov

    6 ай бұрын

    Thanks for feedback!

  • @digitalevidenceofthings
    @digitalevidenceofthings6 ай бұрын

    This is incredible, exactly what I needed to see to ensure I'm on the right track. Thank you for taking the time to do this video!

  • @fazizov

    @fazizov

    6 ай бұрын

    Glad it was helpful, thanks!

Келесі