scd2 in spark | Lec-24

In this video I have talked about slowly changing dimension type 2.
Directly connect with me on:- topmate.io/manish_kumar25
Discord channel:- / discord
SCD Data:-
customer_dim_data = [
(1,'manish','arwal','india','N','2022-09-15','2022-09-25'),
(2,'vikash','patna','india','Y','2023-08-12',None),
(3,'nikita','delhi','india','Y','2023-09-10',None),
(4,'rakesh','jaipur','india','Y','2023-06-10',None),
(5,'ayush','NY','USA','Y','2023-06-10',None),
(1,'manish','gurgaon','india','Y','2022-09-25',None),
]
customer_schema= ['id','name','city','country','active','effective_start_date','effective_end_date']
customer_dim_df = spark.createDataFrame(data= customer_dim_data,schema=customer_schema)
sales_data = [
(1,1,'manish','2023-01-16','gurgaon','india',380),
(77,1,'manish','2023-03-11','bangalore','india',300),
(12,3,'nikita','2023-09-20','delhi','india',127),
(54,4,'rakesh','2023-08-10','jaipur','india',321),
(65,5,'ayush','2023-09-07','mosco','russia',765),
(89,6,'rajat','2023-08-10','jaipur','india',321)
]
sales_schema = ['sales_id', 'customer_id','customer_name', 'sales_date', 'food_delivery_address','food_delivery_country', 'food_cost']
sales_df = spark.createDataFrame(data=sales_data,schema=sales_schema)
spark.apache.org/docs/latest/...
spark.apache.org/docs/latest/...
For more queries reach out to me on my below social media handle.
Follow me on LinkedIn:- / manish-kumar-373b86176
Follow Me On Instagram:- / competitive_gyan1
Follow me on Facebook:- / manish12340
My Second Channel -- / @competitivegyan1
Interview series Playlist:- • Interview Questions an...
My Gear:-
Rode Mic:-- amzn.to/3RekC7a
Boya M1 Mic-- amzn.to/3uW0nnn
Wireless Mic:-- amzn.to/3TqLRhE
Tripod1 -- amzn.to/4avjyF4
Tripod2:-- amzn.to/46Y3QPu
camera1:-- amzn.to/3GIQlsE
camera2:-- amzn.to/46X190P
Pentab (Medium size):-- amzn.to/3RgMszQ (Recommended)
Pentab (Small size):-- amzn.to/3RpmIS0
Mobile:-- amzn.to/47Y8oa4 ( Aapko ye bilkul nahi lena hai)
Laptop -- amzn.to/3Ns5Okj
Mouse+keyboard combo -- amzn.to/3Ro6GYl
21 inch Monitor-- amzn.to/3TvCE7E
27 inch Monitor-- amzn.to/47QzXlA
iPad Pencil:-- amzn.to/4aiJxiG
iPad 9th Generation:-- amzn.to/470I11X
Boom Arm/Swing Arm:-- amzn.to/48eH2we
My PC Components:-
intel i7 Processor:-- amzn.to/47Svdfe
G.Skill RAM:-- amzn.to/47VFffI
Samsung SSD:-- amzn.to/3uVSE8W
WD blue HDD:-- amzn.to/47Y91QY
RTX 3060Ti Graphic card:- amzn.to/3tdLDjn
Gigabyte Motherboard:-- amzn.to/3RFUTGl
O11 Dynamic Cabinet:-- amzn.to/4avkgSK
Liquid cooler:-- amzn.to/472S8mS
Antec Prizm FAN:-- amzn.to/48ey4Pj

Пікірлер: 44

  • @manish_kumar_1
    @manish_kumar_14 ай бұрын

    There was one mistake in the country name of records where customer_name = Ayush. Instead of food_delivery_country it should be country. I have given the corrected code here. Please change the code accordingly. old_records = joined_data.where( (col("food_delivery_address") != col("city")) & (col("active") == "Y"))\ .withColumn("active", lit("N"))\ .withColumn("effective_end_date", col("sales_date"))\ .select( "customer_id", "customer_name", "city", "country", "active", "effective_start_date", "effective_end_date" )

  • @akshaychowdhary8534

    @akshaychowdhary8534

    2 ай бұрын

    what made u choose country plz explain ? i did run the code & found that country will change from RUSSIA to USA post this code modification in old_red df but not clear.

  • @Someonner
    @SomeonnerАй бұрын

    Must do question for experienced people. Very important

  • @fashionate6527
    @fashionate6527Ай бұрын

    aap bhut asan sabd m hindi m smjhate ho...uske liye bhut bhut Thank you

  • @user-rh1hr5cc1r
    @user-rh1hr5cc1r2 ай бұрын

    Thank you Bhaiya, ajj mera practical v khatm ho gaya.... You are too good in explaining Joined a institute for azure data engineer but didnt get enough knowledge in databricks wise, the topic i get to know from you: read mode failfast,permissive,dropmalformed JSON (multi line,single line) Corrupt file handling, Parquet in details, df write/save bucketBy and partitionBY, lit(), union and unional all when otherwise count() in tranform and action left anti/left semi window functions SCD2 Fundamental: Spark UI Catalyst Optimizer/SPARK sql enginee sort vs shuffle join Spark Memory Adaptive Query Execution Salting

  • @navjotsingh-hl1jg
    @navjotsingh-hl1jg9 ай бұрын

    bhai loving your video today i have completed whole playlist of practical

  • @vivekpuurkayastha1580
    @vivekpuurkayastha158010 ай бұрын

    Hi manish ... great video ... Eagerly waiting for the problems faced in Spark project video... Please make it next. Thank You.

  • @kmishy
    @kmishy5 күн бұрын

    clear explanation

  • @rahulrathore2668
    @rahulrathore26684 ай бұрын

    very nice and easily explained spark sir

  • @manvika
    @manvika9 ай бұрын

    incredible work Manish. I just completed you spark practical series.. :)

  • @mukundraj4021
    @mukundraj402110 ай бұрын

    Awesome 👌

  • @Paruu16
    @Paruu162 ай бұрын

    Mind Bending !! :D

  • @roshankumargupta46
    @roshankumargupta467 ай бұрын

    Thanks Manish. Very informative. Can you also make a video on databricks UI? How to interpret and understand GangaliaUI metrics

  • @snehasingh3069
    @snehasingh306924 күн бұрын

    Thankyou

  • @saivenkat6673
    @saivenkat66734 ай бұрын

    I am going through your playlist, it is a wonderful playlist. But I see there are missing lectures for spark practical. Lec 21, 21 and 23. Please provide that. Thank you.

  • @dakait0867
    @dakait086710 ай бұрын

    bhai essi ak to intezar tha ummed karta hu yea real time senario m b use ho jayega thanks

  • @AkshatGupta-ou7zz
    @AkshatGupta-ou7zz4 ай бұрын

    why we didnt use surrogate keys here to implement scd 2?

  • @saumyaranjannayak9179
    @saumyaranjannayak91796 ай бұрын

    very nice tutorial manish but one error in select clause of old_records dataframe select county column in place of food_delivery_country ... it took a lot of time for me to understand this error as i am learning SCD newly....please update

  • @scien-terrific7004
    @scien-terrific70049 ай бұрын

    @manish_kumar_1 I feel that, in new_records df, "withColumn('active',lit('Y'))" is redundant, as we are already filtering according to (col('active')=='Y') condition. One requent from my end, make a video on Incremental Loading as well. Anyways excellent content as always .♥

  • @quiet8691
    @quiet86918 ай бұрын

    Bhai aap konsa pen tablet use kr rhe ho likhne ke liye pls btaye

  • @deepaliborde25
    @deepaliborde257 ай бұрын

    Is this the last video of this playlist?

  • @Tanc369
    @Tanc369Ай бұрын

    sir isme, lecture 20 ke baad seedha lecture 24 hai vaha kuch missing hai kya ? and one doubt in the end part when we are filtering out records using rank on id,active in the that part shouldn't the row number function be used ? because rank will assign the same rank to same values id and active but row_number wiil give 1,2,3... and so on ranks for same id,value with respect to date?

  • @user-rh1hr5cc1r
    @user-rh1hr5cc1r2 ай бұрын

    bhaiya..ek video banaye delta lake delta table and its implementation in pyspark..please apke channel me nai mila

  • @praveenkumarrai101
    @praveenkumarrai10110 ай бұрын

    also make incremental load with insert, update & delete

  • @manish_kumar_1

    @manish_kumar_1

    10 ай бұрын

    Sure

  • @yogeshrathod7815
    @yogeshrathod781510 ай бұрын

    Aur kitne video aayege spark ke??

  • @cosmicgyan6732
    @cosmicgyan67329 ай бұрын

    Manish pls complete spark series !!

  • @manish_kumar_1

    @manish_kumar_1

    9 ай бұрын

    It's almost completed. Now I may add few videos in future . Working on new series that is data modelling. Soon videos will be out

  • @gkapkoti
    @gkapkoti7 ай бұрын

    One small error in the final data , for the inactive record for customer name ayush , it's city is NY but country is russia . Rest is great . Good tutorial overall . Thanks .

  • @saumyaranjannayak9179

    @saumyaranjannayak9179

    6 ай бұрын

    yes there he has to select county column but he selected food_delivery_country

  • @seleniumautomation6552
    @seleniumautomation65523 ай бұрын

    Sir I am getting issue 'NoneType' object has no attribute 'union', while doing new_records_df.union(old_records_df)

  • @manish_kumar_1

    @manish_kumar_1

    3 ай бұрын

    Koi ek df me aapne .show v laga rakha hoga jis se ye error aa sakta hai

  • @seleniumautomation6552

    @seleniumautomation6552

    3 ай бұрын

    @@manish_kumar_1 Yes sir ho gya tha resolve. Sir ye 24 lecture karne se ho jaayega clear interview?

  • @ShivamSharma-xt7te
    @ShivamSharma-xt7te9 ай бұрын

    Yeh series complete ho gayi kya bhai? Or kya topics and kitani videos rhe gayi hain.

  • @manish_kumar_1

    @manish_kumar_1

    9 ай бұрын

    Haan ye complete ho gaya hai.

  • @fascinatingworldfacts5583
    @fascinatingworldfacts55838 ай бұрын

    bhi or aage ki video kha h?

  • @manish_kumar_1

    @manish_kumar_1

    8 ай бұрын

    Isme itna hi video hai. Baaki aap leetcode se practice kijiye

  • @saisri6404
    @saisri640410 ай бұрын

    data nhi provide kiya discription m

  • @manish_kumar_1

    @manish_kumar_1

    10 ай бұрын

    Added

  • @sabesanj5509

    @sabesanj5509

    9 ай бұрын

    Manish bro please provide us the document for SCD2 as I don’t understand hindi much..

  • @sankuM
    @sankuM10 ай бұрын

    @@manish_kumar_1 bhau, data description me daalna bhool gaye..! :P koi nhi ChatGPT kab kaam ayega??! ;)

  • @manish_kumar_1

    @manish_kumar_1

    10 ай бұрын

    Added

  • @sabesanj5509
    @sabesanj55099 ай бұрын

    Manish bro please provide me the document or website link for SCD2 as I don’t understand hindi much..

  • @manish_kumar_1

    @manish_kumar_1

    9 ай бұрын

    You just checkout the code written. And you will be good to go. I don't have any resource as such. When I had faced this issue then I had implemented the way I have shown in my videos