lead and lag in spark | window function in pyspark | Lec-16

In this video I have talked about window function in pyspark.Also I have talked about difference between lead and lag. If you want to optimize your process in Spark then you should have a solid understanding of this concept.
Directly connect with me on:- topmate.io/manish_kumar25
Data:-
product_data = [
(1,"iphone","01-01-2023",1500000),
(2,"samsung","01-01-2023",1100000),
(3,"oneplus","01-01-2023",1100000),
(1,"iphone","01-02-2023",1300000),
(2,"samsung","01-02-2023",1120000),
(3,"oneplus","01-02-2023",1120000),
(1,"iphone","01-03-2023",1600000),
(2,"samsung","01-03-2023",1080000),
(3,"oneplus","01-03-2023",1160000),
(1,"iphone","01-04-2023",1700000),
(2,"samsung","01-04-2023",1800000),
(3,"oneplus","01-04-2023",1170000),
(1,"iphone","01-05-2023",1200000),
(2,"samsung","01-05-2023",980000),
(3,"oneplus","01-05-2023",1175000),
(1,"iphone","01-06-2023",1100000),
(2,"samsung","01-06-2023",1100000),
(3,"oneplus","01-06-2023",1200000)
]
from pyspark.sql.functions import *
emp_data = [(1,'manish',50000,'IT','m'),
(2,'vikash',60000,'sales','m'),
(3,'raushan',70000,'marketing','m'),
(4,'mukesh',80000,'IT','m'),
(5,'priti',90000,'sales','f'),
(6,'nikita',45000,'marketing','f'),
(7,'ragini',55000,'marketing','f'),
(8,'rashi',100000,'IT','f'),
(9,'aditya',65000,'IT','m'),
(10,'rahul',50000,'marketing','m'),
(11,'rakhi',50000,'IT','f'),
(12,'akhilesh',90000,'sales','m')]
emp_schema=['id','name','salary','dept','gender']
emp_df = spark.createDataFrame(data=emp_data,schema=emp_schema)
emp_df = emp_df.select('id','name','salary','gender','dept')
emp_df.show()
Flight Data link:- github.com/databricks/Spark-T...
For more queries reach out to me on my below social media handle.
Follow me on LinkedIn:- / manish-kumar-373b86176
Follow Me On Instagram:- / competitive_gyan1
Follow me on Facebook:- / manish12340
My Second Channel -- / @competitivegyan1
Interview series Playlist:- • Interview Questions an...
My Gear:-
Rode Mic:-- amzn.to/3RekC7a
Boya M1 Mic-- amzn.to/3uW0nnn
Wireless Mic:-- amzn.to/3TqLRhE
Tripod1 -- amzn.to/4avjyF4
Tripod2:-- amzn.to/46Y3QPu
camera1:-- amzn.to/3GIQlsE
camera2:-- amzn.to/46X190P
Pentab (Medium size):-- amzn.to/3RgMszQ (Recommended)
Pentab (Small size):-- amzn.to/3RpmIS0
Mobile:-- amzn.to/47Y8oa4 ( Aapko ye bilkul nahi lena hai)
Laptop -- amzn.to/3Ns5Okj
Mouse+keyboard combo -- amzn.to/3Ro6GYl
21 inch Monitor-- amzn.to/3TvCE7E
27 inch Monitor-- amzn.to/47QzXlA
iPad Pencil:-- amzn.to/4aiJxiG
iPad 9th Generation:-- amzn.to/470I11X
Boom Arm/Swing Arm:-- amzn.to/48eH2we
My PC Components:-
intel i7 Processor:-- amzn.to/47Svdfe
G.Skill RAM:-- amzn.to/47VFffI
Samsung SSD:-- amzn.to/3uVSE8W
WD blue HDD:-- amzn.to/47Y91QY
RTX 3060Ti Graphic card:- amzn.to/3tdLDjn
Gigabyte Motherboard:-- amzn.to/3RFUTGl
O11 Dynamic Cabinet:-- amzn.to/4avkgSK
Liquid cooler:-- amzn.to/472S8mS
Antec Prizm FAN:-- amzn.to/48ey4Pj

Пікірлер: 123

  • @user-hl5em5jb5i
    @user-hl5em5jb5i6 ай бұрын

    bro, don't get demotivated. U r doing a great job. There are only a few channels on YT that actually do teach you. You are one of them.

  • @voice6905
    @voice69052 күн бұрын

    Hi Manish bhai, the way you explained Spark is just AMAZING! Please come up with a series on Apache KAFKA and AIRFLOW and their usage in Data Engineering, there is not much good tutorial present online on these 2 topics. It's a request Manish bhai! I am a working professional too, working on Spark and Python.

  • @priyankakanyan496
    @priyankakanyan49621 күн бұрын

    thnks brother for the course. You are doing a great job

  • @Tanveer_Shaikh_330
    @Tanveer_Shaikh_330 Жыл бұрын

    Just finished the playlist.. Amazing explaination.. Aur videos bhi daalo ASAP.

  • @amitgupta-ty8xd
    @amitgupta-ty8xd11 ай бұрын

    Thanks for making this playlist. I have got a good knowledge on spark now. Hats off to your dedication.🙌

  • @krushitmodi3882
    @krushitmodi3882 Жыл бұрын

    Thank you sir for data sharing.

  • @user-pm7dd2nd4y
    @user-pm7dd2nd4y8 ай бұрын

    Thanks for great and in detail content manish bhai great work

  • @hubspotvalley580
    @hubspotvalley5807 ай бұрын

    Thanks for makin amazing lecture !

  • @sinhadeepak4101
    @sinhadeepak41017 ай бұрын

    thank you for your guidence manish

  • @princyanghan8734
    @princyanghan8734 Жыл бұрын

    Thank you for posting videos in Hindi !!! It's always easy to learn in your mother tounge

  • @user-tb8ry2jl7s
    @user-tb8ry2jl7s9 ай бұрын

    sir this is really a great video.

  • @sonajikadam4523
    @sonajikadam4523 Жыл бұрын

    Nice ....

  • @AbhishekKumar-rq4rt
    @AbhishekKumar-rq4rt10 ай бұрын

    Manish Bhai you are doing great work. Don't get demotivated maybe you will not get response immediately becoz some people watch this after few months. So please keep going and Thank you for the such a precious content.

  • @rekhasingh4945
    @rekhasingh4945 Жыл бұрын

    Very helpful

  • @hubspotvalley580
    @hubspotvalley5807 ай бұрын

    Manish Bhai, You are the best teacher !!!!

  • @akshaychowdhary8534
    @akshaychowdhary85343 ай бұрын

    Ur GEM of a person Manish bhai ❤

  • @venkatramana7980
    @venkatramana7980 Жыл бұрын

    i can understand hindi but it will be really helpful if you make vedios in english so that we can understand the concept indepth, i learned so many things from your content. just requesting you to make it in english if possible or any seperate series for this playlist in english.

  • @dipakchavan4659
    @dipakchavan46595 ай бұрын

    Chalo spark ki 2-4 theory book likh do🤣🤣🤣. Actually, you mentioned a good point, jab tak practice nahi karoge tab tak kuch samaj nai ayega, aur samj nai ayega to confidence nahi ayega, aur confidence nahi to selection nahi. Great. I remembered, David Robins Navy seal, world redcord holder, The toughest man in the world, his statement like this, '' If you not getting it first time, try second time, and not getting try again, try again try again, and that one point come and your mind says you hey wait i figured it out, i got it now, ''🤗🤗🤗. Whatever the situation is you've keep going again and again again till you mastered it. I also started preparing from your videos and finished just by watching, now for second watch i'm practicing it and I'm getting it. Thanks Manish for your Hard efforts.

  • @user-pe2bc6yr7x
    @user-pe2bc6yr7x5 ай бұрын

    For Q1: window3 = Window.partitionBy("product_name") last_6_month_df = product_df.withColumn("last_6_month_tot_sale", sum("sales").over(window3))\ .withColumn("per_sales_each_mon", round(( col("sales") / col("last_6_month_tot_sale")) * 100, 2)) last_6_month_df.show() Note: But it will work for the given product_data given in description as there for each product 6 months' records are there but if less or greater than we need to use rowsBetween which you taught in later videos.. Thanks..

  • @sachinragde
    @sachinragde Жыл бұрын

    great Manish

  • @vaibhavdimri7419
    @vaibhavdimri74197 ай бұрын

    Sir don't get demotivated or angry. You are a great teacher and all the lectures are easy to understand.❤❤❤ Please take care of your health.

  • @ABHISHEKKUMAR-wf3yh
    @ABHISHEKKUMAR-wf3yhАй бұрын

    Thanks for all the knowledge Manish. Best series i have come across on pyspark. Great wishes for your efforts. Answer for 1st question + finding best sales month for each product window=Window.partitionBy("id") product_df.withColumn("last6mothtotalsales",sum(col("sales_val")).over(window))\ .withColumn("perLast6monthsales",round((col("sales_val")/col("last6mothtotalsales"))*100,2))\ .withColumn("best%SalesMonth",row_number().over(window.orderBy(col("perLast6monthsales").desc())))\ .filter(col("best%SalesMonth")==1).show()

  • @sambitmohanty1758
    @sambitmohanty1758 Жыл бұрын

    Hi Manish, your content are amazing, how many more videos will come in spark series?

  • @beingdigitalite6694
    @beingdigitalite669410 ай бұрын

    Q1. Answer: window = Window.partitionBy("product_id") last_six_month_df = product_df.withColumn("previous_six_month_total_sales", sum("sales").over(window))\ .withColumn("perc_sales_each_month", round((col("sales")/col("previous_six_month_total_sales")) * 100, 2) ) last_six_month_df.show()

  • @nikhilhimanshu9758
    @nikhilhimanshu97586 ай бұрын

    same question pucha gaya hai mere se. and i was able to write.

  • @meghaonnath7
    @meghaonnath74 ай бұрын

    Thanks Manish for making the playList It would be more helpful if you teach in english As some of them are non hindi listeners like me. Here Is the answer: product_window = Window.partitionBy("product_name") product_df.withColumn("Total_sales", sum(col("sales")).over(product_window))\ .withColumn("Percent_sales",(col("sales")/col("Total_sales")))\ .show()

  • @lazycool0298
    @lazycool02982 ай бұрын

    Bhaia kitab likhne wala joke acha tha 🤣, main full practical kr rha hun thank you content ke liye

  • @user-rh1hr5cc1r
    @user-rh1hr5cc1r2 ай бұрын

    1 st question: using data frame API we cant get total sales using sum as when we use sum() in windows function we get cummulative/running total sales so for this we need to used group by and join like we did in Window function class this one i tried during window function class and got this question here

  • @harshalunhale4369
    @harshalunhale43693 ай бұрын

    @ManishKumar I think to calculate per_loss_gain, incorrect formula is being used in video at 12:10. It should be divided by previous_month_sales not by sales. according to me correct formula: Percentage Change=(Current Sales − Previous Month Sales )/Previous Month Sales)×100 Correct me if I am wrong. Eager to learn more from you 😊

  • @ashishkulshretha8332
    @ashishkulshretha8332 Жыл бұрын

    can you please suggest the topic i should prepare apart from what you shared till now.i have gone through all of videos.I have ain interview tomorrow :)

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Sorry I saw your comment today. Hope your interview went well!

  • @ashishkulshretha8332

    @ashishkulshretha8332

    Жыл бұрын

    @@manish_kumar_1 Yes. I went through your other recordings.I haveCCA175 certification(hadoop and cpark developer) so i am clear with programming basics.But your videos prepared for the topics beyond programming. But still a roadmap on the topics we should learn for interview that will definitely help as you are still in the process of making the videos.I expect next round of interview will happen within a week.If you can cover spark streaming, kafka related videos than it will help people .looks like lot of requirements are related to real time data processing.

  • @user-dl3ck6ym4r
    @user-dl3ck6ym4r4 ай бұрын

    can anyone tell how to reset password of databrick , i did several time but im unable to resolve it. still not able to lgoin

  • @pranavlalwadia7359
    @pranavlalwadia735911 ай бұрын

    Sir, Can you plz upload the product table data in the description you have given the employee data.

  • @manish_kumar_1

    @manish_kumar_1

    11 ай бұрын

    check now

  • @pranavlalwadia7359

    @pranavlalwadia7359

    11 ай бұрын

    Thanks

  • @yashupadwalkar6127
    @yashupadwalkar61274 ай бұрын

    window= Window.partitionBy("product_id").orderBy(desc("sales_date")) last_month_df=product_df.withColumn("privs_month_sales", lag(col("sales"),1) .over(window)).show() is this okay for better understanding of lag()function i have used desc() to better understand priv.. months

  • @wayzonic
    @wayzonic Жыл бұрын

    Next videos lecture

  • @user-tb8ry2jl7s
    @user-tb8ry2jl7s9 ай бұрын

    DOUBT:: sir This data is month wise like each month only 1 record, what if we have every day 1 record then we should use month of every record then order by month OR take only the min(date) over (partition by month(date)) then do like this

  • @BooksWala
    @BooksWala Жыл бұрын

    Bhai Kafka ka bhi series start ke do per day one video atleast please

  • @shubne

    @shubne

    Жыл бұрын

    Yes i'm with you. Manish please make videos on Kafka as well.

  • @sachindubey4315
    @sachindubey431511 ай бұрын

    bhai yeh vaala data toh iss description me daal dete , ab yeh daa next me dala hua hai tumne kafi dudne k bad pta chlaa

  • @manish_kumar_1

    @manish_kumar_1

    11 ай бұрын

    Is it? Let me check

  • @Sahil1001
    @Sahil10018 күн бұрын

    done brother but u didnt teach lead lag and window functions in spark sql

  • @manish_kumar_1

    @manish_kumar_1

    6 күн бұрын

    Aisa kya mujhe yaad nhi hai ab kis lecture me kya hai. But hona chahiye warna thumbnail aise to nhi banata

  • @neutraltrader5905
    @neutraltrader59058 ай бұрын

    yo , great explanation. Thank you for your vids. Please don't get offended if not many people reply. I'm watching this after 3 months, and here's the answer - perc_sales_6m = product_df.withColumn("total_sales_6m",sum(col("sales")).over(window_product))\ .withColumn("perc_sales_6m",round(100*(col("sales")/col("total_sales_6m")),2))\ .show() #Thank you for the Amazing learning content !

  • @vigneshbalaji_
    @vigneshbalaji_5 ай бұрын

    It will be great if you make the videos in english

  • @manish_kumar_1

    @manish_kumar_1

    5 ай бұрын

    Will upload in English also but that will take time

  • @shantanusaxena6863
    @shantanusaxena68635 ай бұрын

    Answer for Q1- window = Window.partitionBy(col('product_id')) product_df.withColumn('total_sales',sum(col('sales')).over(window))\ .withColumn('percentage',round(col('sales')*100/col('total_sales'),2))\ .show()

  • @grim_rreaperr
    @grim_rreaperr4 ай бұрын

    Ques 1 solution: > windowspec= Window.partitionBy(col("product_id")) #setting window size based on product_id > product_df.withColumn("product_wise_total_sales", sum(col("sales")).over(windowspec))\ .withColumn("%age_sale", round((col("sales")/col("product_wise_total_sales"))*100.00,2))\ .show()

  • @nayanjyotibhagawati939
    @nayanjyotibhagawati939 Жыл бұрын

    Ek session UDF par please

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Bilkul

  • @hemant7477
    @hemant74775 ай бұрын

    1st question solution- window = Window.partitionBy("product_id") product_df.withColumn("sum_six_month_sales", sum("sales").over(window))\ .withColumn("sales_per_6_month", round( col("sales") / col("sum_six_month_sales") *100,2))\ .show()

  • @learnndearn6157
    @learnndearn61572 ай бұрын

    Please check my solution for the question- window= Window.partitionBy("product_id") product_df= product_df.withColumn("total", sum("sales").over(window)) percentage_df= product_df.withColumn("percentage_of_total_sales", round(col("sales")/col("total")*100,2)) percentage_df.show()

  • @nileshpatil8118
    @nileshpatil81189 ай бұрын

    Manish sir, your teaching is fantastic, but in above example : ######## write a code for, Percentage of sale on the basis of last month for last 6 months.  ((current / prev) -1) * 100 Win=Window.partitionBy("product").orderBy("date") df.withColumn("previous_month_sale", lag(col("sale")).over(win)) \ .withColumn("%_of_sale",((col("sal ")/col("previous_month_sale "))-1)*100)\ .show()

  • @manojposhala4751
    @manojposhala47514 ай бұрын

    Can you pls explain English language

  • @user-iz5hj1ep8s
    @user-iz5hj1ep8s9 ай бұрын

    window = Window.partitionBy("product_id").orderBy("sales_date") # calculate the last month sales using lag last_six_month_sale = product_df.withColumn("last_six_month_sale",sum(col('Sales')).over(window.rowsBetween(0,5))) result = last_six_month_sale.withColumn("sales_percentage",(col("sales") - lag("last_six_month_sale").over(window)/lag("last_six_month_sale").over(window))) result.show(truncate = False)

  • @vbansal345
    @vbansal34510 ай бұрын

    Hi Sir, Thanks alot for the videos. from pyspark.sql.window import Window window=Window.partitionBy("brand") product_df1=product_df.withColumn("6_month_sales",sum(col("sales")).over(window)) product_df1.withColumn("percentage_of_sales",(col("sales")/col("6_month_sales"))*100).show()

  • @shubhamalsunde3230
    @shubhamalsunde32307 ай бұрын

    next_month_df.withColumn("per_loss_gain",round(sum(col("sales")+col("next_month_sales"))/col("sales"))*100,2).show(truncate=False) please correct if i wrong mr manish

  • @pde.joggiri
    @pde.joggiri Жыл бұрын

    window = Window.partitionBy('product_name') product_df = product_df.withColumn('total',sum('sales').over(window)) percentage_df = product_df.withColumn("percentageSales",round(( product_df.sales/product_df.total)*100,2) ) product_df.show()

  • @divyaborse5866
    @divyaborse58664 ай бұрын

    from pyspark.sql.functions import lag,lead,col,round window = Window.partitionBy("id").orderBy("id") df = df.withColumn("6_month_sales",sum("sales").over(window)) df = df.withColumn("comparison",round((col("sales")/col("6_month_sales"))*100,2)) df.show()

  • @lakshya1375
    @lakshya13757 ай бұрын

    Solution: For Ques 1 product_window = Window.partitionBy('Product_name').orderBy('Sale_date') product_df.withColumn('total_sale', sum(col('sales')).over(product_window))\ .withColumn('percentage',round((col('total_sale')-col('sales'))*100/col('total_sale'),2))\ .show() For Ques 2 product_window = Window.partitionBy('Product_name').orderBy('Sale_date') product_df.withColumn('Previous_month_sale', lag(col('sales'),1).over(product_window))\ .withColumn('Gain/Loss',round((col('sales')-col('Previous_month_sale'))*100/col('sales'),2))\ .show()

  • @BeingSam7
    @BeingSam716 күн бұрын

    w = Window.partitionBy("Product_ID") Total_df = df.withColumn("Total", sum(col("Sales")).over(w)) Total_df.withColumn("Percen_Sales", round((col("Sales")*100)/col("Total"),2) ).show()

  • @jatinyadav6158
    @jatinyadav6158Ай бұрын

    Sol for_problem1 window = Window.partitionBy(col("product_id")) product_df.withColumn("result",(col('sales_amount'))/(sum(col('sales_amount')).over(window))*100).show()

  • @ashutoshkumarsingh3337
    @ashutoshkumarsingh3337 Жыл бұрын

    my answer:- window= Window.partitionBy(col("product_id")).orderBy("sales_date") prev_sales_df= product_df.withColumn("prev_month_sales", lag(col("sales_amount"),1,0).over(window)) \ .withColumn("prof_loss_amount", col("sales_amount")-col("prev_month_sales")) \ .withColumn("per_loss_gain", round((col("prof_loss_amount")/col("prev_month_sales"))*100,2))

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    1st wale questions ka answers dijiye

  • @ashutoshkumarsingh3337

    @ashutoshkumarsingh3337

    Жыл бұрын

    @@manish_kumar_1 window= Window.partitionBy(col("product_id")) total_sales_df= product_df.withColumn("total_sales_month", sum(col("sales_amount")).over(window)) otal_sales_df.withColumn("percentage_sales", round((col("sales_amount")/col("total_sales_month"))*100,2)).show()

  • @sauravchoudhary10
    @sauravchoudhary105 ай бұрын

    window = Window.partitionBy('product_name') new_product_df = product_df.withColumn('Total_Sales', sum(col('sales')).over(window))\ .withColumn('percentage_sale', (col('sales')/col('Total_Sales'))*100).show()

  • @shorakhutte1887
    @shorakhutte18874 ай бұрын

    window = Window.partitionBy("product_id") product_df.withColumn("six_mon_total" , sum("sales").over(window) )\ .withColumn("percentage_each_month", round( (col("sales")/col("six_mon_total")) * 100, 2) ).show()

  • @ranjeetrai6558
    @ranjeetrai6558Ай бұрын

    window1=Window.partitionBy('brand') p_df2=p_df1.withColumn('total sales',sum(col('sales')).over(window1)) p_df2.withColumn('% sales',round((col('sales')/col('total sales'))*100,2)).show()

  • @manvika
    @manvika9 ай бұрын

    from pyspark.sql.types import * from pyspark.sql.functions import * from pyspark.sql.window import Window window = Window.partitionBy("product_id").orderBy("sales_date").rangeBetween(0,-5) last_six_mnth_sales_df = product_df.withColumn("last_six_months_total_sales", sum(sales).over(window) perc_df= last_six_mnth_sales_df.withColumn("perc_df", ROUND(col(sales)/col(last_six_months_total_sales)*100,2).show()

  • @user-ge4il7kv1n
    @user-ge4il7kv1n5 ай бұрын

    # % of sale each month based on last 6 month sale window = Window.partitionBy('product_name') df_sales.withColumn('total_sale',sum(col('sales')).over(window))\ .withColumn('percentage_sale_each_month',round((col('sales')) / (col('total_sale')),2)*100).display()

  • @navdeeplamba9884
    @navdeeplamba98842 күн бұрын

    Answer to first question : window=Window.partitionBy('product_id').orderBy('sales_date') last_six_month_sale_df=product_df.withColumn('total_sales',lag(sum('sales').over(window),0).over(window))\ .withColumn('%_last_six_month',round((col('sales')/col('total_sales'))*100,2)) last_six_month_sale_df.show()

  • @codjawan
    @codjawan3 ай бұрын

    window = Window.partitionBy('phone_cat') df.withColumn('6_Months_Sales',sum('price').over(window))\ .withColumn('Month_Sales_Percentage',round(100*(col('price')/col('6_Months_Sales')),2))\ .withColumn('Month_Sales_Percentage',concat('Month_Sales_Percentage',lit(' %')))\ .select('id','phone_cat','date','price','Month_Sales_Percentage').show()

  • @electricalsviasugar5271
    @electricalsviasugar5271 Жыл бұрын

    window=Window.partitionBy(col("product_id")).orderBy(col("sales_date")) total_sale_df=product_df.withColumn("total_sale",sum(col("sales")).over(window)) total_sale_df.show() total_sale_df.withColumn("percentage_sale",round((col("sales")/col("total_sale"))*100,2)).show()

  • @saikumarjakki3802
    @saikumarjakki3802 Жыл бұрын

    val win=Window.partitionBy("product_id) val p_df=p.withColumn("total_sales",sum(col("sales")).over(win))/ withColumn("sales_percentage",round((col("sales")/col("total_sales"))*100,2).show()

  • @nikhilhimanshu9758
    @nikhilhimanshu97586 ай бұрын

    from pyspark.sql.functions import * from pyspark.sql.types import * from pyspark.sql.window import Window window= Window.partitionBy("id").orderBy("price").rowsBetween(Window.unboundedPreceding,Window.unboundedFollowing) df12=prd_new1_df.withColumn("row_num",sum("price").over(window))\ .withColumn("last_six",round((col("price")/(col("row_num"))*100))).show()

  • @shubhamalsunde3230
    @shubhamalsunde32307 ай бұрын

    i think this is correct please reply sir next_month_df.withColumn("per_loss_gain", round( ((col("next_month_sales") - col("sales")) / col("sales")) * 100, 2 )).show(truncate=False)

  • @spark_DE

    @spark_DE

    7 ай бұрын

    💯

  • @kushalkumarmishra2281
    @kushalkumarmishra2281Ай бұрын

    window = Window.partitionBy("product_id").orderBy("sales_date") df_new = df.withColumn("last_6_month_sale", sum(col("sales")).over(window)) \ .withColumn("%age_of_sales",round((col("sales")/col("last_6_month_sale"))*100,2))

  • @mayankkandpal1565
    @mayankkandpal15657 ай бұрын

    window = Window.partitionBy('product_id') result_df = ( product_df .withColumn('last_6_month_sale', sum('sales').over(window)) .withColumn('%age_sale_from_last_6_month', round(col('sales') / col('last_6_month_sale') * 100, 2)) ) result_df.show()

  • @Himanshugupta-fb2ss
    @Himanshugupta-fb2ss6 ай бұрын

    from pyspark.sql.window import Window from pyspark.sql.functions import * window = Window.partitionBy('prd_name').orderBy() prd_df.withColumn('total_Sales',sum('sale').over(window)).withColumn('percentage_sale',round(expr('(sale/total_Sales)*100'),2)).display()

  • @rituhalder835
    @rituhalder8352 ай бұрын

    product_df.groupBy('product_name').agg(sum('sales').alias('total_sum')).withColumn("percentage",round(col("total_sum")*(1/225000),2)).show()

  • @rpraveenkumar007
    @rpraveenkumar007 Жыл бұрын

    window = Window.partitionBy("product_name").orderBy('sales_date') last_six_month_df = product_df.withColumn('six_month_sale', sum(col('sales')).over(window)) last_six_month_df.withColumn('each_month_sale', round((col('sales')/col("six_month_sale"))*100,2)).show()

  • @rpraveenkumar007

    @rpraveenkumar007

    Жыл бұрын

    let me know, guys if this is correct??

  • @saurabhkatkar
    @saurabhkatkar12 күн бұрын

    Answer for Q1: sales_df = sales_df.withColumn('sales_date',from_unixtime(unix_timestamp('sales_date', 'dd-MM-yyyy'))) grouped_df = sales_df.groupBy('product_name',year('sales_date').alias('year'),month('sales_date').alias('month')).agg(sum('sales').alias('total_sales_monthly')) sum_window = Window.partitionBy('product_name','year') grouped_df.withColumn('total_sales', sum('total_sales_monthly').over(sum_window))\ .withColumn('percent_month_sales_wrt_total',\ round(100*col('total_sales_monthly')/col('total_sales'),2)).show()

  • @Sahil1001
    @Sahil10018 күн бұрын

    Q1. window = Window.partitionBy("product_name") product_df.withColumn("total sum",sum(col("sales")).over(window))\ .withColumn("%sale per month",round((col("sales")/col("total sum")*100),2))\ .show(truncate = False) Q2. window = Window.partitionBy("product_name").orderBy("sales") product_df.withColumn("next month sale",lead(col("sales")).over(window))\ .withColumn("%sale per month",round(((col("next month sale")-col("sales"))/col("sales")*100),2))\ .show(truncate = False)

  • @praveenkumarrai101
    @praveenkumarrai1019 ай бұрын

    wini = Window.partitionBy('product_id') y = df_products.withColumn('total_byproduct', sum('price').over(wini)) y.withColumn('percent_sales_eachMonth', round((col('price')/col('total_byproduct'))*100,2)).show()

  • @saisri6404
    @saisri640410 ай бұрын

    winp = Window.partitionBy("product_name") prod_df.withColumn("lst_6_mnth_pct", round((col("sales")/(sum("sales").over(winp))*100), 2)).show()

  • @Tanc369
    @Tanc369Ай бұрын

    window1=Window.partitionBy("product") #.orderBy("date") prev_df=sales_df.withColumn("sum_prod",sum("sales").over(window1)) prev_df.withColumn("per_sales",round((col("sum_prod")-col("sales"))/col("sum_prod")*100,2)).show()

  • @jjayeshpawar
    @jjayeshpawar6 ай бұрын

    window = Window.partitionBy("mobile").orderBy(df["date"].asc()) df1=df.withColumn("rn",row_number().over(window)) df2=df1.filter(df1['rn']

  • @helloanalyst
    @helloanalyst6 ай бұрын

    One more way to solve windowspec=Window.partitionBy('product').orderBy('id') df.withColumn('partition by sum',sum('sales').over(windowspec))\ .withColumn('%tage based on last6 months',round((col('sales')/col('partition by sum'))*100,2))\ .show() +---+-------+----------+-------+----------------+---------------------------+ | id|product| dt| sales|partition by sum|%tage based on last6 months| +---+-------+----------+-------+----------------+---------------------------+ | 1| iphone|01-01-2023|1500000| 8400000| 17.86| | 1| iphone|01-02-2023|1300000| 8400000| 15.48| | 1| iphone|01-03-2023|1600000| 8400000| 19.05| | 1| iphone|01-04-2023|1700000| 8400000| 20.24| | 1| iphone|01-05-2023|1200000| 8400000| 14.29| | 1| iphone|01-06-2023|1100000| 8400000| 13.1| | 3|oneplus|01-01-2023|1100000| 6925000| 15.88| | 3|oneplus|01-02-2023|1120000| 6925000| 16.17| | 3|oneplus|01-03-2023|1160000| 6925000| 16.75| | 3|oneplus|01-04-2023|1170000| 6925000| 16.9| | 3|oneplus|01-05-2023|1175000| 6925000| 16.97| | 3|oneplus|01-06-2023|1200000| 6925000| 17.33| | 2|samsung|01-01-2023|1100000| 7180000| 15.32| | 2|samsung|01-02-2023|1120000| 7180000| 15.6| | 2|samsung|01-03-2023|1080000| 7180000| 15.04| | 2|samsung|01-04-2023|1800000| 7180000| 25.07| | 2|samsung|01-05-2023| 980000| 7180000| 13.65| | 2|samsung|01-06-2023|1100000| 7180000| 15.32| +---+-------+----------+-------+----------------+---------------------------+

  • @user-eu4bs6cf4p
    @user-eu4bs6cf4p Жыл бұрын

    window=Window.partitionBy("product_id") last_6_month=product_df.withColumn("total_sales",sum(col("sales")).over(window)) last_6_month.show() last_6_month.withColumn("percentage_sales",round(((col("sales"))/col("total_sales"))*100,2)).show()

  • @anweshkumarsahoo3927
    @anweshkumarsahoo392710 ай бұрын

    window = Window.partitionBy('product_id') six_month_sales_df = product_df.withColumn('6 months sales',sum(col('sales')).over(window))\ six_month_sales_df.withColumn('% of sales to 6 months', round((col('sales')/col('6 months sales')) *100,2))\ .show()

  • @Kanom224
    @Kanom2243 ай бұрын

    window= Window.partitionBy('product_id') last_months_df= ll_df.withColumn('last_6_mon_sales', sum('sales').over(window)) last_months_df.withColumn('percen_sales_6_mon', round((col('sales')/col('last_6_mon_sales'))*100,2)).show()

  • @Kanom224
    @Kanom2243 ай бұрын

    window= Window.partitionBy('product_id') last_months_df= ll_df.withColumn('last_6_mon_sales', sum('sales').over(window)) last_months_df.withColumn('percen_sales_6_mon', round((col('sales')/col('last_6_mon_sales'))*100,2)).show()