group By in spark | Lec-17 | spark interview questions
In this video I have talked about how to transform dataframe in spark. In this video I have talked about group by and many more concepts. Please do ask your doubts in comment section.
Directly connect with me on:- topmate.io/manish_kumar25
[(1,'manish',50000,IT),
(2,'vikash',60000,sales),
(3,'raushan',70000,marketing),
(4,'mukesh',80000,IT),
(5,'pritam',90000,sales),
(6,'nikita',45000,marketing),
(7,'ragini',55000,marketing),
(8,'rakesh',100000,IT),
(9,'aditya',65000,IT),
(10,'rahul',50000,marketing)]
1,manish,50000,IT,india
2,vikash,60000,sales,us
3,raushan,70000,marketing,india
4,mukesh,80000,IT,us
5,pritam,90000,sales,india
6,nikita,45000,marketing,us
7,ragini,55000,marketing,india
8,rakesh,100000,IT,us
9,aditya,65000,IT,india
10,rahul,50000,marketing,us
For more queries reach out to me on my below social media handle.
Follow me on LinkedIn:- / manish-kumar-373b86176
Follow Me On Instagram:- / competitive_gyan1
Follow me on Facebook:- / manish12340
My Second Channel -- / @competitivegyan1
Interview series Playlist:- • Interview Questions an...
My Gear:-
Rode Mic:-- amzn.to/3RekC7a
Boya M1 Mic-- amzn.to/3uW0nnn
Wireless Mic:-- amzn.to/3TqLRhE
Tripod1 -- amzn.to/4avjyF4
Tripod2:-- amzn.to/46Y3QPu
camera1:-- amzn.to/3GIQlsE
camera2:-- amzn.to/46X190P
Pentab (Medium size):-- amzn.to/3RgMszQ (Recommended)
Pentab (Small size):-- amzn.to/3RpmIS0
Mobile:-- amzn.to/47Y8oa4 ( Aapko ye bilkul nahi lena hai)
Laptop -- amzn.to/3Ns5Okj
Mouse+keyboard combo -- amzn.to/3Ro6GYl
21 inch Monitor-- amzn.to/3TvCE7E
27 inch Monitor-- amzn.to/47QzXlA
iPad Pencil:-- amzn.to/4aiJxiG
iPad 9th Generation:-- amzn.to/470I11X
Boom Arm/Swing Arm:-- amzn.to/48eH2we
My PC Components:-
intel i7 Processor:-- amzn.to/47Svdfe
G.Skill RAM:-- amzn.to/47VFffI
Samsung SSD:-- amzn.to/3uVSE8W
WD blue HDD:-- amzn.to/47Y91QY
RTX 3060Ti Graphic card:- amzn.to/3tdLDjn
Gigabyte Motherboard:-- amzn.to/3RFUTGl
O11 Dynamic Cabinet:-- amzn.to/4avkgSK
Liquid cooler:-- amzn.to/472S8mS
Antec Prizm FAN:-- amzn.to/48ey4Pj
Пікірлер: 37
i was worked from last 3 years as a data engineer but now at this time i know when we have to use windows function..... thank you Manish Sir....
Sir please discuss some examples for more clarification . Nice explanation sir 😍😍😍
great explaination.
Tum bohot Accha Kam Karta Hai Maqsud Bhai
Amazing. Title should say lecture- 17
NICE
Thank you so much for all the videos including Pyspark theory & practical videos -- Please upload some videos on Python Question for data engineering real use case ke upor , Pyspark ka aur series age ayega kya Manish Bhai-- After SCD 2 Videos -
Will you please teach various optimization techniques and when we use them in real-time?
@manish_kumar_1
Жыл бұрын
Batate rahta hu bahut saare techniques. Aur v aane wale hai abhi alag alag videos me
Df.groupBy("dept","country").agg(sum("salary")).show()
Thanks Sir Ji
Hi Sir, Can you please tell how many lectures still left for spark and when we will be making a project in spark .
@manish_kumar_1
Жыл бұрын
Av to bahut lecture bache hue hai.
Sir, you didn't teach Windows function. directly jumped on joins. Please upload a video of the Windows function too.
@manish_kumar_1
Жыл бұрын
Windows join ke baad aayega don't worry
@KaranSingh-hx8dh
Жыл бұрын
@@manish_kumar_1 Thank you. I didn't got so good explanation like you any where.
emt.groupBy("dept","country")\ .agg(sum("salary").alias("total_salary")).show()
emp_df.groupBy("dept")\ .agg(sum("salary")).show() TypeError: unsupported operand type(s) for +: 'int' and 'str'. How to resolve this error?
@siladityasarkar5143
5 ай бұрын
Please execute this command before executing the mentioned pyspark command: from pyspark.sql.functions import * from pyspark.sql.types import *
a_df = emp_data_countrywise_df.groupby(col("job"),col("country")).agg(sum(col("salary"))) a_df.show()
spark.sql(""" SELECT *, SUM(salary) OVER (PARTITION BY dept) AS New_col, ROUND((salary/New_col)*100,2) AS percentage_of_empSalary FROM employee_dff """).show()
@adarsharora6097
6 ай бұрын
emp_df.withColumn('New_col',sum('salary').over(Window.partitionBy('dept')))\ .withColumn('percentage_of_empSalary', round(col('salary')/col('New_col')*100,2)).select('*').show()
df.groupBy("Dept")\ .agg(sum("salry").alias("Sum_Salary"),\ avg("salry").alias("avg_Salry")\ ) \ .show(truncate=False)
Solution for the last question: emp_df.groupby("emp_dept", "emp_country").agg(sum("emp_salary")).where((col("emp_country")== "india") & (col("emp_dept") == "IT")).display()
@13:33 # Q. group by dept and country columns with sum of salary emp_country_data_df.groupBy("dept","country")\ .agg(sum("salary")).show() +---------+-------+-----------+ | dept|country|sum(salary)| +---------+-------+-----------+ | IT| india| 115000| | sales| us| 60000| |marketing| india| 125000| | sales| india| 90000| | IT| us| 180000| |marketing| us| 95000| +---------+-------+-----------+
emp_df.groupBy("country", "dept")\ .agg(sum("salary").alias("Total_Salary"), count("dept").alias("Total_Count"))\ .show() emp_sql = spark.sql(""" select country, dept, sum(salary) as Total_salary, count(dept) as Total_count from emp_tbl group by country,dept """) emp_sql.show()
emp_df1.groupBy('dept', 'country')\ .agg(sum("salary").alias("Total_Salary")).show()
I know you've already covered it, but @manish_kumar_1 while learning about group by, it would've been more helpful to know why group by is a wide dependency transformation. thanks for the content you put out & congrats for 5.03K...! 🙌🙌🙌
@manish_kumar_1
Жыл бұрын
Did I not mention it?
@sankuM
Жыл бұрын
@@manish_kumar_1 I guess not in this one, I meant ki if you showed Spark UI while applying any join to showcase the shuffle, it would've helped..!
emp_df1.groupBy("dept" , "country")\ .agg(sum("salary")).show()
emp_df.groupBy("country","dept")\ .agg(sum("salary") .alias("Total_Salary"))\ .sort(col("country")).show()
office_df.groupBy("dept", "country").agg(avg("salary")).show()
employee_df.groupBy("dept","country")\ .agg(sum("salary")).show()