group By in spark | Lec-17 | spark interview questions

In this video I have talked about how to transform dataframe in spark. In this video I have talked about group by and many more concepts. Please do ask your doubts in comment section.
Directly connect with me on:- topmate.io/manish_kumar25
[(1,'manish',50000,IT),
(2,'vikash',60000,sales),
(3,'raushan',70000,marketing),
(4,'mukesh',80000,IT),
(5,'pritam',90000,sales),
(6,'nikita',45000,marketing),
(7,'ragini',55000,marketing),
(8,'rakesh',100000,IT),
(9,'aditya',65000,IT),
(10,'rahul',50000,marketing)]
1,manish,50000,IT,india
2,vikash,60000,sales,us
3,raushan,70000,marketing,india
4,mukesh,80000,IT,us
5,pritam,90000,sales,india
6,nikita,45000,marketing,us
7,ragini,55000,marketing,india
8,rakesh,100000,IT,us
9,aditya,65000,IT,india
10,rahul,50000,marketing,us
For more queries reach out to me on my below social media handle.
Follow me on LinkedIn:- / manish-kumar-373b86176
Follow Me On Instagram:- / competitive_gyan1
Follow me on Facebook:- / manish12340
My Second Channel -- / @competitivegyan1
Interview series Playlist:- • Interview Questions an...
My Gear:-
Rode Mic:-- amzn.to/3RekC7a
Boya M1 Mic-- amzn.to/3uW0nnn
Wireless Mic:-- amzn.to/3TqLRhE
Tripod1 -- amzn.to/4avjyF4
Tripod2:-- amzn.to/46Y3QPu
camera1:-- amzn.to/3GIQlsE
camera2:-- amzn.to/46X190P
Pentab (Medium size):-- amzn.to/3RgMszQ (Recommended)
Pentab (Small size):-- amzn.to/3RpmIS0
Mobile:-- amzn.to/47Y8oa4 ( Aapko ye bilkul nahi lena hai)
Laptop -- amzn.to/3Ns5Okj
Mouse+keyboard combo -- amzn.to/3Ro6GYl
21 inch Monitor-- amzn.to/3TvCE7E
27 inch Monitor-- amzn.to/47QzXlA
iPad Pencil:-- amzn.to/4aiJxiG
iPad 9th Generation:-- amzn.to/470I11X
Boom Arm/Swing Arm:-- amzn.to/48eH2we
My PC Components:-
intel i7 Processor:-- amzn.to/47Svdfe
G.Skill RAM:-- amzn.to/47VFffI
Samsung SSD:-- amzn.to/3uVSE8W
WD blue HDD:-- amzn.to/47Y91QY
RTX 3060Ti Graphic card:- amzn.to/3tdLDjn
Gigabyte Motherboard:-- amzn.to/3RFUTGl
O11 Dynamic Cabinet:-- amzn.to/4avkgSK
Liquid cooler:-- amzn.to/472S8mS
Antec Prizm FAN:-- amzn.to/48ey4Pj

Пікірлер: 37

  • @user-iz5hj1ep8s
    @user-iz5hj1ep8s9 ай бұрын

    i was worked from last 3 years as a data engineer but now at this time i know when we have to use windows function..... thank you Manish Sir....

  • @shreemanthmamatavbal7468
    @shreemanthmamatavbal74687 ай бұрын

    Sir please discuss some examples for more clarification . Nice explanation sir 😍😍😍

  • @Abhishek_Dahariya
    @Abhishek_Dahariya10 ай бұрын

    great explaination.

  • @udayanbhivsane2889
    @udayanbhivsane28895 ай бұрын

    Tum bohot Accha Kam Karta Hai Maqsud Bhai

  • @satyamrai2577
    @satyamrai2577 Жыл бұрын

    Amazing. Title should say lecture- 17

  • @lakshmikantharajun6278
    @lakshmikantharajun62786 ай бұрын

    NICE

  • @ArabindaMohapatra
    @ArabindaMohapatra4 ай бұрын

    Thank you so much for all the videos including Pyspark theory & practical videos -- Please upload some videos on Python Question for data engineering real use case ke upor , Pyspark ka aur series age ayega kya Manish Bhai-- After SCD 2 Videos -

  • @KaranSingh-hx8dh
    @KaranSingh-hx8dh Жыл бұрын

    Will you please teach various optimization techniques and when we use them in real-time?

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Batate rahta hu bahut saare techniques. Aur v aane wale hai abhi alag alag videos me

  • @dakait0867
    @dakait086710 ай бұрын

    Df.groupBy("dept","country").agg(sum("salary")).show()

  • @ranvijaymehta
    @ranvijaymehta Жыл бұрын

    Thanks Sir Ji

  • @poojajoshi871
    @poojajoshi871 Жыл бұрын

    Hi Sir, Can you please tell how many lectures still left for spark and when we will be making a project in spark .

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Av to bahut lecture bache hue hai.

  • @KaranSingh-hx8dh
    @KaranSingh-hx8dh Жыл бұрын

    Sir, you didn't teach Windows function. directly jumped on joins. Please upload a video of the Windows function too.

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Windows join ke baad aayega don't worry

  • @KaranSingh-hx8dh

    @KaranSingh-hx8dh

    Жыл бұрын

    @@manish_kumar_1 Thank you. I didn't got so good explanation like you any where.

  • @DpIndia
    @DpIndia10 ай бұрын

  • @techworld5477
    @techworld54775 ай бұрын

    emt.groupBy("dept","country")\ .agg(sum("salary").alias("total_salary")).show()

  • @luvvkatara1466
    @luvvkatara14666 ай бұрын

    emp_df.groupBy("dept")\ .agg(sum("salary")).show() TypeError: unsupported operand type(s) for +: 'int' and 'str'. How to resolve this error?

  • @siladityasarkar5143

    @siladityasarkar5143

    5 ай бұрын

    Please execute this command before executing the mentioned pyspark command: from pyspark.sql.functions import * from pyspark.sql.types import *

  • @aryankhandelwal8517
    @aryankhandelwal851710 ай бұрын

    a_df = emp_data_countrywise_df.groupby(col("job"),col("country")).agg(sum(col("salary"))) a_df.show()

  • @adarsharora6097
    @adarsharora60976 ай бұрын

    spark.sql(""" SELECT *, SUM(salary) OVER (PARTITION BY dept) AS New_col, ROUND((salary/New_col)*100,2) AS percentage_of_empSalary FROM employee_dff """).show()

  • @adarsharora6097

    @adarsharora6097

    6 ай бұрын

    emp_df.withColumn('New_col',sum('salary').over(Window.partitionBy('dept')))\ .withColumn('percentage_of_empSalary', round(col('salary')/col('New_col')*100,2)).select('*').show()

  • @1234pk
    @1234pk Жыл бұрын

    df.groupBy("Dept")\ .agg(sum("salry").alias("Sum_Salary"),\ avg("salry").alias("avg_Salry")\ ) \ .show(truncate=False)

  • @abdulfaheem5154
    @abdulfaheem515423 күн бұрын

    Solution for the last question: emp_df.groupby("emp_dept", "emp_country").agg(sum("emp_salary")).where((col("emp_country")== "india") & (col("emp_dept") == "IT")).display()

  • @tejasshinde5722
    @tejasshinde57222 ай бұрын

    @13:33 # Q. group by dept and country columns with sum of salary emp_country_data_df.groupBy("dept","country")\ .agg(sum("salary")).show() +---------+-------+-----------+ | dept|country|sum(salary)| +---------+-------+-----------+ | IT| india| 115000| | sales| us| 60000| |marketing| india| 125000| | sales| india| 90000| | IT| us| 180000| |marketing| us| 95000| +---------+-------+-----------+

  • @ETLMasters
    @ETLMasters Жыл бұрын

    emp_df.groupBy("country", "dept")\ .agg(sum("salary").alias("Total_Salary"), count("dept").alias("Total_Count"))\ .show() emp_sql = spark.sql(""" select country, dept, sum(salary) as Total_salary, count(dept) as Total_count from emp_tbl group by country,dept """) emp_sql.show()

  • @sauravchoudhary10
    @sauravchoudhary105 ай бұрын

    emp_df1.groupBy('dept', 'country')\ .agg(sum("salary").alias("Total_Salary")).show()

  • @sankuM
    @sankuM Жыл бұрын

    I know you've already covered it, but @manish_kumar_1 while learning about group by, it would've been more helpful to know why group by is a wide dependency transformation. thanks for the content you put out & congrats for 5.03K...! 🙌🙌🙌

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Did I not mention it?

  • @sankuM

    @sankuM

    Жыл бұрын

    @@manish_kumar_1 I guess not in this one, I meant ki if you showed Spark UI while applying any join to showcase the shuffle, it would've helped..!

  • @swapnalitoradmal9357
    @swapnalitoradmal93572 ай бұрын

    emp_df1.groupBy("dept" , "country")\ .agg(sum("salary")).show()

  • @soumyaranjanrout2843
    @soumyaranjanrout28436 ай бұрын

    emp_df.groupBy("country","dept")\ .agg(sum("salary") .alias("Total_Salary"))\ .sort(col("country")).show()

  • @yogeshsingh1621
    @yogeshsingh16216 ай бұрын

    office_df.groupBy("dept", "country").agg(avg("salary")).show()

  • @rasikakurhade1011
    @rasikakurhade1011Ай бұрын

    employee_df.groupBy("dept","country")\ .agg(sum("salary")).show()