DAG and Lazy Evaluation in spark

In this video I have talked about dag and lazy evaluation in spark in great detail. please follow video entirely and ask doubt in comment section below.
Directly connect with me on:- topmate.io/manish_kumar25
For more queries reach out to me on my below social media handle.
Follow me on LinkedIn:- / manish-kumar-373b86176
Follow Me On Instagram:- / competitive_gyan1
Follow me on Facebook:- / manish12340
My Second Channel -- / @competitivegyan1
Interview series Playlist:- • Interview Questions an...
My Gear:-
Rode Mic:-- amzn.to/3RekC7a
Boya M1 Mic-- amzn.to/3uW0nnn
Wireless Mic:-- amzn.to/3TqLRhE
Tripod1 -- amzn.to/4avjyF4
Tripod2:-- amzn.to/46Y3QPu
camera1:-- amzn.to/3GIQlsE
camera2:-- amzn.to/46X190P
Pentab (Medium size):-- amzn.to/3RgMszQ (Recommended)
Pentab (Small size):-- amzn.to/3RpmIS0
Mobile:-- amzn.to/47Y8oa4 ( Aapko ye bilkul nahi lena hai)
Laptop -- amzn.to/3Ns5Okj
Mouse+keyboard combo -- amzn.to/3Ro6GYl
21 inch Monitor-- amzn.to/3TvCE7E
27 inch Monitor-- amzn.to/47QzXlA
iPad Pencil:-- amzn.to/4aiJxiG
iPad 9th Generation:-- amzn.to/470I11X
Boom Arm/Swing Arm:-- amzn.to/48eH2we
My PC Components:-
intel i7 Processor:-- amzn.to/47Svdfe
G.Skill RAM:-- amzn.to/47VFffI
Samsung SSD:-- amzn.to/3uVSE8W
WD blue HDD:-- amzn.to/47Y91QY
RTX 3060Ti Graphic card:- amzn.to/3tdLDjn
Gigabyte Motherboard:-- amzn.to/3RFUTGl
O11 Dynamic Cabinet:-- amzn.to/4avkgSK
Liquid cooler:-- amzn.to/472S8mS
Antec Prizm FAN:-- amzn.to/48ey4Pj

Пікірлер: 114

  • @tnmyk_
    @tnmyk_4 ай бұрын

    Faadu explanation! Finally someone explained why Lazy evaluation actually works betters for Big Data processing. Amazing examples, very nice code! Loved the way you explained each line and each job step by step

  • @Watson22j
    @Watson22j Жыл бұрын

    wow! very nicely explained. Thank you! :)

  • @kavyabhatnagar716
    @kavyabhatnagar7169 ай бұрын

    Wow! Thank you for such great explanation. ❤

  • @aasthagupta9381
    @aasthagupta93819 күн бұрын

    You are an excellent teacher, you make lectures so interesting! ye answer dekar to interview ko sikha denge :D

  • @abinashpandit5297
    @abinashpandit529710 ай бұрын

    Very good Bhaiya. Aaj bhaut kuch isme indepth sikhne ko Mila jo phele pata hi nahi tha. Keep it up 👍

  • @SanjayKumar-rw2gj
    @SanjayKumar-rw2gj16 күн бұрын

    Truly impressed Manish bhai. Great explanation as you mentioned already "Itna detail mein kahin nhi milega"

  • @krishnavamsirangu1727
    @krishnavamsirangu17272 ай бұрын

    Hi Manish Thanks for explaining the concept in detail by running the code. I have understood the concept of dag ,lazy evaluation and optimization.

  • @prabhakarkumar8022
    @prabhakarkumar80223 ай бұрын

    Awesome bhaiyaji!!!!!

  • @akhiladevangamath1277
    @akhiladevangamath1277Ай бұрын

    Thank you Thank you Thank you Manish for this video✨✨✨

  • @prathapganesh7021
    @prathapganesh70213 ай бұрын

    Thank you so much for clarifying my doubts 🙏

  • @pramod3469
    @pramod3469 Жыл бұрын

    very well explained...thanks Manish

  • @rishav144
    @rishav144 Жыл бұрын

    great video Manish bro

  • @bhavindedhia3976
    @bhavindedhia39763 ай бұрын

    amazing content

  • @manojkaransingh5848
    @manojkaransingh584811 ай бұрын

    @wow...!..v.nice bro

  • @akashprabhakar6353
    @akashprabhakar63532 ай бұрын

    Awesome lecture...thanks a lot!

  • @arijitsamaddar268
    @arijitsamaddar268Ай бұрын

    bohot sahi explanation !

  • @yugantshekhar782
    @yugantshekhar7823 ай бұрын

    Great explanation sir, really helpful!

  • @SqlMastery-fq8rq
    @SqlMastery-fq8rq3 ай бұрын

    very well explained Sir..Thank You.

  • @220piyush
    @220piyush2 ай бұрын

    Maza aa gya lekin video dekh ke... Wahhh❤

  • @ankitachauhan6084
    @ankitachauhan6084Ай бұрын

    thank you ! great teaching style

  • @prabhatgupta6415
    @prabhatgupta6415 Жыл бұрын

    He has mastered and crunched the spark.

  • @mission_possible
    @mission_possible10 ай бұрын

    Thanks for the session and Please make video on Spark Lineage

  • @souradeep.chatterjee
    @souradeep.chatterjee3 ай бұрын

    Detailed Explanation. Better than paid lectures.

  • @a26426408
    @a264264082 ай бұрын

    Very well explained.

  • @prasadBoyane
    @prasadBoyane2 ай бұрын

    I think spark considers 'sum' as action. hence 4 jobs. Greatt series !!!

  • @ChandanKumar-xj3md
    @ChandanKumar-xj3md Жыл бұрын

    "Job kaise create hota hai?" ye question pehle kabi clear nai hua tha but thanks Manish for clearing this out and add on was lazy evaluation understanding. 👍

  • @vsbnr5992

    @vsbnr5992

    Жыл бұрын

    NameError: name 'flight_data_repartition' is not defined what to do in this case even i import functions and types from pyspark please I stuck here

  • @mmohammedsadiq2483
    @mmohammedsadiq24839 ай бұрын

    I have confusion? read and inferSchema are typically used with Spark's DataFrame API, which is part of Spark SQL. They are not transformations or actions ,part of the logical and physical planning phase of Spark, which occurs before any actions are executed

  • @arpitchaurasia5132
    @arpitchaurasia51324 ай бұрын

    bhai gajabe padate ho yar maja hi a gya yar

  • @abhilovefood4102
    @abhilovefood410211 ай бұрын

    Sir ur teaching is good

  • @choubeysumit246
    @choubeysumit246Ай бұрын

    one Action one job is true for rdd api only. one action in dataframe or dataset can lead to multiple actions being generated internally. or sue to adaptive query executions as well multiple jobs are created in databricks which you can see using describe method

  • @tahiliani22
    @tahiliani223 ай бұрын

    Awesome. By the way, do we know why its creating 4 Spark Jobs instead of 3 ?

  • @prateekpawar1871
    @prateekpawar187110 ай бұрын

    Do you have theory notes for spark?

  • @SanjayKumar-rw2gj
    @SanjayKumar-rw2gj16 күн бұрын

    Is there any cheat sheet to know what all are transformations and actions, like read is a action whereas filter is a transformation?

  • @deepaliborde25
    @deepaliborde256 ай бұрын

    where is the practical session link ?

  • @abhishekchaturvedi9855
    @abhishekchaturvedi98557 ай бұрын

    Hello Manish. When you mentioned the sql query gets optimized by spark. Just wanted to know will it help improve the execution time if we use the optimized query in our code itself so that spark need not do it ?

  • @manish_kumar_1

    @manish_kumar_1

    7 ай бұрын

    Spark optimization is very limited. So as a developer we should write optimized code to run our process faster

  • @ruinmaster5039
    @ruinmaster503911 ай бұрын

    Bro Plese add summery at the end.

  • @user-gt3pi6ir5u
    @user-gt3pi6ir5u3 ай бұрын

    any idea now, where the 4th job came from?

  • @DevendraYadav-yz2so
    @DevendraYadav-yz2so9 ай бұрын

    Databricks community ko kaise use karege, Spark kaise setup karege databricks ke sath. Please ye bata dijiye so that code write kr sake

  • @techworld5477
    @techworld54774 ай бұрын

    Hi Sir..jab main yeh code run kar raha hoon I am getting error as--name 'col' is not defined isko kaise solve kare?

  • @jasvirsinghwalia401
    @jasvirsinghwalia401Ай бұрын

    Sir Read an inferschema to Transformations hai na and not actions? to inki alag jobs kyu bani hai?

  • @krushitmodi3882
    @krushitmodi388211 ай бұрын

    Sir please ye series thodi jaldi finish karo taki ham interview de sake mene apki puri channel dekhli hai Thank you

  • @ordinary_indian
    @ordinary_indian4 ай бұрын

    where to find the files ? I just have started the course

  • @pramod3469
    @pramod3469 Жыл бұрын

    is lazy evaluation consider the partition also like after we have applied orderby on salary col and now we want to show only first two highest salary so will lazy evaluation also works here spark will process only that partition which has these two salary records or it will process all partitions and then extract first two highest salary record for us

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Yes until you write .head(2) for 2 highest record your process will not start although in backend it will create DAG.

  • @manish_kumar_1
    @manish_kumar_1 Жыл бұрын

    Directly connect with me on:- topmate.io/manish_kumar25

  • @maurifkhan3029
    @maurifkhan302910 ай бұрын

    I too got confused as to why sometimes number of jobs as more or less than Actions. Try clearing the state using menu option run ->clear state and option and then run the cell again which has code from reading of file till all the things you want to perform . I think Data bricks intelligently stores state of system and later when you run same read command the Jobs count might not match I tried this and it seems to be working

  • @jatinyadav6158

    @jatinyadav6158

    5 ай бұрын

    Jobs count is right it is 4 only because sum() function is an action, which I guess Manish missed by mistake. Btw @Manish thank you so much for the amazing course.

  • @deepanshuaggarwal7042

    @deepanshuaggarwal7042

    2 ай бұрын

    @@jatinyadav6158 If 'sum' is an action then why it didn't create a job before adding 'show' codeline ?

  • @jatinyadav6158

    @jatinyadav6158

    2 ай бұрын

    @deepanshuaggarwal7042 yes sum is an action, I am not sure why it didn't show a job earlier

  • @vaibhavdimri7419
    @vaibhavdimri7419Ай бұрын

    Sir apko samjh aya ki ek action hit karne par 2 jobs kaise create hui?

  • @ChetanSharma-oy4ge
    @ChetanSharma-oy4ge4 ай бұрын

    I am trying to find , why 4 jobs are generating here although we have provided only 3 actions

  • @prabhatsingh7391
    @prabhatsingh739110 ай бұрын

    Hi Manish Bhaiya, in the code snippet you told there are three actions in this applications(read, infer schema and show) but in spark ui there are 4 jobs created ,can you please explain this.

  • @manish_kumar_1

    @manish_kumar_1

    10 ай бұрын

    1 job skip hua hoga. Agar data Kam hai to explain karke dekhiye 3 aana chahiye

  • @ankitas4019
    @ankitas40193 ай бұрын

    Where he explained about fligh data download

  • @chethanmk5852
    @chethanmk58524 ай бұрын

    Why do we have 4 jobs when we are using only 3 actions in the application??

  • @aditya_1005
    @aditya_100511 ай бұрын

    well explained.....Sir could you please clarify, 3 actions and 4 jobs created?

  • @manish_kumar_1

    @manish_kumar_1

    11 ай бұрын

    Aapke 3 actions me 4 jobs create hue hai? Aapne show use kara hai? And aap apna code v paste kar dijiye comment section me

  • @hazard-le7ij123

    @hazard-le7ij123

    9 ай бұрын

    @@manish_kumar_1 Aapne jo code likha hai usme bhi 4 jobs create hue hain. Can you explain that? Below is my code and same thing is happening. 4 Jobs are getting created. Stage is getting skipped but why do we have an extra job with 4 diff Job Ids? from pyspark.sql import SparkSession from pyspark.sql.functions import * spark = SparkSession.builder.master('local[5]') \ .appName("Lazy Evaluation internal working") \ .getOrCreate() flight_data = spark.read.format("csv")\ .option("header","true")\ .option("inferSchema","true")\ .load("D:\\Spark\\flight_data.csv") flight_data_repartition = flight_data.repartition(3) us_flight_data = flight_data.filter(col("DEST_COUNTRY_NAME")=='United States') us_india_data = us_flight_data.filter((col("ORIGIN_COUNTRY_NAME")=='India') | (col("ORIGIN_COUNTRY_NAME")=='Singapore')) total_flight_ind_sing = us_india_data.groupby("DEST_COUNTRY_NAME").sum("count") total_flight_ind_sing.show() input("Enter to terminate")

  • @avanibafna6207
    @avanibafna6207 Жыл бұрын

    In my case same code has created 5 jobs?I have import col so it will also be treated as action and new job will be created is it so?

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Can you please paste your code in comment section

  • @avanibafna6207

    @avanibafna6207

    Жыл бұрын

    @@manish_kumar_1 from pyspark.sql.functions import col flight_data=spark.read.format("csv")\ .option("header","true")\ .option("inferSchema","true")\ .load("dbfs:/FileStore/tables/flight_data.csv") flight_data_reparition=flight_data.repartition(3) us_flight_data=flight_data_reparition.filter("DEST_COUNTRY_NAME='United States'") us_india_data=us_flight_data.filter((col("ORIGIN_COUNTRY_NAME")=='India')|(col("ORIGIN_COUNTRY_NAME")=='Singapore')) total_flight_ind_sing=us_india_data.groupby("DEST_COUNTRY_NAME").sum("count") total_flight_ind_sing.show() (5) Spark Jobs Job 22 View(Stages: 1/1) Job 23 View(Stages: 1/1) Job 24 View(Stages: 1/1) Job 25 View(Stages: 1/1, 1 skipped) Job 26 View(Stages: 1/1, 2 skipped) flight_data:pyspark.sql.dataframe.DataFrame = [DEST_COUNTRY_NAME: string, ORIGIN_COUNTRY_NAME: string ... 1 more field] flight_data_reparition:pyspark.sql.dataframe.DataFrame = [DEST_COUNTRY_NAME: string, ORIGIN_COUNTRY_NAME: string ... 1 more field] us_flight_data:pyspark.sql.dataframe.DataFrame = [DEST_COUNTRY_NAME: string, ORIGIN_COUNTRY_NAME: string ... 1 more field] us_india_data:pyspark.sql.dataframe.DataFrame = [DEST_COUNTRY_NAME: string, ORIGIN_COUNTRY_NAME: string ... 1 more field] total_flight_ind_sing:pyspark.sql.dataframe.DataFrame = [DEST_COUNTRY_NAME: string, sum(count): long] +-----------------+----------+ |DEST_COUNTRY_NAME|sum(count)| +-----------------+----------+ | United States| 100| +-----------------+----------+

  • @snehalkathale98
    @snehalkathale983 ай бұрын

    Where I get CSV file

  • @AmitSharma-ow8wm
    @AmitSharma-ow8wm Жыл бұрын

    waiting for ur next vidio...

  • @AmitSharma-ow8wm

    @AmitSharma-ow8wm

    Жыл бұрын

    @@rampal4570 is it true bro

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Aaj aa jayega

  • @vsbnr5992

    @vsbnr5992

    Жыл бұрын

    @@AmitSharma-ow8wm NameError: name 'flight_data_repartition' is not defined what to do in this case even i import functions and types from pyspark please I stuck here

  • @abhayjr11
    @abhayjr11Ай бұрын

    Bhai iske phle wala video dedo, mujhe mil nhi rha hai..

  • @mahendrareddych334
    @mahendrareddych3343 ай бұрын

    Bro, you are explaining superbly but why don't you explain in English. Everyone doesn't know Hindi. I don't know Hindi but watching your videos to understand the concepts but not getting it fully because it was explained in Hindi.

  • @shrikantpandey6401
    @shrikantpandey6401 Жыл бұрын

    could you provide notebook link?. It will good for hands on

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    I don't provide notebook or pdf. Take notes and type every line of code by yourself. This will give you confidence

  • @sankuM

    @sankuM

    Жыл бұрын

    @@manish_kumar_1 this is indeed really great point! However, if possible, do share your own reference material for our benefit! Thanks! This series is really helpful, I've 4+ YoE in DE but never tried to go into spark internals, now while interviewing for switch, I'm definitely going to utilize all this! Keep 'em coming!! 🙌🏻👏🏻

  • @chandanpatra1053
    @chandanpatra10534 ай бұрын

    ek code likha hai using spark .usse dekhkar kese bataya ja sakta hai ki wo code 'action' hai ya 'transformation' hai.

  • @manish_kumar_1

    @manish_kumar_1

    4 ай бұрын

    Aapko google karke pata karna chahiye ki kon kon se actions hai. Rest are transformation

  • @asif50786
    @asif50786 Жыл бұрын

    How many more videos to come on Apache spark??

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Around 20-30. It's just beginning of spark

  • @anirbanadhikary7997
    @anirbanadhikary7997 Жыл бұрын

    Aj apne interview questions bataya Nehi.

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Basic questions would be there. Like what is DAG and what is edges and vertices in it.

  • @user-ww6yf3iq8q
    @user-ww6yf3iq8q2 ай бұрын

    Because of group by jobs is created

  • @manish_kumar_1

    @manish_kumar_1

    2 ай бұрын

    Nope

  • @raghavsisters
    @raghavsisters11 ай бұрын

    Why it is called acyclic?

  • @manish_kumar_1

    @manish_kumar_1

    11 ай бұрын

    Because it doesn't make cycle. If it's get into cycle or you can consider it as a circle then it will run endlessly

  • @khurshidhasankhan4700
    @khurshidhasankhan47007 ай бұрын

    Sir csv read karne pr two jobs kaise create ho raha hai, read only one action call kr rahe hain. If possible please clearify

  • @manish_kumar_1

    @manish_kumar_1

    7 ай бұрын

    Aur inferschema v use kiye honge. Isliye aa rha hoga

  • @khurshidhasankhan4700

    @khurshidhasankhan4700

    7 ай бұрын

    @@manish_kumar_1 thank you sir, can you please share the action list how many action hai spark me, if possible please share sir

  • @akhilgupta2460
    @akhilgupta2460 Жыл бұрын

    Hi manish Bhai, Could u provide the flight data file.

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Kisi ek video me bataya tha. Please follow all videos in sequence

  • @ajaysinghjadoun9799
    @ajaysinghjadoun9799 Жыл бұрын

    please make a video in Windows function

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Sure

  • @ajaysinghjadoun9799

    @ajaysinghjadoun9799

    Жыл бұрын

    Sir also considers spark 5s problems.spill, shuffle, storage, etc

  • @shivakrishna1743
    @shivakrishna1743 Жыл бұрын

    Where can I get the flight_data.csv file? Please help.

  • @shivakrishna1743

    @shivakrishna1743

    Жыл бұрын

    Got the file, thanks

  • @navjotsingh-hl1jg
    @navjotsingh-hl1jg Жыл бұрын

    bhai iski file de do iss lecture ki

  • @Tushar0797
    @Tushar0797 Жыл бұрын

    bhai please vo extra job kese create hua ye doubt clear krdo

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Aap sql tab me jaake dekho. Kitne jobs skip hue hai. And share me your code and screenshot of the sql tab on LinkedIn or Instagram.

  • @rohitgade2382

    @rohitgade2382

    10 ай бұрын

    @@manish_kumar_1 abe chutiya tere video ka bol Raha he wo 😂

  • @DevendraYadav-yz2so
    @DevendraYadav-yz2so9 ай бұрын

    Lec 7 Tak view kr liya Jo aap code dika rahe hai usko kise databricks and pyspark

  • @manish_kumar_1

    @manish_kumar_1

    9 ай бұрын

    Aapko practical and fundamentals sath me dekhne hai. First video me hi bataya tha

  • @vsbnr5992
    @vsbnr5992 Жыл бұрын

    NameError: name 'flight_data_repartition' is not defined what to do in this case even i import functions and types from pyspark please I stuck here

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Seems like your df is not defined

  • @vsbnr5992

    @vsbnr5992

    Жыл бұрын

    @@manish_kumar_1 ok working now thanks

  • @Tanc369
    @Tanc3693 ай бұрын

    csv kaha milegi sir ?

  • @manish_kumar_1

    @manish_kumar_1

    3 ай бұрын

    2 playlist hai. Parallely dekhiye. Practical wale me data milega description me usko copy karke save Kar lijiye as csv

  • @amlansharma5429
    @amlansharma5429 Жыл бұрын

    us_india_data = us_flight_data.filter((col("ORIGIN_COUNTRY_NAME") == 'India') | (col("ORIGIN_COUNTRY_NAME") == 'Singapore')) Ismein error bata raha hai : NameError: name 'col' is not defined Isko kaise define kare?

  • @AliKhanLuckky

    @AliKhanLuckky

    Жыл бұрын

    Col ko import karna padenga voh ek function hai toh import functions karo I think so

  • @manish_kumar_1

    @manish_kumar_1

    Жыл бұрын

    Correct "from pyspark.sql.functtions import *"

  • @vsbnr5992

    @vsbnr5992

    Жыл бұрын

    @@AliKhanLuckky NameError: name 'flight_data_repartition' is not defined what to do in this case even i import functions and types from pyspark please I stuck here

  • @vsbnr5992

    @vsbnr5992

    Жыл бұрын

    @@manish_kumar_1 NameError: name 'flight_data_repartition' is not defined what to do in this case even i import functions and types from pyspark please I stuck here

  • @3mixmusic564
    @3mixmusic5649 ай бұрын

    Guru ibutton khi nhi aaya na idhr na udhr😂😂😂

  • @manish_kumar_1

    @manish_kumar_1

    9 ай бұрын

    Bhaari mistake ho gaya 😂

  • @aishwaryamane5732
    @aishwaryamane57324 ай бұрын

    Hi sir.. In which video series u have explained about schema @manish_kumar_1