Spark Job, Stages, Tasks

Spark Job, Stages, Tasks | Lec-11

In this video I have talked about how jobs, stages and task is created in spark. If you want to optimize your process in Spark then you should have a solid understanding of this concept.
Directly connect with me on:- topmate.io/manish_kumar25
For more queries reach out to me on my below social media handle.
Follow me on LinkedIn:- / manish-kumar-373b86176
Follow Me On Instagram:- / competitive_gyan1
Follow me on Facebook:- / manish12340
My Second Channel -- / @competitivegyan1
Interview series Playlist:- • Interview Questions an...
My Gear:-
Rode Mic:-- amzn.to/3RekC7a
Boya M1 Mic-- amzn.to/3uW0nnn
Wireless Mic:-- amzn.to/3TqLRhE
Tripod1 -- amzn.to/4avjyF4
Tripod2:-- amzn.to/46Y3QPu
camera1:-- amzn.to/3GIQlsE
camera2:-- amzn.to/46X190P
Pentab (Medium size):-- amzn.to/3RgMszQ (Recommended)
Pentab (Small size):-- amzn.to/3RpmIS0
Mobile:-- amzn.to/47Y8oa4 ( Aapko ye bilkul nahi lena hai)
Laptop -- amzn.to/3Ns5Okj
Mouse+keyboard combo -- amzn.to/3Ro6GYl
21 inch Monitor-- amzn.to/3TvCE7E
27 inch Monitor-- amzn.to/47QzXlA
iPad Pencil:-- amzn.to/4aiJxiG
iPad 9th Generation:-- amzn.to/470I11X
Boom Arm/Swing Arm:-- amzn.to/48eH2we
My PC Components:-
intel i7 Processor:-- amzn.to/47Svdfe
G.Skill RAM:-- amzn.to/47VFffI
Samsung SSD:-- amzn.to/3uVSE8W
WD blue HDD:-- amzn.to/47Y91QY
RTX 3060Ti Graphic card:- amzn.to/3tdLDjn
Gigabyte Motherboard:-- amzn.to/3RFUTGl
O11 Dynamic Cabinet:-- amzn.to/4avkgSK
Liquid cooler:-- amzn.to/472S8mS
Antec Prizm FAN:-- amzn.to/48ey4Pj

Пікірлер: 157

@manish_kumar_1 Жыл бұрын
Directly connect with me on:- topmate.io/manish_kumar25
@ganeshgawde83436 ай бұрын
In Apache Spark, the spark.read.csv() method is neither a transformation nor an action; spark.read.csv() is a method used for initiating the reading of CSV data into a Spark DataFrame, and it's part of the data loading phase in Spark's processing model. The actual reading and processing of the data occur later, driven by Spark's lazy evaluation model.
@ChetanSharma-oy4ge
4 ай бұрын
Fir action jobs kaise ban rahe hai? Mtlb ager action is equal to jobs , to better way kya hai find out kerne ka?
@roshankumargupta46
Ай бұрын
If you don't explicitly provide a schema, Spark will read a portion of the data to infer it. This triggers a job to read data for schema inference.
@mantukumar-qn9pv Жыл бұрын
Thanks Manish Bhai...Please keep continue your video
@naturehealingandpeace26586 ай бұрын
Wow Kya clear explanation tha,first time understood in.one.go
@Useracwqbrazy Жыл бұрын
I really liked this video....nobody explained at this level
@shorakhutte18877 ай бұрын
bro next level ka explanation tha... thanks for sharing your great knowledge. keep up the good work. Thanks
@rahuljain8001 Жыл бұрын
Didn't find such a detailed explanation, Kudos
@rohitbhawle86589 ай бұрын
nice explain ,each and every concept you clear keep it up
@saumyasingh9620 Жыл бұрын
This was so beautifully explained.
@nishasreedharan61752 ай бұрын
One of the best videos ever . Thank you for this . Really helpful.
@jilsonjoe6259 Жыл бұрын
Great Explanation ❤
@ShrinchaRani Жыл бұрын
Explained so well that too bit- by- bit 👏🏻
@satyammeena-bu7kp28 күн бұрын
Really Awsome Explanation ! Esa Explanation kabhi or ni mil sakta hai Thank you so much
@mrinalraj48012 ай бұрын
Great Manish. I am grateful to you for making such rare content with so much depth. You are doing a tremendous job by contributing towards the Community. Please keep up the good work and stay motivated. We are always here to support you.
@GauravKumar-im5zx Жыл бұрын
glad to see you brother
@arshadmohammed10902 ай бұрын
great job bro, you are doing well.
@maruthil517911 ай бұрын
Very well explained bhai.
@sudipmukherjee6878 Жыл бұрын
Bhai was eagerly waiting for your videos
@Rafian192410 ай бұрын
Your channel will grow immensely bro. Keep it up❤
@mayankkandpal15657 ай бұрын
nice explanation.
@prashantsrivastava902610 ай бұрын
Nicely explained
@utkarshkumar67034 ай бұрын
Bhai bahut sahi explain karte ho aap
@stuti57009 ай бұрын
very good content. Please make detail videoes on spark job optimization
@vinitsunita11 ай бұрын
Very good explaination bro
@akhiladevangamath1277Ай бұрын
Thank you so much Manish
@salmansayyad45222 ай бұрын
excellent!!
@samirdeshmukh988610 ай бұрын
thank you sir..
@tanushreenagar31165 ай бұрын
VERY VERY HELPFUL
@ADESHKUMAR-yz2el2 ай бұрын
bhaiya you are grate
@asif50786 Жыл бұрын
Start a playlist with guided projects ,,so that we can apply these things in real life..
@user-tb8ry2jl7s8 ай бұрын
1 job for read, 1 job for print 1 1 job for print 2 1 job for count 1 job for collect total 5 jobs according to me but i have not run the code not sure
@aditya9c3 ай бұрын
waah
@sugandharaghav260911 ай бұрын
I remember wide dependency you explained in shuffling.
@mohitgupta734110 ай бұрын
Bro amazing explanation
@sairamguptha9988 Жыл бұрын
Glad to see you manish... Bhai any update on project details?
@lucky_raiser Жыл бұрын
bro, how many more days will spark series take and will you make any complete DE project with spark at last. BTW watched and implemented all your theory and practical videos. Great sharing❤
@roshankumargupta46Ай бұрын
If you don't explicitly provide a schema, Spark will read a portion of the data to infer it. This triggers a job to read data for schema inference. If you disable schema inference and provide your own schema, you can avoid the job triggered by schema inference.
@arpanscreations6954
Күн бұрын
Thanks for your clarification
@Food_panda-hu6sj10 ай бұрын
One question: after groupby by default 200 partitions will be created where each partition will hold data for individual key. What happens if there are less keys like 100 , will it lead to formation of only 100 partition insted of 200? AND What happens if the individual keys are more than 200 in number, will it create more than 200 partitions?
@praveenkumarrai101 Жыл бұрын
❤
@jay_rana4 ай бұрын
what if the spark.sql.shuffle.partitions is set to some value, in this case what will be the no of tasks after/in groupby stage ?
@AnandVerma3 ай бұрын
Num Jobs = 2 Num Stages = 4 (job1 = 1, job2 = 3) Num Tasks = 204 (job1 = 1, job2 = 203)
@user-ru2fy8nx1z4 ай бұрын
I have one doubt there are 3 actions are there such as read,collect and count, but why it is creating 2 job only ?
@tejathunder
2 ай бұрын
In Apache Spark, the read operation is not considered an action; it is a transformation.
@deeksha65143 ай бұрын
An Executor can have 2 partitions or is it like that partitions means it will be there on two different machines.
@divyabhansali218211 ай бұрын
Is 200 default task in group by even if only 3 distinct ages are there?If so what will be there in rest of the 197 task (which age group will be there)
@ChetanSharma-oy4ge4 ай бұрын
Ek question tha ki, order kya hona chahaye likhne ka, Mtlb ki ager hum filter/select/partition/group by/distinct/count ya or kuch bhi ker rahe hai to, sabsay pehla kya likhna chahaye…
@shekharraghuvanshi22679 ай бұрын
Hi Manish, in the second job there were 203 tasks and 1st job there was 1, so in total 204 tasks are there in complete application? i am bit confused between 203 and 204. Kindly clarify..
@kyransingh82096 ай бұрын
@manish_kumar_1 : correction - job2 - stage2 is till group by, and job2 - stage 3 is till collect
@gauravkothe95587 ай бұрын
Hii sir, i have one doubt like collect will create the task and stage or not because you mentioned like 203 task
@gujjuesports37987 ай бұрын
.count on group dataset is transformation not action, correct ? if it was like employee_df.count() then it would be action
@kumarankit4479Ай бұрын
Hi bhaiya. Why havent we considered collect() as a job creator here in the program you discussed?
@jatinyadav61585 ай бұрын
Hi @manish_kumar_1, I have one question in wide transformation you said that in groupBy stage3 there will be 200 tasks according to the 200 partitions. But can you tell me why these 200 partitions happened in the first place.
@sharmadtadkodkar37313 ай бұрын
What command did you use to run the job?
@VenkataJaswanthParlaАй бұрын
Hi Manish, Count() is also an action right ? If not can you please explain what is count()
@codingwithanonymous8908 ай бұрын
Sir make playlist for other data engineer tools also
@ramyabhogaraju2416 Жыл бұрын
I have been waiting for your video how many more days will it take spark series to complete
@moyeenshaikh43788 ай бұрын
bhai job 2 to collect se chalu hoga na? to read ke bad se collect tak job 1 hi chalega na?
@ranvijaymehta Жыл бұрын
Thanks Sir
@user-hb6vq9rg1h Жыл бұрын
Can i know about executors. How many executors will be there in worker node? And Is the no.of executors depend on no.of cores in worker node?
@dpkbit084 ай бұрын
Same example I tried and in my case, 4 jobs are created. Is there any other config needed?
@user-lh7bw6vw4e7 ай бұрын
One query @Manish , spark.read is a transformation and not an action right?
@nikhilhimanshu97585 ай бұрын
kya hota agar filter ke baad ek aur narrow transformation hota like filter --> flatmap--> select iska kitna task banta ?
@parthagrawal55163 ай бұрын
After repartition(3) Still 200 default partition will show there on dag Sir
@sravankumar17674 ай бұрын
Nice explanation 👌 👍 👏 but can you please explain in English. Every one can see all over the world 🌎 ✨
@kumarabhijeet8968 Жыл бұрын
Manish bhai total kitne videos rhenge theory or practical wale series mein?
@user-rh1hr5cc1r3 ай бұрын
7:40 print is an action ,so it should be 4 job in given code. ryt????correct me if i am wrong
@lalitgarg89652 ай бұрын
count is also a action, so there would be 3 jobs?
@venugopal-nc3nz Жыл бұрын
Hi Manish, In databricks also when groupby() invoke it create 200 task by default ? How to reduce 200 task when using group by() for optimizing spark job ?
@manish_kumar_1
Жыл бұрын
There is a configuration which can be set. Just google how can I set fix number of partition after join
@princekumar-li6cm2 ай бұрын
Count is also an action.
@manish_kumar_1
2 ай бұрын
And transformation too
@manish_kumar_1
2 ай бұрын
And transformation too
@narag98026 ай бұрын
do you have English version of videos
@sharma-vasundhara5 ай бұрын
Sir, in line 14 - we have .groupby and .count .count is an action, right? Not sure if you missed it by mistake or if it doesn't count as an action? 🙁
@tanmayagarwal3481
4 ай бұрын
I had the same doubt,Did you get the answer to this question? As per the UI also it has mentioned only 2 jobs whereas count should be an action :(
@wgood63973 ай бұрын
pls enable subtite in all video bro
@saumyasingh962011 ай бұрын
When I ran in notebook, it gave 5 jobs like below, and not only 2 for this snippet of code. Can you explain.: Job 80 View(Stages: 1/1) Stage 95: 1/1 Job 81 View(Stages: 1/1) Stage 96: 1/1 Job 82 View(Stages: 1/1) Stage 97: 1/1 Job 83 View(Stages: 1/1, 1 skipped) Stage 98: 0/1 skipped Stage 99: 2/2 Job 84 View(Stages: 1/1, 2 skipped) Stage 100: 0/1 skipped Stage 101: 0/2 skipped Stage 102: 1/1
@garvitgarg67498 ай бұрын
Bhai, I'm using Spark 3.4.1 and in that when I group data using groupby (I have 15 records in dummy dataset) it create 4 jobs to process 200 partitions why ? Is this the latest enhancement ? and not only in latest version but also in spark 3.2.1 I observed same thing. Could you please explain this ?
@villagers_01
27 күн бұрын
bhai job action pe create hota h but no. of action != no of jobs. kyonki jobs create hota h jab new rdd ka jarurat hota h. in groupby we need shuffling of data aur rdd immutable hota h to naya rdd banana hi padta h after shuffling. isliye jab bhi naya rdd banane ka jarurat hota h to ek job create hota.aapke case 1 job read ke liye, 1 job schema ke liye,1 job shuffling ke liye aur ek job display ke liye.
@serenitytime1959Ай бұрын
could you please upload the required files, I just want to run and see by myself.
@codetechguru14 ай бұрын
why not stage 4, becuse you say each job has minimum one stage and one task so why job 3 don't included to stage and task ?
@mr.random20016 ай бұрын
@manish_kumar_1 In the previous videos you said like count() as action, but in these video you are not taking that as action, WHY ??
@AnkitNubАй бұрын
I have a question, if one job has 2 consecutive wide dependency transformation then 1 narrow dependency and again 1 wide dependency how many stages will be created. Suppose repartition, after that groupby, then filter and then join, how many stages will this create?
@jaisahota4062
7 күн бұрын
same question
@riteshrajsingh7437Ай бұрын
Hi manish, i have a doubt in groupby count is also a action then why it is not counted as a action?
@manish_kumar_1
Ай бұрын
Aage kuch videos me clear ho jayega
@prabhatsingh739111 ай бұрын
Hi Manish , count is also an action and you have written count just after group by in code snippet,why count is not considered as job here.
@manish_kumar_1
11 ай бұрын
Aage me videos me iska explanation mil jayega. Count action v hai and transformation v. Kab kon sa hoga uske liye aage videos me detailed me explain kara hai
@MiliChoksi-gc8if2 ай бұрын
So if any group by is there we have to consider 200 task?
@manish_kumar_1
2 ай бұрын
Yes if aqe is disabled. If it is enabled then count depends on data volume and default parallelism
@ankitachauhan6084Ай бұрын
why was count() not counted as action while countuing in jobs ?
@manish_kumar_1
Ай бұрын
Aage ke lecture me samjh me aayega
@Watson22j Жыл бұрын
koi baat nhi bhaia, bs ye series poora khatam kr dena kyoki etne detail me yt pe kisi nhi nhi btaya hai.
@dilipkuncha57288 ай бұрын
Hi manish , will count() not be considered as a action ?
@manish_kumar_1
8 ай бұрын
Count action and transformation both hai. Aage ke lecture me clear ho jayega
@wgood63973 ай бұрын
pls enable subtitle bro
@pankajrathod59066 ай бұрын
File ka data kaha se lu.. aapne data kaha diya hai
@RahulAgarwal-uh3pfАй бұрын
jo code snipet suru m dikhaya h usme count bhi ek job h right?
@manish_kumar_1
Ай бұрын
Nhi aage ke lecture me aapko pata chalega why
@venumyneni6696 Жыл бұрын
Hi Manish, Why doesn't the collect() method create a new stage (stage 4) in Job2 as it needs to send the data from 200 partitions into the driver node ?
@manish_kumar_1
Жыл бұрын
Collect is an action not a transformation
@venumyneni6696
Жыл бұрын
@@manish_kumar_1 Thanks for the reply Manish. What happens after the groupBy in this case ? Spark transfers the data in 200 partitions to driver right ? Don't we need any tasks for that process ? Thanks in advance.
@manish_kumar_1
Жыл бұрын
@@venumyneni6696 I think you are missing some key point of driver and executor. Please clear your basics, read multiple blogs or watch my videos in sequence
@AAMIRKHAN-qy2cl
7 ай бұрын
correct but you said that for every action 1 stage will be created so total stages should be 5 ,@@manish_kumar_1
@rajukundu1308 Жыл бұрын
Number of actions is equal to number of jobs. In mentioned code snipped there was thrre actions (read.count,collect) . As per theory three job id should create. But in spark ui only two job is created. Can you help me on this.
@rajukundu1308
Жыл бұрын
why three job id not credited?.
@manish_kumar_1
Жыл бұрын
Count is a transformation and action both. In the given example it is working as transformation not as an action. I will be uploading aggregation video soon. There you will get to know more about count behavior
@rajukundu1308
Жыл бұрын
@@manish_kumar_1 thanks for prompt response.sure, eagerly waiting for your new video
@mahnoorkhalid64969 ай бұрын
I have executed the same job, but it created 4 jobs with each having 1 stage and 1 task. I think for every wide transformation, it created a new job. Please please confirm and guide.
@villagers_01
27 күн бұрын
bhai job action pe create hota h but no. of action != no of jobs. kyonki jobs create hota h jab new rdd ka jarurat hota h. in groupby we need shuffling of data aur rdd immutable hota h to naya rdd banana hi padta h after shuffling. isliye jab bhi naya rdd banane ka jarurat hota h to ek job create hota.
@ruinmaster5039 Жыл бұрын
Bro If i have created 3 partatition then it will create 3 tasks? does it correct?
@manish_kumar_1
Жыл бұрын
Yes
@amlansharma54299 ай бұрын
Manish Bhai, count() bhi ek action hai na? Uska alag job nahi bani q?
@manish_kumar_1
9 ай бұрын
Already bataya hai ek lecture me why count shows 2 kind of behavior
@amlansharma5429
9 ай бұрын
@@manish_kumar_1 sorry thoda peeche hogaya hu...catchup karta hu jaldi ♥️
@user-zd7fg5to7g Жыл бұрын
Hi Thank you for sharing detail video. I tried same code on my laptop but I get total 5 jobs (0 to 4) and final task to just 1 even after the default partition setting is 200..could you please explain why I see multiple jobs in DAG and final task of just 1. Thank you once again.
@user-zd7fg5to7g
Жыл бұрын
and also I get same output on databrick, gcp and pycharm
@manish_kumar_1
Жыл бұрын
first of all use collect in place of show. Show ke case me i have seen it shows skipped job as a count. And ye 200 set karne ke baad kyu nahi kaam kar rha, i am not sure
@rpraveenkumar007
Жыл бұрын
experiencing same
@manish_kumar_1
Жыл бұрын
@@rpraveenkumar007 share me your code
@mahnoorkhalid6496
9 ай бұрын
@@manish_kumar_1 I am having the same issue, same code and it created 4 jobs each having 1 stage and 1 task. may be its because of wide transformation
@amazhobner6 ай бұрын
Around 6:35 it is wrong, read can avoid being a action, if inside schema you pass a manual created schema containing a list of all columns. Refer for practical: kzread.info/dash/bejne/iICdm7mMaLawdrw.html
@prachideokar76393 ай бұрын
From Where I get the spark practical videos
@manish_kumar_1
3 ай бұрын
There is a playlist called spark practical
@prachideokar7639
2 ай бұрын
Hello I want 1:1 comunication..... Is it possible.. I have some doubts related to data engineer project@@manish_kumar_1
@lifeisfun98 ай бұрын
count() bhi to action h na sir?
@manish_kumar_1
8 ай бұрын
Aage ke lecture me aapko pata chalega. It is both action and transformation.
@kunalk38308 ай бұрын
Bhai mujhe toh ye 3 jobs, 3 stages and 4 tasks dikha raha. Job 0 for load with 1 task --> Job 1 for collect with 2 task --> Job 3 for collect with 1 task but is showing skipped. Didn't get whats wrong used the same code data but used different data 7MB size.
@manish_kumar_1
8 ай бұрын
No need to get to rigid here. Spark bahut saare optimization karta hai and multiple time some jobs get skipped. Maine controlled environment me Kiya tha to show how does that work. During project development you are not going to count how many jobs,stages or task are there. So even if you don't get the same number just chill
@AnkitaSakseria2 ай бұрын
count is also an action , why job not created for it??
@manish_kumar_1
2 ай бұрын
Count is an action and transformation both. Aage ke lectures me pata chal jayega
@sanketraut846211 ай бұрын
SO CAN WE SAY ACTION AND STAGE ARE SAME ?
@manish_kumar_1
11 ай бұрын
Fir Maine ye padhaya hi kyu🤔. In short no action and stage are 2 different thing. Please watch video carefully
@shubhambhosale84675 ай бұрын
203 task or 204 task plz clear this
@ssit7329 ай бұрын
Bhai count() bhi action hai na?
@manish_kumar_1
8 ай бұрын
Action v hai and transformation v. Aage ke videos me mil jayega ye