spark architecture | Lec-5
In this video I have talked about spark Architecture in great details. please follow video entirely and ask doubt in comment section below.
Directly connect with me on:- topmate.io/manish_kumar25
For more queries reach out to me on my below social media handle.
Follow me on LinkedIn:- / manish-kumar-373b86176
Follow Me On Instagram:- / competitive_gyan1
Follow me on Facebook:- / manish12340
My Second Channel -- / @competitivegyan1
Interview series Playlist:- • Interview Questions an...
My Gear:-
Rode Mic:-- amzn.to/3RekC7a
Boya M1 Mic-- amzn.to/3uW0nnn
Wireless Mic:-- amzn.to/3TqLRhE
Tripod1 -- amzn.to/4avjyF4
Tripod2:-- amzn.to/46Y3QPu
camera1:-- amzn.to/3GIQlsE
camera2:-- amzn.to/46X190P
Pentab (Medium size):-- amzn.to/3RgMszQ (Recommended)
Pentab (Small size):-- amzn.to/3RpmIS0
Mobile:-- amzn.to/47Y8oa4 ( Aapko ye bilkul nahi lena hai)
Laptop -- amzn.to/3Ns5Okj
Mouse+keyboard combo -- amzn.to/3Ro6GYl
21 inch Monitor-- amzn.to/3TvCE7E
27 inch Monitor-- amzn.to/47QzXlA
iPad Pencil:-- amzn.to/4aiJxiG
iPad 9th Generation:-- amzn.to/470I11X
Boom Arm/Swing Arm:-- amzn.to/48eH2we
My PC Components:-
intel i7 Processor:-- amzn.to/47Svdfe
G.Skill RAM:-- amzn.to/47VFffI
Samsung SSD:-- amzn.to/3uVSE8W
WD blue HDD:-- amzn.to/47Y91QY
RTX 3060Ti Graphic card:- amzn.to/3tdLDjn
Gigabyte Motherboard:-- amzn.to/3RFUTGl
O11 Dynamic Cabinet:-- amzn.to/4avkgSK
Liquid cooler:-- amzn.to/472S8mS
Antec Prizm FAN:-- amzn.to/48ey4Pj
Пікірлер: 109
There is a saying 'if you can't explain it simply, you don't know yourself very well ', fits so accurately. You have understood it so well that you made it even easier for others. Thank you for all the hard work.
sir, aap ne jaan laga di hai videos banane me....bahut he sachhe videos hain...bhagwan aap ko bahut tarkki de aisi prarthna hai
You are a wonderful teacher. You have a gift. Please start a DE bootcamp. You’ll see great success with it I’m sure
please be consistent , dont't leave midway,,i have 5 years SQL development experience , i will switch to big data spark domain within 3 months, pls don't stop midway, you are making wonderful videos
@manish_kumar_1
Жыл бұрын
I won't
@engineerbaaniya4846
Жыл бұрын
Thanks for this seried
@blutoo1363
8 ай бұрын
did you switch? @@engineerbaaniya4846
You probably won’t see this. But I watched your videos 2 days before my DE interview and I cracked it with confidence. Like you said, the fundamentals make all the difference. My understanding was so clear that they offered me the position on the spot
Kya bawal padhai ho Manish bhai, In future if anyone comes to me for guidance ki kaha sa padhna chiaya, I think without any doubt i will refer your channel
brilliantly explained. Loads of Thanks.
So Helpful ! Really a Great Explanation !
Very detailed and layman explaination which no one gives, keep it up
bro, you can be the codeWithHarry of data engineering world, keep continuing this thing. and thanks for this knowledge sharing.
Beautifully explained. Concepts are so much easier to understand with the help of diagram.
you are one of THE BEST TEACHER i have ever known
I have watched many tutorials on Spark but you are the best. The way you teaching is amazing. Sir, please don't stop to uploade tutorials like this. You are great sir. Thank you. From Bangladesh
Literally mind-blown by your teaching! Awsome content
You are building my confidence in the subject. Thank you bhaiya.
explained wonderfully.
Thank you so much for this explanation, Please continue the good work
khatarnak Manish bhai. maja aa gya
The video summary at the end are very useful to recall everything from the video! Good thought Manish...
Superb! Explanation
Right said.... Very detailed👏👏👍👍
Crystal clear. Thanks a lot. 👏
the flow of explanation and engagement were on point 💯
Thankyou manish bhai for this wonderful video
I think therir is slight confuson between AM(Application Master) and driver program: 8:28 The AM launches the driver program within a container on a worker node. The driver program communicates with the AM for resource allocation and task scheduling. The AM acts as a bridge between the driver program and the cluster manager(YARN).
Salute for your hard work but hope in the next video you will come up with the practical too..
thanks for the video manish
Bhai concept apne deep diya hai Lekin mujhe avi bhi container me bahut confusion hai... Repeat krne ke baad bhi clear nahi hua
Maja aa gya bhai . Khan Sir yaad aa gye 🙂 Thanks
Please continue making videos like this with complete information... I appreciate your hard work. Time lage to lage... Concept clear hona chahiye... 😅
Stunning explanation bro 👍
Thank you,This is perfect.
Wonderful explanation
God level explanation!
very nice series
Thank you, Manish. It was an absolutely crystal clear explanation. Hoping to get more in-depth videos like this.
@manish_kumar_1
Жыл бұрын
Glad you liked it
Very well explained 🤩
Thank you Manish Bhai.... You're really doing a great work🙏🏻🙏🏻.... In this series please upload the videos a bit faster... 😊
Spark Architecture: whenever a job is initiate, 'Spark Context' start the 'Spark Session'.It connect with 'Cluster Manager' and trying to understand how many 'Worker Node'(Slave) is required and once the information is received, the 'Driver Program'(Master) will start assign the task to the Worker Node. 'Executor' is responsible for doing all the task. Inter mediate results stored in 'Cache'. All the Worker Node connected with each other so that it can share data, logics with each other
PERFECT BEST ONE EVER
explained very well
Great explanation bro👌👍.. It would be nice if you add subtitles.
Great
Hi Manish, I watched this completely I Understood But most of the time in interviews people ask about spark contenxt and the other way of architechture that you did not covered any view on this ?
Bhaiya total Syllabus cover kijiyga me apka Spark series follow kr rha hu
Hi Manish, great explanation, I have one doubt- Is it possible to add more than one executor in worker node? asking because u demonstrated as one executor comes to one worker node only.
Thanks!
hello Manish Kumar, hope you're doing well , Very well explained concept and very good Spark series, can you provide the pdf or link of the notes?
Thanks for the explanation Manish. One quick question, aapne yaha 5 executors 5 alag alag worker nodes pe banaye hai. Is it possible that we can have more than 1 executor available on the same worker node/ same machine? Thanks in advance
Hi Manish sir agar cluster size puche interview me to kaisa batane ka
Hi Manish Thank you very much for sharing great knowledge . Currently I have 10.5 Year Experience in IT including SQL,PLSQL(7 Year), SQL Server T-SQL (1.5 Year) and Snowflake Query Optimization 6 Month . When I was joined before 2 Year as Data Engineer (Spark with Scala) in one MNC company but He was given project on T-SQL . I was only taken trainings and search interview question and clear interview . At time I on bench what should be we take decision Please suggest me?.
Hi Manish I have Question why can't the UDF in Pyspark be converted to Java code in the application Master
Hii @manish I have two questions 1) What is the difference between cluster manager and resource manager? 2) How developer tell that this type of requirements like RAM, core?
👍👍👍👍
Great Explanation! But, I have doubt regarding the driver. Will there be an extra worker node for driver manager or can it be in any of the executors which process the data. What I mean is for instance if we want to process 10 GB let's say after calculation we want 16 executors, so along with driver will it be 17 executors or am I missing something here.?
osm video.. also please share the playlist or course for SQL. would really appreciate it.
@manish_kumar_1
Жыл бұрын
You can follow kudvenkat youtube channel for sql
When application driver will stop working?Could you please explain again ?
Hi Manish I am learning spark from your videos,but in this video I am bit confusing because you are saying driver is present in worker node but actual architecture diagram it is saying driver present in master. Could you please clarify or elaborate on this.
The Spark Code can be written in Scala itself right? Will we need Application Driver even if the code is written in Scala?
good
Hi Manish. If code from Pyspark driver is getting converted into equivalent java code, won't the udfs too will get converted? If this is true? Why do we need Python Worker again in the executor?
I have a question - in the video, we wanted 5 executors of 25 GB RAM, 5 cores each. And for 5 executors you used - w2, w3, w4, w7, and w8. Now, all of them have 100 GB RAM and 20 cores. Why can't we put 4 executors on a single machine? 4 x 25 = 100 GB, and 4 x 5 = 20 cores That way, our resources (executors, driver) will be spread across less number of machines. I don't know what benefits/drawbacks that might have. Just curious why can't we do this
done
I have one doubt please anyone resolve it. pyspark driver is created only in the application master if we don't use any udf(user defined function) but we write code in pyspark and that distributly process on the the worker nodes so even if I use any udf or not but our code is in pyspark only then how the worker nodes process the pyspark code even though I having only JVM and not having any python worker in worker node?
I’m following 2024.
Sir agar ek node fail hoga to kya karenge interview me pucha hai , please give me the answer, bahut fasa raha hai interview me
Hi Manish, very informative video. I have one question, what exactly executor is? As per my understanding, its responsible for executing task and have cores in it for processing. Since each worker node has 20 core, i can create execution with any core and any memory.
@manish_kumar_1
8 ай бұрын
Worker node me se aapko kuch memory milega in form of container for your spark job. And in that container aapka executor chalega with the memory that you have asked for. So let say worker node ke paas 64 GB RAM and 16 core CPU hai. And aap bas 10 GB with 3 core manage ho to utna hi milega. Baaki ka memory kisi aur job ko milega
what if I try to provision more executor nodes than is available on my cluster ? or what if I try to provision more ram or CPU cores than the capacity of my executors ? can you try to explain what would happen on a cluster as locally I think it is more difficult to replicate it ?
@manish_kumar_1
5 ай бұрын
You can try locally also. Ask for more than available RAM in your system but you are going to only available memory. If you are ask for more memory then you are not going to get that because there is a hardware limit. You will be allocated the memory available in your cluster. If already multiple jobs are running then your job will be in queue waiting to get memory to be available for the run. It runs in FIFO manner
Bro yeh lecture ka notes do na pdf format m pls
Great explanation I have doubt 1) what happend if we don't have 5 free worker in cluseter 2) we have 5 free worker but we don't have enough cpu core or memory that we requested Thank you and waiting for you replay
@manish_kumar_1
11 ай бұрын
You will have to wait in queue. FIFO is applied by the resource manager
Hallo Brother, I have a question: Spark is a distributed processing framework and is fail-tolerant. However, if the driver node fails, what happens?
@manish_kumar_1
2 ай бұрын
You will have to re run the job
Hi Manish, I have one question to ask, I have seen in some job descriptions mentioning about the databricks. What does it mean when we see one must know on how to work on DataBricks? I mean when someone say a candidate should know on how to work on DataBricks what exactly they mean by that? What are the things one should know about DataBricks? Looking forward for your reply.
@manish_kumar_1
Жыл бұрын
You should know how to work with databricks. It's just a tool which you can learn very easily once you start using it
@chiragsharma9430
Жыл бұрын
Alright, thanks for the reply, Manish. Really appreciate your response.
Directly connect with me on:- topmate.io/manish_kumar25
Hello Manish, agar hum zyada RAM ya zyada core maang le ek machine me jitnna available hai usse zyada to kya hoga?
@manish_kumar_1
11 ай бұрын
Resource wastage hoga. And aapko denge nahi extra resources kyunki RAM bahut costly Hota hai.
@bangalibangalore2404
11 ай бұрын
Aur ek question hai ki files ko kya le ke aaya jata hai? Matlab files to distributed tareeka se padi hongi usi cluster me, to Jahan file hai wahin pe executor banega ki randomly banega. Soch lijiye file abc.csv hai machine 4, 5 me. To yarn se jab resources maangenge to 4, 5 me hi banayega executor ka container? Ya fir randomly cluster me kahin bhi banega?
Spark ecosystem ki study bas interview crack karne ke liye jaruri h ya fir iska practical work me bhi koi use h?
@manish_kumar_1
Жыл бұрын
Overall picture samajhne ke liye pata hona chahiye
Did not Understand the JVM main(), Since Spark supports Python then why JVM needed to submit spark application. pls Explain Elaborately ? Thanks for Wonderful session..
@rajasekhar4023
Жыл бұрын
What exactly use of JVM since spark supports Python to code ?
@bangalibangalore2404
11 ай бұрын
Spark is written in Java/ Scala, spark by default does not understand python, think that there is a language translator that changes the python code to Java Byte Code which is understood by spark. Thus python code is converted to Java code first and then the code is run Spark supports python due to this translator
bhai last me jo driver band hoga bole tum.. vo application driver hoga na? and isme ek application driver gai and dusra bhi koi driver hai kya master me?
@manish_kumar_1
8 ай бұрын
Ek job ka ek hi driver hoga. And driver band hone ke baad executor v band ho jayega.
ek node me 5 containers ni ban sakte? 20gb ka
@manish_kumar_1
Ай бұрын
Ban sakte hai. Wahi to video me bola Tha. Workload ke basis par container Banta hai
can anyone explain this to me if they understood it well ?
❤💌💯💢
bhai theory and practical khatam ho gaya hai kya playlist? ya bacha hai kuch?
@manish_kumar_1
9 ай бұрын
Ho gaya hai khatam
Hi, I'm following your video and I need PDF file so could you provide me?
@manish_kumar_1
10 ай бұрын
I think you haven't watched starting wala video. I don't provide pdf, you have to note it down by yourself. By this way, you are only going to get benefits
so driver is our application master?
@manish_kumar_1
9 ай бұрын
Nahi. Application master container ke inside Jo Application driver Banta Hai that is the driver
@TechnoSparkBigData
9 ай бұрын
@@manish_kumar_1 thanks
ye fir se dekhna hoga lol
@manish_kumar_1
11 ай бұрын
Kyu kya hua
Hii Manish I need your linkedin profile link for connect with you.. I need a guidance
@manish_kumar_1
Жыл бұрын
Check description. You can find all of my social media handle link