Spark Out of Memory Issue | Spark Memory Tuning | Spark Memory Management

Spark Out of Memory Issue | Spark Memory Tuning | Spark Memory Management | Part 1

This video is part of the Spark Interview Questions Series.
Spark Memory issues are one of most common problems faced by developers. so Suring spark interviews, This is one of very common interview questions. In this video we will cover ffollowing
What is Memory issue in spark
What components can face Out of memory issue in spark
Out of memory issue in Driver
out of memory issue in Executor
How Spark's performance is impacted by Dynamic Partition Pruning
Here are a few Links useful for you
Git Repo: github.com/harjeet88/
Spark Interview Questions: • Spark Interview Questions
If you are interested to join our community. Please join the following groups
Telegram: t.me/bigdata_hkr
Whatsapp: chat.whatsapp.com/KKUmcOGNiix...
You can drop me an email for any queries at
aforalgo@gmail.com
#apachespark #sparktutorial #bigdata
#spark #hadoop #spark3

Пікірлер: 94

@Nonamaee3 жыл бұрын
So well explained, even the images were very useful. Thank you very much!
@minalmoon46053 жыл бұрын
It is a great vedio. Content is very useful. Keep it up man 👍🏻👍🏻👍🏻
@rijwanmohammed13093 жыл бұрын
Great please don't stop from uploading new contents!!
@nikhilgupta1102 жыл бұрын
Pure content, great topic, informative, interactive and simple.Thanks you!!
@lxkakkarot36892 жыл бұрын
Can you please also show code to repartition and increase executor on dummy process by changing values so that you can show us the impact on the run time of the jobs ? That will be really great to understand concepts
@ravikumarkumashi70653 жыл бұрын
very well expained, thank you
@kaladharnaidusompalyam8513 жыл бұрын
Thank you so much. I m facing many times this auestion recent days. 👍
@DataSavvy
3 жыл бұрын
Thanks :)
@PrasadNadiger4562 жыл бұрын
Great video.. perfect explanation
@bhuvaneshkumarsrivastava9063 жыл бұрын
Is the 2nd Part not there yet? Your videos are AWSUUMMM !!! :D
@saivarunkolluru67923 жыл бұрын
Lots of respect for ur content ❤️
@DataSavvy
3 жыл бұрын
Thanks mate
@vijeandran3 жыл бұрын
Neatly explained thank you....
@prasadadsul8703 Жыл бұрын
Great information...... 👏👏👏
@nikhilmishra75723 жыл бұрын
recently discovered this channel. this is gold
@DataSavvy
3 жыл бұрын
Thanks Nikhil :)
@nivedita5639
3 жыл бұрын
True
@NishaKumari-op2ek3 жыл бұрын
Very useful videos. Thank you :)
@DataSavvy
3 жыл бұрын
Thanks Nisha
@viraajsivaraju23293 жыл бұрын
Very useful.please keep making more such videos
@DataSavvy
3 жыл бұрын
Thanks Viraaj :)
@anuragamit7272 жыл бұрын
Hi Sir, Could you please make a video on the factors that decide the number of tasks, stages, and jobs created after submitting our application.
@RamRam-jp2kc3 жыл бұрын
Your videos on Trouble shooting are pretty good.
@DataSavvy
3 жыл бұрын
Thanks Sree Ram... :)
@riyasmohammad92342 жыл бұрын
Great video. Can you share the source of information for further reading?
@sundarkris13203 жыл бұрын
Can you explain me difference between yarn memory over head vs Spark reserved and user memory?
@nakkaeswaraoeswar2140 Жыл бұрын
Thank you . Can you make video about what is Azure Sql?
@ajaykiranchundi99792 жыл бұрын
Very nice video!! thank you
@DataSavvy
2 жыл бұрын
Thanks Ajay
@arvind303 жыл бұрын
Great video! I had a question regarding the yarn memory overhead. When a pyspark job runs, my understanding is that python worker processes are started within the memory allocated to the executor. JVM then sends data back and forth to these python processes. Won't the allocated python objects use the memory of these python processes instead of the yarn memory overhead?
@Fresh-sh2gc
2 жыл бұрын
the worker nodes run based on resources of the yarn memory. Yarn is normally run on a shared cluster thus there always a tug of war between the tenants of the cluster for memory. as a result, one cannot always use too much memory. However, when there is ample yarn memory there is a process called preemption which gets more memory for the executor memory,
@PrasadChallagondla3 жыл бұрын
Is there any real-time spark project. Please upload video on it. It would be helpful.
@rohithsaivemula32002 жыл бұрын
Very helpful
@RAKESHKUMAR-tp8zj3 жыл бұрын
U are one of the best mentor I have ever seen on youtube. The way you explain in awesome and all real-time questions. if my cluster memory is 10 GB and the date we want to process is 20 Gb will it process the data? sir can you please explain this topic
@medotop330
3 жыл бұрын
No you can not process it
@medotop330
3 жыл бұрын
You can do it using MapReduce if it is in batch layer or non used iterative algorithms like machine learning algos
@sarfarazhussain68833 жыл бұрын
Waiting for Part 2 :)
@DataSavvy
3 жыл бұрын
Will come soon :)
@praptijoshi91024 ай бұрын
amazing
@krupab33882 жыл бұрын
can you please give example of each OOM what you have explained here, lots of blogs are given with same explanations. what extra is here. please provide with examples. it would be great.
@sambitkumardash95853 жыл бұрын
Nice video Sir. And mostly asked question in interview . Could you please make one video, related to other issues we do face in Spark .
@DataSavvy
3 жыл бұрын
Sure Sambit... Do u have any other suggestion on questions?
@sambitkumardash9585
3 жыл бұрын
@@DataSavvy could you please explain , how to deal with the semi structured data, from ingestion to computation .
@aashishraina28313 жыл бұрын
Recruiters say that you dont have production experience and POC spark working will not help. How can we convince despite having a good understanding of PYspark. Plz sugget
@saisravankumar60202 жыл бұрын
When loading a file to data frame you get oom error, how will u rectify it? Can we get a demo?
@bhatiaparesh893 жыл бұрын
Waiting for part 2! :🙈
@DataSavvy
3 жыл бұрын
Working on it... Will post in few weeks. I need to explain one related concept first before that video
@suresh.suthar.243 ай бұрын
i have one doubt: reserved memory and yarn overhead memory are same ? because reserved memory also stored spark internals. Thank you for your time.
@RAVIC32003 жыл бұрын
Nice Video again Harjeet :) , Hey Can you make videos on Test cases on spark/scala as well, i have scene no one talk about it.
@DataSavvy
3 жыл бұрын
Hi Ravi, test cases are generally about functional and use case specific...
@rajlakshmipatil4415
3 жыл бұрын
Ravishankar Maybe you can try using holdenkarau
@DataSavvy
3 жыл бұрын
Thanks for suggesting... Looks like a good resource... I will go through this github.com/holdenk/spark-testing-base
@naveena22263 ай бұрын
Hi @all I just got to know about the wonderful videos in datasavvy channel. In that executor OOM - big partitions slide, in spark every partition is of block size only ryt(128MB) , then how come big partition will cause an issue? Can Simeon please explain this? Little confused here Even if there is 10gb file , when spark reads the file it creates around 80 partition of 128mb. Even if one of the partition is high it cannot increase 128mb ryt.. then how come OOM occurs??
@touristplaces783710 ай бұрын
Hello. I have 16 crore records on which i want to use window function. But order by is taking huge time and giving memory issue. is there any alternative approach
@ravikirantuduru10613 жыл бұрын
Good videos
@DataSavvy
3 жыл бұрын
Thanks Ravi :)
@ANUKARTHIM3 жыл бұрын
Dear Data Savvy, Could you please clarify, if we go for broadcast join mean, it copies the small file into all available executor memory right? how come it causes the driver out of memory exception.
@DataSavvy
3 жыл бұрын
That file is first brought on driver and merged(if it has multiple partitions) then it is sent to executors
@ANUKARTHIM
3 жыл бұрын
@@DataSavvy Thanks for the answer
@DataSavvy
3 жыл бұрын
Thanks
@svsvikky
3 жыл бұрын
@@DataSavvy Isn't brodcast done executor-executor similar to bittorrent? Please correct me if i am wrong
@kiranmudradi263 жыл бұрын
Nice Video. Question: In case when we call coalesce(1), does it causes any OOM issues either in driver or executor? if calling this operation does not through any OOM what could be the reason? Please clarify.
@DataSavvy
3 жыл бұрын
U are right... Coalesce can also cause memory breach in few situations...
@kiranmudradi26
3 жыл бұрын
@@DataSavvy Thanks. In that case OOM will happen at executor side not at driver side. is my understanding correct?
@DataSavvy
3 жыл бұрын
Yes...
@DataSavvy
3 жыл бұрын
Wait... A correction here... Repartition (1) can cause issue , not coalesce (1) as coalesce will not cause shuffle and data will stay on same machines...
@kiranmudradi26
3 жыл бұрын
@@DataSavvyThanks. i was about to ask the same question. u replied in time. Kudos
@carlosllerena39223 жыл бұрын
question if i use pyspark do i still get does errors ?? another question in instead of collect what other command ca we use
@vikaschavan6118
3 жыл бұрын
SaveAsFile instead of collect
@Fresh-sh2gc2 жыл бұрын
Spark on kubernetes works completely different. This works only for spark on hadoop.
@vishalmishra8633 жыл бұрын
Where is the second part ?
@subimalkhatua28863 жыл бұрын
Issue : container killed by yarn . Spark application Exited 1. This is the most common in aws glue or any spark jobs . increasing spark.yarn.executor.memoryOverhead and spark.yarn.executor.memory willl help but make sure it shouldn't increase than the total yarn.nodemanger memory or else there'll be a issue of configuration.
@krunalgoswami46542 жыл бұрын
Why use rdd in all question?? Why not dataframe?
@sreenivasmekala61983 жыл бұрын
Is groupbykey also cause of Out of Memory Right
@DataSavvy
3 жыл бұрын
U are right... If there is skewness in data...vin case of group by key, we can end up facing Memory issue
@user-dl3ck6ym4r6 ай бұрын
how would we know that which file is small and which file is larger . one interview asked this question to me.
@DataSavvy
6 ай бұрын
You can list the files in folder and see the size of file... Hdfs fs -ls... This is command
@user-dl3ck6ym4r
6 ай бұрын
thank you@@DataSavvy
@user-dl3ck6ym4r
6 ай бұрын
but i am using s3 bucket so@@DataSavvy
@divit0011 ай бұрын
Part 2??
@amitpadhi27173 жыл бұрын
i cant able to join your whatsapp group i am facing some issue in my local machine while setting up spark; please let me know where to post my query
@DataSavvy
3 жыл бұрын
Please join telegram group and send query there... We have moved to telegram... Http://t.me/bigdata_hkr
@amitpadhi2717
3 жыл бұрын
@@DataSavvy aforalgo@gmail.com dropped a mail already could you please check the issue which i faced
@midhileshmomidi24343 жыл бұрын
I am learning concepts but without real time experience I am not able to get practice on Data Collection from various sources I am able to clean the data well using Pyspark and can do ML using Spark ML by MLlib library But please suggest some sources to practice for Data Collection from various sources Thank you
@DataSavvy
3 жыл бұрын
Sure, let me look into this and I will share some link... You can join our document library and data Savvy group... U will get lot of relevent information there
@k.saibhargav80726 ай бұрын
How to avoid collect operation
@DataSavvy
6 ай бұрын
You usually don't need collect.. Can you give an example where you are using it.. I can suggest, how to avoid it and rightly code
@subajecintha1733 жыл бұрын
The whatsapp group is full
@DataSavvy
3 жыл бұрын
Yes... Please join telegram group
@RajuSharma-qd2uv
Жыл бұрын
Can you pls share your telegram group name?
@rahulpandit90823 жыл бұрын
Who is the person who dislikes this video... I think.. frustrating with life or wife... 😀😀😀
@DataSavvy
3 жыл бұрын
Ha ha ha 😀