How Spark Creates Partitions || Spark Parallel Processing || Spark Interview Questions and Answers
#SparkPartitioning #Bigdata #ByCleverStudies
In this video you will learn how apache spark creates partitions in local mode and cluster mode.
Follow me on LinkedIn
/ nareshkumarboddupally
-----------------------------------------------------------------------------
Follow this link to join 'Clever Studies' official WhatsApp groups:
chat.whatsapp.com/C70cGzAEKC1...
Community: chat.whatsapp.com/GH1FPeNbDW4...
--------------------------------------------------
Follow this link to join 'Clever Studies' official telegram channel:
t.me/+eMaiZNWTPmZkYmVl
--------------------------------------------------
(Who choose Paid Membership option will get the following benefits)
Watch premium YT videos in our channel
Mock Interview and Feedback
Gdrive access for Bigdata Materials (Complimentary)
--------------------------------------------------
PySpark by Naresh playlist:
• PYSPARK BY NARESH
--------------------------------------------------
PySpark Software Installation:
• Latest Hadoop 3.2.2 Sp...
--------------------------------------------------
Realtime Interview playlist :
• How To Explain Project...
--------------------------------------------------
Apache Spark playlist :
• How Spark Executes A P...
--------------------------------------------------
PySpark playlist:
• PySpark | Tutorial-9 |...
--------------------------------------------------
Apache Hadoop playlist:
• APACHE HADOOP
--------------------------------------------------
Bigdata playlist:
• BIGDATA
--------------------------------------------------
Scala Playlist:
• SCALA TUTORIALS
--------------------------------------------------
SQL Playlist:
• SQL
Hello Viewers,
We ‘Clever Studies’ KZread Channel formed by group of experienced software professionals to fill the gap in the industry by providing free content on software tutorials, mock interviews, study materials, interview tips, knowledge sharing by Real-time working professionals and many more to help the freshers, working professionals, software aspirants to get a job.
If you like our videos, please do subscribe and share within your friends circle.
Contact us : shareit2904@gmail.com
Thank you !
Пікірлер: 16
Sir, I have watched many videos related to this topic, but very few guys were able to explain these concepts the way you did. and this video tempted me to watch full playlist, and I definitely will. thanks for sharing your knowledge and understandings with us. 🙏🙏🙌
Informative and well explained. Keep posting 👍
@cleverstudies
3 жыл бұрын
Sure 👍
Really informative.... neat explanation. Thank u
nice catch points explained
another good efforts for the aspirent of Data engineering job candidates. sound ground for preparing for interview...
@cleverstudies
3 жыл бұрын
Thank You
Could you please explain am I getting it right. As I understand partition is a logical division of data in chunks of data (unit of operation that Spark applies). So basically when for example we create RDD with 4 partitions it means that Driver Node will read data, create partitions, and serialize it, ship those partitions to Worker Nodes (deserialize here) so that it may make compuations parallelly?
Thank you
All your Vedio’s on Spark are good..Can you assign the numbers in the order to watch from first to last?
@cleverstudies
3 жыл бұрын
We will try to do that. Thanks for watching the videos.
Can you please provide the download link for the CDH you are using.???
Per my understanding driver sends the logic or program to executor to read only given partition of data. My doubt is how driver node creates those instruction as it does not know exactly what data is present in file specifically if it's big text file, there are no columns or keys or indexes. How it make sure that all data is read by different executorand there are no overlaps.
Can you explain this question: how to move all partitions in a single node?
@narasimharao7007
3 жыл бұрын
Are you asking about Reducing/Increasing number of partitions then u can try repartition() Or coalesce(). Remember that repartition will work for increasing and Decreasing the partitions but coalesce will only reduce the number of partitions
@nva1719
3 жыл бұрын
We can use df.coalesce(1) instead of reparation(1) as coalesce involves lesser or no shuffle while reparation involves full shuffle of data. It is preferred to have minimal shuffle of data.