How Spark Creates Partitions || Spark Parallel Processing || Spark Interview Questions and Answers

#SparkPartitioning #Bigdata #ByCleverStudies
In this video you will learn how apache spark creates partitions in local mode and cluster mode.
Follow me on LinkedIn
/ nareshkumarboddupally
-----------------------------------------------------------------------------
Follow this link to join 'Clever Studies' official WhatsApp groups:
chat.whatsapp.com/C70cGzAEKC1...
Community: chat.whatsapp.com/GH1FPeNbDW4...
--------------------------------------------------
Follow this link to join 'Clever Studies' official telegram channel:
t.me/+eMaiZNWTPmZkYmVl
--------------------------------------------------
(Who choose Paid Membership option will get the following benefits)
Watch premium YT videos in our channel
Mock Interview and Feedback
Gdrive access for Bigdata Materials (Complimentary)
--------------------------------------------------
PySpark by Naresh playlist:
• PYSPARK BY NARESH
--------------------------------------------------
PySpark Software Installation:
• Latest Hadoop 3.2.2 Sp...
--------------------------------------------------
Realtime Interview playlist :
• How To Explain Project...
--------------------------------------------------
Apache Spark playlist :
• How Spark Executes A P...
--------------------------------------------------
PySpark playlist:
• PySpark | Tutorial-9 |...
--------------------------------------------------
Apache Hadoop playlist:
• APACHE HADOOP
--------------------------------------------------
Bigdata playlist:
• BIGDATA
--------------------------------------------------
Scala Playlist:
• SCALA TUTORIALS
--------------------------------------------------
SQL Playlist:
• SQL
Hello Viewers,
We ‘Clever Studies’ KZread Channel formed by group of experienced software professionals to fill the gap in the industry by providing free content on software tutorials, mock interviews, study materials, interview tips, knowledge sharing by Real-time working professionals and many more to help the freshers, working professionals, software aspirants to get a job.
If you like our videos, please do subscribe and share within your friends circle.
Contact us : shareit2904@gmail.com
Thank you !

Пікірлер: 16

  • @onkarlondhe8131
    @onkarlondhe8131 Жыл бұрын

    Sir, I have watched many videos related to this topic, but very few guys were able to explain these concepts the way you did. and this video tempted me to watch full playlist, and I definitely will. thanks for sharing your knowledge and understandings with us. 🙏🙏🙌

  • @tejeskhandagale5463
    @tejeskhandagale54633 жыл бұрын

    Informative and well explained. Keep posting 👍

  • @cleverstudies

    @cleverstudies

    3 жыл бұрын

    Sure 👍

  • @vijeandran
    @vijeandran3 жыл бұрын

    Really informative.... neat explanation. Thank u

  • @aneksingh4496
    @aneksingh4496 Жыл бұрын

    nice catch points explained

  • @ksktest187
    @ksktest1873 жыл бұрын

    another good efforts for the aspirent of Data engineering job candidates. sound ground for preparing for interview...

  • @cleverstudies

    @cleverstudies

    3 жыл бұрын

    Thank You

  • @ayseak_
    @ayseak_3 жыл бұрын

    Could you please explain am I getting it right. As I understand partition is a logical division of data in chunks of data (unit of operation that Spark applies). So basically when for example we create RDD with 4 partitions it means that Driver Node will read data, create partitions, and serialize it, ship those partitions to Worker Nodes (deserialize here) so that it may make compuations parallelly?

  • @selvansenthil1
    @selvansenthil13 жыл бұрын

    Thank you

  • @kannadigainusa3751
    @kannadigainusa37513 жыл бұрын

    All your Vedio’s on Spark are good..Can you assign the numbers in the order to watch from first to last?

  • @cleverstudies

    @cleverstudies

    3 жыл бұрын

    We will try to do that. Thanks for watching the videos.

  • @dataaholic
    @dataaholic3 жыл бұрын

    Can you please provide the download link for the CDH you are using.???

  • @guptaashok121
    @guptaashok1212 жыл бұрын

    Per my understanding driver sends the logic or program to executor to read only given partition of data. My doubt is how driver node creates those instruction as it does not know exactly what data is present in file specifically if it's big text file, there are no columns or keys or indexes. How it make sure that all data is read by different executorand there are no overlaps.

  • @nivedita5639
    @nivedita56393 жыл бұрын

    Can you explain this question: how to move all partitions in a single node?

  • @narasimharao7007

    @narasimharao7007

    3 жыл бұрын

    Are you asking about Reducing/Increasing number of partitions then u can try repartition() Or coalesce(). Remember that repartition will work for increasing and Decreasing the partitions but coalesce will only reduce the number of partitions

  • @nva1719

    @nva1719

    3 жыл бұрын

    We can use df.coalesce(1) instead of reparation(1) as coalesce involves lesser or no shuffle while reparation involves full shuffle of data. It is preferred to have minimal shuffle of data.

Келесі