Spark Session vs Spark Context | Spark Internals

This video is part of Spark learning Series. spark application, spark context and spark session are some of very less understood concept by beginners. So As part of this video we are covering following
what is Spark session
need of spark session
hope spark session is different from south
spark context
what is Spark Context
need on spark context
#apachespark #sparktutorial #bigdata
#spark #hadoop #hive

Пікірлер: 96

  • @VivekKBangaru
    @VivekKBangaru9 күн бұрын

    very informative one. Thanks Buddy.

  • @arundhingra4536
    @arundhingra45365 жыл бұрын

    Very useful video. I have been working with spark for more than two years now but never really bothered about SparkSession vs SparkContext. For me its just the entry point and you go from there. But the idea of having multiple sparkSessions with a single underlying SparkContext makes great sense and was an eye opener. Thanks

  • @The1Rafvas
    @The1Rafvas5 жыл бұрын

    I would say that this is by far the best explanation I have found after hours of search on the topic. Congrats!!!

  • @randommoments1263
    @randommoments12635 жыл бұрын

    Detailed, Clear and straightforward, all at the same time. Superb..!

  • @swatisneha9393
    @swatisneha93935 жыл бұрын

    Today i understood exact meaning of sparkContext and sparkSession. Thanks a lot, your video helped!!!

  • @DataSavvy

    @DataSavvy

    5 жыл бұрын

    Thanks... I am happy that our is useful... Please provide your feedback on other videos of this channel

  • @anandhusk7794
    @anandhusk77944 жыл бұрын

    very clear and simple explanation. Thanks :)

  • @Smoked93POL
    @Smoked93POL2 жыл бұрын

    Short and to the point. I like your explanation.

  • @SatishKumar-yz4tn
    @SatishKumar-yz4tn3 жыл бұрын

    Nicely explained. Thank you!!

  • @saurabhgarud6690
    @saurabhgarud66903 жыл бұрын

    Very useful stuff thank you so much

  • @Pramodkumar-mn6jd
    @Pramodkumar-mn6jd2 жыл бұрын

    Nice ans it was very clear explaination, thanku sir 🙏

  • @rakeshchaudhary8255
    @rakeshchaudhary82554 ай бұрын

    still relevent as of today and frequently asked. The practical on databricks made things crystal clear.

  • @saurav0777
    @saurav07775 жыл бұрын

    In the older version of spark like spark 1.6 we had the entry point in the spark application created using spark context sc ,but the later version like spark 2.0 spark context has been deprecated and added all the context in one level above the abstraction and added them in spark session which contains all the spark context,spark-sql context,hive context etc..

  • @shayshaswishes373

    @shayshaswishes373

    5 жыл бұрын

    can you please post the vedio , how to add SparkSession.builder in existing code.

  • @NiRmAlKuMaRindia
    @NiRmAlKuMaRindia5 жыл бұрын

    Great details

  • @narendrak36
    @narendrak365 жыл бұрын

    This is what exactly I am looking for. Niw I got to know exact difference between Context and session. Thank you dude. Do you know which is the best certification on spark as a Spark developer?

  • @rahulberry4806
    @rahulberry48063 жыл бұрын

    thanks, clearly explained

  • @nashaeshire6534
    @nashaeshire65342 жыл бұрын

    Thx a lot, really clear.

  • @priteshpatel4316
    @priteshpatel43163 жыл бұрын

    Hi Harjeet thanks for the clear and simple explanations of all your videos. Can you upload videos serial wise pyspark tutorial if you have because in most of the tutorials around its starts with creation of spark dataframe using Sparksession and operations on dataframe. You can also suggest any tutorial/blog to read regarding pyspark. Thanks Man....your explanation are great

  • @kthiru5168
    @kthiru51683 жыл бұрын

    Nice Explanation.

  • @unmeshasreeveni
    @unmeshasreeveni4 жыл бұрын

    The best explanation. Congrats

  • @amitSingh-je8vh

    @amitSingh-je8vh

    4 жыл бұрын

    Congrats or Thanks 😅

  • @saurabh7337
    @saurabh73373 жыл бұрын

    Under which scenarios it will be meaningful to have separate sparkContext for each user?

  • @tirupatiraosambangij607
    @tirupatiraosambangij6075 жыл бұрын

    Nice explanation.. Thank you

  • @DataSavvy

    @DataSavvy

    5 жыл бұрын

    Thanks for appreciation :)

  • @krish808
    @krish8083 жыл бұрын

    excellent content in a simple and easy format. Are you providing any trainings on databricks? if so, how do I contact you

  • @kneelakanta8137
    @kneelakanta8137 Жыл бұрын

    Very good information, can you please help in clarifying this doubts: 1. What are included in configurations and properties of different spark sessions of the spark context, and it's effect on cluster 2. What is purpose of spark context and for what spark context is responsible for can you make a video to understand the spark context in full fledge?

  • @priyachauhan813
    @priyachauhan8132 жыл бұрын

    Hi , Thanks for nice explanation, Scala works with datasets and python with dataframes and they both generate RDDs as end Results, is my uinderstanding correct

  • @santhoshsandySanthosh
    @santhoshsandySanthosh5 жыл бұрын

    What is the tool or software that is used in this demo of creating sessions. . is it python based or scala ?

  • @kushagra_nigam95
    @kushagra_nigam953 жыл бұрын

    Best explanation till date 👍

  • @DataSavvy

    @DataSavvy

    3 жыл бұрын

    Thanks Kushagra :)

  • @alphacharith
    @alphacharith3 жыл бұрын

    Thank You

  • @meenakshisundaraar7267
    @meenakshisundaraar72676 ай бұрын

    Neat and clean presentation... 😊

  • @DataSavvy

    @DataSavvy

    6 ай бұрын

    Thanks a lot 😊

  • @nandhannandhan8155
    @nandhannandhan81553 жыл бұрын

    Superb

  • @karthikram1954
    @karthikram19543 жыл бұрын

    Great video sir. Just one question. In which node does the spark context and spark session run?

  • @truptiwaghmare3376
    @truptiwaghmare33762 жыл бұрын

    What if same table was being updated by two users at a time..which one would be updated,let's say if we change the datatype of column rename it to same as previous column and store it to table..back again..and by table I mean global table

  • @rajrajan51
    @rajrajan515 жыл бұрын

    Thanks for the video bro . I have doubt suppose user 1 is sharing table 1 and user 2 is updating a value for the column in the table 1 will the change also got update user 1 shared table too.

  • @srikanthchillapalli1037

    @srikanthchillapalli1037

    5 жыл бұрын

    It wont happen as user1 and user2 will have isolated sessions from one another and so one user operation doesnt have any impact on other user table. Actually u can have different data for both these users though the table name is same.

  • @jayashankarnallam6945
    @jayashankarnallam69453 жыл бұрын

    What is the advantage of creating multiple spark sessions instead of having multiple spark contexts.

  • @meenakshisundaraar7267

    @meenakshisundaraar7267

    6 ай бұрын

    Think spark context as server And spark session as client

  • @phanikumar4915
    @phanikumar49154 жыл бұрын

    can we call stop on spark session what will happen if we call.

  • @sudheeryarramaneni2218
    @sudheeryarramaneni22185 жыл бұрын

    I have a doubt,Can we apply actions directly on RDD with out transformations?

  • @DataSavvy

    @DataSavvy

    5 жыл бұрын

    Loading of a file and creating a rdd is also a transformation... So logically you cannot run action without transformation... If you you don't count creating an rdd as transformation, then you can say that you run action action without transformation

  • @dineshedala
    @dineshedala5 жыл бұрын

    In one of my interviews I faced this question. What happens if the executor got crashed unexpectedly which has already processed 50 records. Will it continues from 51 or from 0? Do we have any service that tracks the execution status of a executor?

  • @rahuldey1182

    @rahuldey1182

    4 жыл бұрын

    yes by creating checkpoint and mentioning the checkpoint folder location in ur program

  • @indiannewyorkbabies6872

    @indiannewyorkbabies6872

    2 жыл бұрын

    Doesn’t rdds store those lineage information and when does the executor fails, the rdds gives that info to another new executor and starts the execution…!! Thtsy rdds ade fault tolerant

  • @deepakkini3835
    @deepakkini38355 жыл бұрын

    I had an interview and he asked me on spark process. Could you please explain what happens when the spark job is stopped in midway of execution? Will it start from the beginning or from where it left off?

  • @DataSavvy

    @DataSavvy

    5 жыл бұрын

    It depends on how the job was stopped... Do you mean that you killed spark context and stopped or only the job running action had failed... Recovery will depend on this...

  • @DataSavvy

    @DataSavvy

    5 жыл бұрын

    It will also depend on if you have any checkpoints in your job

  • @rvalusa
    @rvalusa3 жыл бұрын

    Thanks, Sir for a wonderful video explaining the differences. one qq, when we close/stop a sparkSession which is created from a sparkContext, then this makes other sparkSessions as well get stopped which are created from the same sparkContext?

  • @rvalusa

    @rvalusa

    3 жыл бұрын

    found this, which is weird implementation and apparently a bug in spark - apache-spark-developers-list.1001551.n3.nabble.com/Closing-a-SparkSession-stops-the-SparkContext-td26932.html

  • @max6447
    @max64473 жыл бұрын

    By executor do u mean node manager?

  • @ravikrish006
    @ravikrish0063 жыл бұрын

    Can we have multiple contexts. Could you show with some examples

  • @ravulapallivenkatagurnadha9605
    @ravulapallivenkatagurnadha9605 Жыл бұрын

    Nice

  • @ankan1627
    @ankan16272 жыл бұрын

    so what happens when different users create their own spark context. ( say before spark session was introduced) ? are multiple spark contexts created in such cases ? if yes, what are we gaining by moving the abstraction away from spark context to spark session ?

  • @ajithkannan522

    @ajithkannan522

    Жыл бұрын

    only one spark context avaialable. You can create multiple sparkSessions under the spark context

  • @rohi1350
    @rohi13505 жыл бұрын

    We can something SparkSession is similar to sqlContext in spark 1.6

  • @vamshi878
    @vamshi8785 жыл бұрын

    Hi harjeet, can you make a video for how to read hbase table data into spark dataframe, and how to insert spark dataframe into hbase table. is there any spark-hbase connector available for cloudera?

  • @DataSavvy

    @DataSavvy

    5 жыл бұрын

    Sure Vamsi... I will add this in my to-do list... Thanks for suggestion :)

  • @sachinhugar
    @sachinhugar5 жыл бұрын

    Hi harjeet, when this type of use case come any example bcz in batch processing there will one spark session is enough.

  • @DataSavvy

    @DataSavvy

    5 жыл бұрын

    When u want your users to have live connection for data analysis etc

  • @snehakavinkar2240
    @snehakavinkar22404 жыл бұрын

    Sorry, but I am little confused here. What do you mean when you say every spark context represents one application? When I submit a spark application aren't I am the only user who is attached to that application. How do multiple users make configuration changes to my spark application? Don't they have to submit their own copy of spark application again with config they wish to set? Thank you!

  • @DataSavvy

    @DataSavvy

    4 жыл бұрын

    Imagine where u have already running app on cluster. Whatever code needs to be run you are getting at run time... That will be good use case for multiple spark sessions... Drop me an email at aforalgo@gmail.com. will share more content to read on this

  • @N12SR48SLC

    @N12SR48SLC

    3 жыл бұрын

    stackoverflow.com/questions/52410267/how-many-spark-session-to-create#:~:text=4%20Answers&text=No%2C%20you%20don't%20create,in%20the%20same%20spark%20job. Why is it saying 1 SS per application then?

  • @jasbirkumar7770
    @jasbirkumar777022 күн бұрын

    sir can you tell me some about housekeeping executive spark deta. i dont understand spark word. facility company JLL requird he have spark exprience

  • @projjalchakraborty1806
    @projjalchakraborty18065 жыл бұрын

    Hi harjeet..why we are using multiple sparksession instead of multiple sparkcontext....any advantage is there???

  • @DataSavvy

    @DataSavvy

    5 жыл бұрын

    It makes it easier to share tables and share cluster resources among your users... As you well know starting different application for each user usually cause cluster contention

  • @vijaybigdata752
    @vijaybigdata7524 жыл бұрын

    I have a doubt, in this scenario if we have 4 spark sessions for a single spark context, when spark context goes down will all 4 spark sessions killed? Please confirm.

  • @DataSavvy

    @DataSavvy

    4 жыл бұрын

    Yes Vijay...all spark session will be killed

  • @vijaydas2962
    @vijaydas29625 жыл бұрын

    Very informative content... I've a doubt.... I opened 2 separate spark2-shell using 2 different Ids... When I hit spark.sparkContext in two different terminals, the reference numbers were different. Shouldn't they be same as you explained at the beginning of this video where multiple users shared the same sparkContext object?

  • @atnafudargaso8374

    @atnafudargaso8374

    4 жыл бұрын

    same here

  • @yashdeepkumar2495

    @yashdeepkumar2495

    2 жыл бұрын

    He is talking when working in a clustered environment with more than one worker node I think...usually that will be the scenario. If you open 2 spark shells and check it will create two seperate contexts.I am new to this and pls let me know if you found the correct ans to your question after two years.

  • @sandeshhegde2847

    @sandeshhegde2847

    Жыл бұрын

    If you open 2 shells, they're 2 different applications. this video talks about having multiple spark sessions within a single application

  • @sreepaljsp
    @sreepaljsp4 жыл бұрын

    you said some thing at 2:39.. i did not get that word.. the sentence is "I can ___ a spark context per user" what is that missing word?

  • @srikd9829

    @srikd9829

    4 жыл бұрын

    I think the missing word is: Spun. its the past tense of the word: Spin. generally the word is used as 'spun a server' means different meanings like "introducing a new server or node, or starting or booting the server or node". This is because, starting or booting the server, spins the hard-disk to load the OS. This is how the word came into practise. hope this helps.

  • @gurumoorthysivakolunthu9878
    @gurumoorthysivakolunthu9878 Жыл бұрын

    Hi Sir... Very useful topic and very well explained... Thank you, Sir... 1. "Each user can have different spark session... " -- Does this mean -- Different jobs submission...? That means only one Spark Context for the entire cluster which handles many jobs... Right...? 2. Then what about Driver... Is it similar to Spark Context... Only one Driver for all jobs...? 3. In the demo you showed creating many spark sessions in the same job... Each sessions are different within the same job itself... Am I right...? But why creating different sessions in the same code / job...? Thank you, Sir...

  • @SagarSingh-ie8tx
    @SagarSingh-ie8tx Жыл бұрын

    Yes it’s nice

  • @srikanthchillapalli1037
    @srikanthchillapalli10375 жыл бұрын

    I don't think we can create multiple spark context in spark 1.x as well. There is a parameter spark.driver.allowMultipleContexts=true, but this is only used on test scripts but cannot be used to create multiple contexts when coding in IDE. And in spark 2.x we will create multiple spark sessions. Please let me know if I'm wrong.

  • @murifedontrun3363

    @murifedontrun3363

    5 жыл бұрын

    There can be only one SparkContext per JVM process. If there would have been multiple SCs running in the same JVM then it would be very difficult to handle GC tuning up, communication overhead among the executors etc.

  • @srikanthchillapalli1037

    @srikanthchillapalli1037

    5 жыл бұрын

    @@murifedontrun3363 Yes, but here in video the tutor explained that in old versions multiple spark contexts were created. So, I got a doubt how it is possible.

  • @vijaypandey5371
    @vijaypandey53714 жыл бұрын

    What will happen if driver program fails in Spark. And how to recover it?

  • @DataSavvy

    @DataSavvy

    4 жыл бұрын

    It depends on what settings you have for that job. If you have checkpoints and retry enabled. spark will start to recreate those objects... otherwise the job will fail..

  • @souravsardar
    @souravsardar3 жыл бұрын

    @datasavvy thanks for the video. Could you please make a video on where we can practice production level scenarios in pyspark .

  • @DataSavvy

    @DataSavvy

    3 жыл бұрын

    Sure Saurav... Let me know if you have list of scenario which u want me to cover. Drop me a email at aforalgo@gmail.com

  • @TheVijju89
    @TheVijju894 жыл бұрын

    Please post the python code sheet..

  • @mohdrayyankhan6623
    @mohdrayyankhan66233 жыл бұрын

    What are the users here ???

  • @mineb1842
    @mineb1842 Жыл бұрын

    Plz

  • @MyTravelingJourney
    @MyTravelingJourney5 жыл бұрын

    I think you missed many points

  • @DataSavvy

    @DataSavvy

    5 жыл бұрын

    Please suggest what are you pointing Towards... Will cover in another video

  • @dalwindersingh9282

    @dalwindersingh9282

    5 жыл бұрын

    please suggest, raise few of them.

  • @underlecht
    @underlecht3 жыл бұрын

    3 ads in 5 minutes. did not finish.

  • @DataSavvy

    @DataSavvy

    3 жыл бұрын

    KZread has increased the ads based on user stats. I unfortunately don't have method to decrease that. Any suggestion is welcomed (except a complete switch off)