A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets - Jules Damji

Ғылым және технология

"Of all the developers' delight, none is more attractive than a set of APIs that make developers productive, that are easy to use, and that are intuitive and expressive. Apache Spark offers these APIs across components such as Spark SQL, Streaming, Machine Learning, and Graph Processing to operate on large data sets in languages such as Scala, Java, Python, and R for doing distributed big data processing at scale. In this talk, I will explore the evolution of three sets of APIs-RDDs, DataFrames, and Datasets-available in Apache Spark 2.x. In particular, I will emphasize three takeaways: 1) why and when you should use each set as best practices 2) outline its performance and optimization benefits; and 3) underscore scenarios when to use DataFrames and Datasets instead of RDDs for your big data distributed processing. Through simple notebook demonstrations with API code examples, you'll learn how to process big data using RDDs, DataFrames, and Datasets and interoperate among them. (this will be vocalization of the blog, along with the latest developments in Apache Spark 2.x Dataframe/Datasets and Spark SQL APIs: databricks.com/blog/2016/07/1... databricks.com/glossary/what-...)
Session hashtag: #EUdev12"
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: databricks.com/product/unifie...
Connect with us:
Website: databricks.com
Facebook: / databricksinc
Twitter: / databricks
LinkedIn: / databricks
Instagram: / databricksinc Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. databricks.com/databricks-nam...

Пікірлер: 38

  • @techoral2261
    @techoral22612 жыл бұрын

    Now i know about RDDs, DataFrames and Datasets. Thanks for explaining it more precisely. Appreciated.

  • @chiranjibghorai6950
    @chiranjibghorai69505 жыл бұрын

    Such a pleasure to hear him talk!

  • @tdkboxster
    @tdkboxster5 жыл бұрын

    What an amazing talk! Crisp and Clear! truly impressed.

  • @rahultiwari2860
    @rahultiwari28605 жыл бұрын

    Thanks for in-depth explaining RDD DF And DS...

  • @ctriz76
    @ctriz765 жыл бұрын

    this is a brilliant and fluid explanation

  • @aliwaheed906
    @aliwaheed9063 жыл бұрын

    Amazing talk, I left off Spark to move in to ML when there was only RDD, I came back and see DataFrame in Spark and I am totally confused, your video helped a lot, Thank you

  • @mjrajeshmj
    @mjrajeshmj4 жыл бұрын

    Excellent talk. Thanks Jules Damji.

  • @shemantkr
    @shemantkr4 жыл бұрын

    it was very insightful, such talks really helps developer why/how one should use structure API

  • @puja9689
    @puja96894 жыл бұрын

    Amazing presentation. Very intuitive..Thanks Boss!

  • @harshtiku3240
    @harshtiku32403 жыл бұрын

    An excellent talk by a clear master.

  • @AllForLove3
    @AllForLove35 жыл бұрын

    Amazing talk! very well explained indeed.

  • @lbasavaraj
    @lbasavaraj6 жыл бұрын

    What a brilliant talk!! Thanks

  • @abhiganta
    @abhiganta5 жыл бұрын

    This is best and clear talk on 3 APIs

  • @soufianebenkhaldoun7765
    @soufianebenkhaldoun77655 жыл бұрын

    Very well explained !! Thank's

  • @jijotitus1755
    @jijotitus17554 жыл бұрын

    Amazing Talk. Thank you!

  • @TheTambourinist
    @TheTambourinist2 жыл бұрын

    Thanks for the video. Very understandable!

  • @pauliewalnuts6734
    @pauliewalnuts67344 жыл бұрын

    so good!!! thanks for this

  • @daviduzumaki
    @daviduzumaki9 ай бұрын

    this guy is such a good speaker

  • @sayandbhattacharya1100
    @sayandbhattacharya11007 ай бұрын

    I had a nice learning time thanks for the talk!

  • @anibaldk
    @anibaldk5 жыл бұрын

    Only 300 likes for such an informative, crystal clear talk??

  • @Dyslexic_Neuron
    @Dyslexic_Neuron5 жыл бұрын

    excellent explanation!! :D

  • @nareshgb1
    @nareshgb16 жыл бұрын

    I am wondering how the "type safe" feature combines with the "unstructured data" that is the nature of data in the systems that spark would be used in.

  • @Blobonat
    @Blobonat4 жыл бұрын

    Very good talk!

  • @varundosapati7148
    @varundosapati71484 жыл бұрын

    I was trying out the example you mentioned @10:46 and as i am getting compile time error, I had to rewrite the final statement as below. parsedRdd.filter( content => content._2 == "en").map(filteredContent => filteredContent._3).reduce(_+_).take(100).foreach(reducedContent => printf(s"$reducedContent._1: $reducedContent._2")) I would really appreciate if you can review above statement

  • @tableauvizwithvineet148
    @tableauvizwithvineet1483 жыл бұрын

    Nice and informative video

  • @vipultyagi1369
    @vipultyagi13693 жыл бұрын

    brilliant talk!

  • @BasemKhalaf-uj7cc
    @BasemKhalaf-uj7ccАй бұрын

    Thank you!

  • @goodyoyo0214
    @goodyoyo02144 жыл бұрын

    Amazing Talk

  • @shankarsr1
    @shankarsr15 жыл бұрын

    awwwwesome talk thanks!

  • @meravchkroun4197
    @meravchkroun41976 жыл бұрын

    Thanks! Can you attach the links here?

  • @nithints302
    @nithints3023 жыл бұрын

    wow

  • @AbhijeetSachdev
    @AbhijeetSachdev6 жыл бұрын

  • @ernesthert1898
    @ernesthert18986 жыл бұрын

    No SS

  • @ernesthert1898
    @ernesthert18986 жыл бұрын

    Hibud

  • @Chris_zacas
    @Chris_zacas5 жыл бұрын

    This was amazing! Pretty well explained! Thanks!

  • @shannithssachin
    @shannithssachin2 жыл бұрын

    Great Talk

Келесі