Advancing Spark - How to pass the Spark 3.0 accreditation!
With the announcement of Spark 3.0 comes a new certification - Accredited Developer for Apache Spark 3.0! Simon recently took the exam and is here to share some advice and study notes about how you too can become a certified Spark 3.0 Developer!
If you're interested in Spark/Databricks training, don't forget to check out our website www.advancinganalytics.co.uk/... or just get in touch to find out when our next public courses are happening!
Planning on taking the exam? Already certified? Let us know in the comments!
Пікірлер: 101
This is great. Thank you so much for posting such helpful information!
Nice simple explanation to help map out my certification journey. Thanks!
Too good , now i got enough confidence to hit the exam. Thank you
Awesome pictorial explanation of the physical architecture. The explanation of slots and how they relate to tasks was super enlightening. Thank you very much!!! :)
Superb explanation with much clarify. Not seen anything like this any tutorial. Thanks for posting it. We need more from you 👌👌👏👏 I will refer this channel for my whole office team.
Thanks! Great video! I loved Spark Architecture's explanation (4:19)
This video helped me a lot to take the exam, thank youuu!!!
Thank you so much for the book recommendation , I would also highly recommend the same book and also make your own notes from the book. It took me 3 weeks of preparation to pass the exam. Thank you so much 🙏🏻
Best explanation I've ever seen
Thank you a lot for this video. I am taking the exam on Wednesday. Keep your fingers crossed for me! :)
@GhernieM
3 жыл бұрын
I nailed it. If you follow these advices, you will surely pass it.
I got lot of information from this video which helped me to pass the certification today.. thank you
@AdvancingAnalytics
3 жыл бұрын
Wahey! Congrats on passing!
@sundarkris1320
3 жыл бұрын
Is it 200$ for one attempt?
@headindata
2 жыл бұрын
@@sundarkris1320 correct, as of today. If you do not pass the exam, you will have to pay $200 again to retake it.
@pankajbaghela8903
2 жыл бұрын
Can you help me to pass this exam
@niru9048
2 жыл бұрын
Hi Siva. Could you please help out on the validity aspect of this certification? However, if I directly try to see some public badge issued to few people, it shows expiration date as 2 years from issue date. In few KZread videos it mentions it never expires but is tied to the specific version of Spark. Could you please help out on this. I can't seem to find clarification anywhere.
Excellent
Thanks for the informative video. I am preparing for the Spark Scala certification and felt Python API docs is much better than Scala API which is having a lot of information and examples
Thanks so much for this video! I read through "The Definitive Guide" and felt ok, but not super confident, I watched this (and some of your other videos) in the week leading up to the exam, and I just passed!
@AdvancingAnalytics
3 жыл бұрын
Woohoo! Congrats on passing - glad our videos helped!
@niru9048
2 жыл бұрын
@@AdvancingAnalytics Could you please help out on the validity aspect of this certification? However, if I directly try to see some public badge issued to few people, it shows expiration date as 2 years from issue date. In few KZread videos it mentions it never expires but is tied to the specific version of Spark. Could you please help out on this. I can't seem to find clarification anywhere.
Thank you man :D
The exam does not require an external webcam when given on Laptops. This video gave me some good points for exam day. Appreciate the work being done here👍🏻
@AdvancingAnalytics
2 жыл бұрын
Ah cool - it was stated on the instructions when I originally took it, guess they've relaxed as the world has gone more remote :)
This is a great video! I have a question since this exam will only test the data frames API, should we go through all the Pyspark functions, or just the data frames and SQL functions are required? Thanks!! Expecting more videos of such from you. :)
Thanks, I didn't even notice that there is a pdf of spark doc to use in the exam!
@headindata
2 жыл бұрын
Dewei Zhai, Databricks also recently published the actual PDF version of the spark doc you see in the exam here: www.webassessor.com/zz/DATABRICKS/Python_v2.html
Learning Spark with David Guetta, tomorrow is my assessment, I hope approve 🍀
@KZoldyck1
Жыл бұрын
Passed!
It helps me a lot on the prep of certification on Spark 3.0 thanks!
@divyadarshan8914
3 жыл бұрын
Any tips on practice material besides definitive guide and official docs
This was a fantastic video - thank you so much for sharing this content! Subscribed!
@AdvancingAnalytics
3 жыл бұрын
Thanks for subscribing. I am glad it helped.
Hey, great content! Quick question: Did you have any questions on Spark MLlib that required understanding of the actual algorithms or.. at all? Thanks for the info!
@AdvancingAnalytics
3 жыл бұрын
Nope, there's no requirement for knowing the data science libraries, pure spark engineering!
Hi Simon: Are you aware of any full length practice exams for the DataBricks certification. I would like to take one of those mock exams before diving in. Thanks
cool, let's get it done.
@AdvancingAnalytics
3 жыл бұрын
Good luck!
Any inputs on the resources to help prepare for Databricks Professional Data Engineer certification? Genuinely appreciate the inputs !!
Hi! Great content on Your channel. I was wondering if You could make a certificate comparison of Associate Developer and Associate Data Engineer (not the professional DE) in terms of what materials one should add to prepare for the Associate DE exam. Cheers! Edit: Would be nice to see your thoughts about Professional DE cert as well :)
@AdvancingAnalytics
2 жыл бұрын
Good suggestion, I've not dug into the various new certifications since making this video, probably worth revisiting now there's such a range out there. I should also probably actually run through the Professional Data Engineer cert at some point too! :D Simon
@micha5781
2 жыл бұрын
@@AdvancingAnalytics That would be great. Your material is always very helpful!
Hi, great content. Gives a good idea on the difficulty level of the exam. Does the exam contains question on streaming?
@madhu1987ful
3 жыл бұрын
No questions on streaming
@joyo2122
2 жыл бұрын
there are different leveles of exam
Thanks for this video. I had a question Which is the best certification for spark? Which would you recommend and why?
@AdvancingAnalytics
3 жыл бұрын
Hey - the only spark cert I'm really aware of it the Databricks Certified Associate Developer one, you've got a choice of Scala & Python, but it's generally a good overview of the tool, digs into understanding of the engine/architecture etc - academy.databricks.com/exam/databricks-certified-associate-developer
@nikhildavis3844
3 жыл бұрын
@@AdvancingAnalytics thank you
Could you please suggest how or from where to practice the format of this test, to be prepared with managing time.
@AdvancingAnalytics
3 жыл бұрын
Hola! I've not seen any practice tests, although there may be some around! As for actual practice/preparation - Databricks have a free community edition, it's a single-node public cluster, but great for practicing: databricks.com/try-databricks
Any suggestions on how to practice. Understanding the concepts is one thing but until you have practiced on some sample questions, or problem statements, its bit tough to get level to confidence to appear for exam
@AdvancingAnalytics
3 жыл бұрын
Hey, sorry - missed this during the break. The best way to practice is to spin up the Databricks community edition - it's a free learning environment! The Databricks docs have a ton of example notebooks that you can import & work through the code with. After that, pick up a personal project & work it through in anger. I'm definitely a "don't learn it till I try it out myself" person! Simon
Good morning! Could you explain better how do you define the ideal number of partitions on a shuffle setting?
How to access the notebook being shown in demo
Excellent! How about Low level APIs ? RDDs ? are there questions about that? Thank you..
@AdvancingAnalytics
4 жыл бұрын
Can't go into actual questions but the exam is focused on the DataFrame API so there's no driver for low level API commands. Understanding how data stores RDDs & how different DataFrame transformations impact RDDs behind the scenes should put you in the right place!
@adrianajimenez523
4 жыл бұрын
@@AdvancingAnalytics thanks you for your time and anwser :)
@adrianajimenez523
4 жыл бұрын
@@AdvancingAnalytics I want to share with you that I passed the exam!! =D thank you for all your videos about databricks. It helped me a lot to complete my learning!
@AdvancingAnalytics
4 жыл бұрын
@@adrianajimenez523 woohoo! That's great to hear, congratulations! Glad the videos helped :) Simon
@nva1719
3 жыл бұрын
Hi guys, can you please let me know if there were questions on Delta lake. I will be giving the exam in less than 2 weeks. I was planning to write 2.4 first and then write 3.0. only difference between them portion wise is Delta Lake.
It's long time that I am looking the explanation about slot.please safe me
Thank you for making this video. I have 2 questions 1) Will there be questions with more than 1 option correct? 2) do they negative marks for incorrect questions?
@AdvancingAnalytics
2 жыл бұрын
I honestly cannot recall if there are options will multiple correct answers, hopefully someone else can help! There are no negative marks for incorrect questions.
@headindata
2 жыл бұрын
Hello Sanjeev. There is only one correct answer per question.
@murifedontrun3363
2 жыл бұрын
@@headindata Thank you sir for responding :)
is the exam available only online?do we have any test centres to take the exam
Can you clarify if a single task can run on multiple slots? Or is it that every task should be granular enough to run on a single slot.
@AdvancingAnalytics
3 жыл бұрын
Hey - a single task can only run on one slot. That means a slot cannot spread across multiple workers (which makes sense as it's data held in memory). So the size of your RDD blocks / Tasks affects how neatly you can utilise the available slots across your workers. Too chunky and they don't spread evenly, too small and there's an overhead of accessing each task and things slow down. It's a tricky balance :) Simon
is it for fresher who doesn't know anything about spark , do we need any prior experience before giving the exam.
hello, do you know about this other certification? Databricks Certified Professional Data Engineer
Can an executor span across multiple worked nodes? Lets say if during spark submit I asked for 4 executors and 4 cores, and the cluster has 8 nodes (2 core each), would the "logical" executor theorectically be spanned across nodes? OR each executor will be granted 2 cores only?
@AdvancingAnalytics
2 жыл бұрын
Don't believe an executor can span across machines/nodes. Lots of managed spark platforms assume a single executor per node, as there's not much benefit of splitting a node across multiple workers
Hi which course I need to select to get databricks spark 3.0 certificate
@AdvancingAnalytics
3 жыл бұрын
Hey - there's a specific "Associate Developer For Apache Spark 3.0" course - academy.databricks.com/exam/databricks-certified-associate-developer
Can you use ctr+f or some other search functionality on the pdf provided ?
@AdvancingAnalytics
2 жыл бұрын
Not at the time - had to get really good at scrolling :D - that said, pyspark docs have changed quite a bit since this video, not sure if the format for the exam has been changed to keep up to date!
browsing documentation is allowed? Because they are providing pdf.. So I am wondering if that same document is allowed to search in browser... Thanks for this video, lots of information
@AdvancingAnalytics
3 жыл бұрын
I don't recall there being a search mechanism, everything is embedded in the testing program. Better to just be familiar with the docs and good at scrolling! :)
@veraclmartins
3 жыл бұрын
@@AdvancingAnalytics Hi! Just a kick question... How is the pdf version of the documentation organized? Is it divided by modules and each module with their classes, methods and attributes... or...? I don't know, any tips? :)
@fernandosouza2388
3 жыл бұрын
@@veraclmartins Did you get some awnser?
so if i make a join then filter then group is a job where i have to shuffle ?
@AdvancingAnalytics
2 жыл бұрын
Hey! You will have one spark job, but that job will have multiple stages. Each time you see a stage, it means there is a shuffle. So a join/filter/group transformation could have two shuffles, one if the join is wide, one if the group is wide. You would have one job and three stages in this case. Hope that makes sense!
This applies only for developer associate correct ? Could you please share details for developer professional ?
@AdvancingAnalytics
2 жыл бұрын
Ooh - I hadn't even seen the "Certified Professional Data Engineer" course was introduced! I haven't taken the exam, if/when I do, I'll make a video! Simon
Is the laptop internal camera acceptable for this exam?
@AdvancingAnalytics
3 жыл бұрын
They list an external camera as a requirement - they ask you to set it up so they can see you from the side, including your screen. Internal laptop cam might disqualify you, not sure!
@headindata
2 жыл бұрын
As far as I know the internal camera is acceptable.
Please share the exam code no for spark 3.0
@AdvancingAnalytics
3 жыл бұрын
I don't believe it has a code - it's a certification backed by Databricks not Microsoft. I had a skim of the website, my purchase, the exam certificate etc and they all refer to it as "Databricks Certified Associate Developer for Apache Spark 3.0" - no code to be found! academy.databricks.com/exam/databricks-certified-associate-developer
What if I know Scala spark and not pyspark does the exam consider this .
@AdvancingAnalytics
3 жыл бұрын
Yep, there are two different flavours of the exam, one for Scala and one for Pyspark. From what we've heard, the Scala one is slightly harder as the documentation is a little harder to navigate, but if you're familiar with the Scala docs it'll be fine! Simon
@abdullahsiddique7787
3 жыл бұрын
@@AdvancingAnalytics thanks much Simon
How much time needed to prepare for this certification
@AdvancingAnalytics
3 жыл бұрын
Depends on your level of spark experience! If you've been using spark most days for a year or so, you'll get by with a day or two of refreshing & cramming. If you're new to spark, it could take a couple of weeks of research, learning & revising. It's very hard to say!
@headindata
2 жыл бұрын
I would argue that some of the architecture questions in the exam are quite tricky, even if you have been working with Spark for a while. So, differing from Simon, I would say that you need at least a week of review, even if you have been using Spark for a while.
First
If there are 8 cores available in total in worker nodes and spark default shuffle partitions is 200, what happens? How does 200 make sense when only 8 slots are available? Pls explain. Thanks
@AdvancingAnalytics
3 жыл бұрын
The 200 tasks are allocated across the workers, the slots will chunk through the tasks (so each of the 8 slots will likely process 25 tasks). So you generally want the default partitions to be a clean multiple of the number of cores as a rule of thumb. But yeah, it's likely that the 200 default isn't right for that size cluster. The modern spark engine (Spark 3.0 / Databricks runtime 7+) uses a few techniques to override the default during query execution and actually pick an appropriate number of shuffle partitions :)
When you lean back the audio quality getting bad