Top 5 Mistakes When Writing Spark Applications

Пікірлер: 38

@johnengelhart44538 жыл бұрын
I would love to see an example of the salting side that is missing
@user-zs4jc9zc1z5 жыл бұрын
5 cores per executor did not work for us. For us, the best number is 3 for on-prem, 2 for EMR. Number larger than that gave us IO exception. You need to adjust case by case.
@madhavareddy39278 жыл бұрын
Thank you guys! Done a great job..
@35sherminator2 жыл бұрын
Thanks for superbly breaking down the mistakes and their solutions. Thanks for the excellent presentation.
@charlesli58098 жыл бұрын
awesome sharing, great thanks
@Machin3964 жыл бұрын
I am new to Spark and after viewing this presentation I see there's a lot to learn. I liked it a lot, thanks!
@bensums6 жыл бұрын
At 6:21 it should say divide by 1 + 0.07 not multiply by 1 - 0.07. Also, on more recent versions of Spark it's gone up from 7% to 10%.
@35sherminator
2 жыл бұрын
Absolutely agree, the division is correct.
@gounna17957 жыл бұрын
Great topic, Great explanation!
@gauravkataria17 жыл бұрын
Thanks a lot. Very helpful!
@popicf7 жыл бұрын
but what to do if you have only 7 node cluster with 4 cores and 8GB ram?
@andriimed64084 жыл бұрын
it's awesome, thanks a lot!
@sahebbhattu6 жыл бұрын
Hi Mark, awesome explanation regarding exe and exe mem calculations. But this is for how can we use max number of cores or exe in the environment provide to achieve max parallelism . I would like to add one more point that if we are having so much memory load to deal with, we have to trade off number of exe\cores for executor memory. That means in the case of massive memory load we may have to go with lesser number of executers ( lesser than 17 exe) and keeping higher exe mem per exe ( more than 19 gb .....Please correct me if I am wrong...Thanks.
@kumarrohit83114 жыл бұрын
Anyone noticed Sameer Farooqui clicking photos when QnA started? Awesome guys, all of them!
@kambaalayashwanth1235 жыл бұрын
what about loading small files ?
@dtsleite4 жыл бұрын
What Cloudera knows about spark applications they dont even update their versions.
@rangarajanrao1994 Жыл бұрын
Excellent. Best wishes.
@JoHeN19904 жыл бұрын
The data quality check article mentioned in 22:52 can be found here web.archive.org/web/20181116232422/blog.cloudera.com/blog/2015/07/how-to-do-data-quality-checks-using-apache-spark-dataframes/
@nguyen4so97 жыл бұрын
Very cool :) ..!
@vinothsmart16 жыл бұрын
what was the tool he was talking about for Spark unit testing ?
@clayblythe
6 жыл бұрын
I think he said Junit
@VasileSurdu5 жыл бұрын
why can't they just let them speak and end their presentation for god's sake?? was it that big of a problem letting them finish their last 2 mistakes ? lol.. the last one (caching vs persisting) was very interesting
@CRTagadiya8 жыл бұрын
awesome
@PizdaRusni20232 жыл бұрын
Great
@rajjad6 жыл бұрын
where are the slides?
@sakthivel0215 жыл бұрын
what will be the solution of 2G Spark Shuffle size. ?
@veereshhosagoudar875
4 жыл бұрын
Limit the partitions
@veereshhosagoudar875
4 жыл бұрын
Resize the partion
@sailpawar6164 Жыл бұрын
damn 5 years ago...i absolutely loved the presentation engaging is a difficult job..u did great also is it me or anyone else..these 2 faces looks too familiar by the time video ends
@TheSmartTrendTrader5 жыл бұрын
What is that special collection to do ETL?
@letscodewithvivek5191
3 жыл бұрын
I have the same question..till now i have been doing etl using df only, never used any custom collections..
@AlexanderWhillas5 жыл бұрын
These are also the top reasons Spark is still relatively unpopular :-/
@Machin396
4 жыл бұрын
Really? I thought It was already popular in 2020. If not, what else is gaining attention instead?
@nakget3 жыл бұрын
How each node gets 3 executors at kzread.info/dash/bejne/ia2aqreHnrDbpMo.html ?
@StuggleIsSurreal3 жыл бұрын
Spark, by itself, is not intended to handle CPU-intensive operations on your data. If you have a process against the data that requires a lot of CPU or memory resources and/or is consuming CPU time, move that process into a microservice or competing consumer pattern. This problem will bog down your data handling and prevent you from using Spark effectively.
@MisterKhash5 жыл бұрын
I can't understand what he is saying !!