Data Savvy

3 жыл бұрын

Data Engineering Interview | Apache Spark Interview | Live Big Data Interview

Пікірлер

@biswadeeppatra17268 күн бұрын

Please share the doc that you are using in this video

@VivekKBangaru9 күн бұрын

very informative one. Thanks Buddy.

@akashhudge573510 күн бұрын

In lambda architecture so far no one has explained how de duplication is handled when the batch and stream processing data is combined in serving layer? whatever data is processed by streaming layer will eventually gets processed in batch layer? if this is true then previous streaming layer processed data is no more required. so do we need to remove that data processed by streaming layer?

@jasbirkumar777022 күн бұрын

sir can you tell me some about housekeeping executive spark deta. i dont understand spark word. facility company JLL requird he have spark exprience

@deepanshuaggarwal704228 күн бұрын

"flatMapGroupsWithState" is a statefull operation? Do you have any tutorial on it?

@thelazitechАй бұрын

To whom may be concerned when to use GroupByKey over ReduceByKey: groupByKey() can be used for non-associative operations, where the order of application of the operation matters. For example, if we want to calculate the median of a set of values for each key, we cannot use reduceByKey(), since median is not an associative operation.

@sreekantha2010Ай бұрын

Awesome!! wonderful explanation. Before this, I have see so many videos but none of those explained the steps in such a clarity. Thank you sharing.

@BishalKarki-pe8hsАй бұрын

vak mugi

@ldk68532 ай бұрын

Terrible accent… 😮

@maturinagababu982 ай бұрын

Hi sir pls help me with the following requirement id|count| +---+-----+ | a| 3| | b| 2| | c| 4| +---+-----+ need the following output using spark a a a b b c c c c

@ramyajyothi86972 ай бұрын

What do you mean by application needing a lot of joins? Can you please clarify how the joins are affecting the architecture decision?

@suresh.suthar.243 ай бұрын

i have one doubt: reserved memory and yarn overhead memory are same ? because reserved memory also stored spark internals. Thank you for your time.

@ahmedaly69993 ай бұрын

how i join small table with big table but i want to fetch all the data in small table like the small table is 100k record and large table is 1 milion record df = smalldf.join(largedf, smalldf.id==largedf.id , how = 'left_outerjoin') it makes out of memory and i cant do broadcast the small df idont know why what is best case here pls help

@naveena22263 ай бұрын

Hi @all I just got to know about the wonderful videos in datasavvy channel. In that executor OOM - big partitions slide, in spark every partition is of block size only ryt(128MB) , then how come big partition will cause an issue? Can Simeon please explain this? Little confused here Even if there is 10gb file , when spark reads the file it creates around 80 partition of 128mb. Even if one of the partition is high it cannot increase 128mb ryt.. then how come OOM occurs??

@sheshkumar85023 ай бұрын

Hi how are you

@praptijoshi91024 ай бұрын

amazing

@adityakvs35294 ай бұрын

I have who take care of task sheduling

@kaladharnaidusompalyam8514 ай бұрын

If we maintain replica of data in three diff racks in hadoop. if we submit job we get results right. hy we dont get copies of data execution. how can / what is the operation that is there in hadoop only one block of data ned to process in hadoop if we have two more duplictaes

@prathapganesh70214 ай бұрын

Great content thank you

@RakeshYadav-cw3zf4 ай бұрын

very well explained

@ayushigupta5424 ай бұрын

Great content! Are you on Topmate or any other platform where I can connect with you. Need some career advice/guidance from you.

@Pratik09174 ай бұрын

Then people arent using dataset everywhere?

@user-du9wb1oe7t4 ай бұрын

Hi Harjeet, Getting Kafka utils not found error while creating dstream

@harshitsingh98424 ай бұрын

where is the volume?

@harshitsingh98424 ай бұрын

Having a diff table at the end of the video would be appreciated.