TechWithViresh

TechWithViresh is committed and specializes in the technology areas like : Machine Learning,AI,Spark,Big Data,Nosql, graph DB,Cassandra and Hadoop ecosystem.

Contact us at : [email protected]
facebook : facebook.com/Tech-Greens

3 жыл бұрын

Azure DataBricks Cluster Deployment | Spark Cluster | Spark Job

4 жыл бұрын

Spark Scala | Connection with Azure Data Lake | Read Data | Write Data | Azure Activity Directory

4 жыл бұрын

Apache Spark | Delta Lake | New Features | Part-2

4 жыл бұрын

Delta Lake | Spark 3 | Apache Spark New Features

4 жыл бұрын

Apache Spark Scala development project setup with Eclipse

4 жыл бұрын

Hadoop Tutorial | HDFS Blocks | Step by Step

4 жыл бұрын

Apache Spark 3 | New Feature | Performance Optimization | Dynamic Partition Pruning

4 жыл бұрын

Spark Performance Optimization | Join | UNION vs OR

4 жыл бұрын

Apache Spark 3 | Design | Architecture | New Features | Interview Question

4 жыл бұрын

Spark Performance Tuning | Memory Architecture | Interview Question

4 жыл бұрын

What is Hadoop | Introduction | Hadoop Tutorial | Architecture

4 жыл бұрын

Spark Interview Question | Bucketing | Spark SQL

4 жыл бұрын

Spark Interview Questions | PySpark and Apache Arrow | What is Apache Arrow

4 жыл бұрын

Spark Interview Question | fold vs reduce

4 жыл бұрын

Spark Interview Question | Clickstream Aanalytics

4 жыл бұрын

Spark Scenario Based Question | ClickStream Analytics

4 жыл бұрын

Spark Interview Question | Cost Based Optimizer

4 жыл бұрын

Hadoop Interview Question | Split Brain Problem

4 жыл бұрын

Spark Interview Question | Map vs MapPartition vs MapPartitionWithIndex

4 жыл бұрын

Spark Interview Questions | Spark Context Vs Spark Session

4 жыл бұрын

Spark Interview Question | Partition Pruning | Predicate Pushdown

4 жыл бұрын

Apache Spark Basics | How Spark Works | Interview Question

4 жыл бұрын

Spark Scenario Interview Question | Persistence Vs Broadcast

4 жыл бұрын

RDD Size Programmatically | Hands-on Code | Interview Question

4 жыл бұрын

Cache vs Persist | Spark Tutorial | Deep Dive

4 жыл бұрын

Spark Execution Model | Spark Tutorial | Interview Questions

4 жыл бұрын

Managing Spark Partitions | Spark Tutorial | Spark Interview Question

4 жыл бұрын

Apache Spark Tutorial | NoSql Database

4 жыл бұрын

Spark Performance Tuning | Avoid GroupBy | Interview Question

Пікірлер

@susanthomas22324 күн бұрын
What about aggregateByKey function in RDD
@thelazitechАй бұрын
To whom may be concerned when to use GroupByKey over ReduceByKey: groupByKey() can be used for non-associative operations, where the order of application of the operation matters. For example, if we want to calculate the median of a set of values for each key, we cannot use reduceByKey(), since median is not an associative operation.
@ldk68532 ай бұрын
Hindu again 🤢
@pankajchikhalwale87694 ай бұрын
Hi, I like your Spark videos. Please create a dedicated video for top 100 most frequently used Spark Commands. - Pankaj C
@sagarrawal77406 ай бұрын
Video recommendatin at the end are blocking the content...
@pmdsngh7 ай бұрын
i see, for RDD its memory and for Dataframe it is mem + disk
@dipakit457 ай бұрын
why are you talking like sleppy mode ??
@raviyadav-dt1tb7 ай бұрын
Please provide aws questions and answers. Thank you 🙏
@avinash70037 ай бұрын
what is MSCK ?
@user-vl1ld3be3n8 ай бұрын
What if I have multiple spark jobs in parallel in on spark session
@adityamathur22848 ай бұрын
For ORC format, schema evolution is not just limited to adding new columns. Backward Compatibility: Adding Columns: New columns can be added to the schema without affecting existing data files. When reading old ORC files with a new schema that includes additional columns, the new columns will be treated as optional and filled with default values. Removing Columns: Similar to Parquet, existing columns can be removed without breaking compatibility. When reading old ORC files with a new schema that excludes certain columns, those columns will be ignored. Changing Data Types: Data types of existing columns can be changed, and ORC will attempt to convert the data to the new type. However, similar to Parquet, this conversion might result in data loss if the types are not compatible. Forward Compatibility: Adding Columns: New columns can be added, and existing files can still be read without errors. The new columns will be filled with default values when data from the old files is read. Removing Columns: Files written with a schema that has fewer columns can still be read with a newer schema containing additional columns. The additional columns will be treated as optional. Changing Data Types: Forward compatibility is generally maintained for changing data types, but careful consideration is needed to avoid potential data loss or conversion issues. above points are what I found supplementing with your content. thanks for your videos and dedication in making them, it is really helpful for my preparation.
@YoSoyWerlix10 ай бұрын
Hi! Why you say Avro is row oriented, isn't also columnar storage?
@srinubathina719111 ай бұрын
Thank you
@srinubathina719111 ай бұрын
Super content thank you
@raviyadav-dt1tb11 ай бұрын
Good sir
@Tarasankarpaul111 ай бұрын
Could you please tell what is the difference between partition pruning and predicate pushdown
@ritikpatil40778 ай бұрын
Both same
@RohanKumar-mh3pt11 ай бұрын
Very Nice and clear explanation before this video i was very confused regarding executor tuning part now after this video it is now crystal clear.
@mdmoniruzzaman703 Жыл бұрын
Hi, 10 nodes means including the master node? i have a configuration like this: "Instances": { "InstanceGroups": [ { "Name": "Master nodes", "Market": "SPOT", "InstanceRole": "MASTER", "InstanceType": "m5.4xlarge", "InstanceCount": 1 }, { "Name": "Worker nodes", "Market": "SPOT", "InstanceRole": "CORE", "InstanceType": "m5.4xlarge", "InstanceCount": 9 } ], "KeepJobFlowAliveWhenNoSteps": false, "TerminationProtected": false },
@venkateshgurram7707 Жыл бұрын
@TechWithViresh: no recent videos. Can you please add . your videos are very useful brother. thanks
@TechWithViresh Жыл бұрын
Thanks, for sure videos coming soon :)
@micheleadriaans6688 Жыл бұрын
Thanks! A great and concise explanation!
@jalsacentre1040 Жыл бұрын
The 2nd map will not executed as no action performed on result data set after collect.
@wafa0196 Жыл бұрын
hello, i find the content very interesting especially on when the hash join is better than the sort merge join. could you please tell me where you found the documentation on that?
@terrificmenace Жыл бұрын
Many thanks to you sir. 😊 i learnt spark from you
@vishalaaa1 Жыл бұрын
very good. please make videos as interview questions on spark as a group of videos
@vishalaaa1 Жыл бұрын
nice
@panduranga Жыл бұрын
Audio quality is not good content is good
@snehakavinkar2240 Жыл бұрын
Limit comes after order by in query execution order, how using limit will reduce the number of records to be sorted? Am I missing anything here?
@Trip-Train Жыл бұрын
Why are you converting dataframe to rdd ?? It is very bad practice in terms of performance
@ajaywade9418 Жыл бұрын
video from 11:30, we are adding random key to exiting towerid key for Example. tower id: 101 and salt key : 67 then 101+67= 168 hash value of the 168 would be a final value right. what in case of partition column is string datatype. ??
@TechWithViresh Жыл бұрын
Incase of strings, we can add surrogate keys, based on string column values and then do the salting.
@SahilSharma-it6gf Жыл бұрын
bhai ye hindi m bta dega toh tera kuch chla jaa rha h kya??
@tanushreenagar3116 Жыл бұрын
Perfect 👌 explanation
@andrewshk8441 Жыл бұрын
Very good and descriptive comparison. Thank you!
@PrajwalSuryawanshi-ds2xs Жыл бұрын
You gave the all information about Hive.. is this enough for interview?
@shivankchaturvedi5875 Жыл бұрын
How the last map operation will run on driver see till collect a job will be completed and whenever we call another action it will create new job with new Dag which will again distributed and run on executors??
@utku83 Жыл бұрын
Good explanation.. Thank you 👍
@ecpavanec Жыл бұрын
can we get ppt that you show in the videos?
@umeshkatighar3635 Жыл бұрын
What If each node has only 8cores?? How does spark allocate 5cores per jvm ?
@bhaskaraggarwal8971 Жыл бұрын
Awesome✨
@ansariasim4463 Жыл бұрын
bro if you have 6 blocks in Hadoop 3 then it consumes 15 blocks. Suppose we have a file which consists of 2 Blocks (B1 and B2). 1) With current HDFS setup, we will have total (2×3 = 6 blocks in total). For Block B1 -> B1.1, B1.2, B1.3 For Block B2 -> B2.1, B2.2, B2.3 2) With EC setup, we will have total (2×2 + 2/2 = 5 blocks in total). For Block B1 -> B1.1, B1.2 For Block B2 -> B2.1, B2.2 The 3rd Copy of each Block will be Xor’ed together and stored as a single Parity Block as (B1.1 xor B2.1) -> Bp In this setup: If B1.1 is corrupted, we can recompute B1.1 = Bp xor B2.1 If B2.1 is corrupted, we can recompute B2.1 = Bp xor B1.1 If both B1.1 and B2.1 are corrupted, then we have another copy of both the blocks (B1.2 and B2.2) If parity Block Bp is corrupted, then it is again recomputed as B1.1 xor B2.1
@ravikumark6746 Жыл бұрын
@Ankit Bansal can you please solve this using SQL please
@amazhobner5 ай бұрын
This isn't instagram where you can tag channels lol
@tarunreddy5917 Жыл бұрын
Is there any differences with performance issues?
@atulgupta9301 Жыл бұрын
Crisp , concise and to the point explanation in great detail. Anyone can understand through this video. Extremely well done. Kudos...
@TechWithViresh Жыл бұрын
Glad it was helpful!
@maheshbhatm9998 Жыл бұрын
Thank you
@TechWithViresh Жыл бұрын
Welcome!
@chandrakamalgupta9116 Жыл бұрын
Thank you
@TechWithViresh Жыл бұрын
Welcome!
@vijaykumarp5882 Жыл бұрын
Good content
@vermad6233 Жыл бұрын
Voice and explanation not clear!
@himanshuramekar6938 Жыл бұрын
Sir will you please make a video that explains the rand() function?
@SpiritOfIndiaaa Жыл бұрын
how can we do percentile() avoiding groupBy ...can you explain it ?
@yeshwanthkumar445 Жыл бұрын
Good one
@TechWithViresh Жыл бұрын
Thank you! Cheers!