Learning Journal

Learn more at www.scholarnest.com/

Best place to learn Data engineering, Bigdata, Apache Spark, Databricks, Apache Kafka, Confluent Cloud, AWS Cloud Computing, Azure Cloud, Google Cloud - Self-paced, Instructor-led, Certification courses, and practice tests.

SPARK
www.scholarnest.com/courses/spark-programming-in-python-for-beginners/
www.scholarnest.com/courses/spark-streaming-for-python-programmers/
www.scholarnest.com/courses/spark-programming-in-scala-for-beginners/

KAFKA
www.scholarnest.com/courses/apache-kafka-for-beginners/
www.scholarnest.com/courses/kafka-streams-master-class/

Find us on Udemy
Visy below link for our Udemy Courses
www.learningjournal.guru/courses/

Find us on Oreilly
www.oreilly.com/library/view/apache-kafka-for/9781800202054/
www.oreilly.com/videos/apache-kafka/9781800209343/
www.oreilly.com/videos/kafka-streams-with/9781801811422/

2 ай бұрын

Apache Spark Performance Tuning | Scenario based interview question | Cluster Autoscaling

3 ай бұрын

Apache Spark Performance Tuning Course | Tuning Terabyte Join | Tuning large table joins

3 ай бұрын

11 - Azure Databricks Platform Architecture

3 ай бұрын

10 - Introduction to Databricks Workspace

3 ай бұрын

9 - Creating Databricks Workspace Service

4 ай бұрын

Apache Spark Performance Tuning on Databricks | Scenario based Spark performance tuning course

4 ай бұрын

08 - Azure Portal Overview

4 ай бұрын

07 - Creating Azure Account

4 ай бұрын

06 - What will you learn in this section

4 ай бұрын

05 - Introduction to Databricks Platform

5 ай бұрын

04 - Why do we need Databricks for Apache Spark projects

5 ай бұрын

03 - Introduction to Data Engineering

5 ай бұрын

02 - Course Prerequisites | Master Azure Databricks for Data Engineers

5 ай бұрын

01 - Master Azure Databricks for Data Engineers | About the Course

5 ай бұрын

02 - Course Prerequisites

5 ай бұрын

01 - About The Course

9 ай бұрын

Spark Installation Prerequisites

9 ай бұрын

06 - Batch processing to stream processing

9 ай бұрын

07 - Your first application - Applying Best Practice

9 ай бұрын

08 - Your first streaming application Implementing Stream

9 ай бұрын

05 - Working in Databricks Workspace

9 ай бұрын

04 - Setup your Databricks Community Cloud Environment

9 ай бұрын

03 - Spark Development Environments

9 ай бұрын

Spark Development Environment

9 ай бұрын

Setup and test your IDE

9 ай бұрын

Install and run Apache Kafka

9 ай бұрын

Introduction to Stream Processing

9 ай бұрын

SS10 Creating your first stream processing application Python

9 ай бұрын

Creating your first stream processing application

Пікірлер

@rohanrustagi7857Күн бұрын

What is zookeeper here

@1990mt3 күн бұрын

Hadoop is dead

@F-Zone16 күн бұрын

Hello sir Can you please explain how to connect db2 database from pyspark I'm searching lot of website and other sources i couldn't connect my company's ibm db2 to access tables please help me with supported jar files or what i need to use to access it thanks

@gurumoorthysivakolunthu98787 күн бұрын

Very detailed... best ever explanation of a topic, Sir... This is amazing... Thank you, Sir....

@Sauravsuman110057 күн бұрын

Datanode = 10 16 CPUs / node 64 GB Memory / node Please tell me cluster config we are going to choose ?

@shyamyadav-qk4zb7 күн бұрын

Very good video sir , my doubt is clear now much more helpful 🙏

@arnabghosh2110 күн бұрын

For the same 10 GB file suppose we have following resources: 38 GB worker memory with10 cores, 8gb driver memory with 2 cores, manually configured schuffle partitions - 80. How will it behave?

@ashutoshkandpal679217 күн бұрын

How to become member to unlock videos?

@soumikdas770917 күн бұрын

Nice explanation

@user-uo8jg2qm8q19 күн бұрын

Does creating workspace will charge you??

@SanjayKumar-rw2gj20 күн бұрын

Great explanation, to the point no exaggeration. Thanks for the video

@Wilmar-xz5io21 күн бұрын

best explanation ever!

@VivekKBangaru21 күн бұрын

Very informative video. Thanks

@ayeshakhan872626 күн бұрын

Easy explanation👍

@PANKAJKUMAR-fe8znАй бұрын

Wonderful explanation. I was studying data cloud in salesforce and they were mentioning this data format multiple time. I was clueless but I got clarity from your video. Thank you sir

@rjrmatiasАй бұрын

Excellent vídeo, thank you Master

@nisarirshad8366Ай бұрын

I have completed this course on udemy and highly recommend this course on udemy. It's very well explained and easy to understand.

@abhilashvasanth700Ай бұрын

Hello, Is this page updated ? Can we rely on this by becoming a member and stay updated ? If not, where do all your courses be updated? I took your PySpark course on Udemy. Though the beginning was really good, the later part of the course did not have a continuous flow. How do I enroll to your batch course ?

@srikanthk-yp4wjАй бұрын

to watch all the videos of Databricks course playlist , if we subscribe 199/- or 399/- ?

@AhsanTirmiziVlogsАй бұрын

such an engaging content you dont loose me for a second .. amazing explanation.. bless you brother.. in my language "SAADAA KHUSHBU"

@sonurohini6764Ай бұрын

Great .but follow up question for this by interviwever is s how do we take 4x memory per executor.

@amlansharma5429Ай бұрын

Spark reserved memory is 300 mb in size and executor memory should be atleast 1.5X times of the spark reserved memory, i.e. 450 mb, which is why we are taking executor memory per core as 4X, that sums up as 512mb per executor per core

@justinmurray8313Ай бұрын

Is there still a coupon to get this course for free?

@ABQ...Ай бұрын

Please provide prerequisites

@AmitCodesАй бұрын

How to certify

@saisivamadhav8338Ай бұрын

Awesome sir

@anuragjaiswal1399Ай бұрын

Thanks man it worked.

@ongn1611Ай бұрын

Very simple and precise. Thank you

@nwanebunkemjika7822Ай бұрын

THANKS

@deevjitsaha3168Ай бұрын

is this course suitable for scala users or do we need to have python knowledge?

@tridipdas9930Ай бұрын

What if the cluster size is fixed? Also ,shouldn't we take into account per node constraint? For eg: what if the no. of cores in a node is 4?

@LakshvedhiАй бұрын

very very good and valuable course.

@veerendrashuklaАй бұрын

In the last step, you did kinit , that pulled the tgt and then dev uer could list the files. At what point of time, the client interacted with TGS with this tgt?

@user-dx9pj6bp3wАй бұрын

The course is very well organized

@vvsekhar1Ай бұрын

Thank you so much. Well explained about Root user.

@federico325Ай бұрын

incredible, thanks

@vaibhavtyagi98852 ай бұрын

in last question each and every value you took was default only (128mb, 4, 512mb,5 cores) , so lets say the question is for 50 gb of data then still 3gb would be the answer?

@HIMANSHUMISHRA-yg8dc2 ай бұрын

ModuleNotFoundError: No module named 'pyspark.streaming.kafka' error using command spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.13:3.5.1 live_processing.py can you help please?

@Amarjeet-fb3lk2 ай бұрын

If no. of cores are 5 per executor, At shuffle time, by default it creates 200 partitions,how that 200 partitions will be created,if no of cores are less, because 1 partition will be stored on 1 core. Suppose, that My config is, 2 executor each with 5 core. Now, how it will create 200 partitions if I do a group by operation? There are 10 cores, and 200 partitions are required to store them, right? How is that possible?

@navdeepjha27392 ай бұрын

You can set the no of partitions equal to no. of cores for maximum parallelism. ofcourse, u cannot create 200 partitions in this case

@DUFFERMEHUL4 күн бұрын

In your case if 200 partitions are created, then your degree of Parallelism will be 10, which means 10 partitions will be processed in a single time and then once those slots are free the next 10 partitions will be processed.

@NandhaKumar17122 ай бұрын

Hi , Thanks for the explanation. It really helps. In the above example let's say In right stream we are getting impressionId=4, and we didn't get matching events for id=4 on left stream for long time, Is it possible to get this record also inside foreachbatch() function before it gets dropped by spark?

@prasannakumar70972 ай бұрын

Very well explained

@robertakid7272 ай бұрын

That is an extradentary explanation, Thank you

@oleg20century2 ай бұрын

Best video about this three abstractions

@Mado445552 ай бұрын

thank you for explaining i was looking for a start example to get what it is but videos were like explaining to some experts well i figured out to follow your steps , after running the code and done the ncat command i m getting errors and first one is: "chk-point-dir" any help

@CloudandTechie2 ай бұрын

C:\kafka\bin\windows>kafka-console-producer.bat --topic test2 --broker-list localhost:9092 < ..\data\sample1.csv The system cannot find the path specified. how to fix this error

@rajibinus2 ай бұрын

Insightful explanation. Thanks for the video.

@rohanrustagi7857Күн бұрын
What is zookeeper here
@1990mt3 күн бұрын
Hadoop is dead
@F-Zone16 күн бұрын
Hello sir Can you please explain how to connect db2 database from pyspark I'm searching lot of website and other sources i couldn't connect my company's ibm db2 to access tables please help me with supported jar files or what i need to use to access it thanks
@gurumoorthysivakolunthu98787 күн бұрын
Very detailed... best ever explanation of a topic, Sir... This is amazing... Thank you, Sir....
@Sauravsuman110057 күн бұрын
Datanode = 10 16 CPUs / node 64 GB Memory / node Please tell me cluster config we are going to choose ?
@shyamyadav-qk4zb7 күн бұрын
Very good video sir , my doubt is clear now much more helpful 🙏
@arnabghosh2110 күн бұрын
For the same 10 GB file suppose we have following resources: 38 GB worker memory with10 cores, 8gb driver memory with 2 cores, manually configured schuffle partitions - 80. How will it behave?
@ashutoshkandpal679217 күн бұрын
How to become member to unlock videos?
@soumikdas770917 күн бұрын
Nice explanation
@user-uo8jg2qm8q19 күн бұрын
Does creating workspace will charge you??
@SanjayKumar-rw2gj20 күн бұрын
Great explanation, to the point no exaggeration. Thanks for the video
@Wilmar-xz5io21 күн бұрын
best explanation ever!
@VivekKBangaru21 күн бұрын
Very informative video. Thanks
@ayeshakhan872626 күн бұрын
Easy explanation👍
@PANKAJKUMAR-fe8znАй бұрын
Wonderful explanation. I was studying data cloud in salesforce and they were mentioning this data format multiple time. I was clueless but I got clarity from your video. Thank you sir
@rjrmatiasАй бұрын
Excellent vídeo, thank you Master
@nisarirshad8366Ай бұрын
I have completed this course on udemy and highly recommend this course on udemy. It's very well explained and easy to understand.
@abhilashvasanth700Ай бұрын
Hello, Is this page updated ? Can we rely on this by becoming a member and stay updated ? If not, where do all your courses be updated? I took your PySpark course on Udemy. Though the beginning was really good, the later part of the course did not have a continuous flow. How do I enroll to your batch course ?
@srikanthk-yp4wjАй бұрын
to watch all the videos of Databricks course playlist , if we subscribe 199/- or 399/- ?
@AhsanTirmiziVlogsАй бұрын
such an engaging content you dont loose me for a second .. amazing explanation.. bless you brother.. in my language "SAADAA KHUSHBU"
@sonurohini6764Ай бұрын
Great .but follow up question for this by interviwever is s how do we take 4x memory per executor.
@amlansharma5429Ай бұрын
Spark reserved memory is 300 mb in size and executor memory should be atleast 1.5X times of the spark reserved memory, i.e. 450 mb, which is why we are taking executor memory per core as 4X, that sums up as 512mb per executor per core
@justinmurray8313Ай бұрын
Is there still a coupon to get this course for free?
@ABQ...Ай бұрын
Please provide prerequisites
@AmitCodesАй бұрын
How to certify
@saisivamadhav8338Ай бұрын
Awesome sir
@anuragjaiswal1399Ай бұрын
Thanks man it worked.
@ongn1611Ай бұрын
Very simple and precise. Thank you
@nwanebunkemjika7822Ай бұрын
THANKS
@deevjitsaha3168Ай бұрын
is this course suitable for scala users or do we need to have python knowledge?
@tridipdas9930Ай бұрын
What if the cluster size is fixed? Also ,shouldn't we take into account per node constraint? For eg: what if the no. of cores in a node is 4?
@LakshvedhiАй бұрын
very very good and valuable course.
@veerendrashuklaАй бұрын
In the last step, you did kinit , that pulled the tgt and then dev uer could list the files. At what point of time, the client interacted with TGS with this tgt?
@user-dx9pj6bp3wАй бұрын
The course is very well organized
@vvsekhar1Ай бұрын
Thank you so much. Well explained about Root user.
@federico325Ай бұрын
incredible, thanks
@vaibhavtyagi98852 ай бұрын
in last question each and every value you took was default only (128mb, 4, 512mb,5 cores) , so lets say the question is for 50 gb of data then still 3gb would be the answer?
@HIMANSHUMISHRA-yg8dc2 ай бұрын
ModuleNotFoundError: No module named 'pyspark.streaming.kafka' error using command spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.13:3.5.1 live_processing.py can you help please?
@Amarjeet-fb3lk2 ай бұрын
If no. of cores are 5 per executor, At shuffle time, by default it creates 200 partitions,how that 200 partitions will be created,if no of cores are less, because 1 partition will be stored on 1 core. Suppose, that My config is, 2 executor each with 5 core. Now, how it will create 200 partitions if I do a group by operation? There are 10 cores, and 200 partitions are required to store them, right? How is that possible?
@navdeepjha27392 ай бұрын
You can set the no of partitions equal to no. of cores for maximum parallelism. ofcourse, u cannot create 200 partitions in this case
@DUFFERMEHUL4 күн бұрын
In your case if 200 partitions are created, then your degree of Parallelism will be 10, which means 10 partitions will be processed in a single time and then once those slots are free the next 10 partitions will be processed.
@NandhaKumar17122 ай бұрын
Hi , Thanks for the explanation. It really helps. In the above example let's say In right stream we are getting impressionId=4, and we didn't get matching events for id=4 on left stream for long time, Is it possible to get this record also inside foreachbatch() function before it gets dropped by spark?
@prasannakumar70972 ай бұрын
Very well explained
@robertakid7272 ай бұрын
That is an extradentary explanation, Thank you
@oleg20century2 ай бұрын
Best video about this three abstractions
@Mado445552 ай бұрын
thank you for explaining i was looking for a start example to get what it is but videos were like explaining to some experts well i figured out to follow your steps , after running the code and done the ncat command i m getting errors and first one is: "chk-point-dir" any help
@CloudandTechie2 ай бұрын
C:\kafka\bin\windows>kafka-console-producer.bat --topic test2 --broker-list localhost:9092 < ..\data\sample1.csv The system cannot find the path specified. how to fix this error
@rajibinus2 ай бұрын
Insightful explanation. Thanks for the video.