Learn more at www.scholarnest.com/
Best place to learn Data engineering, Bigdata, Apache Spark, Databricks, Apache Kafka, Confluent Cloud, AWS Cloud Computing, Azure Cloud, Google Cloud - Self-paced, Instructor-led, Certification courses, and practice tests.
SPARK
www.scholarnest.com/courses/spark-programming-in-python-for-beginners/
www.scholarnest.com/courses/spark-streaming-for-python-programmers/
www.scholarnest.com/courses/spark-programming-in-scala-for-beginners/
KAFKA
www.scholarnest.com/courses/apache-kafka-for-beginners/
www.scholarnest.com/courses/kafka-streams-master-class/
Find us on Udemy
Visy below link for our Udemy Courses
www.learningjournal.guru/courses/
Find us on Oreilly
www.oreilly.com/library/view/apache-kafka-for/9781800202054/
www.oreilly.com/videos/apache-kafka/9781800209343/
www.oreilly.com/videos/kafka-streams-with/9781801811422/
Пікірлер
What is zookeeper here
Hadoop is dead
Hello sir Can you please explain how to connect db2 database from pyspark I'm searching lot of website and other sources i couldn't connect my company's ibm db2 to access tables please help me with supported jar files or what i need to use to access it thanks
Very detailed... best ever explanation of a topic, Sir... This is amazing... Thank you, Sir....
Datanode = 10 16 CPUs / node 64 GB Memory / node Please tell me cluster config we are going to choose ?
Very good video sir , my doubt is clear now much more helpful 🙏
For the same 10 GB file suppose we have following resources: 38 GB worker memory with10 cores, 8gb driver memory with 2 cores, manually configured schuffle partitions - 80. How will it behave?
How to become member to unlock videos?
Nice explanation
Does creating workspace will charge you??
Great explanation, to the point no exaggeration. Thanks for the video
best explanation ever!
Very informative video. Thanks
Easy explanation👍
Wonderful explanation. I was studying data cloud in salesforce and they were mentioning this data format multiple time. I was clueless but I got clarity from your video. Thank you sir
Excellent vídeo, thank you Master
I have completed this course on udemy and highly recommend this course on udemy. It's very well explained and easy to understand.
Hello, Is this page updated ? Can we rely on this by becoming a member and stay updated ? If not, where do all your courses be updated? I took your PySpark course on Udemy. Though the beginning was really good, the later part of the course did not have a continuous flow. How do I enroll to your batch course ?
to watch all the videos of Databricks course playlist , if we subscribe 199/- or 399/- ?
such an engaging content you dont loose me for a second .. amazing explanation.. bless you brother.. in my language "SAADAA KHUSHBU"
Great .but follow up question for this by interviwever is s how do we take 4x memory per executor.
Spark reserved memory is 300 mb in size and executor memory should be atleast 1.5X times of the spark reserved memory, i.e. 450 mb, which is why we are taking executor memory per core as 4X, that sums up as 512mb per executor per core
Is there still a coupon to get this course for free?
Please provide prerequisites
How to certify
Awesome sir
Thanks man it worked.
Very simple and precise. Thank you
THANKS
is this course suitable for scala users or do we need to have python knowledge?
What if the cluster size is fixed? Also ,shouldn't we take into account per node constraint? For eg: what if the no. of cores in a node is 4?
very very good and valuable course.
In the last step, you did kinit , that pulled the tgt and then dev uer could list the files. At what point of time, the client interacted with TGS with this tgt?
The course is very well organized
Thank you so much. Well explained about Root user.
incredible, thanks
in last question each and every value you took was default only (128mb, 4, 512mb,5 cores) , so lets say the question is for 50 gb of data then still 3gb would be the answer?
ModuleNotFoundError: No module named 'pyspark.streaming.kafka' error using command spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.13:3.5.1 live_processing.py can you help please?
If no. of cores are 5 per executor, At shuffle time, by default it creates 200 partitions,how that 200 partitions will be created,if no of cores are less, because 1 partition will be stored on 1 core. Suppose, that My config is, 2 executor each with 5 core. Now, how it will create 200 partitions if I do a group by operation? There are 10 cores, and 200 partitions are required to store them, right? How is that possible?
You can set the no of partitions equal to no. of cores for maximum parallelism. ofcourse, u cannot create 200 partitions in this case
In your case if 200 partitions are created, then your degree of Parallelism will be 10, which means 10 partitions will be processed in a single time and then once those slots are free the next 10 partitions will be processed.
Hi , Thanks for the explanation. It really helps. In the above example let's say In right stream we are getting impressionId=4, and we didn't get matching events for id=4 on left stream for long time, Is it possible to get this record also inside foreachbatch() function before it gets dropped by spark?
Very well explained
That is an extradentary explanation, Thank you
Best video about this three abstractions
thank you for explaining i was looking for a start example to get what it is but videos were like explaining to some experts well i figured out to follow your steps , after running the code and done the ncat command i m getting errors and first one is: "chk-point-dir" any help
C:\kafka\bin\windows>kafka-console-producer.bat --topic test2 --broker-list localhost:9092 < ..\data\sample1.csv The system cannot find the path specified. how to fix this error
Insightful explanation. Thanks for the video.