Learning Journal

Learning Journal

Learn more at www.scholarnest.com/

Best place to learn Data engineering, Bigdata, Apache Spark, Databricks, Apache Kafka, Confluent Cloud, AWS Cloud Computing, Azure Cloud, Google Cloud - Self-paced, Instructor-led, Certification courses, and practice tests.

SPARK
www.scholarnest.com/courses/spark-programming-in-python-for-beginners/
www.scholarnest.com/courses/spark-streaming-for-python-programmers/
www.scholarnest.com/courses/spark-programming-in-scala-for-beginners/

KAFKA
www.scholarnest.com/courses/apache-kafka-for-beginners/
www.scholarnest.com/courses/kafka-streams-master-class/

Find us on Udemy
Visy below link for our Udemy Courses
www.learningjournal.guru/courses/

Find us on Oreilly
www.oreilly.com/library/view/apache-kafka-for/9781800202054/
www.oreilly.com/videos/apache-kafka/9781800209343/
www.oreilly.com/videos/kafka-streams-with/9781801811422/

02 - Course Prerequisites

02 - Course Prerequisites

01 - About The Course

01 - About The Course

Setup and test your IDE

Setup and test your IDE

Пікірлер

  • @rohanrustagi7857
    @rohanrustagi7857Күн бұрын

    What is zookeeper here

  • @1990mt
    @1990mt3 күн бұрын

    Hadoop is dead

  • @F-Zone1
    @F-Zone16 күн бұрын

    Hello sir Can you please explain how to connect db2 database from pyspark I'm searching lot of website and other sources i couldn't connect my company's ibm db2 to access tables please help me with supported jar files or what i need to use to access it thanks

  • @gurumoorthysivakolunthu9878
    @gurumoorthysivakolunthu98787 күн бұрын

    Very detailed... best ever explanation of a topic, Sir... This is amazing... Thank you, Sir....

  • @Sauravsuman11005
    @Sauravsuman110057 күн бұрын

    Datanode = 10 16 CPUs / node 64 GB Memory / node Please tell me cluster config we are going to choose ?

  • @shyamyadav-qk4zb
    @shyamyadav-qk4zb7 күн бұрын

    Very good video sir , my doubt is clear now much more helpful 🙏

  • @arnabghosh21
    @arnabghosh2110 күн бұрын

    For the same 10 GB file suppose we have following resources: 38 GB worker memory with10 cores, 8gb driver memory with 2 cores, manually configured schuffle partitions - 80. How will it behave?

  • @ashutoshkandpal6792
    @ashutoshkandpal679217 күн бұрын

    How to become member to unlock videos?

  • @soumikdas7709
    @soumikdas770917 күн бұрын

    Nice explanation

  • @user-uo8jg2qm8q
    @user-uo8jg2qm8q19 күн бұрын

    Does creating workspace will charge you??

  • @SanjayKumar-rw2gj
    @SanjayKumar-rw2gj20 күн бұрын

    Great explanation, to the point no exaggeration. Thanks for the video

  • @Wilmar-xz5io
    @Wilmar-xz5io21 күн бұрын

    best explanation ever!

  • @VivekKBangaru
    @VivekKBangaru21 күн бұрын

    Very informative video. Thanks

  • @ayeshakhan8726
    @ayeshakhan872626 күн бұрын

    Easy explanation👍

  • @PANKAJKUMAR-fe8zn
    @PANKAJKUMAR-fe8znАй бұрын

    Wonderful explanation. I was studying data cloud in salesforce and they were mentioning this data format multiple time. I was clueless but I got clarity from your video. Thank you sir

  • @rjrmatias
    @rjrmatiasАй бұрын

    Excellent vídeo, thank you Master

  • @nisarirshad8366
    @nisarirshad8366Ай бұрын

    I have completed this course on udemy and highly recommend this course on udemy. It's very well explained and easy to understand.

  • @abhilashvasanth700
    @abhilashvasanth700Ай бұрын

    Hello, Is this page updated ? Can we rely on this by becoming a member and stay updated ? If not, where do all your courses be updated? I took your PySpark course on Udemy. Though the beginning was really good, the later part of the course did not have a continuous flow. How do I enroll to your batch course ?

  • @srikanthk-yp4wj
    @srikanthk-yp4wjАй бұрын

    to watch all the videos of Databricks course playlist , if we subscribe 199/- or 399/- ?

  • @AhsanTirmiziVlogs
    @AhsanTirmiziVlogsАй бұрын

    such an engaging content you dont loose me for a second .. amazing explanation.. bless you brother.. in my language "SAADAA KHUSHBU"

  • @sonurohini6764
    @sonurohini6764Ай бұрын

    Great .but follow up question for this by interviwever is s how do we take 4x memory per executor.

  • @amlansharma5429
    @amlansharma5429Ай бұрын

    Spark reserved memory is 300 mb in size and executor memory should be atleast 1.5X times of the spark reserved memory, i.e. 450 mb, which is why we are taking executor memory per core as 4X, that sums up as 512mb per executor per core

  • @justinmurray8313
    @justinmurray8313Ай бұрын

    Is there still a coupon to get this course for free?

  • @ABQ...
    @ABQ...Ай бұрын

    Please provide prerequisites

  • @AmitCodes
    @AmitCodesАй бұрын

    How to certify

  • @saisivamadhav8338
    @saisivamadhav8338Ай бұрын

    Awesome sir

  • @anuragjaiswal1399
    @anuragjaiswal1399Ай бұрын

    Thanks man it worked.

  • @ongn1611
    @ongn1611Ай бұрын

    Very simple and precise. Thank you

  • @nwanebunkemjika7822
    @nwanebunkemjika7822Ай бұрын

    THANKS

  • @deevjitsaha3168
    @deevjitsaha3168Ай бұрын

    is this course suitable for scala users or do we need to have python knowledge?

  • @tridipdas9930
    @tridipdas9930Ай бұрын

    What if the cluster size is fixed? Also ,shouldn't we take into account per node constraint? For eg: what if the no. of cores in a node is 4?

  • @Lakshvedhi
    @LakshvedhiАй бұрын

    very very good and valuable course.

  • @veerendrashukla
    @veerendrashuklaАй бұрын

    In the last step, you did kinit , that pulled the tgt and then dev uer could list the files. At what point of time, the client interacted with TGS with this tgt?

  • @user-dx9pj6bp3w
    @user-dx9pj6bp3wАй бұрын

    The course is very well organized

  • @vvsekhar1
    @vvsekhar1Ай бұрын

    Thank you so much. Well explained about Root user.

  • @federico325
    @federico325Ай бұрын

    incredible, thanks

  • @vaibhavtyagi9885
    @vaibhavtyagi98852 ай бұрын

    in last question each and every value you took was default only (128mb, 4, 512mb,5 cores) , so lets say the question is for 50 gb of data then still 3gb would be the answer?

  • @HIMANSHUMISHRA-yg8dc
    @HIMANSHUMISHRA-yg8dc2 ай бұрын

    ModuleNotFoundError: No module named 'pyspark.streaming.kafka' error using command spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.13:3.5.1 live_processing.py can you help please?

  • @Amarjeet-fb3lk
    @Amarjeet-fb3lk2 ай бұрын

    If no. of cores are 5 per executor, At shuffle time, by default it creates 200 partitions,how that 200 partitions will be created,if no of cores are less, because 1 partition will be stored on 1 core. Suppose, that My config is, 2 executor each with 5 core. Now, how it will create 200 partitions if I do a group by operation? There are 10 cores, and 200 partitions are required to store them, right? How is that possible?

  • @navdeepjha2739
    @navdeepjha27392 ай бұрын

    You can set the no of partitions equal to no. of cores for maximum parallelism. ofcourse, u cannot create 200 partitions in this case

  • @DUFFERMEHUL
    @DUFFERMEHUL4 күн бұрын

    In your case if 200 partitions are created, then your degree of Parallelism will be 10, which means 10 partitions will be processed in a single time and then once those slots are free the next 10 partitions will be processed.

  • @NandhaKumar1712
    @NandhaKumar17122 ай бұрын

    Hi , Thanks for the explanation. It really helps. In the above example let's say In right stream we are getting impressionId=4, and we didn't get matching events for id=4 on left stream for long time, Is it possible to get this record also inside foreachbatch() function before it gets dropped by spark?

  • @prasannakumar7097
    @prasannakumar70972 ай бұрын

    Very well explained

  • @robertakid727
    @robertakid7272 ай бұрын

    That is an extradentary explanation, Thank you

  • @oleg20century
    @oleg20century2 ай бұрын

    Best video about this three abstractions

  • @Mado44555
    @Mado445552 ай бұрын

    thank you for explaining i was looking for a start example to get what it is but videos were like explaining to some experts well i figured out to follow your steps , after running the code and done the ncat command i m getting errors and first one is: "chk-point-dir" any help

  • @CloudandTechie
    @CloudandTechie2 ай бұрын

    C:\kafka\bin\windows>kafka-console-producer.bat --topic test2 --broker-list localhost:9092 < ..\data\sample1.csv The system cannot find the path specified. how to fix this error

  • @rajibinus
    @rajibinus2 ай бұрын

    Insightful explanation. Thanks for the video.