Clever Studies

Clever Studies

We ‘Clever Studies’ KZread Channel formed by group of experienced software professionals to help Bigdata aspirants by providing free content on software tutorials, mock interviews, study materials, resume writing techniques, interview tips, knowledge sharing by Real-time working professionals and many more to help the freshers, working professionals, software aspirants to get a job.

In addition to the above, Our Subscribers will also get the following benefits from ‘Clever Studies’,

\tOnline Software Courses.
\tInternship opportunities.
\tReal Time Projects.
\tDoubts Clearing Sessions.

We are trying to post as many videos on this channel to educate/help all the software aspirants.

We generally upload our videos after 7.30 PM IST. If you are interested in this channel, make sure to subscribe and click the notification button, so you never miss any videos or posts!

Contact us : [email protected]

Пікірлер

  • @ABQ...
    @ABQ...6 күн бұрын

    what type of file formats used in this project?

  • @user-xb1wx3bt2i
    @user-xb1wx3bt2i28 күн бұрын

    Hi team i am interested in databricks and pyspark project. how can i contact you?

  • @cleverstudies
    @cleverstudies28 күн бұрын

    Please visit www.cleverstudies.in

  • @user-xb1wx3bt2i
    @user-xb1wx3bt2i28 күн бұрын

    how can one enroll for any upcoming real time projects in pyspark and databricks?

  • @tanushreenagar3116
    @tanushreenagar3116Ай бұрын

    perfect video sir

  • @srikanthkaredla
    @srikanthkaredlaАй бұрын

    super explanation. Thanks so much

  • @cleverstudies
    @cleverstudiesАй бұрын

    You are welcome!

  • @dn9416
    @dn9416Ай бұрын

    Thanks for the video

  • @Amarjeet-fb3lk
    @Amarjeet-fb3lkАй бұрын

    If no. of cores are 5 per executor, At shuffle time, by default it creates 200 partitions,how that 200 partitions will be created,if no of cores are less, because 1 partition will be stored on 1 core. Suppose, that My config is, 2 executor each with 5 core. Now, how it will create 200 partitions if I do a group by operation? There are 10 cores, and 200 partitions are required to store them, right? How is that possible?

  • @Amarjeet-fb3lk
    @Amarjeet-fb3lkАй бұрын

    What is use of giving each core 512 mb,if blcok size is 128 MB. Each block process on a single core,so if each block is 128 mb, why we should give 512mb To each core? There will be wastage of memory,Am I right? Please explain this. Thanks

  • @kalyanreddy496
    @kalyanreddy4962 ай бұрын

    Requesting to do a video a video on off-heap memory, Non-JVM heap memory

  • @ManojKumar-yc1qy
    @ManojKumar-yc1qy2 ай бұрын

    Grt explanation but doubt on driver node how it gets created in cluster mode? Will it contact cluster manager to get worker node and start driver in that or how it works plz?

  • @balakrishna61
    @balakrishna612 ай бұрын

    Not clear explanation.Missed to expalin about what is serialize and deserialize.

  • @avinash7003
    @avinash70032 ай бұрын

    Spark context vs spark session difference between RDD, Dataframe and DataSet in spark what is On Heap memory what is Off Heap memory what is Garbage Collector Explain Spark internal architecture Difference between Spark cluster mode vs Client mode how spark do memory management what is driver out of memory exception and how to fix it what is executor out of memory exception and how to fix it what are transformation and action in spark difference between narrow and wide transformation what is fault tolerence in spark what is lazy evaluation in spark can one spark application have multiple spark sessions what is spark directed acyclic graph (DAG) what is spark application , job, stages and tasks how to calculate number of cpu cores required to process data in spark how to calculate number of executors required to process data in spark how much each executor memory is required to process data in spark how to calculate the total memory required to process data in spark how to setup spark configuration for cluster managed tables vs external tables temporary view vs global temporary view what is materialized view types of slowy changing dimensions how to create a dataframe by reading different file format(csv,json,parquet etc) how to create a dataframe out of a hive table how to write dataframe explain the concept of lazy evaluation in spark and its significance what is predective pushdown in spark what is sortmergejoin how can you perform a broadcast join what is partitioning and bucketing cache vs presist storage level of presists repartition vs coalesce how to create a new column in table using pyspark how to remove duplicates in spark how to fill null values in spark how can you select specific columns from spark dataframe how can you rename a column in a spark dataframe how do you perform a groupby operation in spark how can you join two spark dataframe explain the use of StructType and StructField classes in spark with example what is incremental load? how to implement? can you discuss the role of structed streaming in spark what is databricks unit catalog? what is the difference between with and without unity catalog? what is the difference with and without catalog what is RLS and CLS in databricks what is role based access control why unity catalog is better than hive metastore what is different roles in unity catalog what is medallion architecture what is delta lake what is delta table what are features of delta tables what is lakehouse architecture data warehouse vs data lake vs data lakehouse what is optimize in databricks and what does it do? explain about z-order function what is vaccum in databricks what is autoloader in databricks what is delta live tables in databricks types of databricks cluster and their uses?

  • @RohitSharma-ny1oq
    @RohitSharma-ny1oq2 ай бұрын

    Nice men

  • @user-wg4bh3rv5i
    @user-wg4bh3rv5iАй бұрын

    Thanks to you man

  • @rakeshpanigrahi577
    @rakeshpanigrahi57728 күн бұрын

    Cool

  • @shivamchandan50
    @shivamchandan502 ай бұрын

    Plz create video on pyspark debugging,unit testing in pyspark

  • @shivamchandan50
    @shivamchandan502 ай бұрын

    Plz upload video on debugging in pyspark

  • @shivamchandan50
    @shivamchandan502 ай бұрын

    plz make video on unit testing in pyspark

  • @dineshughade6741
    @dineshughade67412 ай бұрын

    Zuper

  • @kingoyster3246
    @kingoyster32462 ай бұрын

    what if we have limited resource? what configuration would you recommend to process 25GB? (16 cores and 32GB)

  • @paulinaadamski8233
    @paulinaadamski8233Ай бұрын

    You would have to choose between an increased partition size or lowered parallelism with an increased number of partitions.

  • @pallavikatoch7233
    @pallavikatoch72332 ай бұрын

    Can u share the link of sessions which provides above explanation( in case not private/paid)

  • @cleverstudies
    @cleverstudies2 ай бұрын

    'Master in Databricks' course. Pls visit www.cleverstudies.in for more details.

  • @ranjithg7598
    @ranjithg75982 ай бұрын

    What about the node manager do in this architecture

  • @arindamnath1233
    @arindamnath12332 ай бұрын

    Wonderful Explanation.

  • @rockroll28
    @rockroll282 ай бұрын

    Hi, Where can I find explanation of spark as you told in video ? Is there playlist on this channel or private classes ?

  • @cleverstudies
    @cleverstudies2 ай бұрын

    in our 'Master in Databricks' course. Pls visit www.cleverstudies.in for more details.

  • @HemantKumar-su1qt
    @HemantKumar-su1qt2 ай бұрын

    Hi sir Hope you are doing well I am an enthusiastic fresher data engineer. I want to create a data engineering project by taking a one month free subscription on Azure Cloud and show that project on my resume. If my one month free subscription on Azure Cloud expires and the resources get exhausted, will my data engineering project disappear or I will not be able to see it? Can I still show my data engineering project on my resume and the company can see it even after my one month free subscription on Azure Cloud expires? I have nothing to show in my resume to the company. Thank you so much

  • @HemantKumar-su1qt
    @HemantKumar-su1qt2 ай бұрын

    Hi sir Hope you are doing well I am an enthusiastic fresher data engineer. I want to create a data engineering project by taking a one month free subscription on Azure Cloud and show that project on my resume. If my one month free subscription on Azure Cloud expires and the resources get exhausted, will my data engineering project disappear or I will not be able to see it? Can I still show my data engineering project on my resume and the company can see it even after my one month free subscription on Azure Cloud expires? Thank you so much

  • @priyankukashyap7650
    @priyankukashyap76502 ай бұрын

    There is a mistake in Right join here . Since we are doing right join , so 108 and 109 ID will also come . It won't be null

  • @Arumugam-fo6vj
    @Arumugam-fo6vj2 ай бұрын

    can i attend this mock interview

  • @user-dv1ry5cs7e
    @user-dv1ry5cs7e3 ай бұрын

    rdd.flatMap(lambda x: x.split(' ')).map(lambda x: (x,1)).groupByKey().mapValues(sum).collect()

  • @shibhamalik1274
    @shibhamalik12743 ай бұрын

    Is it that each core would take 4 * partition size memory ?

  • @shibhamalik1274
    @shibhamalik12743 ай бұрын

    There are 200 cores in total . Each core will use one partition at a time so will use 128MB Each executor has 4 core so each executor requires 4*128 MB which is 512 mb. Where does extra 4 multiplier came from ?😊

  • @bhanuprakashtadepalli7248
    @bhanuprakashtadepalli72482 ай бұрын

    by default, to process a file in one core, we need 4 times the file size memory.

  • @Delchursing
    @Delchursing3 ай бұрын

    Awesome!

  • @kamatchiprabu
    @kamatchiprabu3 ай бұрын

    Sir,I want to join Job ready program.How to join .Link is not enabled.pls help

  • @cleverstudies
    @cleverstudies3 ай бұрын

    Sorry, we are not conducting CSJRP sessions at present. Please check our website www.cleverstudies.in for more details.

  • @flosrv3194
    @flosrv31943 ай бұрын

    gdrive is empty !! lioke any other guy on youtube earning money to make people lose their times

  • @Cristian-tn4tm
    @Cristian-tn4tm3 ай бұрын

    The best option to install cloudera manager. I have tried a lot of options but I only coul install cloudera manager with this video.

  • @Fresh-sh2gc
    @Fresh-sh2gc3 ай бұрын

    In my company the cpu per executor is 5 min and 8 max.

  • @cleverstudies
    @cleverstudies3 ай бұрын

    It depends on the use case and resources availability.

  • @Fresh-sh2gc
    @Fresh-sh2gc3 ай бұрын

    @@cleverstudies depends on cluster. We have a state of the art one over $1b data center that can support high cpu’s per executor

  • @aditya9c
    @aditya9c3 ай бұрын

    If num of partition is 200 ... And so it the number of core required ... So core size is 128mb ... Right ? Then how in 3rd block core size turn to 512mb and thus executer is then 4*512 ????

  • @user-de4hv5bp6k
    @user-de4hv5bp6kАй бұрын

    in each core memory should be minimum 4 times of data it is going to process(128mb) roughly it should be minimum 512 mb of memory.

  • @user-dv1ry5cs7e
    @user-dv1ry5cs7e3 ай бұрын

    for example you are assigning 25 executors instead of 50 then in each executors there will be 8 cores and parallel task will be run(25*8). Then also it will take 5 mins only to complete the job then how 10min. can you please explain this point once again?

  • @vamshi878
    @vamshi8783 ай бұрын

    For each executor 2-5 cores should be there, so he is saying he is going to take 4 this number is fixed, if the data size increased or increased

  • @shivamchandan50
    @shivamchandan503 ай бұрын

    plz make video on pyspark unit testing

  • @anubhavsingh2290
    @anubhavsingh22903 ай бұрын

    Simple explanation Great sir 🙌

  • @cleverstudies
    @cleverstudies3 ай бұрын

    Thanq

  • @user-tl7sh4tm8x
    @user-tl7sh4tm8x3 ай бұрын

    How cluster manager can create any of the nearest worker as application master because the configuration of the master can be different. So will it not go and create master to the machine that is configured for master role. With memory allocated to master depending on the type of task.

  • @user-nv6ho7uk8b
    @user-nv6ho7uk8b3 ай бұрын

    Hi, Does the same study applies if we are working in Data Bricks?

  • @bhanuprakashtadepalli7248
    @bhanuprakashtadepalli72482 ай бұрын

    yes, its same logic

  • @yadi4diamond
    @yadi4diamond3 ай бұрын

    You are simply superb.

  • @cleverstudies
    @cleverstudies3 ай бұрын

    Thank you 🙏

  • @ViickyPatiil
    @ViickyPatiil3 ай бұрын

    When will new project come?

  • @raghavendra4508
    @raghavendra45083 ай бұрын

    Man Simply 17:13 min of junk I have seen. Why did you uploaded this man

  • @pankajchikhalwale8769
    @pankajchikhalwale87694 ай бұрын

    Please make a short video on the relationship between stages, node, executor, dataframe/dataset/RDD, and core, partition, and task. Want to know what consists what ? And what contains what.

  • @vivekmerugu6711
    @vivekmerugu67114 ай бұрын

    Hi Naresh, Thank you so much, This help me a lot. ❤ Naresh, I've execute my spark application in cluster mode (yarn) in emr cluster, My spark application is failing with an exception saying application master container failed 2 times, exists with 137 code. This exceptional is occuring for only one dataset which I'm processing with spark application. For other datasets, my spark application is working fine. The dataset for which spark application is failing having large input payload, ( one record with 25000+ characters ). I tried increasing the driver memory and executor memory, now this time , I'm getting an exception while deserialization of input payload. Any suggestions how to resolve this issue. It will be helpful, please

  • @PavanKumar-vi7hd
    @PavanKumar-vi7hd4 ай бұрын

    Hi Naresh your way of explanation is excellent. this is first time i understand spark architeecture is very easy way in Cluster Mode

  • @cleverstudies
    @cleverstudies4 ай бұрын

    Thank you Pavan.❤

  • @2412_Sujoy_Das
    @2412_Sujoy_Das4 ай бұрын

    Needed this one badly... Thanks Naresh

  • @shyammaths5705
    @shyammaths57054 ай бұрын

    Hi Naresh i am interest in course can i buy it now

  • @cleverstudies
    @cleverstudies4 ай бұрын

    Yes you can. www.cleverstudies.in

  • @Jayanta135
    @Jayanta1354 ай бұрын

    does the course have life time access

  • @cleverstudies
    @cleverstudies4 ай бұрын

    Yes

  • @avinash7003
    @avinash70034 ай бұрын

    sir this is introduction part @@cleverstudies

  • @thepuldarshana9056
    @thepuldarshana90564 ай бұрын

    how much would it charge from the card for the subscription ?