Hello Everyone,
My name is Manish Kumar and I am currently working as Data engineer @Jio.
If you want to connect with me then reach out to me on:-
topmate.io/manish_kumar25
On this channel, I upload videos related to Data engineering. I have uploaded few podcast too.
If you are looking for Data engineering roadmap then go to my videos titled "How I bagged 12 offfer". I have explained my strategy in that video.
Hope I am adding some values in your Data engineering career through the videos.
Пікірлер
Hello manish could you please share practical vedio link beacause i am not able to see
Can we use "as" with col instead of alias??
manish bhai cache() default storage level MEMORY-ONLY hai aap please ik baar spark Documnetation check kar lena
Thankyou somuch bhaiya itne ache se samjhane ke liye. But dar laga last line sunn ke, aapne jab bola ki "aagey chize itni complex hogi ki basic pe jake sikhna padega"
how we should practice the pyspark and spark, resource and how we should practice the question some road plan.
Why didn't you complete the playlist?
@manishkumar bhaiya aap kuch questions ke answer nahi diye, jo starting me mention kiye thee. like backdated job run, check if df is empty or not!!
8
thnks brother for the course. You are doing a great job
I hope you are fine.
user_number = int(input("Enter the number to check if its Even or Odd: ")) result = user_number % 2 if result == 0: logger.info(f"{user_number} is Even number") else: logger.info(f"{user_number} is Odd number")
Could you please make a video on dev , test and prod environment for data engineering projects
from pyspark.sql.window import Window window = Window.partitionBy("product_id") product_sales_df= product_df.withColumn("total_sales_product_wise",sum(col("sales")).over(window)) product_sales_df.withColumn("percentage_of_sales",round((col("sales")/col("total_sales_product_wise"))*100,2)).show()
from loguru import logger _1st_labour = "Mahesh" _2nd_labour = "Mithilesh" _3rd_labour = "Ramesh" _4th_labour = "Sumesh" _1st_labour_wage = 500 _2nd_labour_wage = 400 _3rd_labour_wage = 400 _4th_labour_wage = 300 #Q1 logger.info(f'labour names are: {_1st_labour} {_2nd_labour} {_3rd_labour} {_4th_labour}') #Q2 logger.info(f'labour name and wages are 1st labour: {_1st_labour} {_1st_labour_wage} 2nd labour: {_2nd_labour} {_2nd_labour_wage} 3rd labour: {_3rd_labour} {_3rd_labour_wage} 4th labour: {_4th_labour} {_4th_labour_wage}') #Q3 paragraph = "\"\"\" Programming aasan hai. We are going to learn this in depth. While learning we have to make sure that \ we are implemeting all the logics by ourself. The aim here is to build our \"4 BHK\" house with the \ help of 'Python programming'. We have total land is of \\100 ft * 100ft /, to colmplete the house \ we have total 6 labours with 'different skill set like \"\\\\ building wall or building roof \\\\\". \ I have to print this paragraph as it is given here.\"\"\"" lines = paragraph.splitlines() for i, line in enumerate(lines, start=1): print(f"Line {i}: {line}") #Q4 # Name Error : when we try to call any variable name which not defined. #Q5 # High level : This is something which is understood by humans and computers. #Q6 # In complied programming language code converts all the code in one short into machine code and it will run. (eg: Java, C++) # But in the Inetrpreted language it runs the code Line-By-Line (eg: Python) Q7 print(f'{id(_1st_labour)} {id(_2nd_labour)} {id(_3rd_labour)} {id(_4th_labour)}') print(f'{id(_1st_labour_wage)} {id(_2nd_labour_wage)} {id(_3rd_labour_wage)} {id(_4th_labour_wage)}') # _2nd_labour_wage & _3rd_labour_wage as both have the same value, it store in same memory location.
DSA chalu karo sir
What kind of project u faced as a jio data engineer. Please tell me
Please help what kind of projects in jio
Sir,kya phle theory ki video dekh le uske baad practical baali video dekhe ye saath sath video dekhe theory+ practical both
First 5 videos of theory ke baad parallel me dono playlist karne hai
what is meant by spilling to disk ? do you mean a storage device like SSD or HDD ? or do you mean spill it in the RAM itself ?
Great Session bhai . keep it up !! thanks for creating video
Solution for leet code question: scenario: by keeping same null values sol: lcode_df.select("Name").where(col("Ref_name") != 2).show() scenario2: by filling few own values inplace of null. lcode_df.withColumn("Ref_name", when(col("Ref_name").isNull(), lit('3')).otherwise(col("Ref_name"))).filter(col("Ref_name") != 2).show()
Solution for the last question: emp_df.groupby("emp_dept", "emp_country").agg(sum("emp_salary")).where((col("emp_country")== "india") & (col("emp_dept") == "IT")).display()
Great way to explain complex topics.. Keep it Up !! Thank you so much !!!
sir new vidios kab aayegi
Hi Manish sir, I'm getting out of memory error: Java heap space. But while submitting/writing dataframe in hdfs. I have partitioned data then I have applied bucketing on it. Then I coalesce it as optimization. And while writing it into hdfs error occurred. I have increased driver memory, memory over head but still problem is same?
The Spark Code can be written in Scala itself right? Will we need Application Driver even if the code is written in Scala?
Thankyou
Also I have tried lots of steps but still showing error in my jupyter notebook .. please suggest me how to resolve it
your all session is too interesting I learnt pyspark and now learn python thank you so much Manish sir ... I have doubt in this lecture when import logger libriary showing error .. this -- from loguru import logger ModuleNotFoundError: No module named 'loguru'
what is configuration
Also please explain Bucketing and partitioning
What a series! Aag laga di...
what if the join was non equi join? df1 = big table df2 = small table df1.join(df2.filter(week ==16), left join) or cross join , secondly what if the big table has filters ?
Hi Manish, can you create a course on DSA in python for data engineer
Sir oops kadhi ghrnar what python madhil
bhaiya is both of your series enough for interviews?
Bhaiya video kyun nai araha hai... Please make video regarding databricks pipelines and dataflow
How to prepare SQL
How to prepare DSA round? Important questions - Leetcode there are 1000s of questions which one should we do
Excellent. Exactly what I need, got from your video.. Thank you😊
where is the CSV file ?
Thank You Manish Bhai, waiting for architech level and DB modeling questions example.
Congratulations Anna
You are an excellent teacher, you make lectures so interesting! ye answer dekar to interview ko sikha denge :D
Very nice explanation.
traditional drivers and executors aren't available in local environment because a single JVM is present, and processes are executed in parallel across these threads.
Hi Manish, Please find below code for the % sales per month for the last 6 months: window = Window.partitionBy("product_id").orderBy("sales_date").rowsBetween(Window.unboundedPreceding,Window.unboundedFollowing) last_month_df11 = product_df.withColumn("total_sales",sum(col("sales")).over(window))\ .withColumn("percent_sales",(col("sales")/col("total_sales")*100))\ .show()
Hello Manish Sir, Can you please help me with how to generate key, IV, salt, and Also, how to encrypted a secret key and access key in encrypted form ?
Hello Manish Sir, Can you please help me with how to generate key, IV, salt, and Also, how to encrypted a secret key and access key in encrypted form ?
Driver Memory: 8 GB Executor Memory: 16 GB Number of Executors: 6 (assuming you have a cluster with sufficient resources) Cores per Executor: 4