data engineer interview questions
In this video I have talked about salting in spark
Directly connect with me on:- topmate.io/manish_kumar25
Discord channel:- / discord
Project details for resume :-
.Successfully led a data engineering project in a retail environment using technologies such as Apache Spark, Python, SQL, and Amazon S3 to optimize data processing.
.Implemented structured data models, including dimension and fact tables, to provide valuable context for point-of-sale data analysis.
Designed and executed an incentive program based on sales performance, enhancing motivation among sales teams by rewarding top performers.
Managed extensive daily data volumes of approximately 100GB, demonstrating the ability to handle large-scale data pipelines.
Employed Spark optimization techniques like caching and broadcast joins to improve data processing speed and efficiency.
Utilized Azure CI/CD pipelines for code deployment, and orchestrated workflows using Airflow and CRON jobs.
Detailed writeup to explain more during interview:-
As a Data Engineer on a project for a prominent offline grocery and kitchen supplies retailer, I applied my expertise in data engineering to drive critical improvements in their data processing and analysis operations.
The project primarily focused on processing and analyzing point-of-sale data, which was structured into dimension and fact tables to provide meaningful context for sales analysis. To further enhance employee motivation and performance, we designed and implemented an incentive program that rewarded salespeople with the highest sales volumes in each store.
Handling a substantial daily data volume of approximately 100GB, we leveraged Apache Spark and applied optimization techniques like data caching and broadcast joins to significantly accelerate data processing. This not only improved the speed of our data pipelines but also increased the efficiency of our data analysis.
We seamlessly integrated the code deployment process into the Azure CI/CD pipeline. As part of workflow automation, we orchestrated task scheduling using Airflow and CRON jobs.
One of the project's major achievements was the implementation of a customer engagement strategy that identified infrequent buyers and provided incentives in the form of coupons. This initiative not only boosted customer retention but also had a positive impact on the overall business growth.
For more queries reach out to me on my below social media handle.
Follow me on LinkedIn:- / manish-kumar-373b86176
Follow Me On Instagram:- / competitive_gyan1
Follow me on Facebook:- / manish12340
My Second Channel -- / @competitivegyan1
Interview series Playlist:- • Interview Questions an...
My Gear:-
Rode Mic:-- amzn.to/3RekC7a
Boya M1 Mic-- amzn.to/3uW0nnn
Wireless Mic:-- amzn.to/3TqLRhE
Tripod1 -- amzn.to/4avjyF4
Tripod2:-- amzn.to/46Y3QPu
camera1:-- amzn.to/3GIQlsE
camera2:-- amzn.to/46X190P
Pentab (Medium size):-- amzn.to/3RgMszQ (Recommended)
Pentab (Small size):-- amzn.to/3RpmIS0
Mobile:-- amzn.to/47Y8oa4 ( Aapko ye bilkul nahi lena hai)
Laptop -- amzn.to/3Ns5Okj
Mouse+keyboard combo -- amzn.to/3Ro6GYl
21 inch Monitor-- amzn.to/3TvCE7E
27 inch Monitor-- amzn.to/47QzXlA
iPad Pencil:-- amzn.to/4aiJxiG
iPad 9th Generation:-- amzn.to/470I11X
Boom Arm/Swing Arm:-- amzn.to/48eH2we
My PC Components:-
intel i7 Processor:-- amzn.to/47Svdfe
G.Skill RAM:-- amzn.to/47VFffI
Samsung SSD:-- amzn.to/3uVSE8W
WD blue HDD:-- amzn.to/47Y91QY
RTX 3060Ti Graphic card:- amzn.to/3tdLDjn
Gigabyte Motherboard:-- amzn.to/3RFUTGl
O11 Dynamic Cabinet:-- amzn.to/4avkgSK
Liquid cooler:-- amzn.to/472S8mS
Antec Prizm FAN:-- amzn.to/48ey4Pj
Пікірлер: 114
@Manish Kumar. All of your videos are more than a gem if anything exists like this. I am 4-5 YOE and never get to learn spark in such a depth , clarity , concise answers , questions. It is useful for 10 YOE as well I can vouch for it. I have ADHD issue, but your videos are too engaging that I can sit for long with it. I have got interested in learning. You must be an extra ordinary guy. Having knowledge is one thing , presenting it , putting it in so simple manner is what stands you apart. It is very difficult to be simple . Thanks once again
bhai iss video se mein fan hogya aapka. "logon ke pass experiance nahi hain, aur company ko experiance chahiye" 🔥 🔥
What a gem content sir 🥺 thankyou so much for in-depth video!
Thanks a lot manish bhaiya, u listen to even individual request. Big thabks to you. Loving your content.
Hey Manish, I'm extremely thankful to you and all of your playlists. Especially this video is super problem solver one! No one teaches in so much depth as you do. Thanks for taking out time to teach us!!😇🤗
Thanks for amazing content . Spark playlist is amazing
Sir your amazing. No one has created content till now on this.Wish to see more on this type of content .Being a fresher we need to have a clear idea about how the project works and we should know how to explain project to interviewer.
Wow... Manish bhai really loved this content. Please, I will encourage you to do more videos like this.
Thanks for the session!!
Bhai apko salute, ekdam sidha, saaf or sach bolne ke liye I will definitely connect with you on Top Mate after getting my Data Engineering job, to thank you! Hopefully usse pehle connect krne ji jarwat na pade
This whole PlayList helped a lot.💡
bhai bhai 🙌...ultimate video ❤❤
Thank You manish bhaiya
thank you so much bhaiya for amazing content 💝
Great Manish you don't say fake e,Xperince many times in your video your doing great job
Great video Manish .. What you have face problem while doing your projects and how to resolve it . Please answer this question as experience person.
THANKSS VROOO LOVE U YR CLEARED INTERVIEW
Hey Manish! I am following all playlists and content, also I have given more than 30 interviews but have not been selected yet because of a scheduler if you can cover any of them or can cover a pipeline including airflow or any one schedular it will be very helpful. Without schedular knowledge, it's incomplete because each and every interview they are asking for it. You are explaining very well so I want to have an explanation in your depth knowledge. Thanks.
Amazing...!!!
amazing content..
most exciting video
Thank you
Thanks too much
bahut acha video hai
Thanks I just completed the playlist
thank you sir
Waiting for 2nd part eagerly , related to last project. Please next time usi ko upload krna.
@manish_kumar_1
9 ай бұрын
I did not get you. Maine saari chije Jo Maine project karwayi hai usi ke related batayi thi. Aap sayad Pura video nhi dekhe hai, ya fir main question nhi samjha
nice content Sir
universal truth of industry you explained
bhai ek CI/CD par practical detailed video bana do usnign azure devops/databricks please tht will be great help
What a content sir ❤most needed
Why do we need layers in datawarehouse? Can we put for each loop inside another for each loop?
make a video on coding questions and scenario questions(ex:what if the repartition size increases, how to handle out of memory issues and possible questions which are encountered.
how to analyse our source data in our project so that where we have to perform cleaning operation
Bhai thoda cluster se related incoming data se related chalnegs batao 2-3
bhaya aws map reduce pe ek video banao naa, please...
How to do ONPREMISE to CLOUD migration.
Bhai table me columns and row ketne and kis type ke hai like - cust_id,refund columns kitne ho skte hai or kis kis type ke bata de
Shall I add personal project section along with work experience section in Resume for 2 YOE in DE ??
😂😂i thoroughly learnt and enjoyed this video
What you have face problem while doing your projects and how to resolve it . Please answer this question as experience person
Thank you so much sir for the great explanation, It was the best series I have found in my life. I just have one request from you. Can you please make a video on Cluster manager - Yarn.
@manish_kumar_1
2 ай бұрын
Noted
@prashantmehta2832
2 ай бұрын
@@manish_kumar_1 Thanks sir..
Bro, scheduling jobs me airflow to nhi kia to use question puchega tab kya karenge
Manish going through all of your videos I realized almost all of the optimization is based on number of rows. Do we have any optimization where data increases in terms of columns?
Manish Awesome videos, can you make some videos on Aws Glue job..
@manish_kumar_1
9 ай бұрын
Mujhe nhi aata hai glue
I was asked this kind of questions in interview
Bhaiya,, ye spark submit config cloud me toh kahin v nai mila databricks me cluster banate time spark submit jo on prem me karte hai...unke liye hai ky?
Platform metric used ?
can we use this project for 3-4 yrs of experiance
Hello Sir, firstly thankyou for this amazing content. Truly grateful. I request you to please make an Azure Data Engineer project real project questions to prepare for the interview by collaborating that with databricks. Please
@manish_kumar_1
8 ай бұрын
Mujhe Azure ke services ki idea nhi hai
@Wandering_words_of_INFJ
8 ай бұрын
@@manish_kumar_1 okay sir, by the way, aise sirf Pyspark Developer ki koi position ni dikhti, aap in future skill sets k upar video banaynge kya ki kon si skills resume par mention karni hai and what are the relevant positions in the industry?
If as a fresher if i mention a project in my resume can I say i completed in 1or 2 months
Please make a video for a freshers(0 years of exp).
Can you make content about KAFKA
great work. informative video. love it. I have a question about the data you receive. Do you receive 100 GB of new data every day?
@manish_kumar_1
6 ай бұрын
Not in every project but in last project I had an opportunity
Muze Spark use krte time error ara h , pls help error like 'remote rpc client issue' due to executor lost failure heartbeat issue pls help
Hi Manish, Got call wherein they are asking to hv exp into AWS glue n pyspark. Please tell me how to incorporate glue with pyspark
@poojajoshi871
9 ай бұрын
Spark I know , glue is etl tool..toh how to use spark with glue
Q.)Data skew is one example for which you do spark optimization, apart from data skew for what you have performed optimization for? Q.)What kind of Issues you have faced in your project while working? Matlab iss question ka ek right systematic approach chahiye tha, idea toh hai topics ka but when I think it the points seems to be scattered.
Hey Manish can you make video on end to end data engineering project it will be very much helpful to understand data engineering pipeline
@manish_kumar_1
9 ай бұрын
Aapne na Pura video dekha aur na hi i button me add Kiya hua link. Already project karwa diya hai and link v diya hua tha
@anketsonawane6651
9 ай бұрын
@@manish_kumar_1 Sure Manish... I regret for wrong comment. I will surely check it out and thanks for this amazing content ❤️
What is delta cache?
Dsa aana chahiye kya ... Ya phir kisi aur per dhyan dena ha ok
DSA interview series for Data Engineer kzread.info/head/PLqGLh1jt697wQTamFvXx_Odlm-Wg3zbxq&si=suGxMRqt-uoYkprY
How you analysis your source data before start cleaning?
@rawat7203
9 ай бұрын
We will 1st remove the non csv files Read the correct files into dataframe using spark We will check if these correct files have the mandatory columns, if not then remove these files If some of these files have extra columns then add a column called extra column and put all these columns there Now we will have dataframe with all correct data, Now to this dataframe we join dimension table dataframe and create a Final DF On this final DF we do spark processing to get the desired calculation
Sir aapse personal me kaise baat kar skte hai hum
Why did you stop uploading videos sir, please keep sharing.
@manish_kumar_1
8 ай бұрын
Started again
can we say our source and sink is same like hadoop hdfs?
@manish_kumar_1
9 ай бұрын
Yes
Why did you stop uploading videos ??? eagerly waiting for new video
@manish_kumar_1
9 ай бұрын
I was out of station due to job requirements
Python and spark code questions bhi bata do abhi sir .....
@manish_kumar_1
9 ай бұрын
Already bata rakha hai company specific in one of the playlist
Manish Bhiaya apne jo aapke resume mein BWAC,MHCDM ye sab keywords use kiye hain wo sab apke roles hai ??
@manish_kumar_1
8 ай бұрын
Nhi, projects ke name hai
How to unbroadcast the dataframe?
@manish_kumar_1
9 ай бұрын
Set the configuration of broadcast threshold to -1
Sir maths kitna required h data engineer profile me please reply
@manish_kumar_1
9 ай бұрын
Nhi required hai
How many nodes we use in our project
@manish_kumar_1
2 ай бұрын
Nodes are used in cluster. When job is scheduled then we don't mention the no of node, rather we use number of executor and more than 1 executor can start on the same node
still calls are there for bigdata AWS?
@prabhatgupta6415
9 ай бұрын
are u not getting??
@avinash7003
9 ай бұрын
@@prabhatgupta6415 what is the present market about AWS?
@prabhatgupta6415
9 ай бұрын
i m azure guy..sir@@avinash7003
Sor puthon language kitna aana chahiye hume
Manish bhai, thumbnail mai spelling galt ho gaya hai related ka !
@manish_kumar_1
8 ай бұрын
Oh, thanks for pointing it out
Please talk in English, so that everyone will understood. And pls give answers for the questions
is it complete playlist ?
@manish_kumar_1
8 ай бұрын
Yes
@ruchim3448
8 ай бұрын
@@manish_kumar_1 thank you.
One Like and One Comment.
can fresher become data engineer
@shubhamchavan9438
9 ай бұрын
agar ye saval saal bhar puchega, to nahi ban payega, lekin ek bar puchke saal bhar practice kareka to ban jayega
@amangurjar9714
9 ай бұрын
I should buy some course for data engineer or I should prepare from KZread only and make online project??
@rakeshverma6867
9 ай бұрын
@@shubhamchavan9438
What Don't you talk in English 😢for non Hindi speaker😊
Sir got placed as azure data engineer, its all because of you really thank you for everything 🥹🥹 i would like to talk with you
@manish_kumar_1
9 ай бұрын
Congratulations bhai. Aap linkedin ya Insta par ping kijiye. Social media handle ka link description me mil jayega
@likhithurs8597
7 ай бұрын
Hi heartily congratulations for your success 🙌
@simizcodding4487
7 ай бұрын
Hey I contact to u ...plz drop ur linkedin id
@surajpoojari5182
4 ай бұрын
Congratulations bro
@chinnasaiprathapmeesala8977
3 ай бұрын
Bro can you share your interview preparation questions for Azure Data engineer