2nd Data Engineering Interview | Apache Spark Interview | Live Big Data Interview

This video is part of the Spark Interview Questions Series.
A lot of subscribers has requested me to give some experience on how an actual Big Dta Interview look like. In This Video we have covered what usually happens in Big data or data engineering interview happens.
There will be more videos covering different aspects of Data Engineering Interviews.
Here are a few Links useful for you
Git Repo: github.com/harjeet88/
Spark Interview Questions: • Spark Interview Questions
If you are interested to join our community. Please join the following groups
Telegram: t.me/bigdata_hkr
Whatsapp: chat.whatsapp.com/KKUmcOGNiix...
You can drop me an email for any queries at
aforalgo@gmail.com
#apachespark #sparktutorial #bigdata
#spark #hadoop #spark3 #bigdata #dataengineer

Пікірлер: 63

  • @bhavaniv1721
    @bhavaniv17213 жыл бұрын

    Thank you so much for sharing this kind of videos , really I understand that how interview happen 🙏

  • @johnsonrajendran6194
    @johnsonrajendran61943 жыл бұрын

    I found this video to be really helpful sir....Please create more such videos🙏

  • @rahulmaheshwari5582
    @rahulmaheshwari55823 жыл бұрын

    Very informative. Thanks for the video. 🙏

  • @vibhavaribellutagi9439
    @vibhavaribellutagi94393 жыл бұрын

    Really helpful. thanks a lot for the video.

  • @ankurrunthala
    @ankurrunthala3 жыл бұрын

    I wish I got senior like u can learn more knowledge ❤️nice question ⁉️ .....sir u should make the anwers also ....mostly problematic answer ❤️❤️❤️❤️

  • @priyankadhamija886
    @priyankadhamija8862 жыл бұрын

    I have seen so many videos but your is best on all topics. Precise and cover all the interview questions almost.

  • @DataSavvy

    @DataSavvy

    2 жыл бұрын

    Thanks Priyanka... I am happy that you like it

  • @yelururao1
    @yelururao13 жыл бұрын

    Hi sir.. Please do more videos like this..

  • @Chittaluri
    @Chittaluri3 жыл бұрын

    Thanks a lot team, specially to Harjeet this video boosted my confidence towards interviews, please kindly post more interview videos

  • @DataSavvy

    @DataSavvy

    3 жыл бұрын

    Thanks Sai... Yes I plan to create more videos

  • @prachigupta7688
    @prachigupta76883 жыл бұрын

    In the last question, to combine emp data on name, age and select random location, if we use groupby & collect list, won't it create the list of all loc for the group of emp name and age ? Shouldnt the use of other function like max, first etc will help in this scenario?

  • @nikhilv199138
    @nikhilv1991383 жыл бұрын

    These type of videos are extremely helpful. If you could prepare a video about scala interview questions that would be of great help!!

  • @DataSavvy

    @DataSavvy

    3 жыл бұрын

    Sure Nikhil... That is already in plan.. it is just difficult to find volunteers for Mock Interview

  • @somasundaramvalliappan3851

    @somasundaramvalliappan3851

    3 жыл бұрын

    Can anyone please help me with sample resumes for scala, it's very hard for me to find s ala resumes in internet

  • @Iamsatya_1
    @Iamsatya_13 жыл бұрын

    Can we solve the sales problem using classification. i.e - we can train our historical data by logistics regression and then predict the value of sales using evaluate function on new data.

  • @mamamiakool
    @mamamiakool3 жыл бұрын

    Hi Harjit, you are doing a great job for the community. Is there a way i can connect with you on Linkedin or via email? Also, do you plan to conduct similar interviews about Spark Streaming/Kafka?

  • @anirbandatta2037
    @anirbandatta20373 жыл бұрын

    Hi @Data Savvy, can you plan for a senior level interview, may be people with more than 16-20 yrs of experience?

  • @user-fz4mz6ym6f
    @user-fz4mz6ym6f Жыл бұрын

    Some scenarios where dropping a schema but not the data are 1)Reorganizing the database structure 2)Cleaning up unused schemas 3) Rebuilding the schema from scratch.

  • @graceindia3122
    @graceindia31223 жыл бұрын

    Great vedio. But the candidate seems to be have ETL informatica developer experience not data engineer experience. He was not able to ans major questions of Spark. 😀. But good initiative data savvy, helps me to test my knowledge on Spark, big data and I m able to ans many questions.

  • @raviyadav-dt1tb
    @raviyadav-dt1tb7 ай бұрын

    Can you please provide aws questions and for data engineer, it will be helpful for us thanks 🙏

  • @sachinchandanshiv7578
    @sachinchandanshiv75782 жыл бұрын

    Hi Sir, How much it's important to know snowflake for big data engineer?

  • @usharani7125
    @usharani71252 жыл бұрын

    Harjit let me know if you are taking Ang training session please

  • @ghumredhanu6381
    @ghumredhanu63812 жыл бұрын

    Sir can you make of kafkha community

  • @anshusharaf2019
    @anshusharaf20195 ай бұрын

    In this scenario-based question can we create an end-to-end pipeline using the Kafka and power BI dashboard like..we can connect with your database as a source connector and for the transformation we can use KSQL DB where we perform some business-level transformation and after that store it into the Kafka-topic and then connect with the power BI for dashboard? @dataSavvy or someone, can u check Am I right thinking?

  • @user-fz4mz6ym6f
    @user-fz4mz6ym6f Жыл бұрын

    Based on Problem Statement, My Answer would be firstly by using Apache Kafka or Amazon Kinesis handle the streaming data and dump into Aws S3, Since Aws S3 is acts like a Data Lake. After that by using Apache spark do some essential data processing and then ingest data into Aws Redshift or any other Datawarehouse by using Aws Glue as ETL.

  • @phanidbd7284
    @phanidbd72843 жыл бұрын

    Great Thanks.... Can you please create a video with answers for these questions ...It really helps... Or add your comments at the end of the video

  • @DataSavvy

    @DataSavvy

    3 жыл бұрын

    That's a good suggestion... Let me look into this

  • @yadlapallipriyanka9000
    @yadlapallipriyanka90003 жыл бұрын

    Hi Harjeet... I am Priyanka and I would like to volunteer for mock interview on BigData

  • @rashmidogra7792
    @rashmidogra77923 жыл бұрын

    what is sql question he is asking, I could not understand completely.

  • @rajasekharreddy7624
    @rajasekharreddy76243 жыл бұрын

    Hi DataSavvy, Please let me know your free time will discuss about the mock interview to me.

  • @atanu4321
    @atanu43213 жыл бұрын

    Good Initiative Data Savvy, this will really help full for those who is preparing for interviews. One suggestions, if you will create a followup video where you can explain what good or wrong answer the candidate has given or what is the correct answer the candidate should give in order to get more acceptance from interviewer. a kind of analysis of this interview.

  • @DataSavvy

    @DataSavvy

    3 жыл бұрын

    Thanks Atanu... I will plan for that... Your suggestion is very valuable

  • @nikhilv199138

    @nikhilv199138

    3 жыл бұрын

    Exactly

  • @DataSavvy

    @DataSavvy

    3 жыл бұрын

    Point noted... If anyone of you can volunteer, it will help me create these kind of videos...

  • @omkarjoshi3750

    @omkarjoshi3750

    3 жыл бұрын

    @@DataSavvy hello sir, I am interested for volunteering. But I am fresher (2020 pass-out). If it is ok then I can volunteer.

  • @manojkumar-oc1sp

    @manojkumar-oc1sp

    3 жыл бұрын

    @@DataSavvy I am interested to volunteer..

  • @nitinm1473
    @nitinm14733 жыл бұрын

    What is the function for grouping distinct and select random value from other column?

  • @ashutoshsamanta4244

    @ashutoshsamanta4244

    3 жыл бұрын

    You can use first()

  • @prabhaker9031

    @prabhaker9031

    3 жыл бұрын

    @@ashutoshsamanta4244 Ah thanks man. I was trying to put a max filter and whatnot

  • @ansarhayat6276
    @ansarhayat62763 жыл бұрын

    1.common current working task? 2. What type of problem have you face current task? 3.loading data into data lake ,which changes you face? 4.how you handle increamental data?by batch or stream?how much size of daily process data? 5.snarion:sales data group product category per hour. need result of half historical+ half real time data in report? 6.which tools possible to use for above sanirio? kafka,event hub 7.how to tranfermation in kafka? ****Hive**** 8.hive external and internal table keys different? give use case 9.when use static/dynamic partation in hive table? 10.daily transcational table with year,date colum we can use any one of them,its static/dynamic partation? soultion: we partation on date colum which dynamic.each day day data place in daily partation. ***Spark*** 11.why,which language you use in spark ?desc its benifits 12.you use df and data set? any error on runtime in df/dataset ?give example 13.spark end coders? 14.1 TB data process by spark ,how distrbute memory of core,driver,executor 15.scala case class and regular calss difference? ***DB*** 16.have you work any non-relational db? 17.a given tabel with three colum need to show one row of data use of grouping CREATE DATABASE big_data; USE big_data; CREATE TABLE user_info (user_name NVARCHAR(255),user_age INT,user_loc NVARCHAR(255)) INSERT INTO user_info (user_name,user_age,user_loc)VALUES('ansar',30,'bang'),('ansar',30,'fsk'); SELECT * FROM user_info; SELECT DISTINCT user_name,user_age FROM user_info; SELECT DISTINCT user_name,user_age,user_loc FROM user_info GROUP BY user_name,user_age;

  • @0yustas0

    @0yustas0

    3 жыл бұрын

    Just for fun with Hive: SET hivevar:rnd = CAST(ROUND(RAND()) AS INT); SELECT user_name,user_age,collect_list(user_loc)[${hivevar:rnd}] AS c1, MAX(user_loc) AS c2 FROM user_info GROUP BY user_name,user_age;

  • @ashokkodari5042
    @ashokkodari50423 жыл бұрын

    What will be the usecase for dropping schema instead of truncating complete tbl? For only restore data in future or any other major reason?

  • @RakeshGupta23

    @RakeshGupta23

    3 жыл бұрын

    Major use case for dropping schema or. Creating external table when you have storage area outside your Hadoop e.g. client want data to be stored in S3 or data stored in mongodb.

  • @manojkumar-oc1sp

    @manojkumar-oc1sp

    3 жыл бұрын

    @@RakeshGupta23 Thanks bro.. One more question.. what will happen if we delete the external table file folder.

  • @DataSavvy

    @DataSavvy

    3 жыл бұрын

    This is usually done when more than one team is consuming same data and also using different tech to consume it

  • @RakeshGupta23

    @RakeshGupta23

    3 жыл бұрын

    @@manojkumar-oc1sp you mean ,you are keeping the external table schema but deleting the folder and file from hdfs?. in that case you won't be able to access the data as in hdfs it looks like when you create a database or table but under the hood it's always a file or folder.

  • @ashokkodari5042

    @ashokkodari5042

    3 жыл бұрын

    @@RakeshGupta23 Thanks Rakesh for your quick response

  • @paul4367
    @paul43673 жыл бұрын

    Can u call some data analysts for mock interview too plz??

  • @DataSavvy

    @DataSavvy

    3 жыл бұрын

    I am finding it difficult to get volunteers... Let me explore that

  • @ashleylemos3977

    @ashleylemos3977

    3 жыл бұрын

    @@DataSavvy I would love to give mock interviews to all of your data engineering questions in case you looking out for candidates with 12+ years of experience in PySpark, AWS, Spark SQL, Jenkins CI/CD, Glue, Kafka, Python, Hive, Athena, Presto, Bash, Airflow, Nifi.

  • @nikhilmishra7572
    @nikhilmishra75723 жыл бұрын

    @30.02 what would be the solution? Having count(*)>1 after group by?

  • @ashutoshsamanta4244

    @ashutoshsamanta4244

    3 жыл бұрын

    Use first() on the column

  • @uditsethia7
    @uditsethia72 жыл бұрын

    LAMBDA ARCHITECTURE

  • @awanishkumar6308
    @awanishkumar63083 жыл бұрын

    Sir in red t-shirt is making the interview questions very uninteresting even though spark itself is very much interning in terms of its concept and its working principle,, but sorry you are ruining the interest of learning

Келесі