Data Engineering Course for Beginners
Learn the essentials of data engineering in this course for beginners. You'll learn about Databases, Docker, and analytical engineering. You'll explore advanced topics like data pipeline building with Airflow, and engage in batch processing with Spark and streaming data with Kafka. The course culminates in a comprehensive project, putting your skills to the test in creating a full end-to-end pipeline.
✏️ Justin Chau created this course.
Course Resources: transparent-trout-f2f.notion....
Thanks to Airbyte for providing a grant to make this course possible.
⭐️ Contents ⭐️
⌨️ (0:00:00) Introduction
⌨️ (0:00:36) Why Data Engineering
⌨️ (0:03:14) Docker
⌨️ (0:30:38) SQL
⌨️ (1:04:32) Building a Data Pipeline from Scratch
⌨️ (1:31:03) dbt
⌨️ (2:04:11) CRON Job
⌨️ (2:07:54) Airflow
⌨️ (2:41:14) Airbyte
⌨️ (3:01:54) Outro
🎉 Thanks to our Champion and Sponsor supporters:
👾 davthecoder
👾 jedi-or-sith
👾 南宮千影
👾 Agustín Kussrow
👾 Nattira Maneerat
👾 Heather Wcislo
👾 Serhiy Kalinets
👾 Justin Hual
👾 Otis Morgan
👾 Oscar Rahnama
--
Learn to code for free and get a developer job: www.freecodecamp.org
Read hundreds of articles on programming: freecodecamp.org/news
Пікірлер: 249
We need a 60 hrs course for DataEngineer.
@dijik123
6 ай бұрын
U make one
@AndyTutify
6 ай бұрын
Check out the data engineering zoomcamp.
@pranjalshukla8096
6 ай бұрын
😂@@dijik123
@himanish2006
5 ай бұрын
@@StarLord-571
@dijik123
5 ай бұрын
@@StarLord-571 it's not with the quantity but with the quality
Amazing! Looking forward to it. Also, it would be perfect to have a more comprehensive version of this course as well. Covering all batch and stream processing tools and methods.
Good job, you are creating visibility for airbyte in a great way, by providing an evolutionary view of the stack that gets one to eventually need it. Hope they continue to support you making content using this approach.
Yes! I’ve been waiting for this.
YES THANK YOU SO MUCH keep expanding this!!!!
Long waiting for this course
Please add more data engineering course like this. I really love it.
If you're running into an error with "exit code 1" @1:28:55, you need to update the Dockerfile @1:08:08. Goto the Dockerfile and "image: postgres:9.2" for both "source_postgres" and "destination_postgres" I think this was a good idea but many details went off. I wish he'll be more specific about the version of the software he's using next time.
@marceltorres5680
4 ай бұрын
Thanks, I wasted many hours with that error. I would appreciate if you could expand on the reasons why this error happens. Again, thanks for your contribution.
@wallyhormi106
4 ай бұрын
God bless you mate !
@1001pepi
4 ай бұрын
Thanks a lot for the point. I spent a bit of time trying to solve the issue.
@MystyBoy
4 ай бұрын
bro, the execution of my code is still running, (after I edited the part of the code you mentioned), but I believe in you! Thank you so much wallah you are a savior! 🔥🔥
@gregtsado9926
3 ай бұрын
thanks mate
Finally a data engineering course!
Thank you for this! Learned a lot today. 👍🏾
Please bring a bigger course , which covers all aspects from basics , like from scratch , MySQL , python/Java/Scala , Hadoop Spark , pyspark , something that covers a data engineering with one cloud
@kandoras.guzman6705
6 ай бұрын
yes!!
@dantemartinez3609
6 ай бұрын
Yes, please. This would be so helpful but thanks for this resource, a great start.
@twelve7103
6 ай бұрын
YES Please
@biebersucks27
5 ай бұрын
Yes
@nandank262
5 ай бұрын
Yes please
Thanks for the course .. to all those using VM's make sure files are located on the VM .. tried running docker excercise with docker on VM and files on host (Windows) wasted a lot of time resloving errors finally moved all the files to the Ubuntu VM where things ran smoothly ...
the best DE course that i ever seen. the most courses only stick to the theory and never show the practical part.
Keep up the good work!
More bi and DE courses like this plz
Thank you for this. I like the way you teach it step-by-step and I always got lost with JOINS. Haha anyways thanks for this! Keep it up
We would like to see more content for Data Engineering, possibly a full course.
Yes! Thanks!
My man...Justin! You're a top crossfitter and I miss working out under your guidance
I just started and love the style. You teach fluent and set focus on the important take aways. I saw so much bullshit that I really expected that I need to watch you 10 minutes installing docker and already started skipping but you didn't show that part which is nice. Makes perfect sense. Someone not being able to RTFM and install docker on his own shouldn't focus on DE at this point anyway imho.
Thank you for amazing video. Could you please a second video including Data Engineering Projects
ok, this is super helpful.
Awesome!
My boy, Chau!
best video on the internet
This is excellent content
Thank's for the video
Love it!
Finalllyyyy an data project
Hey , currently there is no set path to becoming a data engineer. So please create a proper certification with a clear roadmap of foundations and most used cloud tech in data engineering so that we can get some structure going for those interested in this career.
thank you very much
At 1:53:56 you may face an error due to the fact that the dbt service is launched before the completion of the elt_script service. To solve the issue, you have to add condition: service_completed_successfully under the depends_on clause of the dbt service to be sure that it will always be launched after the completion of the elt_script service.
@UnrealityDesignsTM
2 ай бұрын
Thank you so much.
@wusswuzz5818
2 ай бұрын
This did not work for me, and I can't progress any further. Frustrating.
@mumukshapant
2 ай бұрын
Thanks man, I was struggling with this for a while depends_on: elt_script: condition : service_completed_successfully
Since when Justin is a data engineer? Well I guess the constant learning is real.
@briabytes
6 ай бұрын
I’ve been following him since I started coding(almost 4 years now) he’s always learning and growing.
@JREQuickPods
6 ай бұрын
@@briabytes which channel ?
@kgahlisomkwanazi9149
6 ай бұрын
@@briabytes Please share his YT channel.
@briabytes
6 ай бұрын
@@JREQuickPods he codes and games on his twitch channel and he has a KZread(you can search his name) where he shares his experiences. twitch.tv/justinbchau. I watched him make some of this course a few months back on twitch.
@briabytes
6 ай бұрын
his twitch is in the other comment
I really need a longer video.
I like to see more of your content, good explanation skills
Why is dbt using the port 5434 to communicate with the destination database? Both containers are running in the same docker network so why do we need to use the exposed port number?
Here I'm I thinking about data engineering, then Boom! KZread shows me a data Engineering course 😅
@ruirodrigues2938
5 ай бұрын
You probably googled
An advice. Doesn't make sense to read and write the code from your top monitor. Save your and the watchers' time and just copy-paste what's there and explain it line by line.
If you get the "pg_dump: error: aborting because of server version mismatch ". I found that running "apt-get update && apt-get install -y postgresql-client" is actually installing version 15, while "postgres:latest: pulls version 16. To fix this this, specify your image to be "postgres:15" for both the source_postgres and destination_postgres in the docker-compose.yaml. I also changed my RUN in command in Dockerfile to be "apt-get update && apt-get install -y postgresql-client-15" to be explicit.
@vohoang6693
2 ай бұрын
Thank you. I have 30 mininutes for this error.
Please make a detailed tutorial on data engineer, about 20-30 hours full end to end course please it's a request 🙌
Please make video on performance testing using jmeter
Excellent .
Real GEM
Right off the bat, would just like to make the comment of, please nix the background music or make it even quieter... Distracting.
Finally!
Finalllyyyy an data project. What are the learning pre-requisites for this course?.
another one had issues with host.docker.internal hence added extra_hosts: - "host.docker.internal:host-gateway" to docker compose for dbt
@paolaprieto8111
22 күн бұрын
Thank you! I ran into all the missing configuration/typos Justin had. But for this one, his github yaml file didn't have the line, so it was difficult to find the root cause. thank you for sharing.
1:59:58 the reason why he encountered the error is that {% generate_ratings() %}. To avoid the error, you should put {% macro generate_ratings() %}
Came across another issue while running code for datapipeline gave me version error between dump and postgresql I changeed the version from latest to 15.5 to match dump version... hope this helps in case anyone face issues on this module
@aknnvr
5 ай бұрын
Hero
does the alpine container run the same as the ubuntu container i presume it does but want to know if it does change the scope of the project
Please make more video about Data Engineering
please come up with big data course
How does the "data_dump.sql" file transfer from source container to the destination container?
Do full MySQL course
Please make this a series! (Edit) Suggestion 1: I would make is to include an overview before each section of 1) the overarching pipeline 2) where in the pipeline we are for a given section. In other words, having a map or diagram of what is happening would help with conceptual understanding. Suggestion 2: Explain the code line-by-line conceptually. Time writing the code on the screen could be cut and replaced just by explanation. This would save time.
| 05:53:48 Encountered an error: dbt-1 | Runtime Error dbt-1 | fatal: Invalid --project-dir flag. Not a dbt project. Missing dbt_project.yml file dbt-1 exited with code 2 getting this issue how to solve this error
Okay, major question. As you are adding in new technologies like Airflow in the docker file, where is your console log or terminal to let you know if there’s any syntax errors etc.? For example, with React, Django, Flutter, I always have the app running on local host and ALWAYS have that window open to see if there are any errors in the error log as I am updating files. How do you do that with this workflow to make sure you’re not making mistakes while you’re writing code?
Airbyte section is complicated and badly explained. It looks for me like some part is missing in the recording. E.g. how airbyte was started? After this video I don't see significant advantage using Airbyte over elt_script from the video example.
Do you need to know maths for data engineering if so what types of
ChauCodes !
Sir please make the audio track available
have a issue with postgres version mismatch. I change the version of postgres server to suit with it. services: source_postgres: image: postgres:15.5 same for destination_postgres. Hope this help
@allenlai4761
5 ай бұрын
thank you man! Great help
What version of terminal is that?
can anyone tell me how he got his terminal looking pretty like that?
Can anyone tell me pre requisite for this video
In the "Building a Data pipeline from Scratch" section, when I am running the containers by docker compose up, I get the following error: elt-elt_script-1 | pg_dump: error: aborting because of server version mismatch elt-elt_script-1 | pg_dump: detail: server version: 16.1 (Debian 16.1-1.pgdg120+1); pg_dump version: 15.5 (Debian 15.5-0+deb12u1) I have installed the latest version of postgres on my machine i.e. 16.1. I have also removed the images and volumes and rerun the docker compose up command, still I get the above error. Can someone please help? Thanks!
@oddlang687
5 ай бұрын
I ran into this same problem. I fixed it by changing the docker-compose.yaml. Instead of image: postgres:latest under the source_postgres and destination_postgres, I wrote image: postgres:15 in both of these places. This way when the code is run, it installed the same version of PostgreSQL in both the source and destination.
@sontran-tx7ng
4 ай бұрын
@@oddlang687 it' s work , thanks you very much, i take two day to find the way to show that problem.
@Victor-rs4ku
4 ай бұрын
Thanks man!
@alanpaget10
Ай бұрын
@@oddlang687 Appreciate this comment, fixed for me. Thank you!
do you guys have an idea what vscode theme he is using?
Does anywone has issues with creating film_rating ?
This channel is gold 🪙❤
Any Pre-requisites required ?
Cool
Hi! Thanks for the tutorial, where can I find those yml file ? in the github repo there are not those files
@findoc9282
5 ай бұрын
I mean the part of dbt
The dbt section might have been easier to explain if it was installed to the base image; all of the config could have been completed in the docker file? As you are using VS Code, it might be a good shout to use the remote conatiner extensions to interact with the running containers, too.
Do Justin Chau have a virtual adress?
I ran into the below error trying to run my container, can you please advice me on how to resolve it? All my files are in one directory. 2024-05-03 15:34:42 Node.js v18.20.2 2024-05-03 15:34:45 node:internal/modules/cjs/loader:1143 2024-05-03 15:34:45 throw err; 2024-05-03 15:34:45 ^ 2024-05-03 15:34:45 2024-05-03 15:34:45 Error: Cannot find module '/app/src/index.js' 2024-05-03 15:34:45 at Module._resolveFilename (node:internal/modules/cjs/loader:1140:15) 2024-05-03 15:34:45 at Module._load (node:internal/modules/cjs/loader:981:27) 2024-05-03 15:34:45 at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:128:12) 2024-05-03 15:34:45 at node:internal/main/run_main_module:28:49 { 2024-05-03 15:34:45 code: 'MODULE_NOT_FOUND', 2024-05-03 15:34:45 requireStack: [] 2024-05-03 15:34:45 } 2024-05-03 15:34:45 2024-05-03 15:34:45 Node.js v18.20.2
Can you make data engineering boot camp.
🎉
Is spark and kafka actually covered here as mentioned in the description?
@ryanthabet7452
4 ай бұрын
nope
Might be great, but source code is not working at all (starts with main branch, and keeps going).
I am getting an issue with docker compose up 'dbt-1 | relation "public.actors" does not exist' seems like dbt runs first and then the elt_script, even though I have the same docker compose file as the video shows.
@josesalazar2384
5 ай бұрын
fixed it by doing this on my dbt service depends_on: elt_script: condition: service_completed_successfully
@oddlang687
5 ай бұрын
@@josesalazar2384 thank you for sharing this! I ran into the same problem and was looking forever for a solution. This worked great for me
@Kiritsu14498
4 ай бұрын
thanks a lot
@mahumtofiq4930
3 ай бұрын
@@josesalazar2384 I was having the same issue but with public.films does not exist. I went in the docker file and added the condition. But that did not solve the issue. i looked around and turns out in sources.yml i had the table source for films to be "films.sql". we dont need to add a .sql. Leaving this here in case someone has the same issue as me
heyy anybody knows where i can found the github repo for the dbt project? I'm searching for the schema and can't find it
@joshwigginton9881
Ай бұрын
same
Please complete j2ee
Love your tutorials. But can you make a c++ sfml with visual studio tutorial that would help me a lot. Thanks
Can we use docker volume as a database instead of sql?
@marcosmarx8236
5 ай бұрын
In theory yes, but I won't recommend because you can easily lost the volume. Databases means you can securely have your data stored in one place.
Errors: Database Error in model actors (models/example/actors.sql) relation "public.actors" does not exist relation "public.actors" does not exist ===> Solution: In 'docker-compose.yaml', add the below code to dbt: depends_on: elt_script: condition: service_completed_successfully
Im curious what tools, besides, Airbyte can be used? (the presenters employer - FYI, this is a marketing channel for Airbyte).
@markk364
5 ай бұрын
Not even on the Gartner Quadrant. And Airbyte is a freemium, only 14 days of free use. And as you might have guessed, it's greed-priced, errr usage priced. the lowest use (10gb and only 4 rows to replicate per month) is $160.00. That is huge expensive. I think Im going to skip this advertisement.
@marcosmarx8236
5 ай бұрын
@@markk364 you can run airbyte for free using the OSS version.
2:07:08 you should have fed your cat 😂
The intro and description don't match the video - there is no Spark or Kafka anywhere in the video.
Sir on first run of docker compose up, here is the error I have been facing: elt_script-1 exited with code 1, some please help
@muh.zakyfirdaus6100
2 ай бұрын
did you solve this?
@jedadeyemi6566
2 ай бұрын
@@muh.zakyfirdaus6100 not yet solved, please help
Here's your data model. People/Places/Pennies/Product involved in a Process.
At 2:52:34 how was airbyte started ??
@krishnaveersingh1851
3 ай бұрын
try looking into course resources > airbyte
it was alright, I think adding a bit more discipline into the course would be an amazing upgrade
2:40:23 That was litterally it 😂 but thanks for tutorial;)
What are the learning pre-requisites for this course?
@RM-xq7gf
6 ай бұрын
brain
@AndrewHuange
5 ай бұрын
this guy could barely code in JavaScript like a year ago so not too much i guess.
@muzammilquazi6398
5 ай бұрын
@@AndrewHuange thanx for answering my query👍
2:32:29
Anyone know why at 1:59:55 defining and using that macro didn't work?
@Fuhrmaaj
3 ай бұрын
I'm pretty sure it's because he forgot to type macro. ie. he wrote {% generate_ratings() %}, but it should be {% macro generate_ratings() %}
@vohoang6693
2 ай бұрын
@@Fuhrmaaj thank you
returned non-zero exit status 1. i have an error like this, anyone can help?
@damolaolayinka-osho7038
Ай бұрын
same, on the elt_scrript right?. tried everything nothing works
2:29
I want to be Data Engineering Assistant 😀
two