I built my own Reddit API to beat Inflation. Web Scraping for data collection.

Ғылым және технология

The only way for us cash strapped developers to make it in this economy!
In this video, I decide to create my own version of the Reddit API for as cheap as possible (whilst still remaining cloud hosted). We look at how I gathered the data, and how I built a simple, yet affordable, data pipeline, and finally, a usage based API which costs me pennies, rather than hundreds of dollars.
This video was sponsored by BrightData. To signup for Brightdata and get $15 credit to build your own web scrapers, use the following link brdta.com/dreamsofcode
You can find the source code for this project on GitHub at the link below
github.com/dreamsofcode-io/re...
Become a better developer in 4 minutes: bit.ly/45C7a29 👈
Join this channel to get access to perks:
/ @dreamsofcode
My socials:
Discord: / discord
Twitter: / dreamsofcode_io
00:00 Intro
01:42 Web Scraping
06:58 Message Queue
10:19 BrightData
13:23 Deploy to AWS Lambda
14:18 DynamoDB
15:32 API
18:05 Final Cost

Пікірлер: 286

  • @dreamsofcode
    @dreamsofcode9 ай бұрын

    To get $15 credit for use with Brightdata to scrape your own APIs, visit: brdta.com/dreamsofcode

  • @meinkanal13378

    @meinkanal13378

    7 ай бұрын

    Just an info: Not working anymore, only $5

  • @dreamsofcode

    @dreamsofcode

    7 ай бұрын

    @@meinkanal13378 inflation strikes again 😭 Let me reach out. Thank you for letting me know

  • @PaulSebastianM

    @PaulSebastianM

    6 ай бұрын

    Be careful, we scraping is illegal in some countries.

  • @sivuyilemagutywa5286
    @sivuyilemagutywa52869 ай бұрын

    The video was enjoyable, but it's important to acknowledge that sponsored content can introduce bias. One approach could be to make the entire video centered around the sponsor, or if you choose to feature the sponsor as you did, consider presenting alternative services similar to them. Your videos are consistently excellent, boasting high-quality production, a well-maintained pace, and crystal-clear explanations.

  • @aliengarden

    @aliengarden

    7 ай бұрын

    that was my exact thought, thanks for pointing it out.

  • @seanthesheep

    @seanthesheep

    7 ай бұрын

    when ChatGPT focuses more on the sponsor of the video than the video itself

  • @jaumsilveira

    @jaumsilveira

    7 ай бұрын

    Yeah, bro was talking about make everything as free as possible and then presents a service which is very expensive

  • @hqcart1

    @hqcart1

    7 ай бұрын

    what about captcha?????? he didnt mention that his sponsor can go around it, and even his code did not handle captcha.

  • @TheMacWindows

    @TheMacWindows

    7 ай бұрын

    @@hqcart1 Death by captcha and related services exist for that

  • @foobars3816
    @foobars38167 ай бұрын

    This was never a technical limitation, it was a legal one.

  • @jgould30

    @jgould30

    6 ай бұрын

    uh, no. It's a financial one. The idea that companies are going to offer network and compute resources for the sheer amount of API calls made for free was always comical. It's sad that so many programmers and general public think this stuff is just free or a charity. No matter what you do, eventually these costs will catch up to the business and HAVE to be charged to people or else the service will just die.

  • @fizzcochito

    @fizzcochito

    6 ай бұрын

    @@jgould30 I am going to touch you without your consent

  • @Homiloko2

    @Homiloko2

    6 ай бұрын

    @@jgould30 Yep. People pretend webscraping is 'free', but it still costs the companies. The companies are willing to bear the cost of regular users browsing through pages, but a scraper browsing through the entire catalog is even more expensive for the company than if they just used the API. Scraping is definitely malicious.

  • @tabbytobias2167

    @tabbytobias2167

    Ай бұрын

    @@jgould30 it costs a server less than a penny to serve 1000 requests.

  • @jameskim7565

    @jameskim7565

    Ай бұрын

    @@tabbytobias2167 yes, but for a service the size of reddit, it can lead to hundreds of thousands of dollars in losses, due to the sheer volume of those requests.

  • @shishsquared
    @shishsquared9 ай бұрын

    Crowdsourcing idea for this to prevent IPs getting blocked: a browser that pays its users for using it. Developers write scripts to scrape data, and pay to use the network of users. Users then get paid for using the web browser, which will create a private session, encrypted away from the user, run the web scraping tasks, and send the data back to the developer. Build it all on top of chromium, and if done correctly, websites would have a very difficult time blocking based on IP addresses, activity , or fingerprinting because it would be distributed across actual user IPs, and actual user login times (browser only runs when open). My only concern would be how to protect the users when malicious devs start doing illegal activities. You'd have to have very strong terms and conditions, have logging, and be able to trace back requests to devs. But then that opens a dev privacy can of worms. Still, interesting concept

  • @phoneywheeze9959

    @phoneywheeze9959

    9 ай бұрын

    Botnet as a Service

  • @levifig

    @levifig

    9 ай бұрын

    You just described 99% of the “VPN” apps available for your mobile device… ;)

  • @MuhsinunChowdhury

    @MuhsinunChowdhury

    8 ай бұрын

    Wouldn't residential sneaker botting proxies be able to accomplish the same thing?

  • @mathisd

    @mathisd

    8 ай бұрын

    @@MuhsinunChowdhury These costs..

  • @ajnart_

    @ajnart_

    8 ай бұрын

    ahahahah you're not wrong, especially the free ones@@levifig

  • @shadez221
    @shadez2219 ай бұрын

    For anyone planning to try this , use headless mode of puppeteer so that I doesn’t open multiple browser to improve performance and route it via a vpn setup on aws to obfuscate . And be ready to have your ip blocked 😊

  • @__sassan__

    @__sassan__

    9 ай бұрын

    Even when using the VPN?

  • @tacokoneko

    @tacokoneko

    9 ай бұрын

    vpns also have an ip so when doing this if they block you you need an endless revolving door of new VPNs or proxys @@__sassan__

  • @tacokoneko

    @tacokoneko

    9 ай бұрын

    which is not that hard because if you port scan the entire internet with some strategic guessing (downloading public datacenter IP ranges, scan port 1080 for SOCKS5 proxys) you can find unsecured proxys for free, even some rare ones that work with SSL over SOCKS5

  • @tacokoneko

    @tacokoneko

    9 ай бұрын

    i asked someone if port scanning the internet to find proxys is illegal and they said no so i think it's completely legal, they didnt put a password or any authentication so they are allowing people to use it

  • @Dot_UwU

    @Dot_UwU

    9 ай бұрын

    @@__sassan__ if you send a ton of requests with the same IP, you'll get rate limited. Also most VPN ips are datacenter IPs which are almost always blocked.

  • @wierdnes
    @wierdnes7 ай бұрын

    Great video. I liked the step by step thought process of getting the scraper get data. One major flaw in the cost analysis you presented was the absence of any cost for brightdata. Checking the pricing myself it looks like 20€ per GB of data?

  • @forresthopkinsa
    @forresthopkinsa9 ай бұрын

    This is an interesting idea but a really impractical approach. New Reddit is an SPA and you can just use the XHR endpoints to fetch the data raw. Don't bother with browser emulation and HTML parsing. Besides, the closure of the APIs was never about restricting access to a user like you're circumventing here. As you've acknowledged, that wouldn't really make sense on the Web. The API pricing is about charging for data farming and large-scale user interception. You can't accomplish either of these use cases by scraping; you'll get rate-limited very quickly. The only way around this is using Bright Data's borderline-illegal botnet, which seems like a pretty shady way to do business.

  • @tatianatub

    @tatianatub

    7 ай бұрын

    its called hostile interoperability and its the consequence to fucking over developers, its time we remind platform hosts why APIs were created in the first place

  • @mathgeniuszach

    @mathgeniuszach

    7 ай бұрын

    People will use their own embedded browsers and similar scraping methods will occur locally. It's basically the same as an extension modification of the site. People just browsing normally don't need botnets and access to all of reddit, they just want a better stinking interface.

  • @ArizeOW

    @ArizeOW

    7 ай бұрын

    @@tatianatub It's time to remind you, that Reddit doesn't belong to "us". It belongs to Reddit. And they can do whatever they want with it. If they don't want large applications like Apollo to scrape EVERY post, comment, upvote, downvote, user karma and such, there is nothing you can do about it. That's it. It's not that deep.

  • @DathCoco

    @DathCoco

    7 ай бұрын

    also if using oldreddit you can simply use jsdom to parse the data without needing to spin up a chromium

  • @x--.

    @x--.

    7 ай бұрын

    The internet is meant to be and should be open. That doesn't mean everything has to be free at-scale but fighting hostility to the _idea of an open internet_ is a good thing. You're free to put your content behind a paywall for everyone.

  • @conaticus
    @conaticus9 ай бұрын

    Really cool project idea! Loved it

  • @nocluebruh3792

    @nocluebruh3792

    9 ай бұрын

    yooo

  • @aa898246

    @aa898246

    9 ай бұрын

    Its the rust guy

  • @dreamsofcode

    @dreamsofcode

    9 ай бұрын

    Yoo thank you! Love your videos as well.

  • @outofrange7156

    @outofrange7156

    8 ай бұрын

    rusty boi

  • @IannoOfAlgodoo
    @IannoOfAlgodoo9 ай бұрын

    Curious how much you spend on bright data as their product is like 20$ / GB and 0.1/hour

  • @GoldenretriverYT

    @GoldenretriverYT

    7 ай бұрын

    Yeah, its expensive as heck. Also I am wondering how they claim they have 72 million residential ips? I can only imagine them having spread malware which then gave them a botnet to work with, or, less likely, they offer people money in exchange for them running a proxy. Edit: I looked it up, apparently they have an SDK which app developers can integrate which gives the users a choice between ads or allowing their connection to be used by BrightData as a proxy, thats where they (at least claim to) have the proxies from.

  • @tardistrailers

    @tardistrailers

    7 ай бұрын

    @@GoldenretriverYT It'd be insane to run a resold proxy on your personal IP, just to see no ads somewhere. Worst case you get your home raided by law enforcement, because someone did something highly illegal with it. But I wouldn't be surprised if less educated people still do this.

  • @OrangeYTT

    @OrangeYTT

    7 ай бұрын

    ​@@GoldenretriverYT99% of "residential proxies" are just computers under a botnet. Hola (that free Vpn) got in trouble a while back for making people who used their VPN join their botnet for this very reason!

  • @nigerianprince5389
    @nigerianprince53895 ай бұрын

    1st off, thanks for this buddy, you're a godsend. it does feel a bit over-engineered but i guess you've gone this route because you want to build your own Reddit API. for folks like me who have only been coding everyday for 1 month using GPT - knowing how to pull the data from reddit and store in a database is the main thing i need (i think most people as well but i could be wrong). keep up the good work still and thank you again !

  • @FunctionGermany
    @FunctionGermany9 ай бұрын

    new reddit probably uses an internal API you can pull from by fetching from the browser window. also note another user's comment about old reddit + cheerio (no browser needed).

  • @eoussama

    @eoussama

    3 ай бұрын

    He probably used Playwright just to have an excuse to shove the Bright Data sponsorship in the vide, which I understand.

  • @Jana-se4kv
    @Jana-se4kv8 ай бұрын

    THANK YOU! Very helpful!

  • @WarlordEnthusiast
    @WarlordEnthusiast7 ай бұрын

    I actually did something similar, we needed financial data for a project we were working on and the APIs we found were very limiting and some were very expensive. We tried using one of the cheaper ones and it straight up did not work, it had downtime of sometimes hours and when we contacted the company they basically told us it wasn't there problem. So I built a web scraper, hosted it on my server at home and scraped all the forex data I needed from their website for free.

  • @the_cobfather
    @the_cobfather7 ай бұрын

    Why use an SQS queue to abstract the db writing interface? The solution that immediately comes to mind is to just make an abstract class. The point of SQS is to be able to handle crazy amounts of throughput (like, up to 30,000 messages per second), which isn't really what you're doing.

  • @sumirandahal76
    @sumirandahal769 ай бұрын

    Quality project ❤ content worth watching, hooks through the time. 🎉

  • @-Siknakaliux-II
    @-Siknakaliux-II7 ай бұрын

    So this vid popped up in my recs. Unrelated off-topic comment, but I remember getting into a programming phase in grade 6-7. I've pretty much obsessed over the thought of doing something great with it. Got myself to do a few courses but never really stuck on as ive moded onto Finance. Now I kinda wanna get into it again as I did in the past...

  • @takennmc
    @takennmc9 ай бұрын

    8 cents for 3 weeks damn this really makes reddit unreasonable

  • @rockshankar

    @rockshankar

    9 ай бұрын

    That does come with a significant management. the project is a simple way to get it working. Once you dig deeper there are lots of problems. Lambda and dynamodb is cheaper based on amount of requests. If you post your api endpoint in public. 1 million requests will be gone in seconds. and then using Lambda will make it more expensive than running your server. If its cheaper, someone else would have done it already.

  • @poggybitz513
    @poggybitz5137 ай бұрын

    I did the same thing for my app using selenium bindings in rust and used vagrant to manage instances. You can use docker if you want. Please mark this video as ad, because none in their right mind would do it this way. I am so tired of people shoving ads down my throat and claiming its a good education.

  • @scaffus
    @scaffus9 ай бұрын

    Great vid! Love your work

  • @teamredstudio7012
    @teamredstudio70127 ай бұрын

    I would do this in a different way. I would simply write a script in whatever language, that has a get and post function so you can call the main page first, then parse the data, often websites use apis already to fetch the content, use Fiddler Classic or some other proxy server to inspect what api the website uses. When the website loads more content after scrolling, it needs to fetch the data from somewhere. Simply reproduce this api by copying the authentication tokens from the headers and providing the required headers in the requests, then parse the response body and add it to some database. I would make it store everything so if it needs to be fetched repeatedly it simply gets from offline copy instead of wasting resources fetching and parsing. I never automate browsers, if your browser can fetch the data, you can fetch it too without front end. You can also get the url to load more content from fetching the raw main page because the browser needs to know where to fetch this anyways so it's definitely defined somewhere. It's super simple to scrape websites, you only need to know how to do requests and parse json and xml in your preferred language! Don't automate browsers but just fetch it directly!

  • @unforgettable31

    @unforgettable31

    7 ай бұрын

    I come from a cracking background and back in the day and this is exactly what we would do. We would write GET/POST requests with token grabbing methods and get the job done. We’d launch hundredths of threads all connected to different proxies, instead of a single web browser. Sometimes it was challenging for particular platforms because of cookies but at the end of the day it was doable.

  • @rossimac

    @rossimac

    7 ай бұрын

    Websites that use recaptcha2 are ones that I've found that I need a browser to interact with. Ones that don't then yes, totally, inspect the network traffic and understand how your browser is creating the requests and then replicate them.

  • @S0L4RE

    @S0L4RE

    7 ай бұрын

    +1 it’s such a massive pet peeve of mine seeing people use selenium when it could just be achieved with requests.

  • @cheemzboi

    @cheemzboi

    7 ай бұрын

    @@unforgettable31 what about captchas then

  • @unforgettable31

    @unforgettable31

    7 ай бұрын

    @@cheemzboi Most platforms use captchas when they detect ongoing suspicious activity, which is omitted when using proxies.

  • @kale_bhai
    @kale_bhai7 ай бұрын

    Learned about the queing system utilization. But thats pretty much the obly thing new to me.

  • @dancinglazer1628
    @dancinglazer16287 ай бұрын

    Honestly, I think this infrastructure is too complicated for what it is doing. I don't really care about the sponsored bit, but I think it would have been better to simply create a lambda that directly writes to a database (assume a cacheFactory -> RedisCache | MongoCache | JsonCache) along with a "freshness" param due to the relative simplicity of the data I think redis would be a good candidate; Then all you would need to do in the API is simply fetch the data based on the query param, something which can probably be achieved in a single file.

  • @jp46614

    @jp46614

    7 ай бұрын

    Yeah I feel it's been quite overengineered with all this message queue and database/service stuff, this could be done fully locally realistically and at not much of a bigger cost since nowadays OSS databases and caching solutions are really efficient

  • @hqcart1

    @hqcart1

    7 ай бұрын

    he will need a 2-4GB ram VM to do that. AWS is expensive

  • @dancinglazer1628

    @dancinglazer1628

    7 ай бұрын

    @@hqcart1 he is deffering the scraping to the sponsered service anyway, but I think we can just fetch the html instead of running a headless browser

  • @dancinglazer1628

    @dancinglazer1628

    7 ай бұрын

    @@jp46614 This could be a single service on a docker image, run a cron scheduler that fetches and writes to a json file and have a server running that uses the json as a database

  • @hqcart1

    @hqcart1

    7 ай бұрын

    @@dancinglazer1628even he uses a sponsored service, at one point you will get captcha, and my point was his code does not handle that.. and about fetching HTML, no it does not work for complex sites where HTML code or classes is getting rewritten by js, i tried that and failed, ended up using headless browser.

  • @jondoe79
    @jondoe799 ай бұрын

    Great content, real examples of use case for different tools for a simple but useful project.

  • @DodaGarcia
    @DodaGarcia7 ай бұрын

    Decoupling the data persistence from the business logic is always a good idea, but using a queue service for that is bonkers. It removes none of the existing complexity, since you still eventually have to map the message payload to the database schema, and then introduces more complexity because you now have to keep track of one more service, the publishing code, the consuming code and the asynchronicity itself. Just use the repository pattern with an adapter for the chosen database, or an ORM like Prisma if you really don't expect the app to scale much.

  • @goofynose2520

    @goofynose2520

    7 ай бұрын

    Agreed. I swear 90% of queues I encounter are needless overcomplications

  • @ShaneZarechian

    @ShaneZarechian

    3 ай бұрын

    Someone fork this and make it non-ridiculous

  • @socks5proxy
    @socks5proxy7 ай бұрын

    absolutely brilliant video. so very well done.

  • @dreamsofcode

    @dreamsofcode

    7 ай бұрын

    Thank you! I'm glad you enjoyed it!

  • @grif5307
    @grif53079 ай бұрын

    One of my favourite videos in a while, great job!!!!

  • @EarlZMoade
    @EarlZMoade9 ай бұрын

    Unrelated to this video - would you show how you version your dotfiles (if you do)? It would make for a good video.

  • @cooperqmarshall
    @cooperqmarshall9 ай бұрын

    The quality of this project is supreme their. Love the detail and consideration for the infrastructure

  • @glitchy_weasel
    @glitchy_weasel7 ай бұрын

    Fantastic! Very informative, always nice to stick it to big tech lol

  • @jasontruter7239
    @jasontruter72397 ай бұрын

    Good job, one improvement would be to go with a single table design with DynamoDb

  • @veshal.s3690
    @veshal.s36908 ай бұрын

    Would love a post on your powerlevel10k config and your terminal config

  • @dandandev
    @dandandev9 ай бұрын

    Heya! I'd recommend Railway to host your apps, its usage based and pretty cheap!

  • @shadyworld1
    @shadyworld17 ай бұрын

    If you could use RSS to pull the data and store them in a proper format to be used for API you’ll be able to save 40% at least of your current approach time and effort!

  • @antonjoacir
    @antonjoacir9 ай бұрын

    Man, could you make a video about the configurations of your terminal?

  • @primo_geniture
    @primo_geniture9 ай бұрын

    I'm curious as to what the total time for the project was.

  • @5criptcom
    @5criptcom7 ай бұрын

    Good one sir!

  • @ltecheroffical
    @ltecherofficalАй бұрын

    You can remove the browser part by using a web scraping framework that works without a browser instance.

  • @TheHotMrDuck
    @TheHotMrDuck7 ай бұрын

    i hope this doesnt kill old reddit, if they remove it im gone

  • @stylrart
    @stylrart7 ай бұрын

    Nice you are using JB Mono, like me. what theme are you using, the colors are handsome ;)

  • @sworatex1683
    @sworatex16837 ай бұрын

    Why didnt you use curl? It would bei way more lightweight than using a Browser. Most Programming languages will let you manage Dom objects with built in libraries

  • @xXtim128Xx
    @xXtim128Xx7 ай бұрын

    Using a full webbrowser when a simple HTTP request and HTML parser would suffice...

  • @dreamsofcode

    @dreamsofcode

    7 ай бұрын

    You're correct. It would have. However a browser is a more versatile option for other use cases.

  • @zack_beard
    @zack_beard6 ай бұрын

    Great content! Quick question. Did you do this after logging into to Reddit with your userid/pwd o without? IIRC Reddit does not show new content if you are not logged in. Thanks!

  • @dreamsofcode

    @dreamsofcode

    6 ай бұрын

    Thank you! Logged out, which causes it to fall under publically accessible. Reddit still shows content on the old reddit website under the /new when you're not logged it.

  • @JoshIbbotson-
    @JoshIbbotson-7 ай бұрын

    How long have you been programming? Loved this video btw!

  • @dreamsofcode

    @dreamsofcode

    7 ай бұрын

    Thank you! I've been writing code since 2008.

  • @jerryaugusto95
    @jerryaugusto959 ай бұрын

    Is it just me or are the icons for the Go files different? How do you change these icons please?

  • @pelic9608
    @pelic96087 ай бұрын

    Every modern website has an API. Most just aren't documented. 🤷‍♂️ Copy their own website's auth flow and use those tokens to drive your app. Wjat are they gonna do? Paywall their entire site? (Ok, ok; SSR is a thing, but there's still almost always some pure-data endpoint around)

  • @louishuort7969
    @louishuort79698 ай бұрын

    What about the cost of bright data ?

  • @k98killer
    @k98killer8 ай бұрын

    Would it have cost more without the brightdata sponsorship?

  • @louishuort7969

    @louishuort7969

    8 ай бұрын

    Ohh yes, a lot, bright data is very expensive

  • @heckerhecker8246
    @heckerhecker82467 ай бұрын

    How to get four hitmen at your door:

  • @chofmann
    @chofmann9 ай бұрын

    you are aware of the json api that things like rif is using? basically, for every link, there is also a json file you can just access

  • @mx338
    @mx3387 ай бұрын

    DynamoDB isn't really low cost, so I would definitely look into switching to ScyllaDB which offers a DynamoDB compatible API.

  • @christianjedro6206
    @christianjedro62067 ай бұрын

    How do you avoid vendor/database lock in by using AWS SQS?!

  • @rando521
    @rando5219 ай бұрын

    hi dreams i love your vids on vim and tried it on my own due to them while trying c++ i want to know if there is a better option than cmake? i come from python so i plan on rpc-ing the python part and move to mostly c++ or golang any ideas on how to do this?

  • @FaZekiller-qe3uf

    @FaZekiller-qe3uf

    9 ай бұрын

    The better option is to use a language with good tooling. Zig, Rust, Go, etc. cmake L, Make L.

  • @jacksonsmith4648

    @jacksonsmith4648

    9 ай бұрын

    Meson! It's basically CMake, but with syntax similar to python, and a lot less stupid design decisions. Definitely worth a look.

  • @S0L4RE

    @S0L4RE

    7 ай бұрын

    @@jacksonsmith4648why are we hating on cmake?

  • @creeperlolthetrouble
    @creeperlolthetrouble7 ай бұрын

    xD i've seen this coming for months but why not keep AWS and tunnel the requests through a proxy

  • @EarlZMoade
    @EarlZMoade9 ай бұрын

    Are there any issues with legality when using the data you extract? I.e. could you use the data for commercial purposes, or research?

  • @ristekostadinov2820

    @ristekostadinov2820

    9 ай бұрын

    Microsoft i think have taken someone to court for web scraping and won, i think it was a company that were scraping linkedin public data from users and were building their own app for recruiting people and microsoft were arguing that the users didn't consent to that (which is true, but then again data is public). So it's a very tricky problem, and is best to read websites terms & service.

  • @user-nr1qk6oi7g
    @user-nr1qk6oi7g7 ай бұрын

    if you used python you could easily bypass ip blocking with torpy

  • @jakestrouse12
    @jakestrouse127 ай бұрын

    You can also reverse engineer their private api by looking at the browser network requests. The scraping will be much faster

  • @S0L4RE

    @S0L4RE

    7 ай бұрын

    Although Cloudflare IUAM makes it an immense pain in the ass

  • @batmanatkinson1188

    @batmanatkinson1188

    7 ай бұрын

    And keep in mind that private APIs are susceptible to change, so today it’s gonna work, tomorrow you have to start over

  • @unaif.2171

    @unaif.2171

    7 ай бұрын

    ​@@batmanatkinson1188less often than the html

  • @TheSaintsVEVO

    @TheSaintsVEVO

    7 ай бұрын

    @@S0L4REwhat’s that? Does Reddit use it?

  • @S0L4RE

    @S0L4RE

    7 ай бұрын

    @@TheSaintsVEVO I’m not sure if Reddit uses it, but IUAM detects very low-level characteristics about the request (i.e cipher mode, SSL configuration) to determine whether it looks automated.

  • @siniarskimar
    @siniarskimar9 ай бұрын

    How about developing a browser extension for "enhancing" reddit that would additionaly scrape any post that user sees 🤔

  • @filiprandom
    @filiprandom3 ай бұрын

    I watched this video for 4 hours because it was on repeat and I fell asleep

  • @TrueDetectivePikachu
    @TrueDetectivePikachu7 ай бұрын

    Genuine question, why use puppeteer that relies on an active browser and not something like cheerio?

  • @dreamsofcode

    @dreamsofcode

    7 ай бұрын

    It's a great question. Cheerio would work really well in this case as there was little to no javascript for the old version of reddit. Initially I wanted to go with the new reddit so had scoped out using an active browser (which I think has more application beyond reddit). Cheerio is always preferable in a case with no javascript, but it's not as applicable as puppeteer is. TLDR is that I wanted to showcase active browser scraping in the video.

  • @sheldonsays9922
    @sheldonsays99226 ай бұрын

    How long did it actually take for you to complete this project.

  • @pchris
    @pchris7 ай бұрын

    Would something like this work for third-party applications like Reddit Apollo?

  • @CrazyWinner357

    @CrazyWinner357

    7 ай бұрын

    It can work... until you get a captcha

  • @navaneeth6157
    @navaneeth61579 ай бұрын

    chromedp for golang is also an option

  • @houstonbova3136
    @houstonbova31368 ай бұрын

    DataStore and FireStore work roughly the same as Dynamo, no?

  • @iamrafiqulislam
    @iamrafiqulislam7 ай бұрын

    what is the Font you are using for Nvim and tmux status bar, please?

  • @dreamsofcode

    @dreamsofcode

    7 ай бұрын

    I am using JetBrainsMono Nerd Font! I have a video on both of my Nvim and tmux configs on my channel :)

  • @metalspoon69
    @metalspoon699 ай бұрын

    "Just build your own API" *builds own API* "NOO NOT LIKE THAT!!!!"

  • @juanmacias5922

    @juanmacias5922

    9 ай бұрын

    Bahahaha...

  • @techwithjoe8636
    @techwithjoe86367 ай бұрын

    Which Editor is he using? Vim?

  • @robinbinder8658
    @robinbinder86587 ай бұрын

    boi do i smell a cease and desist

  • @Puwunda
    @Puwunda7 ай бұрын

    Intercontinental Lawsuit Inbound!!!

  • @ahwx
    @ahwx9 ай бұрын

    I see you're using a Mac now, what terminal is that? How are your rounded window corners so much less rounded that mine? Have you changed anything?

  • @Meleeman011
    @Meleeman0115 ай бұрын

    why do you use playwright and not just puppeteer?

  • @_Mackan
    @_Mackan8 ай бұрын

    virgin api consumer vs chad scraper

  • @dimagass7801
    @dimagass78017 ай бұрын

    I have no clue how to use apis I still don't completely understand but data is the new oil😅

  • @betapacket
    @betapacket7 ай бұрын

    2:02 isn't playright yet another ECM and not a web scraper?

  • @Shudshudu
    @Shudshudu7 ай бұрын

    Sir am learning c and am new to programming. Currently am learning control structure. But when i look into real world projects I don’t understand anything why

  • @user-hy6cp6xp9f

    @user-hy6cp6xp9f

    7 ай бұрын

    It takes time! Also C is a VERY different level of abstraction than Javascript / Go like he used here.

  • @vekoze9872
    @vekoze98727 ай бұрын

    what is the tmux font ?

  • @edanbigw
    @edanbigw8 ай бұрын

    sorry oot, did you use mac sir?

  • @qCJLbggG4IWAY9nTH6o
    @qCJLbggG4IWAY9nTH6o8 ай бұрын

    why not use their rss feed?

  • @ultimatetoast2739
    @ultimatetoast27397 ай бұрын

    Apicels be seething over scrapechads

  • @flor.7797
    @flor.77976 ай бұрын

    There’s no AI without API

  • @willmil1199
    @willmil11995 ай бұрын

    How do we use your api then ?

  • @guillemgarcia3630
    @guillemgarcia36309 ай бұрын

    jesus there's more terraform configuration than code

  • @reihanboo
    @reihanboo7 ай бұрын

    didn't understand anything but great video

  • @mayar2047
    @mayar20479 ай бұрын

    I'm thinking of just scrape reddit directly from a mobile device, and maybe save the data to the device for caching. I don't need to pay for anything

  • @mr.togrul--9383
    @mr.togrul--93839 ай бұрын

    Great video btw! In the future I also want to make my own web scraper project and this just simplified everything I need to do. Is there any reason why you didnt just use Golang for the whole thing, for the scraper as well? just curious, since as you said writing golang would be more faster than node js

  • @JeanHirtz-ms3bf

    @JeanHirtz-ms3bf

    9 ай бұрын

    Curious about Golang - any repo / vids ?

  • @_soundwave_
    @_soundwave_6 ай бұрын

    A very interesting comment section.

  • @hemant_san
    @hemant_san7 ай бұрын

    how to bypass capctha?

  • @VRGamerBoi
    @VRGamerBoi7 ай бұрын

    Chatgpt told me about this

  • @mikaay4269
    @mikaay42697 ай бұрын

    Application Paying Interface

  • @bieggerm
    @bieggerm8 ай бұрын

    This video shows the only way an arms race should be visualized

  • @makeshift27015
    @makeshift270157 ай бұрын

    Oh god, I hope this doesn't give them even more reason to kill old reddit, it's the only way I can bear using reddit now. As an aside, would it be possible to decompile/packet sniff their mobile app and emulate the requests it makes for a pseudo-api? I haven't decompiled android apps in a hot minute, but I imagine it uses some sort of api rather than downloading a massive html payload that requires parsing

  • @earu_arcana
    @earu_arcana6 ай бұрын

    Nice video, but your setup is a lot more complex than it needs to be IMO.

  • @lowlevell0ser25
    @lowlevell0ser259 ай бұрын

    They will block things like this with Web Environment Integrity

  • @TheArchimede2000
    @TheArchimede20007 ай бұрын

    he never disappoints

  • @xybersurfer
    @xybersurfer9 ай бұрын

    i was with you until you started putting things in a database and the cloud. was it because your video was sponsored by a cloud provider? (i really can't tell) it would be more interesting to see you justifying decisions. seeing all the code is really not that interesting. the overall idea of creating your own reddit API is interesting though, so i will give this a like

  • @pixel690
    @pixel6907 ай бұрын

    $20 per GB is something different jesus

  • @Dev-Siri
    @Dev-Siri9 ай бұрын

    tip: bun 1.0 has been released just last day, and you can use it as a drop-in-replacement for node. it executes js much faster, without breaking anything so it can magically make your api faster. for deployment, you need to use a docker image because its still very early and not supported by any platforms (yet)

  • @ac130kz

    @ac130kz

    9 ай бұрын

    it just get stuck if I try to run puppeteer with whatsapwebjs, yeah, fast and cool, but too early

  • @lollermann
    @lollermann6 ай бұрын

    Don't let pyrocynical see this video he'll become a web dev

  • @hqcart1
    @hqcart17 ай бұрын

    what about cAaptcha ??????????????????????

  • @iliabeliaev2260
    @iliabeliaev22607 ай бұрын

    Old reddit is the only version I use...

  • @DaMu24
    @DaMu245 ай бұрын

    Ok, give it to me

Келесі