Industrial-scale Web Scraping with AI & Proxy Networks

Learn advanced web scraping techniques with Puppeteer and BrightData's scraping browser. We collect ecommerce data from sites like Amazon then analyze that data with ChatGPT.
#javascript #datascience #chatgpt
Get $10 Credit for BrightData get.brightdata.com/fireship
Puppeteer Docs pptr.dev

Пікірлер: 614

  • @beyondfireship
    @beyondfireship Жыл бұрын

    Use this link to get a $10 credit, enough cash to scrape thousands of pages get.brightdata.com/fireship

  • @DeanDavisMarketing

    @DeanDavisMarketing

    Жыл бұрын

  • @Reddblue

    @Reddblue

    Жыл бұрын

    This man selling wood and iron to shovel makers

  • @anze

    @anze

    Жыл бұрын

    @beyondfireship ad link doesnt work

  • @NoahKalson

    @NoahKalson

    Жыл бұрын

    ​@@anze worked for me. Try now.

  • @tamasmajer

    @tamasmajer

    Жыл бұрын

    The pricing page says 20$/GB. I checked how big the pricing page was it loaded 4MB, so then it costs 20$ for 250 pages? That seems very expensive. Or how should i calculate the price?

  • @rvft
    @rvft Жыл бұрын

    I like how he didn't use "cheap" during the entire video because my god the pricing is absolutely madness on the advertised product

  • @brunopanizzi

    @brunopanizzi

    Жыл бұрын

    Industrial scale!!!

  • @koba2160

    @koba2160

    Жыл бұрын

    scraping aint cheap, but theres many ways to make it much cheaper

  • @mrgyani

    @mrgyani

    Жыл бұрын

    ​@@arteuspw what do you mean by 1gb/$1? You mean browsing 1gb of data for a dollar with a single proxy? How many proxies do you get for $1?

  • @user-kj2kt8jt4n

    @user-kj2kt8jt4n

    Жыл бұрын

    @@arteuspw Please tell me where to buy them at this price.

  • @mantas9827

    @mantas9827

    Жыл бұрын

    Is 20$ per GB considered expensive? I wonder how much could you scrape from a site like amazon for that GB... surely a lot ?

  • @meansnada
    @meansnada Жыл бұрын

    I love how there are legit businesses to bypass captchas and mess up with data :)

  • @dislike__button

    @dislike__button

    Жыл бұрын

    Scraping isn't illegal

  • @Tylersmodding

    @Tylersmodding

    Жыл бұрын

    and individuals

  • @aresakmalcus6578

    @aresakmalcus6578

    Жыл бұрын

    @@dislike__button if it's against Terms of Service of the given site, it is

  • @Bruceylancer

    @Bruceylancer

    Жыл бұрын

    @@aresakmalcus6578 I'm not a lawyer, but how can it possibly be illegal? It can be against ToS, sure, then the website owners can surely act accordingly, i.e. ban your account on the said website, ban your IP address, and so on. But illegal? Are there any laws out there that prohibit collecting public data? Are there any cases of people getting sued for scraping? I haven't heard of such, maybe you can provide some examples. Also, there are 8-figure businesses built on scraping, like Ahrefs or Semrush.

  • @Bruceylancer

    @Bruceylancer

    Жыл бұрын

    @@Andrew-zy7jz Exactly! Very good example.

  • @albiceleste101
    @albiceleste101 Жыл бұрын

    As a freelance dev I get contacted all the time for scraping, it's definitely one of the most requested along with Wordpress (which I also dont work with)

  • @cymaked

    @cymaked

    Жыл бұрын

    interesting - 8 years of freelancing and never had one such request 😮

  • @dinoscheidt

    @dinoscheidt

    Жыл бұрын

    And with a freelancer, the business has the advantage that YOU break the terms and conditions of the companies you scrape (are legally liable and suable). Not the business 😊 so a cheap code monkey and legal scape goat all in one 💪

  • @mrgyani

    @mrgyani

    Жыл бұрын

    Where do you get these projects from?

  • @VividCoding

    @VividCoding

    Жыл бұрын

    @@dinoscheidt Wait can they really do that? They are the ones who wanted to scrape the data in the first place.

  • @dabbopabblo

    @dabbopabblo

    Жыл бұрын

    I'm not even a freelancer and I cant count the number of times on two hands Ive been asked to make someone a website. They think because I'm a web developer I am just some guy who goes around making websites willy nilly. And the few times I have actually went through with helping someone out, they want everything Wix or Wordpress provides and have the audacity to suggest I shouldn't be asking so much in pay when a drag and drop builder can suffice.. THEN USE THE BUILDER GOD DAMMIT. My knowledge is wasted on front end work anyways.

  • @alexcasillas2488
    @alexcasillas2488 Жыл бұрын

    This reminds me of when I solved 100 captchas manually so that I could download some data files from a website for an ai. I got a sever message temporarily banning me from the website saying that I must be a bot. I learned my lesson and stuck to only solving 99 captchas each day from then on until I had enough data files

  • @Autoscraping
    @Autoscraping5 ай бұрын

    An extraordinary piece of video material that has proven highly useful for our new team members. Your generosity is immensely appreciated!

  • @Maneki-Nico
    @Maneki-Nico Жыл бұрын

    Your videos are somehow exactly relevant to the code I am writing every week - interesting for sure!

  • @yashkhd1100
    @yashkhd1100 Жыл бұрын

    To be frank out of all youtubers Fireship has most interesting and to the point videos and gives most value out of time spend. Kind of just wondering how he keeps track of all the varied topics and able to make most out of it.

  • @julienwickramatunga7338

    @julienwickramatunga7338

    Жыл бұрын

    He already has five prototypes of Neuralink chips plugged into his brain, linked to the Web via 5G, and he is using digital clones of himself (coded in JS of course) to make more video content (with the help of ChatGPT). That makes him the most powerful being on the planet. Praise the Cyber-Jeff! 👾

  • @RobinhoodCFO

    @RobinhoodCFO

    4 ай бұрын

    With ChatGPT of course

  • @AdamBechtol

    @AdamBechtol

    3 ай бұрын

    Mmm

  • @xanderbarkhatov
    @xanderbarkhatov Жыл бұрын

    If I'm not mistaken, page.waitForSelector(selector) already returns the element handle, so you don't need to use page.$(selector) after that. Anyway, great video, as always. Thank you! ❤

  • @yvanguemkam4739

    @yvanguemkam4739

    Жыл бұрын

    You're right, wanted to said that... But don't have money to spend on the browser. Is there an alternative?

  • @cyberzjeh

    @cyberzjeh

    Жыл бұрын

    ​@@yvanguemkam4739 you can host puppeteer yourself and pay for a proxy service if you need it, might come out cheaper (but more work obviously)

  • @YuriG03042
    @YuriG03042 Жыл бұрын

    toward the end of the video, Jeff suggests that you can grab all the links and then make requests to those links. it gave me flashbacks of another video on the main channel where a company did this and ended up with a 70k+ GCP bill after one night of web scraping, because their computing instance was forever recursing and was scalable up to 1000 instances lmao

  • @EliteGamerpk
    @EliteGamerpk Жыл бұрын

    As a web scraping tool developer, one thing to note about the chatGPT code about extracting product names etc is that it's not going to work on all cases. What I mean by that is we can see there are some random class names like '._cDEzb'. And these classes can vary from page to page. So your code for one listing page, might not work for other. The way I do this is using some advanced query selectors that don't rely on unreliable classes. Can go into more detail if required.

  • @CrackedPlayz

    @CrackedPlayz

    Жыл бұрын

    Please do!

  • @RiChYFanatics

    @RiChYFanatics

    Жыл бұрын

    Dont be shy :p

  • @myhitltd5826

    @myhitltd5826

    Жыл бұрын

    so that's why I copy full selector of the element and work with it in puppeteer.

  • @MrNsaysHi

    @MrNsaysHi

    Жыл бұрын

    AFAIK puppeteer doesn't support finding elements by xpath, so what do guys use?

  • @thrand

    @thrand

    Жыл бұрын

    @@MrNsaysHi well, real men write their own html parser and query language. But peasants like myself use css selectors with document.querySelectorAll.

  • @wtfdoiputhere
    @wtfdoiputhere Жыл бұрын

    Web scraping is still my favourite type of projects it's so fun and "meaningful" to me and with the help of AI i can see it becoming much much easier

  • @0187

    @0187

    Жыл бұрын

    same, gives me shitton of satisfaction

  • @GeekProdigyGuy

    @GeekProdigyGuy

    Жыл бұрын

    thanks Jesus

  • @alejandroarango8227

    @alejandroarango8227

    Жыл бұрын

    Unfortunately GPT4 is still too expensive to use in projects and gpt3.5 is still too stupid.

  • @wtfdoiputhere

    @wtfdoiputhere

    Жыл бұрын

    @@alejandroarango8227 it's stupid enough so you still do much of the work yourself cause eventually it's just a tool to help and personally it helps me enough

  • @wtfdoiputhere

    @wtfdoiputhere

    Жыл бұрын

    @@0187 exactly what i feel

  • @Jeanseb23
    @Jeanseb23 Жыл бұрын

    You've foiled my plan 5 years in the making. At least now I have a free 10$ credit for Brightdata to catch up. Thanks Fireship!

  • @gatonegro187

    @gatonegro187

    3 ай бұрын

    how much did u end up spending

  • @prabhavkhera4959
    @prabhavkhera4959 Жыл бұрын

    Thanks Jeff. I was planning on building a project that uses web scraping and this video absolutely dropped at the perfect time. Appreciate it. I love your videos and hope for more such content in the future :)

  • @Ruf4eg
    @Ruf4eg Жыл бұрын

    Man, you are reading my thoughts! this video came at the right time when I wanted to scrape some websites!!!!

  • @BharadwajGiridhar
    @BharadwajGiridhar Жыл бұрын

    One thing jeff is that these websites change css class names on every refresh. So it's better to write code with selectors that don't change like id or aria label.

  • @Loubensdoriscar
    @Loubensdoriscar5 ай бұрын

    Zeus Proxy's specific emphasis on session management is a key factor that resonates with my goal of executing data retrieval tasks with a focus on mimicking genuine user behaviors.

  • @shawnvirdree8593
    @shawnvirdree8593 Жыл бұрын

    Wow, you’re on the cutting edge of technology 🤯

  • @DanielLavedoniodeLima_DLL
    @DanielLavedoniodeLima_DLL Жыл бұрын

    I remembered that web scrapping was a nightmare to deal with, specially doing this proxy rotation by ourselves. This tool is not cheap, though, so at least here in Brazil (and other emerging countries alike), companies will still be doing that like the old days. The captcha solving was actually done by real people at the time I worked in a company that mined those kind of data a few years ago, but I guess this can be automated with GPT-4 tools now

  • @bossdaily5575
    @bossdaily5575 Жыл бұрын

    Virgin API users vs Chad Web scrapers

  • @ikedacripps
    @ikedacripps Жыл бұрын

    When I first saw puppeteer when I was learning nodejs this is exactly the kind of use case I wanted to apply it to. Specifically wanted to scrape csv files and have some AI learn it and make some sense out of it. I think it’s now more than possible

  • @DemPilafian

    @DemPilafian

    Жыл бұрын

    Downloading CSV files would typically not be considered _"scraping"._ You don't have to scrape the data out of a CSV file -- it's already data.

  • @ikedacripps

    @ikedacripps

    Жыл бұрын

    @@DemPilafian you just wanna falsify my statement but scraping for csv file is as valid as scraping for pdf files. I specifically wanted to scrape soccer analytics websites for those csv files. Hope that puts it into perspective for you .

  • @abishekbaiju1705
    @abishekbaiju170510 ай бұрын

    Thanks for making this video. I am actually working on a project where the users can add amazon products and look for price changes and also get notified with price changes. My objective was to learn web scraping.

  • @user-bp9dx1ir7w
    @user-bp9dx1ir7w11 ай бұрын

    Thank you for teaching me puppeteer and bright data, beats all content on internet

  • @kinglane8634
    @kinglane8634 Жыл бұрын

    Thanks for always helping us devs keep out workflow clean and simple!!! If you plan on starting a subscription service I'd love to see what you're offering.

  • @trickster6254

    @trickster6254

    Жыл бұрын

    He has got a website offering courses. I bought the Angular one myself and was really good.

  • @abz4852
    @abz4852 Жыл бұрын

    fireship you are uploading videos faster than new javascript frameworks get released

  • @VaibhavShewale
    @VaibhavShewale Жыл бұрын

    damn that was really amazing, i was actually thinking of taking snippet of the page extract data then delte that page and repeat

  • @desertislanddivs
    @desertislanddivs Жыл бұрын

    This is a great spell for Howarts Ai Academy, Thanks Professor Fireship ^^

  • @beefykenny
    @beefykenny Жыл бұрын

    This video has a lot of value.

  • @KabbalahredemptionBlogspot
    @KabbalahredemptionBlogspot10 ай бұрын

    OK that was way cooler than I thought

  • @danieldosen5260
    @danieldosen5260 Жыл бұрын

    I never thought of returning data as JSON... that's obvious and brilliant...

  • @d3layd
    @d3layd Жыл бұрын

    Thank you for this! I used ChatGPT to write a puppeteer script for me the other day and it was fucking slick

  • @CODE_YOUR_TYPE
    @CODE_YOUR_TYPE6 ай бұрын

    I love you man i was trying for so long and you are the only one who gave the solution thank you so much

  • @Jason-nv6ku
    @Jason-nv6ku Жыл бұрын

    You're amazing! Many thanks!

  • @pythoneatssquirrel
    @pythoneatssquirrel9 ай бұрын

    I have build hundreds of scrapers in both VBA and Python using Selenium. Everything can be done, this video it's just an ad for one of those hundreds of this kind of service providers.

  • @rstar899
    @rstar899 Жыл бұрын

    Amazing video as always 🎉

  • @blaizeW
    @blaizeW Жыл бұрын

    Another gold gem for daddy fireship 🤑🔥

  • @wandenreich770
    @wandenreich770 Жыл бұрын

    Very insightful

  • @selimachour
    @selimachour Жыл бұрын

    I usually block the fetching of images, css, fonts (and javascript if the website can run without) which speeds up the page load by a lot!

  • @kasparsc
    @kasparsc Жыл бұрын

    Sir, you are a legend 🔥🔥🔥

  • @classmanOfficial
    @classmanOfficial Жыл бұрын

    Selenium has a headless mode :) if you guys want to try it out, works well enough for multithreading

  • @AbuBakar-pc2fp
    @AbuBakar-pc2fp Жыл бұрын

    Awesome Explanation

  • @Victor4X
    @Victor4X Жыл бұрын

    Stuff isn't censored properly at 3:00 But I assume those creds are temporary anyway

  • @cymaked

    @cymaked

    Жыл бұрын

    theres many videos on Fireship where he jokes about living dangerously and letting the cred be seen 😂 obv temp stuff

  • @thie9781

    @thie9781

    Жыл бұрын

    ​@@cymaked or just F12 to let somebody waste their time

  • @danvilela
    @danvilela Жыл бұрын

    Brooo, this is awesome!

  • @felixmildon690
    @felixmildon690 Жыл бұрын

    Best video yet thanks fireship. This will introduce me to puppeteer and the services BrightData offers (BrightDatas prices are a concern based on the comments section)

  • @NathanDodson
    @NathanDodson Жыл бұрын

    See. This is why I watch all your videos, Jeff. I'm a super shit JS coder, but I'm pretty decent with Python. This gives me an idea for my own eBay business, and scouring those tool docs for Python SDKs to do the same thing. Honestly, it's been your videos that have kept me in the coding space. You always have these creative "concept/idea" videos and a good majority of them have me opening up VSC to do some tinkering. Thanks for all your content brother.

  • @priapulida

    @priapulida

    Жыл бұрын

    there's Pyppeteer

  • @maskettaman1488

    @maskettaman1488

    Жыл бұрын

    @BeBop No, it's Pyppeteer

  • @minhuang8848

    @minhuang8848

    Жыл бұрын

    @@bebop355 *pyppeteer tho

  • @JGBreton

    @JGBreton

    Жыл бұрын

    did this materialize?

  • @tonymudau3005

    @tonymudau3005

    Жыл бұрын

    ​@@JGBretonlmao 😂 asking myself the same thing

  • @robertwitzke6134
    @robertwitzke6134 Жыл бұрын

    great video!

  • @Kevgas
    @Kevgas Жыл бұрын

    You should create a course on how to do this, Id pay for that!

  • @exploringcrypto6609
    @exploringcrypto6609 Жыл бұрын

    Jeff how can you process data so fast?

  • @maxivy
    @maxivy Жыл бұрын

    Awesome video - I will have to rewrite it in Python though ;) because I am a human bean

  • @NicolaiWeitkemper

    @NicolaiWeitkemper

    Жыл бұрын

    BeautifulSoup is better anyways :P

  • @priapulida

    @priapulida

    Жыл бұрын

    @@danielsan901998 or Pyppeteer

  • @NicolaiWeitkemper

    @NicolaiWeitkemper

    Жыл бұрын

    @@danielsan901998 Correct, that's not an even comparison. However: BeautifulSoup >> Cheerio

  • @aseluxestays
    @aseluxestays Жыл бұрын

    I'm here because I need to hire someone who can provide this service for me. Great video!

  • @TheHassoun9

    @TheHassoun9

    6 ай бұрын

    Hi I'm willing to help# I'm a dev looking for commission

  • @KhaledAlMola
    @KhaledAlMola11 ай бұрын

    That is a cool website to use. I'll try it one day

  • @daniamaya
    @daniamaya Жыл бұрын

    Gold. Just pure gold.

  • @nichtolarchotolok
    @nichtolarchotolok Жыл бұрын

    Been using puppeteer for a few yrs for freelance web scraping. Puppeteer and Playwright have been a saving grace in many circumstances.

  • @donirahmatiana8675

    @donirahmatiana8675

    11 ай бұрын

    could you give some tips to not getting ip banned?

  • @nichtolarchotolok

    @nichtolarchotolok

    11 ай бұрын

    @@donirahmatiana8675 puppeteer-extra library and the puppeteer-extra-stealth plugin. If that doesnt work, you'd need rotating proxy like that of bright data as mentioned in the video.

  • @jacekpaczos3012

    @jacekpaczos3012

    6 ай бұрын

    @@nichtolarchotolok are you not using scrapy? I always thought of scrapy as the most convenient solution.

  • @nichtolarchotolok

    @nichtolarchotolok

    6 ай бұрын

    @@jacekpaczos3012 I started off on the nodejs route and havent had the need to try the python way of doing this. I do remember trying scrapy in my early days but for some reason puppeteer felt more intuitive to me. That is probably because I felt more comfortable writing javascript code.

  • @kellymcdonald7095

    @kellymcdonald7095

    2 күн бұрын

    I just saw a comment above saying clients request a web scrapping tool but if it's not legal to scrape the website then how do you take up freelance web scraping what if the guy uses the data and the company ur scrapping from finds out about it? will you not be in trouble or how does this work ?

  • @chaseclingman
    @chaseclingman Жыл бұрын

    I liked how you showed the timeout as 2 * 60 * 1000 so beginner friendly haha

  • @mrgalaxy396

    @mrgalaxy396

    Жыл бұрын

    I mean that's way more readable than 1200000, this is a pretty common practice

  • @kevinbraga9526
    @kevinbraga9526 Жыл бұрын

    Great video, i have a question for you, how do you know that this is the industry standard for modern web scraping? Like how can you find out this information.

  • @estebancordoba555
    @estebancordoba555 Жыл бұрын

    In my country, some products are more expensive than amazon, I built a scrapper to get the products and price with params as the brand or names but amazon blocked me couple of times, this si really nice solution!

  • @calmgee
    @calmgee9 ай бұрын

    This was gold

  • @forbiddenera
    @forbiddenera Жыл бұрын

    Puppeteer is the source of non stop memory leak nightmares for me. Fortunately I got it down to under like 30mb a day but originally it was like 30mb per leak and like 250+mb a day leaked (and it was mostly only loading 2 pages back and forth)

  • @alejandroarango8227

    @alejandroarango8227

    Жыл бұрын

    I avoid using it to the maximum, it is a waste of server resources.

  • @andy12379

    @andy12379

    Жыл бұрын

    You could just close the browser and open a new one every time you use it to avoid memory leaks

  • @TPAKTOPsp
    @TPAKTOPsp Жыл бұрын

    Any reason why you have used puppeteer over playwright? I see bright data has support for both.

  • @kalelsoffspring
    @kalelsoffspring Жыл бұрын

    Presumably this can be used to DDoS as well, do you know if there are any protections in place or how blame is handled if someone does cause something like that? Like, Amazon will start giving 403s, does it automatically get a fresh clean IP? Those aren't infinite so I'm curious if you'd be charged for going through to many IPs at a particular service

  • @xetera

    @xetera

    Жыл бұрын

    bright data is insanely expensive so that's the protection against DDoS lol. You'll run out of money before you even have the chance to send enough traffic to cause a problem

  • @katykarry2495
    @katykarry2495 Жыл бұрын

    can you share the code in the description? for us to test it and edit it to our own needs? loving your videos!

  • @rid9
    @rid9 Жыл бұрын

    This feels like the kind of programming work a ferengi would be involved with.

  • @SpencerDwight
    @SpencerDwight Жыл бұрын

    Would it be possible to scrape base file types from a website to access their asset? For example; there's a T-shirt image that I want to save, but I can only save as a .avif file. Ideally, I'd be able to access the underlying file type (png/jpg) and save it in full resolution. If anyone has any feedback regarding if advanced web scraping can extract this, please lmk.

  • @manfredcomplex366
    @manfredcomplex366 Жыл бұрын

    Freaking Money Glitch. Love you man❤

  • @kevinbatdorf
    @kevinbatdorf Жыл бұрын

    some of those query selectors look like they’d break in a week. Maybe you need to add openai to the workflow more directly

  • @RichardHarlos

    @RichardHarlos

    Жыл бұрын

    It's a proof of concept/tutorial, not an explicit recommendation for bulletproof boilerplate. Context, eh? :)

  • @yellowboat8773

    @yellowboat8773

    Жыл бұрын

    Maybe outputting the html every time to openai then having that pick the query selector then insert into the script. Do have to be very specific with your prompt because it often replies with: The query selector is: a.carousel

  • @ehsanpo
    @ehsanpo Жыл бұрын

    web scraping with ruby and rails is one of the best ways

  • @hamza-325
    @hamza-325 Жыл бұрын

    I worked for a digital shelf company that scrap the data from Amazon and more websites. They use many proxy services but one of the most expensive ones was BrightData, so the more experienced workers always instructed us to not use BrightData unless it is really necessary.

  • @sciencenerd8326

    @sciencenerd8326

    Жыл бұрын

    what are the others that are better?

  • @hamza-325

    @hamza-325

    Жыл бұрын

    @@sciencenerd8326 the company has made some cheap proxies using the machines of AWS for examples (they don't have many IPs but they do the job for many websites). And I think there are cheaper services like ProxyRack.

  • @fhnvcghj1587

    @fhnvcghj1587

    8 ай бұрын

    ​@@hamza-325I have a task of selenium bot I have 1000 account but need 1 ip for each account to make request to the website and do the work any idea or paid service for that

  • @UmanPC
    @UmanPC Жыл бұрын

    Great!!!

  • @rallysahil
    @rallysahil4 ай бұрын

    Awesome !

  • @kusztelson2947
    @kusztelson2947 Жыл бұрын

    One problem you may face is that class names used for selectors change over time as they are generated every time website is deployed breaking yor code.

  • @assmonkey9202

    @assmonkey9202

    Жыл бұрын

    Reverse engineer algo for generating class names

  • @garywaddell6309
    @garywaddell6309 Жыл бұрын

    Brilliant

  • @gregheth
    @gregheth Жыл бұрын

    Wow. Thanks

  • @progamer1196
    @progamer1196 Жыл бұрын

    as soon as I saw the thumbnail I knew this was an ad for brightdata

  • @Xld3beats
    @Xld3beats Жыл бұрын

    Guess its time to write a program that applies to every job on the internet

  • @rithickchowdhury7116
    @rithickchowdhury7116 Жыл бұрын

    I built a web scrapper a while back using puppeteer. Didn't know anything about automation and ip rotation which is why my application would occasionally fail to access data from amazon as they would block my ip for a certain amout of time..Would like to know how to upscale that application with help of AI.

  • @hermanplatou
    @hermanplatou Жыл бұрын

    Doesn't amazon rotate the classes and ids, effectively breaking your selectors? Not sure how the most advanced RPA bots work, but im hoping that some of them offer a AI that grabs screenshots and parses them instead. Would be interesting with a follow up!

  • @makkusu3866

    @makkusu3866

    Жыл бұрын

    Yea, I think classes should be autogenerated, at least after every deployment if not every request. Fast and dirty solution would be to use openapi sdk to prompt ChatGPT to generate document query code and eval it

  • @trappedcat3615

    @trappedcat3615

    Жыл бұрын

    @@makkusu3866 You can select elements based on attributes or lack of attributes, or you can use pseudo-classes such as :nth-of-type. There are dozens of them.

  • @iljazero

    @iljazero

    Жыл бұрын

    @@trappedcat3615 yea, that is how i wrote scraping for other website, i targeted div elements with style X which often ... doesn't change cuz... why ;D

  • @arthurchazal3064

    @arthurchazal3064

    Жыл бұрын

    Most websites with random ids/classes names still have a common and repetitive structure. Axios + Regex and you'll process ~10 time as much pages as puppeteer, with minimal bandwidth by default and simpler code. Just validate the output with a strict schema (as you always should) and you'll maybe have to update it once a year at most. Puppeteer's only real advantage is TLS fingerprint

  • @cyberzjeh

    @cyberzjeh

    Жыл бұрын

    ​@@arthurchazal3064 you can also use sg like cheerio as a middleground between an entire headless browser, and parsing html with fukken regex (chad move tho ngl)

  • @felixmildon690
    @felixmildon690 Жыл бұрын

    Tutorial starts at 2:15

  • @AnshTiwari-fx2yq

    @AnshTiwari-fx2yq

    5 ай бұрын

    May god bless you

  • @adityag6022
    @adityag6022 Жыл бұрын

    Thank you sir

  • @JustBR0
    @JustBR0 Жыл бұрын

    Bright data is throwing their money!!

  • @oblivion_2852
    @oblivion_2852 Жыл бұрын

    Could we have a vid on the difference between Selenium and Puppeteer?

  • @sebastianacostamolina9593
    @sebastianacostamolina959310 ай бұрын

    really cool

  • @TheLime1
    @TheLime1 Жыл бұрын

    Good money making right there

  • @Dev-Siri
    @Dev-Siri Жыл бұрын

    just as I thought the ai videos ended

  • @3rawkz
    @3rawkz Жыл бұрын

    Scrapy all day baby!

  • @kairee1093
    @kairee1093 Жыл бұрын

    thanks

  • @ozten
    @ozten Жыл бұрын

    Those css selectors look super fragile.

  • @RichardHarlos

    @RichardHarlos

    Жыл бұрын

    It's a proof of concept/tutorial, not an explicit recommendation for bulletproof boilerplate. Context, eh? :)

  • @TheMalcolm_X
    @TheMalcolm_X Жыл бұрын

    This video felt like one giant sponsored ad.

  • @wrathofainz
    @wrathofainz Жыл бұрын

    Puppeteer is cool and all, but what do we do about the websites that become unresponsive when you use a webdriver or open the dev tools? I've run into a few sites that will literally navigate you away if you open the dev tools.

  • @panther_puneeth
    @panther_puneeth Жыл бұрын

    went above head with such fast

  • @Dominic-bj3ls
    @Dominic-bj3ls Жыл бұрын

    What’s the difference betweeen using Selenium, Beautiful soup, and configuring your own IP rotation scripts? Cuz I would love to know.

  • @daniel_q40
    @daniel_q40 Жыл бұрын

    Data is the new gold

  • @hugodsa89
    @hugodsa8911 ай бұрын

    this would be a great use for the new "using" keyword, am I right?

  • @Because_Reasons
    @Because_Reasons Жыл бұрын

    Question. Probably a dumb question, but can something similar be done for KZread? ie scarping the top channels in a given niche and their metrics?

  • @thatsalot3577

    @thatsalot3577

    Жыл бұрын

    I think it would be much easier plus you do get a lot of APIs for KZread

  • @ZeonLP

    @ZeonLP

    Жыл бұрын

    Sure, why not? The only problem is that scraping JS-heavy websites requires some browser emulation which makes it MUCH slower than scraping static sites. Using their API is pretty much required, unless you want to worry about optimizing selenium/puppeteer, bypassing rate-limiting, proxies, running multiple scrapers in parallel etc. Much more effort programmatically and also financially

  • @forbiddenera
    @forbiddenera Жыл бұрын

    ..while Puppeteer can run headless, you don't have to run it headless. It may still seem headless from what most might consider that term to mean but headless or not is a config option for Puppeteer, running with headless disabled can help beat bot detection sometimes.

  • @wlockuz4467
    @wlockuz4467 Жыл бұрын

    Remote browser as a service is actually a genius idea. Often times when you want to scrape at scale the most painful thing to do is hosting and using effective proxies. But with this you can literally leave the scraper running on your machine and let brightdata take care of the proxies. You don't even need good specs because the browser runs on a different server.

  • @quickkcare605

    @quickkcare605

    Жыл бұрын

    Well thought!

  • @klapaucius515

    @klapaucius515

    Жыл бұрын

    smells like ad

  • @wlockuz4467

    @wlockuz4467

    Жыл бұрын

    @@klapaucius515 Do you mean that for my comment or the video?

  • @arrvee7249

    @arrvee7249

    11 ай бұрын

    ikr, then you can just pay brightdata $10,000 and go on to make $52 for the data you've scraped.

  • @petrlaskevic1948
    @petrlaskevic1948 Жыл бұрын

    Do all search engines do it like this? I don't think that a website for searching furniture from my country bothered to talk to each one of the sellers and make an arrangement with them. Or did they?

  • @MrKrzysiek9991
    @MrKrzysiek999111 ай бұрын

    Microbots AI chrome extension helps with building prompt with HTML code included. Chech it out it you want to write automation code faster.

  • @SkySesshomaru
    @SkySesshomaru Жыл бұрын

    o.o that's some impressive shit right there

  • @luxurycondobbmg
    @luxurycondobbmg Жыл бұрын

    I remember my first time scraping a website - except back then, we didn't have ChatGPT proompts to do it for us. We had to physically read the documentation and actually understand the code we wrote

  • @lotfikamel5947
    @lotfikamel5947 Жыл бұрын

    The one thing you did not mention is the hashed css classes used to select the price...etc is going to change frequently by Amazon...so the next time you run the script it will not work

  • @Frickencreepy

    @Frickencreepy

    Жыл бұрын

    I went through, the tut. It worked the first time, afterward exactly that happened. I think this is really important to mention. Any suggestions?