This script I threw together saves me hours.

Ғылым және технология

Finding out the best way to scrape data from a site is time consuming, this script uses selenium wire to view the network requests from a site and give you back a list of urls and json responses.
Proxies: nodemaven.com/?a_aid=JohnWats...
Patreon: / johnwatsonrooney (NEW free tier)
Scraper API www.scrapingbee.com/?fpr=jhnwr
Donations: www.paypal.com/donate/?hosted...
Hosting: Digital Ocean: m.do.co/c/c7c90f161ff6
Gear I use: www.amazon.co.uk/shop/johnwat...

Пікірлер: 69

  • @jessejames3169
    @jessejames316910 ай бұрын

    Love your thought process behind writing this! It makes it easy to follow why you do a certain step, and if it’s necessary for others! Great vids keep it up!

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    Glad it was helpful!

  • @liketheduck
    @liketheduck3 ай бұрын

    Fantastic “apprentice” content. This assumes a basic understand but also pushes the novice forward. I really appreciate it!

  • @DerekMurawsky
    @DerekMurawsky2 ай бұрын

    This is really great, and a great foundation, too. I can see this being extended to support so many things, too.

  • @Extrey
    @Extrey10 ай бұрын

    I didn't even know that selenium can be used like this, thank you very much, great work as always))

  • @sandunwijethunga6787
    @sandunwijethunga678710 ай бұрын

    great video. thank you john❤

  • @TimoTalksTech
    @TimoTalksTech10 ай бұрын

    Amazing, just something I was looking for. Need to look into more if I could fetch all the IPs too

  • @kite759
    @kite75910 ай бұрын

    that's very useful, thank you

  • @jagdish1o1
    @jagdish1o110 ай бұрын

    I used seleniumwire for create a scraping bot. It’s a very good package to grab the backend requests. What i did was using selenium i logged-in than grab the cookies and the backend api ;) than i simply closed the browser and used the python requests lib to make the request to make thing little bit faster. Eventually, i dockerized everything and than i have this container image which i than pushed on aws ecr and run parallel on aws ecs. Pretty amazing.

  • @datacleaningchallenge2029

    @datacleaningchallenge2029

    9 ай бұрын

    impressive, what's your email, need to ask you a question as relate to your code

  • @kocahmet1
    @kocahmet110 ай бұрын

    golden content here

  • @tizianonakamader8177
    @tizianonakamader817710 ай бұрын

    Amazing content thank you

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    Very welcome

  • @pldvs
    @pldvs10 ай бұрын

    "Because. I. Don't. Care..." 😂😂

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    haha

  • @darylhunt9070
    @darylhunt907010 ай бұрын

    good video . Do you capture keys for api in Selium wire as well. As some api use session keys

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    you can grab any headers and cookies yeah

  • @ivanowdenis
    @ivanowdenis10 ай бұрын

    Hello John, could you make a video how to scrape data which a server send trough a websocket connection in live mode?

  • @StonedApe420
    @StonedApe42010 ай бұрын

    Can it make complete copy of requests with url, headers and payload?

  • @zakariaboulouarde4591
    @zakariaboulouarde4591Ай бұрын

    Hello thank you for the amazing video. Wanna ask please how can I bypass 403 forbidden, for cloudflare when I am requesting an Api? Thank you for all your efforts 🙏🏽

  • @user-nj2om2vt8u
    @user-nj2om2vt8u10 ай бұрын

    are you using JetBrains Mono font? If yes, then how it looks so thin?

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    it is yeah, I don't know I didn't do anything other than select that font sorry

  • @satyajeetkumar3993
    @satyajeetkumar399310 ай бұрын

    Hi John!! I really appreciate this new content. I have a query to ask. I was using selenium webdriver in chrome to fetch data from a website. The script is working just fine but after certain iterations, the driver is not working properly or the way it should. I am getting a NoneType error. I tried clearing the cookie and starting a new session and then continue from where I left off but it is still not working. Any suggestions on this?? I really appreciate it!! Thanks!!

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    hard to say but when i get problems like this i always check to see what the direct output from loading the page is, you could be hitting a captcha

  • @satyajeetkumar3993

    @satyajeetkumar3993

    10 ай бұрын

    Actually that new page is loading properly. I didn't check for terminal output but the page is loading. After that when I am looking for an element on the same page which I know is available there, I am getting an error.

  • @AleksT28
    @AleksT2810 ай бұрын

    i was working with selenium / selenium-wire until i could not debug the issue while selenium-wire is not listening the right port where selenium is running while dockerised.

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    that's interesting, i haven't tried dockerising it but i will keep an eye open for issues

  • @maloukemallouke9735
    @maloukemallouke973510 ай бұрын

    thank you, i am wondering if you wine money with this tools ????

  • @mitvpankaj2454
    @mitvpankaj245410 ай бұрын

    Great work bro!! And I have one question also if I want scrape Walmart everytime robot or human pop-up comes so can you please guide me how to Bypass this type of bot detection system? Thanks and love your content because of you i learned python!! 👍

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    Check out undetected chrome driver - there’s some good information for it that might help

  • @mitvpankaj2454

    @mitvpankaj2454

    10 ай бұрын

    I tried bro but still it's showing the same issue if you have any reference or video can you please suggest me it'll be very helpful for me and other also :)

  • @AllifIzzuddin
    @AllifIzzuddin10 ай бұрын

    So this is kinda like playwright network events right?

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    Yes same thing but I found it better to use

  • @iamshiva003
    @iamshiva00310 ай бұрын

    What is the vscode theme and the font used in this video?

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    github dark theme and jet brains mono!

  • @iamshiva003

    @iamshiva003

    10 ай бұрын

    @@JohnWatsonRooney thank you

  • @linuxkerem
    @linuxkerem10 ай бұрын

    Are you using arch linux sir ? And thanks for the content ! 🥰

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    thanks! its actually just ubuntu + i3

  • @linuxkerem

    @linuxkerem

    10 ай бұрын

    ​@@JohnWatsonRooney Wow, I guess my mind went straight to arch when I saw a hyperland style window manager 😁

  • @TheCulpritgamer
    @TheCulpritgamer3 ай бұрын

    can you please share the script that you created for my future reference ??

  • @AhmedThahir2002
    @AhmedThahir200210 ай бұрын

    Hi John! Love your work. Could you share the codes of your videos.

  • @markbennett5626

    @markbennett5626

    10 ай бұрын

    Maybe John has the code available to Patreon members ;)

  • @AhmedThahir2002

    @AhmedThahir2002

    10 ай бұрын

    @@markbennett5626Ohhhhh okay no issues hehe :)

  • @satwikawasthi2002
    @satwikawasthi200210 ай бұрын

    What if api only called when any user action occurs then?

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    the next step to upgrade this would be to run the same but insert clicks on various page links first and check each one

  • @satwikawasthi2002

    @satwikawasthi2002

    10 ай бұрын

    @@JohnWatsonRooney thanks for reply🙏 also most important thing post method api which accept custom keys in its headers or payload, will not give expected response, please make video of this thing for executing it.

  • @throwyourmindat
    @throwyourmindat10 ай бұрын

    Hi Are you aware of self healing selenium scripts? Can you explain the concept of self healing and how is it even possible!? Because we find element on web page using a locator if that element isn't found we get error. How can self healing find that locator. For eg. An element found by //input[@name=email] if not found, can automatically guess the element was updated in next build as //input[@name=mailing-addrress] using self healing approach.. it would be great if you can help us understand that

  • @valoclips2896
    @valoclips289610 ай бұрын

    Nice idea. But I will still prefer to log the requests via Network tab or Burp suite. The chromedriver detection will also kick in for some sites.

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    fair enough, it does have some uses but also limitations as you say.

  • @user-qi2kt8ow5r
    @user-qi2kt8ow5r10 ай бұрын

    Can I bypass hqq.tv devtool blocking using this?

  • @Niuroteya
    @Niuroteya10 ай бұрын

    I don't really get it.. I mean you can filter Network tab by link or a word "api" too if you want to. Plus this solution will not work for everything, but Network tab will. Other than filtering only needed requests this solution doesn't seem to do anything. And yeah, you can do a bit more advanced filtering here, but.. Does this really saving a lot of time for some kind of task? It's just hard to see how for me. Did I miss something? I'm making AJAX scripts dealing with forms for the past year+ and for me it would be absolutely useless.

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    I use it when I am given a URL and want to do some quick checks - saving any JSON output so I can search inside all from my terminal. I chose to semi automate something I was doing regularly is all.

  • @markbennett5626

    @markbennett5626

    10 ай бұрын

    Maybe not for everyone but once scripted including user prompt for url, it'll be quicker than using network tab and much nicer response, plus can see adding the ability for the additional steps of recording session keys and further calls.. Thanks John

  • @spab87
    @spab875 ай бұрын

    Hi, thanks a lot, this was very helpfull to learn. I use contextlib.surpress, its actually faster than try/except and it looks better i think. Your function would look like this: import contextlib for request in driver.requests: with contextlib.suppress(Exception): data = decodesw( request.response.body, request.response.headers.get("Content-Encoding", "identity") ) resp = json.loads(data.decode("UTF-16")) resps.append(resp) return resps

  • @AndyTutify
    @AndyTutify10 ай бұрын

    Are you no longer using neovim?

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    I still use neovim, i decided to use VS Code for video demos as i thought it would include more people

  • @user-tk5ir1hg7l
    @user-tk5ir1hg7l10 ай бұрын

    is this better than pupeteet network events?

  • @JohnWatsonRooney

    @JohnWatsonRooney

    10 ай бұрын

    I have limited experience with pupeteer, i expect it to be the same - although I prefer seelnium-wire to playwright for network events

  • @user-tk5ir1hg7l

    @user-tk5ir1hg7l

    10 ай бұрын

    @@JohnWatsonRooney ok, how about playwright network events, does it have similar functionality or would you still recommend going with seleniumwire

  • @bakasenpaidesu
    @bakasenpaidesu9 ай бұрын

    .

  • @Septumsempra8818
    @Septumsempra881810 ай бұрын

    Anyone else update chrome on their pc and had all their scrapers break?😅

  • @abdelrahmankhaled8239
    @abdelrahmankhaled82392 ай бұрын

    complete noob here just started web scraping for some reason the seleniumwire import is giving me this error import blinker._saferef ModuleNotFoundError: No module named 'blinker._saferef' I've been searching online for help for hours. changed python versions (currently using the same one you're using in the video) nothing seems to work. please help thank you in advance

  • @DudethatGross

    @DudethatGross

    Ай бұрын

    pip install blinker ?

  • @twelfth4927
    @twelfth49273 ай бұрын

    Guys, I'm watching with passion but for what it would be helpful? What are web-scrapers actually doing?

  • @DudethatGross

    @DudethatGross

    Ай бұрын

    Gathering data that would otherwise be difficult to get without a proper API

  • @MasoomNini
    @MasoomNini9 ай бұрын

    Hi John, big fan. Thanks for toturials ❤ I need to contact you on any social media, i need one site scrape help kindly

Келесі