GPT-4 Vision API + Puppeteer = Easy Web Scraping

Ғылым және технология

In today's video I do some experimentation with the new GPT-4 Vision API and try to scrape information from web pages using it.
GitHub: github.com/unconv/gpt4v-browsing
Support: buymeacoffee.com/unconv
Consultations: www.buymeacoffee.com/unconv/e...
Memberships: www.buymeacoffee.com/unconv/m...
00:00 Intro
01:04 Basic usage of GPT-4 Vision API
05:50 Test GPT-4 Vision with image from Unsplash
07:23 Taking a screenshot with Puppeteer
12:35 Test GPT-4 Vision with Wikipedia screenshot
18:14 Test GPT-4 Vision with Google weather info
19:29 Automating URL generation + screenshot taking
33:24 Handling timeouts and retries and making it conversational
44:30 Summarizing BBC news
45:33 Fixing slow loading pages
49:18 Asking for weather information
50:24 Tweaking system message
54:03 Asking for Tesla stock price
56:00 Outro

Пікірлер: 155

@unconv7 ай бұрын
Almost 20K views 😳 Part 2: kzread.info/dash/bejne/goGAyZiLopvMk7g.html
@surya0202
7 ай бұрын
please upload part 2 sir
@artemisfauls
2 ай бұрын
I believe this method can be used to automate certain routine processes, but only if the price of gpt4v is reasonable. For example, you need to send 10,000 screenshots with a resolution of 1920x1080 pixels to gpt4v in 1 day - how much will it cost?🤔🤓
@Lewis647 ай бұрын
Didn’t expect a coding video to be this entertaining. Love the frank display of your thought process.
@arc0life7 ай бұрын
Your tutorial helps with the excitement and anxiety as a fellow dev. I knew I could do this myself but keep procrastinating and eventually some tasks end up as a mental block in WFH mode. Just forcing myself to watch a fella do something like this really helps, thank you!
@dustinsoodak89547 ай бұрын
I love how much of the process of programming he includes in the demo
@Autoscraping6 ай бұрын
A fabulous video that has been of great help in orienting our new collaborators. Your generosity is highly valued!
@mooktakim7 ай бұрын
This guy has superpowers. He can talk and code at the same time!
@PostMeridianLyf7 ай бұрын
Its interesting that this is exactly what I was looking for. Llast night i spent a few hours asking copilot how to implement the same libraries. Thanks for the tutorial
@Salfie0077 ай бұрын
I just wanted to tell you that you are doing great and I really like your format.
@unconv
7 ай бұрын
Thank you very much!
@fuba447 ай бұрын
This was super cool! Don't mind the long format at all. Would love to see you evolve this concept in another video.
@unconv
7 ай бұрын
I've already filmed the next one. It'll definitely be long form 😅
@amritbanerjee
7 ай бұрын
With full page screenshots. Maybe create an assistant which looks at my bookmarks and the tags in there based on my question and tries to get me the info from the page.
@thecount257 ай бұрын
Use the retry library and set a low timeout; you can use a simple decorator. If the timeout needs to be high and this isn't very pleasant, consider running multiple requests concurrently and waiting only on the first result.
@reuna4c37 ай бұрын
This is so cool and nerdy! Maybe the best site to follow and learn more and more on OpenAI API. Difficult but entertaining to follow.
@gmichael55065 ай бұрын
Really appreciate your information and style. Learning much!
@unconv
5 ай бұрын
Thanks for watching!
@robbennett60535 ай бұрын
Seriously impressive. I'm a NodeJS API engineer and you're writing that JS code faster than me!
@unconv
5 ай бұрын
Thanks! Fast doesn't equal good, though 😅
@cutecute91897 ай бұрын
This is awesome. I love your videos. Please keep these videos going specially this one. I learned so much
@unconv
7 ай бұрын
Thank you! More to come :)
@marcoaerlic2576Ай бұрын
Thanks for the video. Great work.
@grant_vine7 ай бұрын
So for cookies you just need to know what cookie is being set, in many cases it’s likely just a matter of causing the same effect in puppeteer, one way is to add to the cookie store directly (I’m sure puppeteer has a way to do this), and an alternative is specifying a “user directory” for puppeteer so you can actually agree to things like cookies, in many ways consent popups are easy to “locate” using standard html locators simply because it is often set to a priority load event and is often a div/container with a name/id containing the word consent or cookie etc, so regex can be used to find these reasonably easy. Use puppeteer to locate the “Ok” button and click it and then having that reusable user directory means you only check for any site if you have or haven’t accepted consent, if not click it if so just scrape it
@albertwang59744 ай бұрын
very interesting, thanks for sharing!
@ScootLogix7 ай бұрын
Great video dude. Im gonna rewatch later. I got a project this might help on.
@edoardogribaldo28705 ай бұрын
Crazy good content! Thank you!
@gaming_for_sanity7 ай бұрын
Legend has it, he’s still trying to find out what the weather is like in Alaska…
@digitalcivilulydighed7 ай бұрын
I'd like to see a video from you about navigating websites with Puppeteer. Now that you ask, I'd like a tutorial on how it follows links, fills out data, crawls four or more links deep into a website, how to handle session cookies, automate and run loops, etc. :-)
@mysticminds11267 ай бұрын
I appreciate your efforts mannn...
@mt4u83216 күн бұрын
Very clever. Congratulation
@Laowater5 ай бұрын
a Master in the Arts of coding!
@pourkin7 ай бұрын
Excellent Job
@gianmarcoferrara33977 ай бұрын
You should try the JSON response mode. You can request to return a response like that in the system promp: {data: ExpectedDataInterface, error: ErrorInterface | null}. Good luck!
@dreamphoenix7 ай бұрын
Thank you.
@nathanl65986 ай бұрын
No typescript and no copilot? This was a more wholesome time.
@guitaripod7 ай бұрын
Cool video
@billybofh23637 ай бұрын
A little speed up might be to use the python requests package to try and fetch the url first before running puppeteer - then short-circuit invalid domains, 404's etc? Also, when doing a completion you can pass `request_timeout=10` or whatever and it'll kill the call. Sometimes even works.... ;-)
@unconv
7 ай бұрын
Thanks, I'll try that. Yeah, you can set the request_timeout, but you still have to handle the error my having some recursive function that retries the request if it fails. And I don't have time to implement that. It would take like a minute, lol
@billybofh2363
7 ай бұрын
I replied to this and youtube removed it (I think!) - but the python package 'tenacity' (or the original retry) is worth a look (I'll skip the url as I think that's what made youtube remove/hide my comment)
@chromashift5 ай бұрын
BWAHAHAHAHA! the struggle (programming: errors = WTF!!!!) is real. day in the life of code building...Awesome video!
@EduardsRuzga7 ай бұрын
Great video! Interesting experiments with the GPT Vision API and Puppeteer. I have a couple of questions and a suggestion: 1. Could you share some insights on the cost aspect of using the GPT Vision API for this project? I'm curious about the pricing and whether it's feasible. Also, have you considered combining classical web scraping methods with the Vision API in a synergistic way? Specifically, using traditional scraping to gather initial data and then employing the Vision API to verify or correct this data where needed. I think this could potentially address some of the limitations of both methods. What are your thoughts on this approach?Looking forward to hearing your thoughts!
@unconv
7 ай бұрын
Thanks! On the day I filmed the video, my API costs were $0.58. The next day I maxed out the limit of 100 messages of the gpt-4-vision-preview while testing and the total cost for that day was $2.15. These costs include some other API calls as well, though. Combining classical web scraping and Vision API seems like a good idea. I'll have to look into that when I run into an issue scraping something.
@louisbertson5 ай бұрын
A good way is to include in user role message a timestamp. It will help him calculate the age of SAM Altaman easily!
@unconv
5 ай бұрын
Yes, but only because he knows his birthday already (even without the Wikipedia screenshot)
@Y3llowMustang5 ай бұрын
I wouldn't call this easy web scraping, but this was very hilarious with all the bugs
@iceshoqer5 ай бұрын
Chain of thought is actually meant to be used for mostly information accuracy, not for fixing what you could do in a proper single prompt.
@yoyartube7 ай бұрын
Great Video! Can these libraries handle auth like azure oauth flow in order to browse to the page?
@alon71107 ай бұрын
Thank you for this helpfull video! can you please try the same task with the functions tool? Thanks!
@AlfonsoMenkel7 ай бұрын
Grate video, at last I see on YT someone that struggles with the API as I do… I know the topic of the video is to use the vision api, but you cold get better results using a terminal web browser like lynx , piping the result to a Tex file and asking ChatGPT with that text as context. Just an idea. 😉
@unconv
7 ай бұрын
I was gonna dismiss your suggestion by saying one does not simply use Lynx in 2023 since it doesn't support JavaScript, which many websites require nowadays. But testing it out just now, all the examples I showed in this video could have worked with Lynx (based on its output). I don't know how I would extract links and input fields with Lynx, though, to make it crawl subpages. Perhaps all those pages were server side rendered, so I might as well have used Curl.
@ntgCleaner5 ай бұрын
I'm only up to 15:00 but the issue you had up at this point is that it CAN read sam altman's birthdate, but it doesn't know what the date is today. You can feed it the date in your response generated with `date()` or whatever.
@zeta_meow_meow5 ай бұрын
just kisses for you , so freaakin loved how you explained and debugged along us
@sandermol93697 ай бұрын
For getting Sam Altman's age, would it help if you stated that the screenshot is taken today? ChatGPT may be hesitant to assume this.
@silva82157 ай бұрын
Couldn't you use backoff to handle the error when the API is stuck?
@andrejuntermanns76607 ай бұрын
I dont get the plus in funcionality compared to google in this demo. Help me out.
@TarasKim6 ай бұрын
If you add something like "Strictly based on the information from screeshot" you get information based on the information he gets from screenshot.
@mohamedbasueny94767 ай бұрын
i was wondering how this is different from the web-search capapblilty of chatgpt-plus right now . in other words , if i asked gpt to look for an answer on the web will it struggle to do so ? , is this a hack way to use a better websearch via an api like method because it's not enabled yet in the openai dev tools . any way i really like the video , can we use selenuim to do so also ?
@xsploit7 ай бұрын
also i think this better suuited for assistants api. i made a private investigator that uses functions. one is serper api and if it finds a linkedin page crawls and de html it and send to get summzairzed with the link snippets ,then the other function is getting details on a image url you asked it to veiw using gpt 4 vision and i could make those functions paralell
@bogdanbogdan5276
7 ай бұрын
Could you share more details, I'm trying to build similar functionality
@bkentffichter7 ай бұрын
I made a drinking game out of the word Alaska. I died.
@evanlovett35535 ай бұрын
What is the weather like in Alaska?
@Tyfeen6 ай бұрын
what is the weather like in alaska?
@RonivaldoPassosSampaio7 ай бұрын
Nice content, but you should just copy paste the code, we know you can code well behind the scenes, don't worry. Keep doing great!
@murch50547 ай бұрын
I see that 0420 there... in 00:31:50 : )
@LearnCode_withAI7 ай бұрын
In package.json yku can set type : module
@alexeygrom18347 ай бұрын
"In Alaska's land, where coders seek the weather's tale, They type and query, 'neath the aurora's bright veil. With every line of code, they ask the sky's mood, Hoping for sunshine, but prepared for the cold and brood."
@PDragonLabs4 ай бұрын
👍
@AuditorsUnited5 ай бұрын
im not going to watch an hour of you coding but i will share that you can get a image of each element and selenium would problem be a good choice to use in this
@virdvird7 ай бұрын
Make screenshot (do not close puppeteer session) and ask chatGPT is page looks loaded or not instead of relying on networkidle0, timeout, etc
@waneyvin7 ай бұрын
is it possible to use selenium? at least it is python, you don't need to switch between 2 language.
@unconv
7 ай бұрын
Yes, it should work too. I just have more experience with Puppeteer (never tried Selenium)
@8COOL66 ай бұрын
but the token authorization for use gpt-4 preview where is ?
@HolyG2k67 ай бұрын
"Hopefully this is not a Malware" :D :D
@eyoo3697 ай бұрын
The Vision API downsamples the image.. thats why it cannot recognise small fonts.
@TonyS17 ай бұрын
What is the current weather in the world?
@terenceundbud6 ай бұрын
so you need to use gpt 3.5 turbo to get exact answers ijnstead of gpt-4? weird.
@erikaszvicevicius91917 ай бұрын
First thank You. And question - how much token used this scraping method?
@unconv
7 ай бұрын
I haven't checked exactly but it seems to be around $0.017 per scrape based on my API usage during building this
@TheChrisSoria5 ай бұрын
I still don’t have access to the vision API : (
@OBRosewell5 ай бұрын
hey man! I would need something like this posted onto a server of some sort, like AWS or Heroku. is that possible if i build this & deploy it? i need it to scale up for 1000 requests daily
@unconv
5 ай бұрын
A lot of websites will block requests from AWS servers, so you would probably need some sort of proxy server in between.
@uncleJuancho6 ай бұрын
Great video! However, I noticed a few instances where you mentioned not having prior experience with certain tasks, but then you later showcased projects where the code was already complete. For example, at 9:29 in the video. This seems a bit contradictory and might confuse some viewers
@unconv
6 ай бұрын
Thanks! Which tasks did I say I didn't have prior experience with?
@uncleJuancho
6 ай бұрын
@unconv, this is my first time viewing a video on your channel. I observed that you started by looking through the documentation as if it was new to you, despite already having the answer in another file. This struck me as unusual, but I understand it might have been part of your process. When the documentation didn't seem to help, you referred to your existing project. I don't mean this in a negative way; it's just my personal observation from watching this video for the first time
@unconv
6 ай бұрын
I've used Puppeteer multiple times in the past, but I never remember the boilerplate stuff. I didn't want to jump directly to my own previously written code, because I want to do things from scratch in my videos, not leaving out any steps. And I want to show how I go about researching stuff. But I get that it might have been confusing - although I suspect even more confusing if I directly copy pasted my old code.
@joebazooks7 ай бұрын
I believe it’s not telling you his age because it is trying to provide you with a precise age i.e. his current age, given his birth date. Don’t ask what his age is, but what age the page or author of the page says he is
@PolinomPolynets7 ай бұрын
Is there a reason you don't use copilot?
@unconv
7 ай бұрын
It often guides me to directions I don't want to go. Also, I'm still learning Python so I'd rather practice my memorization
@TeleV77_media17 күн бұрын
also i had checkout a patreon chat ( paid ). but now i am just unable to find it? it is gone?+
@unconv
17 күн бұрын
I'm not on Patreon but I'm on BuyMeACoffee and you can find a link in the description
@TeleV77_media
17 күн бұрын
@@unconv thankyou for the good job. i am improving and using it. there are some pieces that doens work up to today and fixed them
@ddsmax7 ай бұрын
You're already in javascript for puppeteer. Why do the gymnastics of writing your main logic in python?
@unconv
7 ай бұрын
That's a good point and in the next video I in fact switch to JavaScript only. I prefer Python, though
@TheBeefiestable7 ай бұрын
If humans didnt all re-invent the wheel every hour, there would be a huge database of every query : response : list of problems : links to solutions if they ever figured it out , that would save humanity unlimited man hours... but probably put openai out of business
@thr0w4075 ай бұрын
The llm was wrong about what the light on the motorcycle means, since the headlight is ALWAYS on. A simple but important mistake.
@xsploit7 ай бұрын
i would just using the scraping way and dehtml it. ive never seen seen someone with so much problems calling api
@avi72787 ай бұрын
great video, just one suggestion, the repetition of what you're typing literally every time is a bit much.
@unconv
7 ай бұрын
Thanks! I'll try to avoid that in the future (and mistakes leading to repetition in general)
@chameeragamage15266 ай бұрын
Want more
@kamalkamals6 ай бұрын
14:50 i don't think that's a good idea because u will lose a lot of tokens (input, output), so it s better to use scrapping urls with vector store
@yolamontalvan95027 ай бұрын
What is 4 Vision API?
@theoriginalrecycler7 ай бұрын
Remove the word Like, ask what is the weather in Alaska. The question you ask leads to an answer such as “colder than a commercial freezer”.
@unconv
7 ай бұрын
Good point 😂
@markw76097 ай бұрын
Can this work for Instagram scraping ?
@denzelcanvas5223
7 ай бұрын
no. instagram doesnt allow repetitive actions.
@MrCaovang7 ай бұрын
Thank you for the Video. But the way you re-typing the question (instead of copy and paste it) make me frustrated 😖
@unconv
7 ай бұрын
Sorry about that 😄
@Alternativetips7 ай бұрын
gpt4 vision api limits ?
@unconv
7 ай бұрын
100 requests per day
@la61887 ай бұрын
Why not use everything in js? So confusing
@sniegu847 ай бұрын
to have productive programming ai has to return what you want in 100% cases. it has to be better than human in deduction.
@User_17955 ай бұрын
This could be the best kodi addon ever
@cafeta7 ай бұрын
Why aren't you using the AI to help you code?🤔🤷
@Bartskol
7 ай бұрын
I think that he wants to explain the code to us by writing. I use ChatGPT to write code as I'm not a programmer myself, but I find myself learning to code anyway because I still need to understand what I actually need. It's also tiring to pass every small error to chat; it's easier to make adjustments yourself. However, to do that, you need to understand the code at some level.
@unconv
7 ай бұрын
I actually have Copilot but usually I disable it because it often guides me to directions I don't want to go. Especially when making videos, if Copilot suggests a different way than I was going to go, I get distracted. And I'm still learning Python, so I want to actually learn it. If I always use Copilot, I can get the job done but I probably won't memorize the syntax.
@-Jason-L
7 ай бұрын
@@unconvI think he meant let chatgpt generate the entire code, not copilot.
@yungjerky
5 ай бұрын
Because fully AI generated code is unusable
@AIPulse118
4 ай бұрын
@@yungjerkynot anymore it isn't. Never used Grimoire?
@aviralpatel24435 ай бұрын
bro sounds like an AI. Good video tho
@HaseebHeaven6 ай бұрын
Why you mixed Python + JS i dont see an requirement you could single programming language, Java script, or Python, and simply executed the same task with the single project
@Flameandfireclan7 ай бұрын
Instant fork, all your code belong to us
@nitestrykerx017 ай бұрын
Seems very inefficient to do it that way, yes it’s and interesting concept but you can do it all in Python and your logic can be simplified to get results
@MrDouglax7 ай бұрын
it's an AI speaking?
@mnageh-bo1mm7 ай бұрын
aaaaa this was frustrating as hell
@_nom_7 ай бұрын
It's not hard to make a scraper. In fact you probably only need to use a http request, not a full on instance of chrome.
@user-uw7st6vn1z5 ай бұрын
you would bankrupt if you use gp4 vision api scrap web.... just link your credit card and start scraping
@qasurfer7 ай бұрын
coding 😛
@hidroman19937 ай бұрын
Using the seed as if it was a hyperparameter shows how little you know about the stuff you're talking about, congrats!
@unconv
7 ай бұрын
I mean, if you know more about it than me, you could maybe explain further or link to some more information about the subject
@ZweiBein
7 ай бұрын
@hidroman1993 What a stupid reply, guide him at least if you know better...
@mibaatwork7 ай бұрын
It is intolerable how badly you prepared for the video. You can't teach people like that.
@unconv
7 ай бұрын
This isn't Unconventional Teaching
@alqods80
7 ай бұрын
It is a more natural way as a developer, it is much better that way, learnt debugging
@noahgottesla3439
7 ай бұрын
This is definitely the practical way to watch and learn. I like your style. You are showing the humanity of future coding
@itheenigma
7 ай бұрын
I love this approach - similar to how good developers actually code. Keep it up unconv
@JT-Works
7 ай бұрын
Meh, he is teaching how to troubleshoot. If you want direct directions just read the API documentation.
@foxdog93327 ай бұрын
where do you put the openai key? I can't find anywhere to put it tried searching. Getting a billing not active error.
@unconv
7 ай бұрын
It grabs it from the OPENAI_API_KEY environment variable. You can set it on Linux by running "export OPENAI_API_KEY=YOUR_API_KEY" and if you're on Windows, I believe you can use "setx" or "set" instead of "export"