GPT-4 Vision API + Puppeteer = Easy Web Scraping
Ғылым және технология
In today's video I do some experimentation with the new GPT-4 Vision API and try to scrape information from web pages using it.
GitHub: github.com/unconv/gpt4v-browsing
Support: buymeacoffee.com/unconv
Consultations: www.buymeacoffee.com/unconv/e...
Memberships: www.buymeacoffee.com/unconv/m...
00:00 Intro
01:04 Basic usage of GPT-4 Vision API
05:50 Test GPT-4 Vision with image from Unsplash
07:23 Taking a screenshot with Puppeteer
12:35 Test GPT-4 Vision with Wikipedia screenshot
18:14 Test GPT-4 Vision with Google weather info
19:29 Automating URL generation + screenshot taking
33:24 Handling timeouts and retries and making it conversational
44:30 Summarizing BBC news
45:33 Fixing slow loading pages
49:18 Asking for weather information
50:24 Tweaking system message
54:03 Asking for Tesla stock price
56:00 Outro
Пікірлер: 155
Almost 20K views 😳 Part 2: kzread.info/dash/bejne/goGAyZiLopvMk7g.html
@surya0202
7 ай бұрын
please upload part 2 sir
@artemisfauls
2 ай бұрын
I believe this method can be used to automate certain routine processes, but only if the price of gpt4v is reasonable. For example, you need to send 10,000 screenshots with a resolution of 1920x1080 pixels to gpt4v in 1 day - how much will it cost?🤔🤓
Didn’t expect a coding video to be this entertaining. Love the frank display of your thought process.
Your tutorial helps with the excitement and anxiety as a fellow dev. I knew I could do this myself but keep procrastinating and eventually some tasks end up as a mental block in WFH mode. Just forcing myself to watch a fella do something like this really helps, thank you!
I love how much of the process of programming he includes in the demo
A fabulous video that has been of great help in orienting our new collaborators. Your generosity is highly valued!
This guy has superpowers. He can talk and code at the same time!
Its interesting that this is exactly what I was looking for. Llast night i spent a few hours asking copilot how to implement the same libraries. Thanks for the tutorial
I just wanted to tell you that you are doing great and I really like your format.
@unconv
7 ай бұрын
Thank you very much!
This was super cool! Don't mind the long format at all. Would love to see you evolve this concept in another video.
@unconv
7 ай бұрын
I've already filmed the next one. It'll definitely be long form 😅
@amritbanerjee
7 ай бұрын
With full page screenshots. Maybe create an assistant which looks at my bookmarks and the tags in there based on my question and tries to get me the info from the page.
Use the retry library and set a low timeout; you can use a simple decorator. If the timeout needs to be high and this isn't very pleasant, consider running multiple requests concurrently and waiting only on the first result.
This is so cool and nerdy! Maybe the best site to follow and learn more and more on OpenAI API. Difficult but entertaining to follow.
Really appreciate your information and style. Learning much!
@unconv
5 ай бұрын
Thanks for watching!
Seriously impressive. I'm a NodeJS API engineer and you're writing that JS code faster than me!
@unconv
5 ай бұрын
Thanks! Fast doesn't equal good, though 😅
This is awesome. I love your videos. Please keep these videos going specially this one. I learned so much
@unconv
7 ай бұрын
Thank you! More to come :)
Thanks for the video. Great work.
So for cookies you just need to know what cookie is being set, in many cases it’s likely just a matter of causing the same effect in puppeteer, one way is to add to the cookie store directly (I’m sure puppeteer has a way to do this), and an alternative is specifying a “user directory” for puppeteer so you can actually agree to things like cookies, in many ways consent popups are easy to “locate” using standard html locators simply because it is often set to a priority load event and is often a div/container with a name/id containing the word consent or cookie etc, so regex can be used to find these reasonably easy. Use puppeteer to locate the “Ok” button and click it and then having that reusable user directory means you only check for any site if you have or haven’t accepted consent, if not click it if so just scrape it
very interesting, thanks for sharing!
Great video dude. Im gonna rewatch later. I got a project this might help on.
Crazy good content! Thank you!
Legend has it, he’s still trying to find out what the weather is like in Alaska…
I'd like to see a video from you about navigating websites with Puppeteer. Now that you ask, I'd like a tutorial on how it follows links, fills out data, crawls four or more links deep into a website, how to handle session cookies, automate and run loops, etc. :-)
I appreciate your efforts mannn...
Very clever. Congratulation
a Master in the Arts of coding!
Excellent Job
You should try the JSON response mode. You can request to return a response like that in the system promp: {data: ExpectedDataInterface, error: ErrorInterface | null}. Good luck!
Thank you.
No typescript and no copilot? This was a more wholesome time.
Cool video
A little speed up might be to use the python requests package to try and fetch the url first before running puppeteer - then short-circuit invalid domains, 404's etc? Also, when doing a completion you can pass `request_timeout=10` or whatever and it'll kill the call. Sometimes even works.... ;-)
@unconv
7 ай бұрын
Thanks, I'll try that. Yeah, you can set the request_timeout, but you still have to handle the error my having some recursive function that retries the request if it fails. And I don't have time to implement that. It would take like a minute, lol
@billybofh2363
7 ай бұрын
I replied to this and youtube removed it (I think!) - but the python package 'tenacity' (or the original retry) is worth a look (I'll skip the url as I think that's what made youtube remove/hide my comment)
BWAHAHAHAHA! the struggle (programming: errors = WTF!!!!) is real. day in the life of code building...Awesome video!
Great video! Interesting experiments with the GPT Vision API and Puppeteer. I have a couple of questions and a suggestion: 1. Could you share some insights on the cost aspect of using the GPT Vision API for this project? I'm curious about the pricing and whether it's feasible. Also, have you considered combining classical web scraping methods with the Vision API in a synergistic way? Specifically, using traditional scraping to gather initial data and then employing the Vision API to verify or correct this data where needed. I think this could potentially address some of the limitations of both methods. What are your thoughts on this approach?Looking forward to hearing your thoughts!
@unconv
7 ай бұрын
Thanks! On the day I filmed the video, my API costs were $0.58. The next day I maxed out the limit of 100 messages of the gpt-4-vision-preview while testing and the total cost for that day was $2.15. These costs include some other API calls as well, though. Combining classical web scraping and Vision API seems like a good idea. I'll have to look into that when I run into an issue scraping something.
A good way is to include in user role message a timestamp. It will help him calculate the age of SAM Altaman easily!
@unconv
5 ай бұрын
Yes, but only because he knows his birthday already (even without the Wikipedia screenshot)
I wouldn't call this easy web scraping, but this was very hilarious with all the bugs
Chain of thought is actually meant to be used for mostly information accuracy, not for fixing what you could do in a proper single prompt.
Great Video! Can these libraries handle auth like azure oauth flow in order to browse to the page?
Thank you for this helpfull video! can you please try the same task with the functions tool? Thanks!
Grate video, at last I see on YT someone that struggles with the API as I do… I know the topic of the video is to use the vision api, but you cold get better results using a terminal web browser like lynx , piping the result to a Tex file and asking ChatGPT with that text as context. Just an idea. 😉
@unconv
7 ай бұрын
I was gonna dismiss your suggestion by saying one does not simply use Lynx in 2023 since it doesn't support JavaScript, which many websites require nowadays. But testing it out just now, all the examples I showed in this video could have worked with Lynx (based on its output). I don't know how I would extract links and input fields with Lynx, though, to make it crawl subpages. Perhaps all those pages were server side rendered, so I might as well have used Curl.
I'm only up to 15:00 but the issue you had up at this point is that it CAN read sam altman's birthdate, but it doesn't know what the date is today. You can feed it the date in your response generated with `date()` or whatever.
just kisses for you , so freaakin loved how you explained and debugged along us
For getting Sam Altman's age, would it help if you stated that the screenshot is taken today? ChatGPT may be hesitant to assume this.
Couldn't you use backoff to handle the error when the API is stuck?
I dont get the plus in funcionality compared to google in this demo. Help me out.
If you add something like "Strictly based on the information from screeshot" you get information based on the information he gets from screenshot.
i was wondering how this is different from the web-search capapblilty of chatgpt-plus right now . in other words , if i asked gpt to look for an answer on the web will it struggle to do so ? , is this a hack way to use a better websearch via an api like method because it's not enabled yet in the openai dev tools . any way i really like the video , can we use selenuim to do so also ?
also i think this better suuited for assistants api. i made a private investigator that uses functions. one is serper api and if it finds a linkedin page crawls and de html it and send to get summzairzed with the link snippets ,then the other function is getting details on a image url you asked it to veiw using gpt 4 vision and i could make those functions paralell
@bogdanbogdan5276
7 ай бұрын
Could you share more details, I'm trying to build similar functionality
I made a drinking game out of the word Alaska. I died.
What is the weather like in Alaska?
what is the weather like in alaska?
Nice content, but you should just copy paste the code, we know you can code well behind the scenes, don't worry. Keep doing great!
I see that 0420 there... in 00:31:50 : )
In package.json yku can set type : module
"In Alaska's land, where coders seek the weather's tale, They type and query, 'neath the aurora's bright veil. With every line of code, they ask the sky's mood, Hoping for sunshine, but prepared for the cold and brood."
👍
im not going to watch an hour of you coding but i will share that you can get a image of each element and selenium would problem be a good choice to use in this
Make screenshot (do not close puppeteer session) and ask chatGPT is page looks loaded or not instead of relying on networkidle0, timeout, etc
is it possible to use selenium? at least it is python, you don't need to switch between 2 language.
@unconv
7 ай бұрын
Yes, it should work too. I just have more experience with Puppeteer (never tried Selenium)
but the token authorization for use gpt-4 preview where is ?
"Hopefully this is not a Malware" :D :D
The Vision API downsamples the image.. thats why it cannot recognise small fonts.
What is the current weather in the world?
so you need to use gpt 3.5 turbo to get exact answers ijnstead of gpt-4? weird.
First thank You. And question - how much token used this scraping method?
@unconv
7 ай бұрын
I haven't checked exactly but it seems to be around $0.017 per scrape based on my API usage during building this
I still don’t have access to the vision API : (
hey man! I would need something like this posted onto a server of some sort, like AWS or Heroku. is that possible if i build this & deploy it? i need it to scale up for 1000 requests daily
@unconv
5 ай бұрын
A lot of websites will block requests from AWS servers, so you would probably need some sort of proxy server in between.
Great video! However, I noticed a few instances where you mentioned not having prior experience with certain tasks, but then you later showcased projects where the code was already complete. For example, at 9:29 in the video. This seems a bit contradictory and might confuse some viewers
@unconv
6 ай бұрын
Thanks! Which tasks did I say I didn't have prior experience with?
@uncleJuancho
6 ай бұрын
@unconv, this is my first time viewing a video on your channel. I observed that you started by looking through the documentation as if it was new to you, despite already having the answer in another file. This struck me as unusual, but I understand it might have been part of your process. When the documentation didn't seem to help, you referred to your existing project. I don't mean this in a negative way; it's just my personal observation from watching this video for the first time
@unconv
6 ай бұрын
I've used Puppeteer multiple times in the past, but I never remember the boilerplate stuff. I didn't want to jump directly to my own previously written code, because I want to do things from scratch in my videos, not leaving out any steps. And I want to show how I go about researching stuff. But I get that it might have been confusing - although I suspect even more confusing if I directly copy pasted my old code.
I believe it’s not telling you his age because it is trying to provide you with a precise age i.e. his current age, given his birth date. Don’t ask what his age is, but what age the page or author of the page says he is
Is there a reason you don't use copilot?
@unconv
7 ай бұрын
It often guides me to directions I don't want to go. Also, I'm still learning Python so I'd rather practice my memorization
also i had checkout a patreon chat ( paid ). but now i am just unable to find it? it is gone?+
@unconv
17 күн бұрын
I'm not on Patreon but I'm on BuyMeACoffee and you can find a link in the description
@TeleV77_media
17 күн бұрын
@@unconv thankyou for the good job. i am improving and using it. there are some pieces that doens work up to today and fixed them
You're already in javascript for puppeteer. Why do the gymnastics of writing your main logic in python?
@unconv
7 ай бұрын
That's a good point and in the next video I in fact switch to JavaScript only. I prefer Python, though
If humans didnt all re-invent the wheel every hour, there would be a huge database of every query : response : list of problems : links to solutions if they ever figured it out , that would save humanity unlimited man hours... but probably put openai out of business
The llm was wrong about what the light on the motorcycle means, since the headlight is ALWAYS on. A simple but important mistake.
i would just using the scraping way and dehtml it. ive never seen seen someone with so much problems calling api
great video, just one suggestion, the repetition of what you're typing literally every time is a bit much.
@unconv
7 ай бұрын
Thanks! I'll try to avoid that in the future (and mistakes leading to repetition in general)
Want more
14:50 i don't think that's a good idea because u will lose a lot of tokens (input, output), so it s better to use scrapping urls with vector store
What is 4 Vision API?
Remove the word Like, ask what is the weather in Alaska. The question you ask leads to an answer such as “colder than a commercial freezer”.
@unconv
7 ай бұрын
Good point 😂
Can this work for Instagram scraping ?
@denzelcanvas5223
7 ай бұрын
no. instagram doesnt allow repetitive actions.
Thank you for the Video. But the way you re-typing the question (instead of copy and paste it) make me frustrated 😖
@unconv
7 ай бұрын
Sorry about that 😄
gpt4 vision api limits ?
@unconv
7 ай бұрын
100 requests per day
Why not use everything in js? So confusing
to have productive programming ai has to return what you want in 100% cases. it has to be better than human in deduction.
This could be the best kodi addon ever
Why aren't you using the AI to help you code?🤔🤷
@Bartskol
7 ай бұрын
I think that he wants to explain the code to us by writing. I use ChatGPT to write code as I'm not a programmer myself, but I find myself learning to code anyway because I still need to understand what I actually need. It's also tiring to pass every small error to chat; it's easier to make adjustments yourself. However, to do that, you need to understand the code at some level.
@unconv
7 ай бұрын
I actually have Copilot but usually I disable it because it often guides me to directions I don't want to go. Especially when making videos, if Copilot suggests a different way than I was going to go, I get distracted. And I'm still learning Python, so I want to actually learn it. If I always use Copilot, I can get the job done but I probably won't memorize the syntax.
@-Jason-L
7 ай бұрын
@@unconvI think he meant let chatgpt generate the entire code, not copilot.
@yungjerky
5 ай бұрын
Because fully AI generated code is unusable
@AIPulse118
4 ай бұрын
@@yungjerkynot anymore it isn't. Never used Grimoire?
bro sounds like an AI. Good video tho
Why you mixed Python + JS i dont see an requirement you could single programming language, Java script, or Python, and simply executed the same task with the single project
Instant fork, all your code belong to us
Seems very inefficient to do it that way, yes it’s and interesting concept but you can do it all in Python and your logic can be simplified to get results
it's an AI speaking?
aaaaa this was frustrating as hell
It's not hard to make a scraper. In fact you probably only need to use a http request, not a full on instance of chrome.
you would bankrupt if you use gp4 vision api scrap web.... just link your credit card and start scraping
coding 😛
Using the seed as if it was a hyperparameter shows how little you know about the stuff you're talking about, congrats!
@unconv
7 ай бұрын
I mean, if you know more about it than me, you could maybe explain further or link to some more information about the subject
@ZweiBein
7 ай бұрын
@hidroman1993 What a stupid reply, guide him at least if you know better...
It is intolerable how badly you prepared for the video. You can't teach people like that.
@unconv
7 ай бұрын
This isn't Unconventional Teaching
@alqods80
7 ай бұрын
It is a more natural way as a developer, it is much better that way, learnt debugging
@noahgottesla3439
7 ай бұрын
This is definitely the practical way to watch and learn. I like your style. You are showing the humanity of future coding
@itheenigma
7 ай бұрын
I love this approach - similar to how good developers actually code. Keep it up unconv
@JT-Works
7 ай бұрын
Meh, he is teaching how to troubleshoot. If you want direct directions just read the API documentation.
where do you put the openai key? I can't find anywhere to put it tried searching. Getting a billing not active error.
@unconv
7 ай бұрын
It grabs it from the OPENAI_API_KEY environment variable. You can set it on Linux by running "export OPENAI_API_KEY=YOUR_API_KEY" and if you're on Windows, I believe you can use "setx" or "set" instead of "export"