Web Scraping AI AGENT, that absolutely works 😍

Ғылым және технология

ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites, documents and XML files. Just say which information you want to extract and the library will do it for you!
🔗 Links 🔗
Scrape Graph AI
github.com/VinciGit00/Scrapeg...
Code used in the video - github.com/amrrs/scrapegraph-...
❤️ If you want to support the channel ❤️
Support here:
Patreon - / 1littlecoder
Ko-Fi - ko-fi.com/1littlecoder
🧭 Follow me on 🧭
Twitter - / 1littlecoder
Linkedin - / amrrs

Пікірлер: 98

@unclemike20082 ай бұрын
"poor" Love you brother! Right there with you. Great video. Been trying and failing to get a scraper with java support. Cheers!
@1littlecoder
2 ай бұрын
Someone noticed it :D
@marcoaerlic2576Ай бұрын
Really great video, thank you. I would be interested in seeing more videos about ScrapeGraphAI.
@ayyanarjayabalan2 ай бұрын
Awesome we need more practical session with code like this.
@alx84392 ай бұрын
Next time it will also need a visual model to solve capchas because website administrators will be protecting their precious content from scraping :)
@1littlecoder
2 ай бұрын
Haha
@bastabey2652Ай бұрын
this ScrapegraphAI tool is the most interesting scraping tool I've tested so far
@dakotaep1
14 күн бұрын
I am not having success with it. It only gives me urls, titles, related posts. No content that I ask for.
@Balajik7-qh1pq2 ай бұрын
I like all your videos , keep rocking bro
@user-ew8ld1cy4dАй бұрын
Great video! Thank you!
@liamlarsen92862 ай бұрын
thanks for the heads up at 6:00 . worked when using that version only
@HeberLopez2 ай бұрын
I find this live example pretty useful for general purpose, I can think of multiple ways I could use this for one off PoCs
@1littlecoder
2 ай бұрын
Glad it was helpful!
@Raphy_Afk2 ай бұрын
Amazing ! If my PSU wasn’t dead I wouldn’t be sleeping for days
@manojy10152 ай бұрын
We need more tutorials of practical live examples of llm especially rag and fine tuning
@patrickwasp2 ай бұрын
It’s a spider, not an octopus. Spiders crawl on webs.
@opusdei1151
2 ай бұрын
What is an octopus? Which crawls API's or do datamining
@jbo85402 ай бұрын
If your LLM gives you an article you can't find, my first assumption is that it made it up. While this is an interesting use case, it's going to likely take very precise prompt engineering to not get hallucinated outputs.
@1littlecoder
2 ай бұрын
No, it's my bad. After the video I reviewed the web page. In fact, I added the screenshot in the video. It was inside the carousel
@alqods802 ай бұрын
There is a playwright function that bypasses the irrelevant resources so the scraping becomes faster
@edgarl.mardal8256Ай бұрын
you are the best indian youtuber I have soon to this date.
@kalilinux86822 ай бұрын
Could you please do more videos on this. Like trying to use it on more educational content with equations used using mathjax and katex
@jmirodg70942 ай бұрын
thanks! 👍
@honneon2 ай бұрын
i luv it❤
@EobardUchihaThawne2 ай бұрын
Ok, now that's a good useage of ai model
@madhudson1Ай бұрын
It depends on the llm used and questions you pose it. It can often not generate json and the library isnt best suited for iteration through a collection of sites
@tauquirahmed18792 ай бұрын
great video....
@1littlecoder
2 ай бұрын
Glad you enjoyed it
@ngoduyvu2 ай бұрын
thanks for the tutorial, please make more tutorial for this ScrapeGraphAI, can you make one for scraping the website that has antibot or credential (require login)
@inplainview12 ай бұрын
Watching this before youtube gets upset again. 😉
@1littlecoder
2 ай бұрын
Honestly, I was actually scared before uploading this, but let's see!
@inplainview1
2 ай бұрын
@1littlecoder Hopefully all is well.
@monuaimat52282 ай бұрын
RAG: Ritual Augmented Generation 😂
@J3R3MI6
2 ай бұрын
🕯️🕷️🕯️
@shobhanaayodya70242 ай бұрын
That logo is a spider 🕸️🕷️
@CM-zl2jw2 ай бұрын
🤣 I enjoy your sense of humor. Thank you. You are RICH in kindness and intelligence. That’s almost as good as money…. Money only buys limited amounts of happiness. Your videos are very helpful and informative. I’ll pay you to help me figure a couple things out. What’s your contact?
@1littlecoder
2 ай бұрын
Thank you 1littlecoder@gmail.com is my email
@Macorelppa2 ай бұрын
🥇
@LeeBrenton2 ай бұрын
scrape Facebook please! - I need to do the most boring thing for work, I tried to program a scrapper but FB makes it very hard, I was only partially successful (expecially grabbing the post date). This method looks very exciting :)
@webhosting7062
2 ай бұрын
What was ur requirements?
@LeeBrenton
2 ай бұрын
@@webhosting7062 I write a daily report, based on the new posts in various FB groups .. but FB doesn't put posts in the correct order (also, pinned posts up the top will be old posts) .. so i need to check the date, but, FB obfuscates the date like a MF .. i wasn't able to figure it out with selenium. so, requirements are .. 'get the latest (less than ~24hr old posts) from a FB group.
@BiXmaTube2 ай бұрын
Need proper pdf parsing ai that I can run on a cloud server without gpu. Extracting text, tables and images and arranging it in a db based on a prompt that puts each data in the right table. That will be amazing if you can find something like that.
@user-vm8lr2hr7d2 ай бұрын
Only is own-lee Not one-lee Btw great video
@1littlecoder
2 ай бұрын
😭 will try to fix it!
@meetscreationz559118 күн бұрын
Hi, Could you please elaborate on setting base_url port number? also, where did you check olama information? kindly guide. TIA
@moonwhisperer4804Ай бұрын
Only if this tool has a way to automatically know how to go through different paginated pages and go into each detail page to extract data
@sandrallancherosg2 ай бұрын
BTW, that's a spider in the logo. It's a spider that lives in the World Wide Web 😅
@1littlecoder
2 ай бұрын
How did I not even think about it?😭😭😭
@sandrallancherosg
2 ай бұрын
@@1littlecoder :)
@planplay59212 ай бұрын
it still have the risk of being blocked, it's just a way of parsing
@einekleineente1Ай бұрын
It would have been nice if you would have shown to install Ollama locally first.
@1littlecoder
Ай бұрын
I'm sorry I had done it a few times before so didn't repeat kzread.info/dash/bejne/dWR7z6Omqcu8qLA.html
@einekleineente1
Ай бұрын
@@1littlecoder cool. Thank you 👍🏻
@NaveenChouhan-mm5gzАй бұрын
I tried to install the scrapegraphai but I'm getting stuck in the yahoo search dependency which breaks the execution and return attribute error.
@Ashort12345
Ай бұрын
it is the same error or not here: I'm very beginer level if someone know how to fix mine please leave the comment --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[25], line 17 3 graph_config = { 4 "llm": { 5 "model": "ollama/mistral", (...) 13 } 14 } 16 # Instantiate the SmartScraperGraph class ---> 17 smart_scraper_graph = SmartScraperGraph( 18 prompt="List me all the articles", 19 source="news.ycombinator.com", 20 config=graph_config 21 ) 23 # Run the smart scraper graph 24 result = smart_scraper_graph.run() File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\scrapegraphai\graphs\smart_scraper_graph.py:47, in SmartScraperGraph.__init__(self, prompt, source, config) 46 def __init__(self, prompt: str, source: str, config: dict): ---> 47 super().__init__(prompt, config, source) 49 self.input_key = "url" if source.startswith("http") else "local_dir" File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\scrapegraphai\graphs\abstract_graph.py:49, in AbstractGraph.__init__(self, prompt, config, source) 47 self.config = config ... --> 227 params = self.llm_model._lc_kwargs 228 # remove streaming and temperature 229 params.pop("streaming", None) AttributeError: 'Ollama' object has no attribute '_lc_kwargs' Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
@DhruvPatel-vl1tj14 күн бұрын
There is a problem i am encountering for many websites i am getting empty response from the library i have tried many solutions that were listed in their official documentation like proxy rotation , using different models etc .... also the output that it gives for any website also takes like minimum of 2-3 minutes pls help me solve the problem
@jarad46212 ай бұрын
Is the llm there to convert the raw html to structured data? Then it saves to rag and you can query the data with another llm to analyse? I need to scrape homepages from 10k sites tostructured data into rag db to ask The sites questions, can it be setup todo many sites like an automated agent, or can it be used as a tool or function call in an agent framework like crew ai? that video would be cool
@Kevinsmithns13 күн бұрын
Have you used vapi to automatically do cold calls
@IdPreferNot12 ай бұрын
What am i missing.... error running the async cell?
@mihirprakash600918 күн бұрын
Hi, can it scrape from the web in general? Like not a particular website
@darkreader0116 күн бұрын
if we want to scrape from websites that need authentication, how can we do that? Is there any way to login first or any option to use cookies?
@morease2 ай бұрын
I fail to see why rag is needed when the library can simply be asked to identify the html path/element that contains the content, and then extract the html from that with cheerio
@ramanaraj728 күн бұрын
can we use Gemini API to do the same?
@prasannaprakash8922 ай бұрын
This is great, thanks for sharing, Can you share your python version as i am getting an error when running the same code
@1littlecoder
2 ай бұрын
I guess mine is 3.9ish
@Ari_Alur2 ай бұрын
Would it be possible to explain the whole thing to someone who has nothing to do with programming? I was able to install everything but I can't do anything with the code from github... Would be great :) Thanks for the video! Very interesting but unfortunately not feasible for me. (I'm on Linux)
@1littlecoder
2 ай бұрын
Do you want me to show how to run the code from GitHub? Will it be helpful
@Ari_Alur
2 ай бұрын
Yeah! At least in a way that's easier to understand. I don't know anything about code, so I need things to be clear and simple.
@Ari_Alur
2 ай бұрын
Thanks!:)
@oliverli96302 ай бұрын
wondering when somebody will integrate `undetected-chrome` to it.
@adriangpuiu2 ай бұрын
another question , what if we only want to scrape and not emmbed anything ?
@1littlecoder
2 ай бұрын
I think in those cases you can probably use a conventional libraries I guess but that's a good question there are different classes within this library that might let it do
@adriangpuiu
2 ай бұрын
@@1littlecoder from scrapegraphai.graphs import BaseGraph from scrapegraphai.nodes import FetchNode, ParseNode,generate_answer_node graph = BaseGraph( nodes={ fetch_node, parse_node, }, edges={ (fetch_node, parse_node), (parse_node, generate_answer_node), }, entry_point=fetch_node ) .. i dont have time to try it now cause im at work :))
@user-nm2wc1tt9uАй бұрын
does it work on google colab?
@user-zt2lp6hq7lАй бұрын
reddit being called front page of internet is like... no please
@AI-Wire2 ай бұрын
So, this is impossible to run in Colab? I like to automate many of my tasks using Github actions.
@1littlecoder
2 ай бұрын
You can run on colab. But you'd need openai keys
@kushagrakapoor918112 күн бұрын
hey man im getting not implemented error
@yashsrivastava6772 ай бұрын
Will it work to scrape linkedIn jobs?
@DM-py7pj2 ай бұрын
looks something like spider (scrape/crawl) + bone (GET/fetch) + document | parse ( HTML) ???
@1littlecoder
2 ай бұрын
Plus RAG, yes!
@rahuldinesh28402 ай бұрын
I think Chrome extensions are best.
@viddeshk80202 ай бұрын
I don't understand that for web scrapping why do I have to install so much of other dependencies like ollama etc. I mean it is just a simple webscraping why make the thinks complex? Still for the complex task a complex prompt needs to be given.
@liamlarsen9286
2 ай бұрын
ollama is just a frmework to run LLMs locally, so it downloads the model insted of using an API and connecting to server
@madhudson1
Ай бұрын
If you just want scraping, don't bother with this. However, if you want scraping + RAG, with LLM integration, then use this. But it's not without it's issues
@adriangpuiu2 ай бұрын
can it do heavy JavaScript sites ? :))
@1littlecoder
2 ай бұрын
I've not tried it! it'd be a good opportunity to try that, especially given it uses Playwright!
@adriangpuiu
2 ай бұрын
@@1littlecoder ill tell ya, i tried and it fails miserably :)) , if you have better luck let us know man
@1littlecoder
2 ай бұрын
@@adriangpuiu ah that's bad. Which website was it ?
@adriangpuiu
2 ай бұрын
@@1littlecoder the user replyes are incapsulated in a JS response from what i noticed, maybe they have an api or soething , i was just unable to figure it out . YET ...
@adriangpuiu
2 ай бұрын
@@1littlecoder its the appian discussion forum
@webhosting70622 ай бұрын
What about site build with jquery.. Does it works for that too?
@1littlecoder
2 ай бұрын
I have not tried it . Someone else in the comments said it might not very good.
@Balajik7-qh1pq2 ай бұрын
I like all your videos , keep rocking bro
@1littlecoder
2 ай бұрын
Thank you so much 😀