I Made a FAST Search Engine

Ғылым және технология

Get $15 free credits with BrightData: brdta.com/conaticus1
BrightData KZread Channel: @BrightData
TF-IDF Blog Post: janav.wordpress.com/2013/10/2...
Lemmetization Word Lists: github.com/michmech/lemmatiza...
Crawler Repository: github.com/conaticus/search-e...
API Repository: github.com/conaticus/search-e...
Client Repository: github.com/conaticus/search-e...
Discord: / discord
Github: github.com/conaticus
Twitter: / conaticus
Join this channel to get access to perks:
/ @conaticus
I Made a FAST Search Engine
0:00 Intro
0:20 BrightData
2:10 Inverse Term Frequency & Indexing
6:41 Page Ranking & Lemmetization

Пікірлер: 185

  • @conaticus
    @conaticus3 ай бұрын

    Start building awesome projects with $15 free credits using BrightData today: brdta.com/conaticus1

  • @AWIRE_onpc

    @AWIRE_onpc

    3 ай бұрын

    no

  • @xulaxwtf

    @xulaxwtf

    3 ай бұрын

    no

  • @aryanszone4963

    @aryanszone4963

    3 ай бұрын

    no

  • @noviui

    @noviui

    2 ай бұрын

    no thanks

  • @user-uv3nv2bc6v

    @user-uv3nv2bc6v

    Ай бұрын

    no

  • @lifeofme702
    @lifeofme7023 ай бұрын

    I don't know what this guy said, and still was mind-blown of all the effort this guy puts

  • @conaticus

    @conaticus

    3 ай бұрын

    Thanks much so 🙏 It would not be possible without your support

  • @jaymarksum6542
    @jaymarksum65423 ай бұрын

    I’m impressed, can’t wait to see you build a multithreaded web server in assembly

  • @da40au40

    @da40au40

    3 ай бұрын

    Why do I find it super funny 😅😅😅.

  • @ArthursHD

    @ArthursHD

    3 ай бұрын

    @@da40au40 Me too :D

  • @DanskeCrimeRiderTV

    @DanskeCrimeRiderTV

    3 ай бұрын

    it's not impressive. Of course querying a few hundred or even hundred thousand web pages isn't as complicated or slow of a task than querying trillions of webpages.

  • @KibitoAkuya

    @KibitoAkuya

    3 ай бұрын

    ​@@DanskeCrimeRiderTV google also wastes time deciding wether you are allowed to see or not certain sites

  • @DanskeCrimeRiderTV

    @DanskeCrimeRiderTV

    3 ай бұрын

    @@KibitoAkuya what does that have to do with anything? Google is still faster at querying trillions of results than this.

  • @asm_x86
    @asm_x863 ай бұрын

    That's really impressive, I can't even figure out how to run it.

  • @ZuperPotato

    @ZuperPotato

    3 ай бұрын

    Nice username

  • @conaticus

    @conaticus

    3 ай бұрын

    Just added some instructions to the READMEs if you're interested :)

  • @asm_x86

    @asm_x86

    3 ай бұрын

    @@conaticus thanks, I'll do that

  • @coderx8634
    @coderx86343 ай бұрын

    Love your content. You and your quality have really improved. Keep it up ❤

  • @conaticus

    @conaticus

    3 ай бұрын

    Thanks so much, your support means a lot ♥

  • @coderan5029
    @coderan50293 ай бұрын

    This is basically what we learned in my big data class, but we used map-reduce to do the TF-IDF calculations, so it's impressive you figured this out on your own

  • @rafaelpereiracoias1047
    @rafaelpereiracoias10473 ай бұрын

    Nice video and nice code, keep up the good work!

  • @ExpandedCuber
    @ExpandedCuber3 ай бұрын

    Let's go another conaticus video

  • @ccost
    @ccost3 ай бұрын

    7:40 flashing those questionable websites in a sponsored video is quite the move

  • @twitchizle

    @twitchizle

    3 ай бұрын

    You scared of porn?

  • @greensporevalley
    @greensporevalley3 ай бұрын

    SERBIA MENTIONED 🎉🎉🎉

  • @RealMephres

    @RealMephres

    3 ай бұрын

    ​@europa_the_last_battle>goes to comments >sees meme comment >looks at replies >only a LARPer replied lol

  • @MAXHASS-ph5ib

    @MAXHASS-ph5ib

    3 ай бұрын

    @@RealMephres this aint 4chan nga

  • @jawadmansoor6064

    @jawadmansoor6064

    3 ай бұрын

    that name rings a bell, maybe from some kind of Serbian movie?

  • @RealMephres

    @RealMephres

    3 ай бұрын

    @@MAXHASS-ph5ib tell that to the LARPer dawg

  • @slimeyar

    @slimeyar

    3 ай бұрын

    ​​@@RealMephrestell that to yourself 😊

  • @MySachincool
    @MySachincool2 ай бұрын

    Subscribed & notifications on :) you deserve more recognition bruh

  • @foqsi_
    @foqsi_3 ай бұрын

    Love this dude and his video projects

  • @conaticus

    @conaticus

    3 ай бұрын

    🙏

  • @polyshrub
    @polyshrub3 ай бұрын

    This is very impressive, what was the size of the database when indexing is finished? Seems like it would be quite big

  • @turb0004
    @turb00043 ай бұрын

    Please finish your file explorer in rust fully, because the idea of it is awesome. Love your videos, content is very engaging 🎉

  • @iritesh
    @iritesh3 ай бұрын

    Awesome effort ✨

  • @aryakvn6051
    @aryakvn6051Ай бұрын

    You could calculate and cache TF values on the fly so you don’t fill up your ram as quickly but still get a decent response time.

  • @6IGNITION9
    @6IGNITION93 ай бұрын

    filter out JS for another 10x bandwidth savings alternatively use an adblocker. (can puppeteer do that? It's just chromium right?)

  • @devinlauderdale9635
    @devinlauderdale96353 ай бұрын

    The problem is this approach is susceptible to SEO spamming/invisible SEO keywords

  • @conaticus

    @conaticus

    3 ай бұрын

    Yeah for sure, realistically it should be moderated based on user interaction as well

  • @R_Y_Z_E_N
    @R_Y_Z_E_N3 ай бұрын

    Google also does the same but with disstributed computing to reduce the overall time . Just scale the database horizontally and mimic googles apporach

  • @Nerdimo
    @Nerdimo3 ай бұрын

    Impressive, seriously!

  • @madalenaferreira3018
    @madalenaferreira30183 ай бұрын

    great video, gave me ptsd from my information retrieval class though

  • @GermanTimecrafter
    @GermanTimecrafter3 ай бұрын

    such a cool video! i love the way how you explain what you are doing :) random question but what is your editor font?

  • @conaticus

    @conaticus

    3 ай бұрын

    Appreciate it :) I'm using Jetbrains Mono it's free to download

  • @a6gitti
    @a6gitti3 ай бұрын

    Supa dope. I would like to use this search engine of yours

  • @dreamsofcode
    @dreamsofcode3 ай бұрын

    🔥🔥🔥

  • @stayhappy-forever
    @stayhappy-forever3 ай бұрын

    thats insane, hows this only at 12k views

  • @allenfpascua
    @allenfpascua3 ай бұрын

    Super good editing 🫡🫡🫡🫡

  • @conaticus

    @conaticus

    3 ай бұрын

    Would not possible with your breathtaking animations 😄

  • @jugurtha292
    @jugurtha2923 ай бұрын

    very nice, built something similar for my info retrieval class. we have to use okapi bm25 formula for the ranking but overall very similar. scrape, tokenize, parse, inverted index, rank

  • @gaimnbro9337
    @gaimnbro93373 ай бұрын

    Nice job :D

  • @errplane_
    @errplane_3 ай бұрын

    oh my fuck i saw this on your github last night

  • @yorailevi6747
    @yorailevi67473 ай бұрын

    how much did you pay for the web scraping service in total?

  • @SG-kn2jl
    @SG-kn2jl3 ай бұрын

    Why did you choose TF-IDF instead of word2vec or any context aware model?

  • @skorp5677

    @skorp5677

    3 ай бұрын

    +1 Woule like to know

  • @user-xl2om2up2x
    @user-xl2om2up2x3 ай бұрын

    W ad plug, it's 100% relevant and actually necessary to fulfill the premise of this vid.

  • @80sVectorz
    @80sVectorz3 ай бұрын

    3:07 Best pronunciation of Euclidean I have every heard :P

  • @CrazyDiamondo

    @CrazyDiamondo

    3 ай бұрын

    Where?

  • @80sVectorz

    @80sVectorz

    3 ай бұрын

    @@CrazyDiamondo I added a timestamp

  • @thekwoka4707
    @thekwoka47073 ай бұрын

    How much did the scraping cost if it wasn't free?

  • @jsalsman
    @jsalsman3 ай бұрын

    I believe it's "inverted indexing", as inverse indexing is something else.

  • @maksymilianglowacki1409
    @maksymilianglowacki14093 ай бұрын

    is this engine oneline or ( wouldt it be abel to be oneline for otcher users ) so otcher also coulst enjoy it? or was it dust a peak or somthing you made cuz ( you where bored or smt )

  • @carlitosdummy
    @carlitosdummy3 ай бұрын

    i love this channel

  • @animeworld4775
    @animeworld47753 ай бұрын

    what is things that i should to know or learn to create like these projects

  • @GONDWANA-de4od

    @GONDWANA-de4od

    3 ай бұрын

    HTML for website creation CSS page designing Javascript for making website dynamic and for backend SQL for indexing Rust for fast backend services

  • @miro5182
    @miro5182Ай бұрын

    You can use a chrome like TLS config to not get blocked by cloud flare in a lot of cases, using a browser for scraping isn’t viable when tracking about scanning the internet.

  • @MortonMcCastle
    @MortonMcCastle3 ай бұрын

    Good! The world needs a new Google Search, one that's more like how it was in the 2000s.

  • @ethanstewart1011
    @ethanstewart10113 ай бұрын

    How did you manage to get a node.js memory leak??

  • @larry_berry
    @larry_berry3 ай бұрын

    Lol. Got notif after clicking the video.

  • @synapsenova299-fp7tf
    @synapsenova299-fp7tf3 ай бұрын

    >goes to youtube homepage >finds this video >yipeee >oh >lets try it

  • @gopallohar5534
    @gopallohar55342 ай бұрын

    ain't see rust there!

  • @alexmoses3215
    @alexmoses32152 ай бұрын

    Programming 🤝 martincitopants…match made in heaven

  • @TheRealMangoDev
    @TheRealMangoDev3 ай бұрын

    good vid

  • @callowaysutton
    @callowaysutton3 ай бұрын

    Next time use the Common Crawl dataset ;)

  • @daemonkisure2952
    @daemonkisure29523 ай бұрын

    how can i install this search engine?

  • @conaticus

    @conaticus

    3 ай бұрын

    Instructions are on the Github repos :)

  • @lazarusNoob
    @lazarusNoob3 ай бұрын

    You should host it

  • @igrb
    @igrb3 ай бұрын

    nice

  • @HyperCodec
    @HyperCodec3 ай бұрын

    Bro managed to memleak in js

  • @gamedirection_us
    @gamedirection_us3 ай бұрын

    🍎 👀 .. Apple being like "when will it be ready?".

  • @playtatus1758
    @playtatus17583 ай бұрын

    how do you edit your vids

  • @conaticus

    @conaticus

    3 ай бұрын

    Allen uses adobe after effects for the amazing animations - I just use Davinci to cut things up 😁

  • @playtatus1758

    @playtatus1758

    3 ай бұрын

    @@conaticus ok thx

  • @Macellaio94
    @Macellaio943 ай бұрын

    Liked and subbed

  • @etherbeans
    @etherbeans3 ай бұрын

    da goat

  • @binpersonal
    @binpersonal3 ай бұрын

    "some fucking genius" lmao

  • @Tech_Code127-76
    @Tech_Code127-763 ай бұрын

    Good

  • @fangg194
    @fangg1943 ай бұрын

    you seem ok

  • @lonelybookworm
    @lonelybookworm3 ай бұрын

    Well of course it is very fast, it only has like 200 websites

  • @mahrezjanati3426
    @mahrezjanati34263 ай бұрын

    first time watching a vid of yours ... i have one question : why are you vibrating ??

  • @-rate6326

    @-rate6326

    3 ай бұрын

    Cause he is vibrator

  • @InioluwaFalade-Tolulope

    @InioluwaFalade-Tolulope

    Ай бұрын

    don't know either

  • @schoolbreakyay
    @schoolbreakyay21 күн бұрын

    Can i not use brightdata?

  • @SlimyFrog123
    @SlimyFrog1233 ай бұрын

    Now make your own email system to go along with it. 😉

  • @c猫t
    @c猫t3 ай бұрын

    at a desert

  • @humanontheinternet6510
    @humanontheinternet65102 ай бұрын

    Auto solve captcha you say🧐

  • @deepfan14
    @deepfan14Ай бұрын

    Bro make a compiler programming language

  • @Raven-fu1zz
    @Raven-fu1zz3 ай бұрын

    Remember, never return an over 18 site without an over 18 word in the search request

  • @ALTERRAa8
    @ALTERRAa83 ай бұрын

    6:08 nahhhhhhhhhhh whats bro even searching 💀💀💀💀

  • @_DarkLiquid
    @_DarkLiquid3 ай бұрын

    discord clone when

  • @iCrimzon
    @iCrimzonАй бұрын

    Cant wait for you to rewrite JS in binary 🎉🎉

  • @Xanmattauri
    @Xanmattauri3 ай бұрын

    @google acquire this man

  • @a224kkk
    @a224kkk3 ай бұрын

    Nice, you re-invented the lucene library

  • @Horn7xBG
    @Horn7xBG3 ай бұрын

    hub 🎉🎉

  • @Serhii_Volchetskyi
    @Serhii_Volchetskyi3 ай бұрын

    🔥🔥🔥 I was looking for that algorithm and didn't know its name.

  • @v037_
    @v037_3 ай бұрын

    I found a worthy opponent

  • @Faeest
    @Faeest3 ай бұрын

    why disallow and user-agent matter? can't you just scrap everything?

  • @skorp5677

    @skorp5677

    3 ай бұрын

    You can but it might be illegal

  • @joenutt1232
    @joenutt12323 ай бұрын

    Create your own database engine for shits and giggles

  • @conaticus

    @conaticus

    3 ай бұрын

    B+Trees 💀

  • @J0Y22
    @J0Y223 ай бұрын

    shockedd

  • @AquaQuokka
    @AquaQuokka3 ай бұрын

    Rewrite your genetic code in Rust.

  • @pyyrr

    @pyyrr

    3 ай бұрын

    i would rather be bug free so i will pass

  • @AttaaH
    @AttaaHАй бұрын

    0:33 🤨

  • @monotonedevelopment
    @monotonedevelopment3 ай бұрын

    If only windows file explorer could do the same

  • @SandWire

    @SandWire

    3 ай бұрын

    For this we have thing named Everything :)

  • @gammongaming9081
    @gammongaming90813 ай бұрын

    yk what would be funny? making the slowest search engine possible without like halting the program for a set time, just with maths

  • @Ayymoss
    @Ayymoss3 ай бұрын

    MAKE LONGER VIDEOS

  • @monkshee
    @monkshee3 ай бұрын

    damn

  • @danielisop3182
    @danielisop31823 ай бұрын

    What did u mean by the websites u shouldn’t have searched

  • @Miluum
    @Miluum3 ай бұрын

    1:06 automatically solve captchas? i knew these things exist just to waste our time and energy

  • @--bountyhunter--
    @--bountyhunter--Ай бұрын

    bro thought he could scrape my web and get away with it.

  • @juniordevmedia
    @juniordevmedia3 ай бұрын

    what TF is IDF ?!!

  • @neofox2526

    @neofox2526

    3 ай бұрын

    idk man but watching it makes me feel smart

  • @jamesbarret4240

    @jamesbarret4240

    3 ай бұрын

    Term frequency (the number of times a given word or so shows up in total) - inverse document frequency (the number of times it shows up in a specific document). The wikipedia article is pretty good: en.wikipedia.org/wiki/Tf-idf

  • @neologicalgamer3437
    @neologicalgamer34373 ай бұрын

    Bro sounds like WilburSoot

  • @sleepybraincells
    @sleepybraincells3 ай бұрын

    Why is there Rust in the thumbnail? This was written in Javascript

  • @conaticus

    @conaticus

    3 ай бұрын

    Used Rust for the API and TF-IDF matching - decided not to keep in much of the footage for that as it was already explained in the animations

  • @user-fj5ts6sz1f
    @user-fj5ts6sz1f3 ай бұрын

    rust is a real badass❤❤

  • @chiroyce
    @chiroyce3 ай бұрын

    What are the consequences of scrapings sites you aren't allowed to?

  • @conaticus

    @conaticus

    3 ай бұрын

    Probably not much on its own as long as you're not violating copyright - however it is curtious not to scrape sites forbidden by the robots.txt

  • @trollinqu

    @trollinqu

    3 ай бұрын

    wastes their resources and yours

  • @AhmedMahmoud-ec4kz
    @AhmedMahmoud-ec4kz2 ай бұрын

    Great video 😊 FYI: bright data is an Israeli company 😮

  • @latrapa918
    @latrapa9183 ай бұрын

    105

  • @_sohom
    @_sohom3 ай бұрын

    Make a better version of VSCode.

  • @ph03n1x_dev
    @ph03n1x_dev3 ай бұрын

    You made a search engine for porn?! Thats disgusting... is it on GitHub?! 👀

  • @conaticus

    @conaticus

    3 ай бұрын

    All open source and ready to play around with 😂

  • @susannerudolph8469
    @susannerudolph84693 ай бұрын

    then brightdata makes captchas useless

  • @educacionespecialchannel3756

    @educacionespecialchannel3756

    3 ай бұрын

    Captcha's effectiveness has been in question for quite some time now.

  • @brettmiddleton5013

    @brettmiddleton5013

    Ай бұрын

    protects against amateurs but keeps it simple enough that an expert won’t breach/destroy their data to get what they want.

  • @vrljk
    @vrljk3 ай бұрын

    SRBIJAAAAAA

  • @kavinbharathi
    @kavinbharathi3 ай бұрын

    Not to be the 🤓☝️ guy, but "Jana Vembunarayanan" is pronounced 'Ja' as in 'Jarvis' and 'na' as usual. Just fyi

  • @conaticus

    @conaticus

    3 ай бұрын

    Thank you, I'll do this if I ever pronounce it again 😂

  • @konstantinsotov6251
    @konstantinsotov62513 ай бұрын

    we had a hackathon where we basically had to implement TF/IDF - also a search engine of a sort, but for files. we did the interface in python and all mathematics processing in C++. It would have been a fun experience if not for the time limit. we struggled really hard, on test data our solution worked faster by an order or two than most other participants, but... we somehow failed on the exam data. we failed fucking IO. and won nothing. I fucking hate hackathons since then. fuck IDF. also maybe this happened because i had written 75% of the code, while 4 other members did almost nothing. It was (their) responsibility to handle IO, and mine to handle mathematics and processing. I hate working in teams. I know noone cares but i might as well just burst out all of the rage I have towards that experience. once again, fuck team work, fuck hackathons, fuck my teammates, fuck everything and everyone

  • @skorp5677

    @skorp5677

    3 ай бұрын

    skill issue

  • @konstantinsotov6251

    @konstantinsotov6251

    3 ай бұрын

    @@skorp5677 exactly

  • @lukamajcenic1172
    @lukamajcenic11723 ай бұрын

    This is just an ad for BrightData. Compared to previous videos very low effort.