The creators of TikTok caused my website to shut down

Ойындар

and i thought charli d'amelio was the worst thing bytedance had done to me
▶SUPPORT on Patreon and watch videos like this early and ad-free: / mattkc
▶FOLLOW on Twitter: / itsmattkc
▶FOLLOW on Twitch: / mattkclive
▶FOLLOW on Instagram: / itsmattkc
▶Music by DDRKirby(ISQ) used with permission: ddrkirbyisq.bandcamp.com/
"I Can Feel it Coming" Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0 License
creativecommons.org/licenses/b...

Пікірлер: 1 000

  • @christianwolff497
    @christianwolff4979 ай бұрын

    the biggest crime here is naming it ByteSpider and not SpiderByte

  • @Your_Average_Stickman_WasTaken

    @Your_Average_Stickman_WasTaken

    9 ай бұрын

    I hate bluey

  • @ErikKoev

    @ErikKoev

    9 ай бұрын

    no, the biggest crime is actually naming it ByteSpider and not SpiderDance

  • @pausebreakreviews

    @pausebreakreviews

    9 ай бұрын

    God forbid it bite ya. Don't let 'em bitecha. That SpiderByte! Hurt! Hurt SpiderByte! That SpiderByte HURT!

  • @codyryan9789

    @codyryan9789

    9 ай бұрын

    ​@@pausebreakreviewsthat spider bit me where the good lord split me

  • @KOMEOyt

    @KOMEOyt

    9 ай бұрын

    SpyderByte

  • @philo23
    @philo239 ай бұрын

    You probably want to block them in cloudflare rather than on your server, currently they’re still wasting your bandwidth (just in a much much more reduced form) by blocking them in cloudflare they shouldn’t end up wasting any of your bandwidth at all, they’ll never even touch your server. A simple page rule should do the trick, and even on the free tier you should get 3 page rules.

  • @lxpe5269

    @lxpe5269

    9 ай бұрын

    Cloudflare also gives 5 WAF rules for free. With these, you could create a rule to block the user agent and add any other user agents, IPs, ASNs, etc in the future within a single rule.

  • @JustAWalter

    @JustAWalter

    9 ай бұрын

    It says in the video blocking doesn't help

  • @philo23

    @philo23

    9 ай бұрын

    @@JustAWalter in the video he's talking about Cloudflare's automatic bot detection, which is going to let legitimate web crawlers like ByteSpider through. I'm talking about a custom rule to specifically block that user agent at the Cloudflare level

  • @porglezomp7235

    @porglezomp7235

    9 ай бұрын

    No, it says in the video that cloudflare’s automated DDOS protection doesn’t help. Explicit traffic rules would help.

  • @gggkiller

    @gggkiller

    9 ай бұрын

    Since the bot follows links, if the homepage returns a 403, it won't spam the other pages as it has no links to follow I assume, but yeah, blocking in CF would still be ideal as it'd mean even avoiding that initial single 403 request's bandwidth.

  • @GooveG
    @GooveG9 ай бұрын

    ByteSpider checking for updates on the Lego Island series every 100 milliseconds

  • @underscore.

    @underscore.

    9 ай бұрын

    0.1 milliseconds*

  • @Yazan_Majdalawi

    @Yazan_Majdalawi

    9 ай бұрын

    ​@@underscore. 0.1 seconds

  • @UnitSe7en

    @UnitSe7en

    9 ай бұрын

    Chinese hackers still have not managed to fix the framerate. Spy on the West.

  • @tomtrublu

    @tomtrublu

    9 ай бұрын

    0.1 nanoseconds.

  • @leonkernan

    @leonkernan

    9 ай бұрын

    Can’t fault them for wanting an update

  • @PoignantPirate
    @PoignantPirate9 ай бұрын

    I definitely appreciate the PSA, you literally just saved me from having to diagnose the same issue on one of my servers.

  • @erikkonstas

    @erikkonstas

    9 ай бұрын

    Wait really? The coincidence!

  • @NaoPb

    @NaoPb

    9 ай бұрын

    ​@@erikkonstasyou mean coincibytedance.

  • @adamkuster

    @adamkuster

    9 ай бұрын

    @@erikkonstas You mean CoinciDance.

  • @FlameSoulis

    @FlameSoulis

    9 ай бұрын

    Can confirm. I've been bitten by the stupid spider now that I reviewed my logs. If it isn't Russia trying to access a non-existant CPanel, it's now this.

  • @glossymouse7712

    @glossymouse7712

    8 ай бұрын

    ​@@erikkonstasIt might not be a coincidence as they are probably launching a huge data gathering campaign for a possible AI.

  • @asriel09
    @asriel099 ай бұрын

    Looks to me like they're downloading any and all images they can find. Could be for training an AI model. Looks like you have a forum, so that's why there's tonnes of requests coming your way.

  • @TsoLIt

    @TsoLIt

    9 ай бұрын

    I've seen this before on my company's website. We host a lot of blog posts for business communications systems. Our site traffic trippled in a span of a week, and pretty sure it was one of these crawlers for AI

  • @mr.whimsic6902

    @mr.whimsic6902

    9 ай бұрын

    Imagine a timeline where tiktok makes an ai of mattkc

  • @TuriGamer

    @TuriGamer

    9 ай бұрын

    "Could be training for ai" No

  • @bonkwonkelchip7569

    @bonkwonkelchip7569

    9 ай бұрын

    @@TuriGamer yes

  • @Survivalist_Redo

    @Survivalist_Redo

    9 ай бұрын

    @@TuriGamer yes it's likely not training for an AI, could very easily be dataset gathering to then later train an AI though

  • @Gunbudder
    @Gunbudder9 ай бұрын

    oddly enough, a chinese crawler completely tanked my college professor's homework submission website. it was extremely persistent too! i remember the entire system was down for a few days while they worked out exactly how to block it. from what i remember, they eventually just blocked every IP not from the USA lol

  • @IdentifiantE.S

    @IdentifiantE.S

    9 ай бұрын

    You’re just too strong 😂

  • @Kyrmana

    @Kyrmana

    9 ай бұрын

    Very sus

  • @steve_1507

    @steve_1507

    9 ай бұрын

    Digital racism

  • @GavinFromWeb

    @GavinFromWeb

    9 ай бұрын

    @@steve_1507no, not really. It’s a uni in the US. Unless they let you study online in other countries it shouldn’t be a problem.

  • @erikkonstas

    @erikkonstas

    9 ай бұрын

    Um, sorry to be a killjoy, but "every IP not from the USA" is not an objective statement... an IP address by itself does not contain information regarding its origin on the planet, the job is usually done by ISPs (of all levels) who hand these addresses out to customers while reporting back to geo-IP database hosts at the same time; if one ISP of a high enough level goes rogue, you're toast...

  • @rudolfpast9243
    @rudolfpast92439 ай бұрын

    i dont know if its still a thing, but back in the day i implemented a spidertrap to all of my websites. easy thing. you need a 1x1 pixel transparent image on every site linked to your trap-script and in your robots.txt you declare the script as disallowed. so good spiders wont go there and bad ones will be blocked...

  • @TheGreatSteve
    @TheGreatSteve9 ай бұрын

    You'd think non-malicious inadvertent DDoS would be the easiest thing for Cloudflare to spot and block? Maybe it's whitelisted?

  • @capsey_

    @capsey_

    9 ай бұрын

    I mean, is it though? I am no expert, but I think gradual exceed of bandwidth limit is harder to spot than active DDoS attack. Why do you think it is easier?

  • @semaja2

    @semaja2

    9 ай бұрын

    CF can block this traffic, you could deploy rules to block the user agent, or pay for their bot features, but this isn’t a DDoS Alternatively adjusting the server code to be more cache friendly would also help

  • @FurriousFox

    @FurriousFox

    9 ай бұрын

    the caching part would in fact not work, the bytespiders will only scan all urls once, not multiple times, so caching wouldn't solve anything

  • @monad_tcp

    @monad_tcp

    9 ай бұрын

    probably

  • @x_x_w_

    @x_x_w_

    9 ай бұрын

    Increase the cloudflare query caching level

  • @Nik.leonard
    @Nik.leonard9 ай бұрын

    At least they had the “decency” of using a proper UA. They could (and hope that not will) just use the Chrome UA or worst, weighted random UA’s

  • @JessicaFEREM

    @JessicaFEREM

    9 ай бұрын

    may be worth it to block the entire country if it's bad enough.

  • @FlamesRunner

    @FlamesRunner

    9 ай бұрын

    @@JessicaFEREMBlocking countries is moreso a last resort, and shouldn't be considered so long as other options are available. CloudFlare, for instance, offers the ability to selectively block user agents, which would do the trick here.

  • @saiv46

    @saiv46

    9 ай бұрын

    @@JessicaFEREM What's why many websites just outright block China (and now Russia, but for other reasons)

  • @undefinedchannel9916

    @undefinedchannel9916

    9 ай бұрын

    @@JessicaFEREMApparently they use different hosts like AWS so the country may show up as the US for some requests.

  • @HappyGick

    @HappyGick

    7 ай бұрын

    ​@@undefinedchannel9916 Enough requests appeared with Singapore/China as location, so he could block those countries and he would be fine.

  • @ondrejpavlik4210
    @ondrejpavlik42109 ай бұрын

    I'd recommend you set up a simple email notification that the server would send to you if an arbitrary bandwidth threshold you'd consider too high was exceeded. This way you could resolve the issue before any downtime occurs.

  • @arjix8738

    @arjix8738

    9 ай бұрын

    or just implement a cooldown that returns 429 when the same IP makes too many requests in under a specific amount of time

  • @jacksoncremean1664

    @jacksoncremean1664

    9 ай бұрын

    @@arjix8738 from what he's shown in the access log that will be tricky to pull off since they are crawling very slowly

  • @jacksoncremean1664

    @jacksoncremean1664

    9 ай бұрын

    a better idea would be to just set Cloudflare security level to IUAM

  • @randomblock1_

    @randomblock1_

    9 ай бұрын

    That's what his first email was about. The second was the notification that it ran out

  • @JordanPlayz158

    @JordanPlayz158

    9 ай бұрын

    ​@@arjix8738true, otherwise the crawler has no reason to assume there is a rate limit (perhaps there are even standard crawler headers to dictate how often they should scrape?)

  • @SlavTiger
    @SlavTiger9 ай бұрын

    I'm just sick of us being expected to foot the bill for something a large corporation does without your consent. These days our data makes us look like little more than dollar signs instead of people to a lot of those tech company execs.

  • @EpicLPer
    @EpicLPer9 ай бұрын

    OH MY GOD ARE YOU KIDDING ME... so THIS was the reason my site went down too??? I suddenly couldn't reach my website at around July 11th or something too, and a few minutes later my provider sent me a mail saying they temporarily disabled my site till the figure out what's going on, it also looked like a DDoS in the logs and everything... wow... Now that mystery is solved, thanks! :)

  • @erikkonstas

    @erikkonstas

    9 ай бұрын

    I'd say check if this was *really* it tho (like "ByteSpider" and everything).

  • @GeorgeSukFuk

    @GeorgeSukFuk

    9 ай бұрын

    It's the squinty-eyed commies!

  • @johnbucki5567
    @johnbucki55679 ай бұрын

    When exceeding bandwidth, VPS's should not be suspended. I believe they should just shut off all network access, so the KVM console would still be accessible for troubleshooting. Also, if the reason is a DDOS attack, it will stop reaching the server and you can check where the traffic is coming from.

  • @shishsquared

    @shishsquared

    8 ай бұрын

    Yeah it's crazy that there's not an out of band console

  • @WackoMcGoose
    @WackoMcGoose9 ай бұрын

    Why do I get the feeling that Bytedance paid Cloudflare to look the other way and ignore their aggressive crawler shenanigans...

  • @jcfawerd

    @jcfawerd

    2 ай бұрын

    Not surprised, since cloudflare is allowed to operate in china, coincidence? I don’t think so

  • @ZeroRiskAppetite
    @ZeroRiskAppetite9 ай бұрын

    Maybe the crawler gets into an infinite loop. Might be the classic 'detecting cycles in an undirected graph' problem.

  • @erikkonstas

    @erikkonstas

    9 ай бұрын

    Pretty sure that would make more of an exponential curve though, the one I saw in the video was a bit too linear...

  • @n3ishere

    @n3ishere

    9 ай бұрын

    @@erikkonstas not necessarily, if it got stuck on some pages that in some way link to each other in a loop, it could be linear like that as more spiders go there and get stuck in a repeating loop (source: ive made web spiders before and this was a problem i had to fix with it)

  • @some1and297

    @some1and297

    9 ай бұрын

    Yeah, I mean is this case I can't imagine bytedance designing a production webcrawler so terrible it can't cache URLs. It might have more to do with unique get request parameters being generated from page links.

  • @n3ishere

    @n3ishere

    9 ай бұрын

    @@some1and297 unless the loop has enough pages that the cache gets cleared beforehand

  • @erikkonstas

    @erikkonstas

    9 ай бұрын

    @@some1and297 I don't think crawlers should take into account whatever follows a question mark in the URL... like yes, there might be that one rare case where it doesn't mean what we think it means, but come on, it's just a spider...

  • @notniko6914
    @notniko69149 ай бұрын

    Sue them for the 10$

  • @Howtheheckarehandleswit

    @Howtheheckarehandleswit

    9 ай бұрын

    It's ByteDance, unless they do something bad enough to spark an international incident, the CCP will protect them from the consequences of their actions

  • @new_simsons

    @new_simsons

    9 ай бұрын

    Bruh

  • @adorable_yangire

    @adorable_yangire

    9 ай бұрын

    ​@@new_simsonsBruh translate to English

  • @new_simsons

    @new_simsons

    9 ай бұрын

    @@adorable_yangire wtf?

  • @Roach18

    @Roach18

    9 ай бұрын

    ​@@adorable_yangireToo bad, I translate to polish

  • @OROO111
    @OROO1119 ай бұрын

    Thank you, I have the exactly same problem with my website, I don't even host anything on that website besides the default WordPress website but I had a huge amount of "users" accessing my site

  • @Zowiezo101

    @Zowiezo101

    9 ай бұрын

    Yeah, I'm very glad to know about this as I have my own website as well and now I am prepared if this would happen to me!

  • @erikkonstas

    @erikkonstas

    9 ай бұрын

    Were they all "ByteSpider"?

  • @rkvkydqf
    @rkvkydqf9 ай бұрын

    Since auto-regressive language models are so trendy these days, and there might be fears of export bans for using already collected corpus like CommonCrawl, they might be trying to build their own. Maybe some Snapchat-esque annoying "friend" for lonely teens.

  • @piemadd

    @piemadd

    9 ай бұрын

    Bytespider has been active for years not (4ish last I checked) so this isn't anything new.

  • @blikthepro972
    @blikthepro9729 ай бұрын

    knowing how tiktok spies on and tracks phones like crazy, their web crawlers being extremely overkill just to scrape every last bit of data makes sense

  • @haileymccurry3756

    @haileymccurry3756

    9 ай бұрын

    google et al spies on and tracks phones like crazy and yet thier crawlers are doing fine

  • @blikthepro972

    @blikthepro972

    9 ай бұрын

    @@haileymccurry3756 true, but google's tracking is still not as bad as tiktok's. how "not as bad" it is i don't know, but that's the vibe i have gotten over the years

  • @internet_userr

    @internet_userr

    9 ай бұрын

    Bing Chilling

  • @RAFMnBgaming

    @RAFMnBgaming

    9 ай бұрын

    @@haileymccurry3756 google is certainly more practiced at "keeping their heads down", insofar as that's possible for one of the biggest companies around.

  • @nicepotato5755

    @nicepotato5755

    9 ай бұрын

    the tiktok thing is mostly propaganda, all major tech companies do this.

  • @XeZrunner
    @XeZrunner9 ай бұрын

    5:46 Have you tried contacting their email address from the UA string? In case it is a legitimate issue, they might want to hear about it.

  • @U20E0

    @U20E0

    9 ай бұрын

    if they are actually just making a search engine, i doubt they want to waste their own resources like this

  • @foreskin

    @foreskin

    9 ай бұрын

    I dont mean to be that guy but they probably legitimately dont care since its already been brought up multiple times before matt

  • @U20E0

    @U20E0

    9 ай бұрын

    @@foreskin probably.

  • @FirstLast-gw5mg

    @FirstLast-gw5mg

    9 ай бұрын

    If they ignore robots.txt I don't think it's likely that they care much about complaining emails.

  • @XeZrunner

    @XeZrunner

    9 ай бұрын

    @@foreskin In that case, I agree with blocking them in this scenario.

  • @chaosmagican
    @chaosmagican9 ай бұрын

    10 bucks for 100GB? Jeez, I'm paying 1€ per TB over here, that is just straight up robbery

  • @sesad5035

    @sesad5035

    9 ай бұрын

    10 aussie dollars.

  • @thewhitefalcon8539

    @thewhitefalcon8539

    4 ай бұрын

    Sounds like cloud. I get 1€ per TB too.

  • @sosman64

    @sosman64

    2 ай бұрын

    ​@@sesad5035then its even more robbery

  • @alex13902

    @alex13902

    Ай бұрын

    ​@@thewhitefalcon8539for bandwidth? Or storage. The two are very very different

  • @MagicalPhi
    @MagicalPhi9 ай бұрын

    Now to find out if it was Tik or Tok who was responsible for this.

  • @Junimeek

    @Junimeek

    9 ай бұрын

    or their lost cousin Tak

  • @SomeRandomPiggo

    @SomeRandomPiggo

    9 ай бұрын

    Definitely Tek

  • @TheRedOwl

    @TheRedOwl

    9 ай бұрын

    I'm pretty sure it was Tuk

  • @havesomerespectandspoilthe5880

    @havesomerespectandspoilthe5880

    9 ай бұрын

    Or Tyk, if they ever come out of exile

  • @Zowiezo101

    @Zowiezo101

    9 ай бұрын

    Don't forget Tyk and Tøk

  • @CoolJosh3k
    @CoolJosh3k9 ай бұрын

    Would be nice is certain user agents, like web crawlers, had a default limit on how often they could access the site. While obvious malicious crawlers could get around this, reputable ones that wish to stay whitelisted by default would obey.

  • @notalostnumber8660

    @notalostnumber8660

    9 ай бұрын

    You can make php scripts to rate limit bot crawlers based on user agent In fact, you can try to Denial-of-Service them by using a GZip bomb or PNG/GIF/WebP bomb, since those can look legitimate, but end up causing havok for a short while

  • @erikkonstas

    @erikkonstas

    9 ай бұрын

    "reputable ones that wish to stay whitelisted by default would obey" nah, malicious would just become the new reputable.

  • @Zettymaster

    @Zettymaster

    9 ай бұрын

    UAs are super easy to spoof (since they are supplied by the software that SENDS the request) so that would only force them to crawl using spoofed UAs, which they allegedly already do.

  • @CoolJosh3k

    @CoolJosh3k

    9 ай бұрын

    @@Zettymaster Oh. Then I stand corrected.

  • @someguy4915

    @someguy4915

    9 ай бұрын

    This is in part what robots.txt is for but as the video shows, ByteSpider does not obey robots.txt... Used to be, for crawlers to not get blocked by everyone, they had to obey robots.txt, seems like ByteDance didn't get the memo...

  • @shadowtheimpure
    @shadowtheimpure9 ай бұрын

    The old adage applies here: Never attribute to malice what can be easily attributed to incompetence.

  • @GreyMaria

    @GreyMaria

    9 ай бұрын

    Found the ByteDance employee

  • @shadowtheimpure

    @shadowtheimpure

    9 ай бұрын

    @@GreyMariaWhat? I'm literally calling them stupid rather than malicious. Their web crawlers are not malicious, just very poorly coded.

  • @itsTyrion

    @itsTyrion

    9 ай бұрын

    @@GreyMaria it's literally just "Hanlon's razor"

  • @MrTriple3D

    @MrTriple3D

    9 ай бұрын

    evil people make it look like incompentence when it really is malice.

  • @erwannthietart3602

    @erwannthietart3602

    9 ай бұрын

    ​@@MrTriple3Dthe problem is, if we apply this idea to everytime incompetance looks evil, you may unjustly treat something actually incompetant, which can be just as useful tk the "evil people" as hiding behind a veil of incompetance

  • @MyHandleIsAplaceholder
    @MyHandleIsAplaceholder9 ай бұрын

    I believe Bytedance wants to create a new Chinese web browser to compete with the blocked ones

  • @zyxwv

    @zyxwv

    9 ай бұрын

    Another? What about TouTiao?

  • @IdentifiantE.S

    @IdentifiantE.S

    9 ай бұрын

    @@zyxwvWhat is Tatiao ?

  • @zyxwv

    @zyxwv

    9 ай бұрын

    @@IdentifiantE.S A Chinese Web Browser by Bytedance.

  • @hi12167pies

    @hi12167pies

    9 ай бұрын

    make a browser to compete with browsers chinese people can't even access 💀

  • @f3rny_66

    @f3rny_66

    9 ай бұрын

    not a browser, but a search engine, I had the same bot and also PetalBot, from the huawei people and their search engine petal crawling client servers. But it can be filtered tho, just needs configuration. bytespyder is banned by default in AWS iirc

  • @CoolJosh3k
    @CoolJosh3k9 ай бұрын

    Assuming it really was Byte Dance, I expect this was not intended behaviour. It would cost them bandwidth too, though maybe so little in comparison that it just looks like regular background noise. An earlier alert would be been very useful here.

  • @Thesnugglebottom

    @Thesnugglebottom

    9 ай бұрын

    They would be doing this tho thousands if not millions of sites though so the bandwidth in their side would be gigantic

  • @f3rny_66

    @f3rny_66

    9 ай бұрын

    is the cost of bussines, just like google crawls the web, the issue with bytespyder and other chinese bots is that ignores robots.txt and other shady stuff

  • @spykillergames8402

    @spykillergames8402

    9 ай бұрын

    it probably was....as i reckon they are using iamges from his site to train an AI model...modified webcrawlers can do that thing

  • @CoolJosh3k

    @CoolJosh3k

    9 ай бұрын

    @@Thesnugglebottom I figure maybe it is so rare that only very few sites would have the issue.

  • @JordanPlayz158

    @JordanPlayz158

    9 ай бұрын

    ​@@Thesnugglebottombut if they know they are using a ton, they may opt for servers with no bandwidth limit

  • @Mark_Rober
    @Mark_Rober9 ай бұрын

    "and i thought charli d'amelio was the worst thing bytedance had done to me" The description is the best part of this video XD

  • @mrscrewu1199
    @mrscrewu11999 ай бұрын

    Feel like cloud flare should detect this sort of activity and automatically block the user agent. At least temporarily too see if it stops or continues. Instead of suspending the client.

  • @diegopescia9602
    @diegopescia96029 ай бұрын

    Luckily your site has a static limit with a fixed price. Imagine the costs if it were an uncapped pay-as-you-go service like most cloud services

  • @kur0kiba
    @kur0kiba9 ай бұрын

    i thought for sure that it would be the same as a friend of mine had about 10 or more years ago. he kept a travel log website where he uploaded photos to because he was a nerd who liked to travel. he eventually visited the original Starbucks and uploaded a picture of the original logo. a more popular website used the photo but they didn't download and host the photo themselves. instead they just linked to the photo so when you loaded up the more popular website it would give your browser a link to where the photo was located on my buddies website so it could display it. his traffic skyrocketed. i believe it has a name for when people do this but i don't know it. he did find a fix for it where any website linking to any photo on his website like that would then be blocked.

  • @erikkonstas

    @erikkonstas

    9 ай бұрын

    I've seen the term "hotlinking" for that, and yep, that's exactly why it's frowned upon.

  • @DarkGob

    @DarkGob

    9 ай бұрын

    It's called hotlinking, and has been a discouraged practice for decades.

  • @HappyTinfoilCat

    @HappyTinfoilCat

    9 ай бұрын

    That's when you swap out the photo for something like goatse

  • @thewhitefalcon8539

    @thewhitefalcon8539

    4 ай бұрын

    It's called hotlinking and it's traditional to change the picture to pron. That could be illegal in some countries though.

  • @thebunsenburner
    @thebunsenburner9 ай бұрын

    That's a wild ride for sure.

  • @SeraphimKnight
    @SeraphimKnight9 ай бұрын

    Good thing this is happening in the age of DDOS-prevention. Imagine getting fucked by a spiderbot back in the days when you'd host your website on your home network and your ISP charged you by data usage.

  • @gluttonousmaximus9048
    @gluttonousmaximus90489 ай бұрын

    ...And several years ago here I simply failed to see several of the classic cartoon blogs simply because I was detected using VPN. Tom Scott has warned us well. The internet is a cesspool of patchwork offense and defense, shady strategies and clumsy turf war.

  • @DorAntCr
    @DorAntCr9 ай бұрын

    It's always a good day when Matt uploads a new video. And rants about a random company as well.

  • @robyc9545
    @robyc95459 ай бұрын

    Kinda irony that your sub is 404k now. Stay safe out there

  • @airnith
    @airnith9 ай бұрын

    this is very useful information. I been thinking about putting together a website for some friends, so now I know that I might need to look out for this.

  • @jonmayer
    @jonmayer9 ай бұрын

    I'm interested if you could get a response or not by emailing the support. Probably not, but it would be funny to see their reply.

  • @TravellingTARDIS
    @TravellingTARDIS9 ай бұрын

    funny you use the spider-man 3 in that clip about bytespider because im fairly certain the font from the bytespider logo is that same one from the sam raimi spider-man films lmao

  • @nj5374
    @nj53749 ай бұрын

    Surely as this becomes more common cloudflare may begin to implement a catch for similar overzealous crawlers?

  • @imaxvi
    @imaxvi9 ай бұрын

    “im not that popular” hits hard 😭

  • @csbauder
    @csbauder9 ай бұрын

    Really interesting stuff. I've considered making a website before, but I wasn't aware of stuff like this. Thanks for the heads-up!

  • @cptpotatoface386
    @cptpotatoface3869 ай бұрын

    This reminds me when i had a minecraft server running for me and my friends. Woke up one day and went to check on it to see the that the server command window was full or disconnected messages. Did some stuff like editing the hosts file to make it redirect the IP back to itself or simular (prob did nothing) but eventually just went with running malwarebytes since it blocks suspicious requests

  • @ToadyEN
    @ToadyEN9 ай бұрын

    Worth noting that Twitter / X and lots of other sites have stopped bots from crawling them now, something todo with them training their AI with content from their sites.

  • @erikkonstas

    @erikkonstas

    9 ай бұрын

    Except that I believe Twitter's case has become common knowledge to a wider audience, because, well, it did hit actual people with rate limits often too.

  • @Sammysapphira

    @Sammysapphira

    9 ай бұрын

    Facebook and KZread are obviously rate limiting. I get the same posts nonstop on Facebook for literal weeks no matter how many times I refresh or even if I open it on a different device. A lot of people are getting the same behavior. Twitter was just rhe only ones that were public about it.

  • @official-obama

    @official-obama

    9 ай бұрын

    @@Sammysapphira it would go "oh no! something went wrong and we can't tell you" instead of doing that. it might be caching or nobody's posting anything

  • @zyxwv
    @zyxwv9 ай бұрын

    I would find it strange for them to be making a new SE. I believe TouTiao would not really need a remake, saying as it already has over 100 million users daily

  • @rkvkydqf

    @rkvkydqf

    9 ай бұрын

    Since auto-regressive language models are so trendy these days, and there might be fears of export bans for using already collected corpus like CommonCrawl, they might be trying to build their own. Maybe some Snapchat-esque annoying "friend" for lonely teens.

  • @zyxwv

    @zyxwv

    9 ай бұрын

    @@rkvkydqf That does make a lot of sense. However, googling the issue in the video (ByteSpider) shows that this has been going on for a long time. I saw a Stack Overflow post from 2019.

  • @v1mja
    @v1mja8 ай бұрын

    I work at a cloud provider. We have a wide band of customers and I'm afraid to say that we have seen all sorts of issues with search engine bots. Not just from fringe ones either. Even the large ones can cause weird issues. The problems we have observed include big spikes in PHP-FPM processes, tens gigabytes of cache being generated by weird access patterns and even extremely high database loads... Funny how that goes sometimes.

  • @kennethbeal
    @kennethbeal9 ай бұрын

    Thank you, excellent analysis!

  • @General12th
    @General12th9 ай бұрын

    Hi Matt! I love storytime with Matt! You're really fun to listen to.

  • @8ullfrog
    @8ullfrog9 ай бұрын

    It's a shame you can't invoice them for the bytef**king they did.

  • @LilacMonarch

    @LilacMonarch

    7 ай бұрын

    I mean, you can still try. Just send an invoice and see if they'll pay it lol

  • @___aZa___
    @___aZa___9 ай бұрын

    always happy to see you upload :)

  • @Jergling
    @Jergling9 ай бұрын

    The fact that the requests were coming from seemingly random Singapore IPs still suggests a botnet. I wonder if there's a Bytedance app doing ill-conceived distributed computing in the background. You wouldn't need any kind of app permissions to browse the web, nor would any one user notice it the way crypto leech apps tend to be noticed.

  • @d9zirable

    @d9zirable

    9 ай бұрын

    nah singapore is just a colony of china

  • @zwz.zdenek

    @zwz.zdenek

    2 ай бұрын

    They are not random at all, they are ranges owned by cloud services.

  • @niepytajdl
    @niepytajdl9 ай бұрын

    truly a chinese moment

  • @JohnLasseter-ct5in
    @JohnLasseter-ct5in9 ай бұрын

    Old man yells at cloud

  • @matthewforan6397
    @matthewforan63979 ай бұрын

    I've also noticed a ton of traffic from Singapore recently, and my domain just has the default parking page!

  • @Serverfrog
    @Serverfrog9 ай бұрын

    fail2ban with BadBots Rule should also do the job ;) then it would already block the IP Address temporarily in iptables (or other Firewall thing that fail2ban was configured), which reduces more the Traffic they will produce

  • @TheFinnishTechie
    @TheFinnishTechie9 ай бұрын

    You KNOW it’s going to be a good day when MattKC posts a video. Keep up the good work man

  • @wchorski
    @wchorski9 ай бұрын

    Please more content like this. I host websites and services and this helps me keep up on new threats and how to deal with them

  • @wesleyfournier6278
    @wesleyfournier62789 ай бұрын

    cheers on the psa, the more people that share knowledge like this in unbiased ways like this the safer we can all be on the interwebs :)

  • @RetroJack
    @RetroJack9 ай бұрын

    Handy to know - thanks for the heads-up!

  • @JulianR2JG
    @JulianR2JG9 ай бұрын

    New video from Mr. LEGO Island

  • @pcislocked
    @pcislocked9 ай бұрын

    ur uncached traffic ratio is really low tbh, maybe also take a look at that to take more load from your webserver.

  • @realcrashie
    @realcrashie9 ай бұрын

    Not the type of MattKC video we expected, but the one we deserved. Always happy to see you have uploaded, no matter the content ❤

  • @LethalBubbles
    @LethalBubbles9 ай бұрын

    gotta love their use of the spider-man movie font

  • @ruairim2283
    @ruairim22839 ай бұрын

    Openly showing this is the best thing you can do. Even if you can't prove this is malicious, you're still providing info for the Internet. Maybe more OCD users will get to it. Who knows?

  • @JTCF
    @JTCF9 ай бұрын

    That was a nice reminder to check my home server nginx access logs. Thank god I set it up correctly before opening up to the world.

  • @mandarina1367

    @mandarina1367

    9 ай бұрын

    set it up in a way to avoid this from happening?

  • @grubdotwebsite
    @grubdotwebsite9 ай бұрын

    ByteSpider's logo using the Raimi Spider-Man font is incredibly silly

  • @donutsndcoffee
    @donutsndcoffee9 ай бұрын

    Friggin fascinating mate

  • @Boxuga
    @Boxuga9 ай бұрын

    All the crazy tech corporations been in the news recently LTT, now Bytedance again its crazy and also keep up the good work MattKC

  • @nunyabiznesse6917

    @nunyabiznesse6917

    9 ай бұрын

    They always have been on the news though

  • @Tigermoto

    @Tigermoto

    9 ай бұрын

    All two? Have i missed something?

  • @Pesthuf
    @Pesthuf9 ай бұрын

    I hope they won't stop using that user agent string, or else you've got an issue. It's weird how they give you that string, but do everything else in their power to stop you from blocking their crawler.

  • @Kyrmana
    @Kyrmana9 ай бұрын

    Happy 404k subs! 😄

  • @Napert
    @Napert8 ай бұрын

    giving them the benefit of the doubt is like giving a serial killer a benefit of the doubt it's just moronic

  • @autiboy08
    @autiboy089 ай бұрын

    Hi Matt, love the content you make! Looking forward to this watch!

  • @jps915

    @jps915

    9 ай бұрын

    nn

  • @grass6317
    @grass63179 ай бұрын

    3:11 who tf uses android 5.0

  • @rockpie

    @rockpie

    5 ай бұрын

    People who don't want to upgrade

  • @slipperynickels
    @slipperynickels6 ай бұрын

    wow. being unable to view my own access logs because my website’s bandwidth allocation has run out would be a MASSIVE dealbreaker for me. that is ridiculous.

  • @supernenechi
    @supernenechi9 ай бұрын

    My mail server logs were an absolute mess before. I then selected a bunch of countries that I don't care about and blocked them in my router. And suddenly? Silence. Peace and tranquility

  • @JustPyroYT
    @JustPyroYT9 ай бұрын

    How's the Lego island decompilation doing?

  • @johnsmith34
    @johnsmith349 ай бұрын

    Another thing to note is that your site doesn't have a robots.txt file. I can't say if it matters though.

  • @CaptainGibbons

    @CaptainGibbons

    9 ай бұрын

    The screenshot he showed said they didn't respect it anyways.

  • @yukimoe
    @yukimoe9 ай бұрын

    I remember it happened to me years ago with another one of these Chinese crawlers, I think it was Yandex or something Those guys never learn

  • @mos6581com
    @mos6581com9 ай бұрын

    These guys are a pain in the ass to block, you can't even just blackhole the entire bytedance IP range because the gits operate the crawler from other AS numbers. They're constantly in my home servers logs.

  • @Geomedge
    @Geomedge9 ай бұрын

    New Matt KC video 🎉

  • @pdlbackup
    @pdlbackup9 ай бұрын

    I love this shorter type of informational video! It took me a bit to notice the lack of music, which might be why something felt off to me. Also not seeing you in the video as much as usual felt different. Totally not against you experimenting with it though, cause I can imagine that it would save some time on making the video and for me I don't think the video really suffered too much from it.

  • @Space_Reptile
    @Space_Reptile9 ай бұрын

    it seems to be grabbing every single image file on your forum block that thing asap

  • @MirrorsEdgeGamer01
    @MirrorsEdgeGamer019 ай бұрын

    I just saw a webpage about them launching a search engine on china.

  • @Justinjaro
    @Justinjaro9 ай бұрын

    I got hit by the same thing the past two weeks. My servers were getting every port and route scanned and pinged every half second.... like damn. Spending at least 20-30 mins a day adding IP's to the block list.

  • @justaneric

    @justaneric

    9 ай бұрын

    Interesting how there can be that many (IP addresses/computers) scanning your website, TikTok really needs to stop this and fix their crawler.

  • @chrisakaschulbus4903
    @chrisakaschulbus49039 ай бұрын

    I was a webmaster once. Then i realised that i want to get older than 30 and stopped.

  • @brycem8161

    @brycem8161

    9 ай бұрын

    That bad?

  • @ThePenisMan
    @ThePenisMan9 ай бұрын

    So a tech company with way too many resources incompetently played with tech and now everyone else has to pay for it

  • @Daniel-hz6pt
    @Daniel-hz6pt4 ай бұрын

    They’re almost certainly collecting mass training data for a new AI model

  • @7isAnOddNumber
    @7isAnOddNumber9 ай бұрын

    Oh hey it’s the Lego island guy

  • @Rainmotorsports
    @Rainmotorsports9 ай бұрын

    Didn't see the email contents on mobile but if your provider wouldn't spin the VPS up with the external IP blocked so you could access it through a virtual console id probably ditch them lol.

  • @burp2019

    @burp2019

    9 ай бұрын

    the VPS provider likely wouldn't know what was going on and he only got to it after they locked it

  • @erikkonstas

    @erikkonstas

    9 ай бұрын

    That could very well open it up to abuse tho... no, the client wouldn't earn anything from the abuse, but if the client is evil-minded and delusional they can wreak havoc like that.

  • @Rainmotorsports

    @Rainmotorsports

    9 ай бұрын

    @@burp2019 You aren't saying anything against this though. Spinning the server up with no connection to the outside world allows the customer to access their logs. Virtual console is a method to replace the crash cart you would use if you were inside the data center.

  • @Rainmotorsports

    @Rainmotorsports

    9 ай бұрын

    @@erikkonstas How? All you are allowing a customer to do is see their logs and make config changes before deciding what to do. Selling them more bandwidth first is in poor faith and might not last long enough to solve the issue. With absolutely no connection to the outside world except a virtualized KB/VGA which by the way is soo much worse than using an SSH client there isn't much you can do. You won't be able to install software thats not on the machine, you wont be able to backup and retrieve your files. You can enter text and take screenshots thats about it.

  • @erikkonstas

    @erikkonstas

    9 ай бұрын

    @@Rainmotorsports Is that actually very common...??? I was thinking the SSH or similar way, where you could just have another VPS with your credentials stuck to it, but which is open to the whole world, and totally not what is intended to be allowed.

  • @RhodderzX
    @RhodderzX9 ай бұрын

    Had few run-ins with this as well, it even ignored robots.txt majority of the time as well so added few rules on CF to just drop it.

  • @muffinking3149
    @muffinking31499 ай бұрын

    i call this a dub

  • @bosch5303
    @bosch53039 ай бұрын

    Yoo new mattkc viv

  • @Sharan25
    @Sharan259 ай бұрын

    Matt KC is back fr

  • @donatj
    @donatj9 ай бұрын

    Have you sent an email to the feedback email address in the user agent string?

  • @JoshuaPeisach
    @JoshuaPeisach9 ай бұрын

    Once again, screw TikTok

  • @Ampd-647

    @Ampd-647

    9 ай бұрын

    yt shorts is shit

  • @DrakkarCalethiel

    @DrakkarCalethiel

    9 ай бұрын

    Would love a global ban of that CCP spyware that degenerates humanity. But we all know that this will never happen...

  • @Jayenkai
    @Jayenkai9 ай бұрын

    Yep, I had to block them last week. It's an evil little runt.

  • @SoLemerald
    @SoLemerald9 ай бұрын

    Its funny that he made the spiderman reference because it uses the Toby Maguire spiderman font

  • @MarcoGPUtuber
    @MarcoGPUtuber9 ай бұрын

    Never watched Tiktok. Never will. Definitely will not now.

  • @Zair_Ahmed_1313

    @Zair_Ahmed_1313

    9 ай бұрын

    Same

  • @griffonboi

    @griffonboi

    9 ай бұрын

    Your loss I guess I watch lots of tech related content on there.

  • @MarcoGPUtuber

    @MarcoGPUtuber

    9 ай бұрын

    ​@@griffonboi Strange. I don't feel any loss.

  • @Iaotle

    @Iaotle

    9 ай бұрын

    not sure how the company having a crawler makes you more confident about not watching an unrelated shorts platform

  • @joshfromsmosh3352d

    @joshfromsmosh3352d

    9 ай бұрын

    ​@@Iaotlewhy do they even need one in the first place then?

  • @paulinet68
    @paulinet689 ай бұрын

    not people commenting on the video before even watching it assuming this is about people who publish content on tiktok and immediately jumping the gun, noo, that could never happen to a video that's actually about a web crawler

  • @Junimeek

    @Junimeek

    9 ай бұрын

    considering that tiktok creators are infamous for committing intentionally malicious acts completely unrestricted, i think that makes perfect sense personally

  • @monkeypox21
    @monkeypox219 ай бұрын

    NEW MATTKC FINALLY

  • @bananapl0
    @bananapl09 ай бұрын

    The reveal on stream was epic.

Келесі