I Made This Open-Source Project

Ғылым және технология

After MONTHS, I finally made another open-source project. This one was a ton of fun to build and I hope to turn this into an API we can all benefit from with any user-generated data on our web apps.
-- links
website: www.profanity.dev/
github (leave a ⭐ pls thx): github.com/joschan21/profanit...
I'll post a complete build on this API on my second channel (linked below) soon!
-- my links
second channel (in depth videos): / @joshtriedupstash
newletter: www.joshtriedcoding.com/
discord: / discord
github: github.com/joschan21

Пікірлер: 221

  • @phsopher
    @phsopher23 күн бұрын

    Disappointed. I thought it was gonna be an API that serves profanity.

  • @ShadowOcto

    @ShadowOcto

    23 күн бұрын

    fr 😢

  • @wlockuz4467

    @wlockuz4467

    23 күн бұрын

    Ferb, I know what we're building today!

  • @unbiasedperson1155

    @unbiasedperson1155

    23 күн бұрын

    Okay ,let's build an open source profanity maker that bypasses this apis check.😺

  • @anhdunghisinh

    @anhdunghisinh

    23 күн бұрын

    ​@@unbiasedperson1155that's a great idea

  • @akam9919

    @akam9919

    23 күн бұрын

    @@anhdunghisinh YEAH! F PROFANITY FILTERS!

  • @ChristianKolbow
    @ChristianKolbow23 күн бұрын

    funny but ... "You son of a mother" - profanity "fucking awesome" - profanity "damn, that's great" - profanity

  • @rxn7

    @rxn7

    12 күн бұрын

    well, "fucking awesome" is in fact profane

  • @visu7135

    @visu7135

    10 күн бұрын

    "see you" is profanity :) the API sucks tbh

  • @albert_ac1045

    @albert_ac1045

    10 күн бұрын

    that is why he implemented the score system i think... but is open source, if you want, you can modified or see how he build it... btw... fucking awesome makes sense.. damn also.. and depend of the context, "you son of a mother" too... XD

  • @CornerKingsReal

    @CornerKingsReal

    8 күн бұрын

    those are profanities though

  • @smithrockford-dv1nb

    @smithrockford-dv1nb

    8 күн бұрын

    @@visu7135 It's too short to be accurate...

  • @luckysolanki9440
    @luckysolanki944023 күн бұрын

    Google's content moderation api is the best as it gives seperate score for each field like insulting , toxicity, etc, accurately and doesn't take much time and also it's free

  • @gregthomas5887
    @gregthomas588723 күн бұрын

    I typed "Son of a mother" and it responded with profanity detected

  • @viriv

    @viriv

    23 күн бұрын

    lmaoo

  • @_the_mohamed

    @_the_mohamed

    23 күн бұрын

    I tried "No need to waste more oxygen, just do it

  • @elvis_gastelum

    @elvis_gastelum

    23 күн бұрын

    That’s the beauty of open source, now more people can contribute to fix this edge cases in theory right?

  • @nirajkhatiwada6696

    @nirajkhatiwada6696

    23 күн бұрын

    I typed "daughter of a father" and it says "Crispy clean input, no profanities" . LMAO!

  • @elry-tyrogames

    @elry-tyrogames

    23 күн бұрын

    @@elvis_gastelum Why work on a half assed not working project tho ?

  • @oskarsmusic865
    @oskarsmusic86523 күн бұрын

    I typed "I fucking love pizza" and it responded "OH GOD, VERY BIG PROFANITY DETECTED!!! "

  • @ValipPowa

    @ValipPowa

    12 күн бұрын

    fucking is profanity

  • @thatonecoder737
    @thatonecoder73723 күн бұрын

    🚨🚨😱😱 OH GOD, VERY BIG PROFANITY DETECTED!! 🚨🚨😱😱 score (higher is worse): 1.000 and I typed "mosquitos suck blood" lol

  • @pastori2672

    @pastori2672

    23 күн бұрын

    acoustic model

  • @yichenchong7728

    @yichenchong7728

    22 күн бұрын

    "suck" is a banned word if you look at his training data

  • @ilonachan

    @ilonachan

    5 күн бұрын

    ​@@yichenchong7728 except it's also a normal word that's fine to use in official conversation when the concept comes up. So putting it in a blacklist is objectively incorrect. But hey, it's the best one can do with a system that can't understand context, which is why it's not worth trying to make such a system.

  • @gabrielesilinic
    @gabrielesilinic23 күн бұрын

    Btw, consider choosing a license. Technically this is not really open source yet, you just uploaded the code on the web and hoped for the best. In case you want to keep it simple there is BSD license or MIT license that is very short, but in case you want something more solid year may want to choose the Apache license that is not as different from MIT but as a bunch of legalese to protect your ass from patent trolls and contributors with malicious intent. Then there are also copyleft open source licenses like gpl though I am not a fan of those, it is not my idea of freedom.

  • @chrislgr23

    @chrislgr23

    22 күн бұрын

    chill out harvey specter

  • @ativerc

    @ativerc

    22 күн бұрын

    Is there a website for me to quickly read about and select Licenses?

  • @gabrielesilinic

    @gabrielesilinic

    22 күн бұрын

    @@ativerc so, KZread is very big brain so it removed my comment where I was trying to help you cuz it was an URL. Anyway. There is choosealicense that is a website made by GitHub. Also whenever you add a file from GitHub UI and it's name contains the word license GitHub will offer you a license picker. For more complex commercial scenarios case you are a business there is also a specific source available license that lets your software convert to open source after a set amount of time from publication, it is the functional source license, but most people got by with open source licenses, generally, if you are unsure just make coffee and read them.

  • @gnsf

    @gnsf

    22 күн бұрын

    ​@@ativercfrom GitHub there is "choose a license" which you may search up

  • @davepeace603

    @davepeace603

    19 күн бұрын

    oh damn.. really? isnt it open source if like you said he just uploaded the code on the internet?

  • @yichenchong7728
    @yichenchong772822 күн бұрын

    the type 1 error on this tool makes it kind of unusable. my favorite perfectly normal prompts that get detected as profanity: - "double slit experiment" - "single pen" / "pen test" - "toxic person" - "Abbie Lee" (possible person name) - "garden hoe" - "what a jerk" (i suppose some people might think this is profane)

  • @devinlauderdale9635
    @devinlauderdale963523 күн бұрын

    Josh, can you make a video about how to train a tensor model?

  • @lee.g.v

    @lee.g.v

    23 күн бұрын

    This

  • @Totomenu

    @Totomenu

    23 күн бұрын

    yes please

  • @IvyCreamMathieu
    @IvyCreamMathieu23 күн бұрын

    A fucking great project

  • @ashishsharma__

    @ashishsharma__

    23 күн бұрын

    Profanity DETECTED (score 99999) 😂😂

  • @Fullflexno
    @Fullflexno22 күн бұрын

    Supercool project, Cheers from Norway!

  • @NithinJune
    @NithinJune12 күн бұрын

    using vector embeddings is actually so creative i love it

  • @xav_624
    @xav_62423 күн бұрын

    It would be awesome to see some content on how you trained your model (costs, services..etc.). I'm looking for that kind of content.

  • @shubhankartrivedi
    @shubhankartrivedi23 күн бұрын

    Holy moly bro, I needed this very badly!

  • @roberth8737
    @roberth873723 күн бұрын

    Interesting concept - similar to Semantic router. A combination approach that filters for single-word profanities and vector similarity for longer sentences that pass the single-word filter would absolutely be a "good enough approach" for most profanity detection use cases.

  • @prajwalaradhya4379
    @prajwalaradhya437923 күн бұрын

    It would be useful which words are profane, in the api response giving a list of words or start and end index of the word, so in the clientside apps, we can replace this with * or something similar.

  • @nro337
    @nro33719 күн бұрын

    congrats on the launch!

  • @blockwhisperers8352
    @blockwhisperers83523 күн бұрын

    I think if you combined the ml model with a word list approach you could improve the accuracy. Basically give the ML output but then look in the blacklist and whitelist to see if that changes the outcome. Best of both worlds. This will also solve the single word issues you had.

  • @SiddharthSharma-ei8os
    @SiddharthSharma-ei8os23 күн бұрын

    Great Project

  • @bkschatzki
    @bkschatzki23 күн бұрын

    Worth looking at how other languages would be handled as well. Saw a PR adding some words from Spanish and I had planned to add some Chinese and Thai, but I saw an issue open about the potential of adding a langs parameter so that clean words and phrases in one language don't trigger the filter in another.

  • @godofwar8262
    @godofwar826223 күн бұрын

    Make a video on minimum standards does a open source project should have for better reach and scalability

  • @adiswa123
    @adiswa12323 күн бұрын

    Curious why you chose to use Upstash Vector db vs Cloudflare's Vectorize? Especially since you're using cloudflare's stack for hosting

  • @v1d300
    @v1d30023 күн бұрын

    I am working on a similar problem of finding similarity between two sentences, they need not be exact but similar words. And I was baffled that there is so simple solution to this, thanks for this I will not look into vector databases.

  • @asmet2701
    @asmet270120 күн бұрын

    Hi I wanna add an e-commerce store app for my portfolio. I wonder which react stack is solid for it in 2024. Can someone suggest something? As a back I would prefer Firebase, also for styling scss+mui but need recommendations about state manager and other technologies and tools. Thanks!

  • @parkerrex
    @parkerrex22 күн бұрын

    Fantastic video Josh

  • @m4rt_
    @m4rt_9 күн бұрын

    Does it filter out ones from other languages? Does it filter out ones with typos? How many normal messages will be considered profanity and will be filtered? Why did you write it in JavaScript/TypeScript? it will be way faster and less error prone if you switch over to a statically compiled language.

  • @Manofthebean
    @Manofthebean14 күн бұрын

    im working on a review website right now and i could use this to flag reviews and put a mature rating on it or something. this is amazing. great job

  • @PrismFave

    @PrismFave

    10 күн бұрын

    doesnt work so well, easily bypassible what i type: "you are so SHlT lol" Crispy clean input, no profanities :)) 👍👍 score (higher is worse): 0.801

  • @PrismFave

    @PrismFave

    10 күн бұрын

    this review website is so A55 rispy clean input, no profanities :)) 👍👍 score (higher is worse): 0.784

  • @Manofthebean

    @Manofthebean

    10 күн бұрын

    @@PrismFave dam I haven't tested it out yet so i dont know but looking on the git yeah im gonna wait until it getes better

  • @kaustubhpatange
    @kaustubhpatange23 күн бұрын

    Could've used the text-embedding-large model that could've packed more information in your embedding model due to it's large dimension which would've improved your accuracy even on large num tokens.

  • @xMrAfonso
    @xMrAfonso13 күн бұрын

    I wonder if there is some type of list of tests people have made with fails? Would love to see the edge cases.

  • @gosnooky
    @gosnooky23 күн бұрын

    There should be some internationalization context added. One of the biggest coffee shops in Vietnam (where I spend time) is Phúc Long. Testing with the string "my favorite coffee shop is phuc long" raises a score of 1.000! Also curious as to why the range is so small - seems it starts at 0.8?

  • @user-he3io6lo9t
    @user-he3io6lo9t23 күн бұрын

    Exciting! What about different languages. Auto detect language? Explicitly set? One model for all, a lot of models for each language? So much questions🤣

  • @SpektRProduction
    @SpektRProduction23 күн бұрын

    The value of the resource is not very clear, since I can’t paste the whole article (the text is too big) and I can’t understand where exactly the profanity is located

  • @prasanthpedaprolu2261
    @prasanthpedaprolu226123 күн бұрын

    may be training on twitter tweets can make this model perform well

  • @Thomas777m1
    @Thomas777m111 күн бұрын

    For the very short texts why don't you just pad out the input text with neutral words?

  • @practicaluseof
    @practicaluseof23 күн бұрын

    Very nice, what softwares are you using to make your videos? Share screen and show your face at the same time?

  • @TellTobler
    @TellTobler22 күн бұрын

    Would be awesome if you could make a tutorial why you use Hono over Express :) for your api

  • @davidsiewert8649
    @davidsiewert864921 күн бұрын

    @joshtriedcoding why do still use yarn in 2024? Either pnpm or bun are better in every category

  • @blockshift758

    @blockshift758

    11 күн бұрын

    New doesn't equal better.

  • @mjddev
    @mjddev12 күн бұрын

    Important to note that although the source is viewable on GitHub, this is not currently classed as as "Open Source" software as it lacks a license. See issue #6 on the GitHub repo.

  • @paullouppe9947
    @paullouppe994723 күн бұрын

    Does it work only for english ? would you be interested to open it to other languages ?

  • @MateuszWierzejski

    @MateuszWierzejski

    3 күн бұрын

    It seems so to only work for English as foreign languages (like polish) didn’t flag these swear words as profanity

  • @enic-ma
    @enic-ma23 күн бұрын

    Everybody is scared of KZread demoneytization! Just chill and keep crushing it!

  • @herrkatzegaming
    @herrkatzegaming11 күн бұрын

    it doesnt detect profanity in german

  • @Axorax
    @Axorax21 күн бұрын

    Cool project 👍

  • @anasouardini
    @anasouardini23 күн бұрын

    Let's goooo!

  • @haryormedayjoshua281
    @haryormedayjoshua28123 күн бұрын

    Does anyone know what APP he's using to switch app on the left sidebar? I think Theo also use it

  • @petersusan215

    @petersusan215

    23 күн бұрын

    Arc Web browser

  • @igmtink
    @igmtink23 күн бұрын

    sir josh can you make a tutorial how to use rpc of hono with next

  • @joshuarodriguez2219
    @joshuarodriguez221923 күн бұрын

    Ey, what framework did you used to design the website? I love it

  • @joshuarodriguez2219

    @joshuarodriguez2219

    23 күн бұрын

    follow up what do you use to record your videos?

  • @arshgemrie4621
    @arshgemrie462112 күн бұрын

    A question what is your browser

  • @bed_destroyed
    @bed_destroyed12 күн бұрын

    i got pretty sure this is profanity on: THIS IS VERY PROFANE

  • @BrightCode
    @BrightCode23 күн бұрын

    Can we do one for images too?

  • @lilrow4206
    @lilrow42064 күн бұрын

    "This doesn't use AI, just a machine learning model"

  • @NiklasZiermann
    @NiklasZiermann23 күн бұрын

    Insert 'KZread would like to connect to your API' jokes here

  • @lel7531
    @lel753116 күн бұрын

    Basically the score goes from 0.810 to 0.880 seems like there's not a lot of margin for error given "clean input" is 0.840, and limiting the content size drastically reduces it's usefulness After a bit of testing it seems your product is definitely not ready, you should update your landing page as it is not reliable at all.

  • @armandmalci495
    @armandmalci49523 күн бұрын

    Does anyone know what is the app he is using to draw the schemas (min 1:00)?

  • @Shorts4D

    @Shorts4D

    23 күн бұрын

    tldraw

  • @koudy008

    @koudy008

    23 күн бұрын

    It's Excalidraw

  • @zakariazain8790
    @zakariazain879023 күн бұрын

    Thank you

  • @cablesalty
    @cablesalty10 күн бұрын

    Great now I will make a version that creates profanity

  • @_purple_44_
    @_purple_44_4 күн бұрын

    Can it be made to respond which word is profane as well? So that i can just *** it

  • @Michael-Martell
    @Michael-Martell23 күн бұрын

    Cool man!

  • @ovna
    @ovna23 күн бұрын

    👍 Useful

  • @LRSKWTKWSK
    @LRSKWTKWSK23 күн бұрын

    Love it

  • @AmodeusR
    @AmodeusR23 күн бұрын

    That profanity score is very weird. Why the score is always around .8? Why not use the range from 0 to 1?

  • @BoxEnjoyer
    @BoxEnjoyer12 күн бұрын

    Holy moly gets 0.912 🚨😱 BIG PROFANITY DETECTED!! 🚨😱

  • @sal00
    @sal0012 күн бұрын

    wow wow - 🚨 PROFANITY DETECTED!! 🚨

  • @scarlatum
    @scarlatum21 күн бұрын

    Well, it drops when the message is larger than ~750 chars due to the execution time limit. Tokenization makes BOOM

  • @ronitgurjar5747
    @ronitgurjar574723 күн бұрын

    great work Josha🔥🔥🫡

  • @ihsanmohamad521
    @ihsanmohamad52123 күн бұрын

    f@#k!ng great project!

  • @wenelol
    @wenelol11 күн бұрын

    Typed meow meow and the rating was: 😱 PRETTY SURE THIS IS A PROFANITY 😱 score (higher is worse): 0.865

  • @670839245
    @6708392455 күн бұрын

    "what the hell" (0.966) or "what the heck" (0.912) both return profanity. Even if we use the totally safe version of this phrase, "what in the world", it's still profanity (0.859). then how are we supposed to express that idea on the other hand, "I hate this [blank] taco" returns clean for "flipping", "frigging" and "freaking", all of which lesser versions of the F bomb

  • @Renner4k
    @Renner4k10 күн бұрын

    Cool idea but it's super impractical and easy to bypass. Needs some more work because simply chaining 2 swear words together without a space can usually bypass it.

  • @snatvb
    @snatvb23 күн бұрын

    this is really good project, actually you can use it not only for profanity, you can detect ads, span, scam and etc, isn't?

  • @evan_ry
    @evan_ry23 күн бұрын

    tensor model < bunch of ifs

  • @kapa9436
    @kapa943623 күн бұрын

    It`s like semantic search

  • @user-vk6cb1zu7p
    @user-vk6cb1zu7p23 күн бұрын

    I typed "you are very sexy" and it responded with: Crispy clean input, no profanities :))

  • @user-vk6cb1zu7p

    @user-vk6cb1zu7p

    23 күн бұрын

    it's insane!!

  • @Erik-pk8rw
    @Erik-pk8rw14 күн бұрын

    Maybe add something to convert unicode look-a-likes, because those wont get detected

  • @DS-ow2ge
    @DS-ow2ge23 күн бұрын

    Josh, by design this system is fastest when there is profanity, and slowest when there is none. Is it even possible to design one with the opposite? fastest when no profanity, and slowest when there is?

  • @rorymax

    @rorymax

    23 күн бұрын

    Well if you think about it, to declare something as profane you need to find only 1 profanity. However to declare something as clean you need to make sure there are no profanities at all. So in one case you stop when you find a profanity, but in the other case you have to check the whole thing

  • @_ultraviolet
    @_ultraviolet23 күн бұрын

    Why is it so strict? "dumb person" is apparently extremely profane

  • @depralexcrimson

    @depralexcrimson

    23 күн бұрын

    because this is not production ready, it's at best a Proof of Concept. it obviously cannot detect or understand any context, it can just maybe detect bad words, that's it, it doesn't care about context at all.

  • @purpshell
    @purpshell13 күн бұрын

    heard of Akismet?

  • @PrismFave
    @PrismFave10 күн бұрын

    my prompt: "you are so S.HIT at this game" rispy clean input, no profanities :)) 👍👍 score (higher is worse): 0.822 ----------------------------------------------------------------------- my prompt: "you are so SHlT lol" rispy clean input, no profanities :)) 👍👍 score (higher is worse): 0.801

  • @blaizeW
    @blaizeW23 күн бұрын

    cool, but what does “zip in the wire” and “zipperhead” means? 😭

  • @spinxooo
    @spinxooo7 күн бұрын

    "ì I" 😱 PRETTY SURE THIS IS A PROFANITY 😱 score (higher is worse): 0.857 LMAOO

  • @mikaay4269
    @mikaay426923 күн бұрын

    One issue is internationalisation: "Ich geh nach Fucking", is a German sentence without any profanity, because "Fucking" is an actual town.

  • @coopener
    @coopener20 күн бұрын

    The website does not work anymore, since the website uses HSTS.

  • @mrkostya008
    @mrkostya00823 күн бұрын

    Im sorry but why go such an extra mile if OpenAI's Moderation API is free and quite fast at that.

  • @leonardodoujinshi

    @leonardodoujinshi

    23 күн бұрын

    I thought you could only use their API for outputs from their own model and they disallow other usage

  • @j0hnr3x
    @j0hnr3x23 күн бұрын

    The phrases "I love doing it with my sister"(0.802) and "I want to end your life"(0.783) have lower scores than your examples of clean input. I think this needs a lot of work, only obvious profanity gets detected.

  • @TheDragonDesigns
    @TheDragonDesigns9 күн бұрын

    my pen is broken - 😱 PRETTY SURE THIS IS A PROFANITY 😱 you what - 😱 PRETTY SURE THIS IS A PROFANITY 😱 How much have you been drinking - 😱 PRETTY SURE THIS IS A PROFANITY 😱

  • @chiroyce
    @chiroyce11 күн бұрын

    dumb dumb: 🚨😱 BIG PROFANITY DETECTED!! 🚨😱 - 0.937

  • @theminecraft690
    @theminecraft69012 күн бұрын

    The problem is its only English as a German myself i testet the famous german swear wort "hu rr ensohn" and it sayed its not a swear wort

  • @ErrorINAOfficial
    @ErrorINAOfficial4 күн бұрын

    “I can’t say this word because KZread may demonetize the *hell* out of me.”

  • @kushaagr
    @kushaagr23 күн бұрын

    TL;DW It's basically AI... Heck the use of vector database puts it closer to LLM technology.

  • @centdemeern1
    @centdemeern16 күн бұрын

    Isn’t it more effective to use a word list and actual logic? Isn’t that

  • @cheapbucks9590
    @cheapbucks959022 күн бұрын

    The model needs more data

  • @xastralmars
    @xastralmars23 күн бұрын

    what web browser do you use

  • @techworld3255

    @techworld3255

    23 күн бұрын

    That's Arc Browser

  • @tedspens
    @tedspens22 күн бұрын

    So I guess Theo and The Primeagan have been demonetized a long time ago. 🤣🤣

  • @wlockuz4467
    @wlockuz446723 күн бұрын

    I don't really understand the scoring, there is only 0.08 percent difference between something being profane and not profane? Intuitively I would think 0.0 is something normal and 1.0 is definitively profane. The threshold of 0.86 seems arbitrary. This seems like a useful tool but I think most real projects would want something context aware.

  • @theaviationbee
    @theaviationbee11 күн бұрын

    i typed "gfasgda asfga" into the checker and it said it was profanity. might want to fine tune the model a little more it also said "i got a new diamond hoe in minecraft, it has a lot of durability" was profanity. also might want to add context reading.

  • @nicoluvas8486
    @nicoluvas848623 күн бұрын

    Lorem fucking ipsum

  • @TheIpicon
    @TheIpicon23 күн бұрын

    upstash really profit from you working there😂😂

  • @jiM3op
    @jiM3op23 күн бұрын

    stared!

  • @marcuss.abildskov7175
    @marcuss.abildskov717522 күн бұрын

    Why would I want an API for this? There's tons of libraries that solves this.

  • @NaraSherko
    @NaraSherko14 күн бұрын

    Swear! Swear! Swear! gives you 😱 PRETTY SURE THIS IS A PROFANITY 😱

  • @mason8335

    @mason8335

    5 күн бұрын

    "Profanity is bad" = PRETTY SURE THIS IS A PROFANITY

  • @GratuityMedia
    @GratuityMedia23 күн бұрын

    Good fucking video

Келесі