I Made This Open-Source Project

Ғылым және технология

After MONTHS, I finally made another open-source project. This one was a ton of fun to build and I hope to turn this into an API we can all benefit from with any user-generated data on our web apps.
-- links
website: www.profanity.dev/
github (leave a ⭐ pls thx): github.com/joschan21/profanit...
I'll post a complete build on this API on my second channel (linked below) soon!
-- my links
second channel (in depth videos): / @joshtriedupstash
newletter: www.joshtriedcoding.com/
discord: / discord
github: github.com/joschan21

Пікірлер: 221

@phsopher23 күн бұрын
Disappointed. I thought it was gonna be an API that serves profanity.
@ShadowOcto
23 күн бұрын
fr 😢
@wlockuz4467
23 күн бұрын
Ferb, I know what we're building today!
@unbiasedperson1155
23 күн бұрын
Okay ,let's build an open source profanity maker that bypasses this apis check.😺
@anhdunghisinh
23 күн бұрын
@@unbiasedperson1155that's a great idea
@akam9919
23 күн бұрын
@@anhdunghisinh YEAH! F PROFANITY FILTERS!
@ChristianKolbow23 күн бұрын
funny but ... "You son of a mother" - profanity "fucking awesome" - profanity "damn, that's great" - profanity
@rxn7
12 күн бұрын
well, "fucking awesome" is in fact profane
@visu7135
10 күн бұрын
"see you" is profanity :) the API sucks tbh
@albert_ac1045
10 күн бұрын
that is why he implemented the score system i think... but is open source, if you want, you can modified or see how he build it... btw... fucking awesome makes sense.. damn also.. and depend of the context, "you son of a mother" too... XD
@CornerKingsReal
8 күн бұрын
those are profanities though
@smithrockford-dv1nb
8 күн бұрын
@@visu7135 It's too short to be accurate...
@luckysolanki944023 күн бұрын
Google's content moderation api is the best as it gives seperate score for each field like insulting , toxicity, etc, accurately and doesn't take much time and also it's free
@gregthomas588723 күн бұрын
I typed "Son of a mother" and it responded with profanity detected
@viriv
23 күн бұрын
lmaoo
@_the_mohamed
23 күн бұрын
I tried "No need to waste more oxygen, just do it
@elvis_gastelum
23 күн бұрын
That’s the beauty of open source, now more people can contribute to fix this edge cases in theory right?
@nirajkhatiwada6696
23 күн бұрын
I typed "daughter of a father" and it says "Crispy clean input, no profanities" . LMAO!
@elry-tyrogames
23 күн бұрын
@@elvis_gastelum Why work on a half assed not working project tho ?
@oskarsmusic86523 күн бұрын
I typed "I fucking love pizza" and it responded "OH GOD, VERY BIG PROFANITY DETECTED!!! "
@ValipPowa
12 күн бұрын
fucking is profanity
@thatonecoder73723 күн бұрын
🚨🚨😱😱 OH GOD, VERY BIG PROFANITY DETECTED!! 🚨🚨😱😱 score (higher is worse): 1.000 and I typed "mosquitos suck blood" lol
@pastori2672
23 күн бұрын
acoustic model
@yichenchong7728
22 күн бұрын
"suck" is a banned word if you look at his training data
@ilonachan
5 күн бұрын
@@yichenchong7728 except it's also a normal word that's fine to use in official conversation when the concept comes up. So putting it in a blacklist is objectively incorrect. But hey, it's the best one can do with a system that can't understand context, which is why it's not worth trying to make such a system.
@gabrielesilinic23 күн бұрын
Btw, consider choosing a license. Technically this is not really open source yet, you just uploaded the code on the web and hoped for the best. In case you want to keep it simple there is BSD license or MIT license that is very short, but in case you want something more solid year may want to choose the Apache license that is not as different from MIT but as a bunch of legalese to protect your ass from patent trolls and contributors with malicious intent. Then there are also copyleft open source licenses like gpl though I am not a fan of those, it is not my idea of freedom.
@chrislgr23
22 күн бұрын
chill out harvey specter
@ativerc
22 күн бұрын
Is there a website for me to quickly read about and select Licenses?
@gabrielesilinic
22 күн бұрын
@@ativerc so, KZread is very big brain so it removed my comment where I was trying to help you cuz it was an URL. Anyway. There is choosealicense that is a website made by GitHub. Also whenever you add a file from GitHub UI and it's name contains the word license GitHub will offer you a license picker. For more complex commercial scenarios case you are a business there is also a specific source available license that lets your software convert to open source after a set amount of time from publication, it is the functional source license, but most people got by with open source licenses, generally, if you are unsure just make coffee and read them.
@gnsf
22 күн бұрын
@@ativercfrom GitHub there is "choose a license" which you may search up
@davepeace603
19 күн бұрын
oh damn.. really? isnt it open source if like you said he just uploaded the code on the internet?
@yichenchong772822 күн бұрын
the type 1 error on this tool makes it kind of unusable. my favorite perfectly normal prompts that get detected as profanity: - "double slit experiment" - "single pen" / "pen test" - "toxic person" - "Abbie Lee" (possible person name) - "garden hoe" - "what a jerk" (i suppose some people might think this is profane)
@devinlauderdale963523 күн бұрын
Josh, can you make a video about how to train a tensor model?
@lee.g.v
23 күн бұрын
This
@Totomenu
23 күн бұрын
yes please
@IvyCreamMathieu23 күн бұрын
A fucking great project
@ashishsharma__
23 күн бұрын
Profanity DETECTED (score 99999) 😂😂
@Fullflexno22 күн бұрын
Supercool project, Cheers from Norway!
@NithinJune12 күн бұрын
using vector embeddings is actually so creative i love it
@xav_62423 күн бұрын
It would be awesome to see some content on how you trained your model (costs, services..etc.). I'm looking for that kind of content.
@shubhankartrivedi23 күн бұрын
Holy moly bro, I needed this very badly!
@roberth873723 күн бұрын
Interesting concept - similar to Semantic router. A combination approach that filters for single-word profanities and vector similarity for longer sentences that pass the single-word filter would absolutely be a "good enough approach" for most profanity detection use cases.
@prajwalaradhya437923 күн бұрын
It would be useful which words are profane, in the api response giving a list of words or start and end index of the word, so in the clientside apps, we can replace this with * or something similar.
@nro33719 күн бұрын
congrats on the launch!
@blockwhisperers83523 күн бұрын
I think if you combined the ml model with a word list approach you could improve the accuracy. Basically give the ML output but then look in the blacklist and whitelist to see if that changes the outcome. Best of both worlds. This will also solve the single word issues you had.
@SiddharthSharma-ei8os23 күн бұрын
Great Project
@bkschatzki23 күн бұрын
Worth looking at how other languages would be handled as well. Saw a PR adding some words from Spanish and I had planned to add some Chinese and Thai, but I saw an issue open about the potential of adding a langs parameter so that clean words and phrases in one language don't trigger the filter in another.
@godofwar826223 күн бұрын
Make a video on minimum standards does a open source project should have for better reach and scalability
@adiswa12323 күн бұрын
Curious why you chose to use Upstash Vector db vs Cloudflare's Vectorize? Especially since you're using cloudflare's stack for hosting
@v1d30023 күн бұрын
I am working on a similar problem of finding similarity between two sentences, they need not be exact but similar words. And I was baffled that there is so simple solution to this, thanks for this I will not look into vector databases.
@asmet270120 күн бұрын
Hi I wanna add an e-commerce store app for my portfolio. I wonder which react stack is solid for it in 2024. Can someone suggest something? As a back I would prefer Firebase, also for styling scss+mui but need recommendations about state manager and other technologies and tools. Thanks!
@parkerrex22 күн бұрын
Fantastic video Josh
@m4rt_9 күн бұрын
Does it filter out ones from other languages? Does it filter out ones with typos? How many normal messages will be considered profanity and will be filtered? Why did you write it in JavaScript/TypeScript? it will be way faster and less error prone if you switch over to a statically compiled language.
@Manofthebean14 күн бұрын
im working on a review website right now and i could use this to flag reviews and put a mature rating on it or something. this is amazing. great job
@PrismFave
10 күн бұрын
doesnt work so well, easily bypassible what i type: "you are so SHlT lol" Crispy clean input, no profanities :)) 👍👍 score (higher is worse): 0.801
@PrismFave
10 күн бұрын
this review website is so A55 rispy clean input, no profanities :)) 👍👍 score (higher is worse): 0.784
@Manofthebean
10 күн бұрын
@@PrismFave dam I haven't tested it out yet so i dont know but looking on the git yeah im gonna wait until it getes better
@kaustubhpatange23 күн бұрын
Could've used the text-embedding-large model that could've packed more information in your embedding model due to it's large dimension which would've improved your accuracy even on large num tokens.
@xMrAfonso13 күн бұрын
I wonder if there is some type of list of tests people have made with fails? Would love to see the edge cases.
@gosnooky23 күн бұрын
There should be some internationalization context added. One of the biggest coffee shops in Vietnam (where I spend time) is Phúc Long. Testing with the string "my favorite coffee shop is phuc long" raises a score of 1.000! Also curious as to why the range is so small - seems it starts at 0.8?
@user-he3io6lo9t23 күн бұрын
Exciting! What about different languages. Auto detect language? Explicitly set? One model for all, a lot of models for each language? So much questions🤣
@SpektRProduction23 күн бұрын
The value of the resource is not very clear, since I can’t paste the whole article (the text is too big) and I can’t understand where exactly the profanity is located
@prasanthpedaprolu226123 күн бұрын
may be training on twitter tweets can make this model perform well
@Thomas777m111 күн бұрын
For the very short texts why don't you just pad out the input text with neutral words?
@practicaluseof23 күн бұрын
Very nice, what softwares are you using to make your videos? Share screen and show your face at the same time?
@TellTobler22 күн бұрын
Would be awesome if you could make a tutorial why you use Hono over Express :) for your api
@davidsiewert864921 күн бұрын
@joshtriedcoding why do still use yarn in 2024? Either pnpm or bun are better in every category
@blockshift758
11 күн бұрын
New doesn't equal better.
@mjddev12 күн бұрын
Important to note that although the source is viewable on GitHub, this is not currently classed as as "Open Source" software as it lacks a license. See issue #6 on the GitHub repo.
@paullouppe994723 күн бұрын
Does it work only for english ? would you be interested to open it to other languages ?
@MateuszWierzejski
3 күн бұрын
It seems so to only work for English as foreign languages (like polish) didn’t flag these swear words as profanity
@enic-ma23 күн бұрын
Everybody is scared of KZread demoneytization! Just chill and keep crushing it!
@herrkatzegaming11 күн бұрын
it doesnt detect profanity in german
@Axorax21 күн бұрын
Cool project 👍
@anasouardini23 күн бұрын
Let's goooo!
@haryormedayjoshua28123 күн бұрын
Does anyone know what APP he's using to switch app on the left sidebar? I think Theo also use it
@petersusan215
23 күн бұрын
Arc Web browser
@igmtink23 күн бұрын
sir josh can you make a tutorial how to use rpc of hono with next
@joshuarodriguez221923 күн бұрын
Ey, what framework did you used to design the website? I love it
@joshuarodriguez2219
23 күн бұрын
follow up what do you use to record your videos?
@arshgemrie462112 күн бұрын
A question what is your browser
@bed_destroyed12 күн бұрын
i got pretty sure this is profanity on: THIS IS VERY PROFANE
@BrightCode23 күн бұрын
Can we do one for images too?
@lilrow42064 күн бұрын
"This doesn't use AI, just a machine learning model"
@NiklasZiermann23 күн бұрын
Insert 'KZread would like to connect to your API' jokes here
@lel753116 күн бұрын
Basically the score goes from 0.810 to 0.880 seems like there's not a lot of margin for error given "clean input" is 0.840, and limiting the content size drastically reduces it's usefulness After a bit of testing it seems your product is definitely not ready, you should update your landing page as it is not reliable at all.
@armandmalci49523 күн бұрын
Does anyone know what is the app he is using to draw the schemas (min 1:00)?
@Shorts4D
23 күн бұрын
tldraw
@koudy008
23 күн бұрын
It's Excalidraw
@zakariazain879023 күн бұрын
Thank you
@cablesalty10 күн бұрын
Great now I will make a version that creates profanity
@_purple_44_4 күн бұрын
Can it be made to respond which word is profane as well? So that i can just *** it
@Michael-Martell23 күн бұрын
Cool man!
@ovna23 күн бұрын
👍 Useful
@LRSKWTKWSK23 күн бұрын
Love it
@AmodeusR23 күн бұрын
That profanity score is very weird. Why the score is always around .8? Why not use the range from 0 to 1?
@BoxEnjoyer12 күн бұрын
Holy moly gets 0.912 🚨😱 BIG PROFANITY DETECTED!! 🚨😱
@sal0012 күн бұрын
wow wow - 🚨 PROFANITY DETECTED!! 🚨
@scarlatum21 күн бұрын
Well, it drops when the message is larger than ~750 chars due to the execution time limit. Tokenization makes BOOM
@ronitgurjar574723 күн бұрын
great work Josha🔥🔥🫡
@ihsanmohamad52123 күн бұрын
f@#k!ng great project!
@wenelol11 күн бұрын
Typed meow meow and the rating was: 😱 PRETTY SURE THIS IS A PROFANITY 😱 score (higher is worse): 0.865
@6708392455 күн бұрын
"what the hell" (0.966) or "what the heck" (0.912) both return profanity. Even if we use the totally safe version of this phrase, "what in the world", it's still profanity (0.859). then how are we supposed to express that idea on the other hand, "I hate this [blank] taco" returns clean for "flipping", "frigging" and "freaking", all of which lesser versions of the F bomb
@Renner4k10 күн бұрын
Cool idea but it's super impractical and easy to bypass. Needs some more work because simply chaining 2 swear words together without a space can usually bypass it.
@snatvb23 күн бұрын
this is really good project, actually you can use it not only for profanity, you can detect ads, span, scam and etc, isn't?
@evan_ry23 күн бұрын
tensor model < bunch of ifs
@kapa943623 күн бұрын
It`s like semantic search
@user-vk6cb1zu7p23 күн бұрын
I typed "you are very sexy" and it responded with: Crispy clean input, no profanities :))
@user-vk6cb1zu7p
23 күн бұрын
it's insane!!
@Erik-pk8rw14 күн бұрын
Maybe add something to convert unicode look-a-likes, because those wont get detected
@DS-ow2ge23 күн бұрын
Josh, by design this system is fastest when there is profanity, and slowest when there is none. Is it even possible to design one with the opposite? fastest when no profanity, and slowest when there is?
@rorymax
23 күн бұрын
Well if you think about it, to declare something as profane you need to find only 1 profanity. However to declare something as clean you need to make sure there are no profanities at all. So in one case you stop when you find a profanity, but in the other case you have to check the whole thing
@_ultraviolet23 күн бұрын
Why is it so strict? "dumb person" is apparently extremely profane
@depralexcrimson
23 күн бұрын
because this is not production ready, it's at best a Proof of Concept. it obviously cannot detect or understand any context, it can just maybe detect bad words, that's it, it doesn't care about context at all.
@purpshell13 күн бұрын
heard of Akismet?
@PrismFave10 күн бұрын
my prompt: "you are so S.HIT at this game" rispy clean input, no profanities :)) 👍👍 score (higher is worse): 0.822 ----------------------------------------------------------------------- my prompt: "you are so SHlT lol" rispy clean input, no profanities :)) 👍👍 score (higher is worse): 0.801
@blaizeW23 күн бұрын
cool, but what does “zip in the wire” and “zipperhead” means? 😭
@spinxooo7 күн бұрын
"ì I" 😱 PRETTY SURE THIS IS A PROFANITY 😱 score (higher is worse): 0.857 LMAOO
@mikaay426923 күн бұрын
One issue is internationalisation: "Ich geh nach Fucking", is a German sentence without any profanity, because "Fucking" is an actual town.
@coopener20 күн бұрын
The website does not work anymore, since the website uses HSTS.
@mrkostya00823 күн бұрын
Im sorry but why go such an extra mile if OpenAI's Moderation API is free and quite fast at that.
@leonardodoujinshi
23 күн бұрын
I thought you could only use their API for outputs from their own model and they disallow other usage
@j0hnr3x23 күн бұрын
The phrases "I love doing it with my sister"(0.802) and "I want to end your life"(0.783) have lower scores than your examples of clean input. I think this needs a lot of work, only obvious profanity gets detected.
@TheDragonDesigns9 күн бұрын
my pen is broken - 😱 PRETTY SURE THIS IS A PROFANITY 😱 you what - 😱 PRETTY SURE THIS IS A PROFANITY 😱 How much have you been drinking - 😱 PRETTY SURE THIS IS A PROFANITY 😱
@chiroyce11 күн бұрын
dumb dumb: 🚨😱 BIG PROFANITY DETECTED!! 🚨😱 - 0.937
@theminecraft69012 күн бұрын
The problem is its only English as a German myself i testet the famous german swear wort "hu rr ensohn" and it sayed its not a swear wort
@ErrorINAOfficial4 күн бұрын
“I can’t say this word because KZread may demonetize the *hell* out of me.”
@kushaagr23 күн бұрын
TL;DW It's basically AI... Heck the use of vector database puts it closer to LLM technology.
@centdemeern16 күн бұрын
Isn’t it more effective to use a word list and actual logic? Isn’t that
@cheapbucks959022 күн бұрын
The model needs more data
@xastralmars23 күн бұрын
what web browser do you use
@techworld3255
23 күн бұрын
That's Arc Browser
@tedspens22 күн бұрын
So I guess Theo and The Primeagan have been demonetized a long time ago. 🤣🤣
@wlockuz446723 күн бұрын
I don't really understand the scoring, there is only 0.08 percent difference between something being profane and not profane? Intuitively I would think 0.0 is something normal and 1.0 is definitively profane. The threshold of 0.86 seems arbitrary. This seems like a useful tool but I think most real projects would want something context aware.
@theaviationbee11 күн бұрын
i typed "gfasgda asfga" into the checker and it said it was profanity. might want to fine tune the model a little more it also said "i got a new diamond hoe in minecraft, it has a lot of durability" was profanity. also might want to add context reading.
@nicoluvas848623 күн бұрын
Lorem fucking ipsum
@TheIpicon23 күн бұрын
upstash really profit from you working there😂😂
@jiM3op23 күн бұрын
stared!
@marcuss.abildskov717522 күн бұрын
Why would I want an API for this? There's tons of libraries that solves this.
@NaraSherko14 күн бұрын
Swear! Swear! Swear! gives you 😱 PRETTY SURE THIS IS A PROFANITY 😱
@mason8335
5 күн бұрын
"Profanity is bad" = PRETTY SURE THIS IS A PROFANITY
@GratuityMedia23 күн бұрын
Good fucking video