ClippyGPT - How I Built Supabase’s OpenAI Doc Search (Embeddings)

Ғылым және технология

Supabase hired me to build ClippyGPT - their next generation doc search. We can ask our old friend Clippy anything you want about Supabase, and it will answer it using natural language. Powered by OpenAI + prompt engineering.
In this video I will be showing you exactly how I did this, and how you can do the same in your projects. We'll be covering:
- Prompt engineering and best practices
- Working with a custom knowledge base via context injection + OpenAI embeddings
- How to store embeddings in Postgres using pgvector
Supabase blog post:
supabase.com/blog/chatgpt-sup...
pgvector extension:
github.com/pgvector/pgvector
Generate embeddings implementation:
github.com/supabase/supabase/...
Clippy edge function implementation:
github.com/supabase/supabase/...
Clippy frontend implementation:
github.com/supabase/supabase/...
Prompt engineering:
prmpts.ai/blog/what-is-prompt...
00:00 Why?
01:40 Let's get started
03:15 Custom knowledge base
04:49 Context injection
06:13 Pre-process MDX files
13:40 Embeddings
15:40 Storing in Postgres + pgvector
22:21 API endpoint (edge function)
23:44 Calculating similarity in pgvector
27:55 Prompt engineering
33:15 Prompt best practices
38:37 Demo time!
41:32 Thanks for watching!

Пікірлер: 333

  • @AngelEduardoLopezZambrano
    @AngelEduardoLopezZambrano10 ай бұрын

    This channel is awesome! Love the rabbit holes you take us one! keep them coming please!

  • @SatvikAgnihotri
    @SatvikAgnihotri8 ай бұрын

    The clarity of this video while maintaining detailed granularity of the subject is very impressive and very appreciated. Thank you for making this video.

  • @JimmySting
    @JimmySting Жыл бұрын

    This content is really top notch. I appreciate the clear and detailed explanations of everything!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad you found it helpful!

  • @trimonmusic
    @trimonmusic Жыл бұрын

    Found your channel while learning React Three Fiber, subbed with notifications immediately. Today I get a notification for a well-explained ChatGPT tutorial, right as I embark on building a similar thing. Fantastic continued work, thank you very much!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Awesome! Thank you!

  • @LV-md6lb
    @LV-md6lb Жыл бұрын

    This was so valuable to just settle the thoughts into a clear action plan as to how to implement in production. Thank you!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it helped!

  • @Jonathan-rm6kt
    @Jonathan-rm6kt Жыл бұрын

    I am blown away at how much information is densely packed into this. You got yourself a new subscriber, sir. It’s staggering to think about how these technologies will shape the landscape for data and analytics. This is just the beginning.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Thanks for the sub! I am continuously blown away by the possibilities of large language models 😀

  • @zaynjarvis9443
    @zaynjarvis9443 Жыл бұрын

    fantastic content, didn't expect it will be so informative in just 40 mins. Looking forwards to the next one!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Thanks for watching 😀

  • @rayaffas857
    @rayaffas857 Жыл бұрын

    First time on this channel. The way you structured the video, the pace and the explanations are all on point. Keep up the good work. +1 subscriber.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad to hear that, thanks for the sub!

  • @ooogabooga5111
    @ooogabooga5111 Жыл бұрын

    Insane, I love how you are able to do so many things. My laptop atm is unable to power every wish of mine (getting into 3D) but I hope I will soon be able to do so.

  • @Candyapplebone
    @Candyapplebone Жыл бұрын

    Damn, I spent like a week researching this shit on my own, and have been working on almost exactly the same thing. Processing MDX files into embeddings etc. It’s really cool to see somebody doing almost the same exact thing. Makes me think I am really on the right track!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Nice! Glad to help give some validation 😄 are you also building for docs?

  • @automioai

    @automioai

    11 ай бұрын

    Hey! how had you turn your pdf files into a propper mdx format ? tnx

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    11 ай бұрын

    @@automioai In this project no PDF files were used - all documentation had been written directly in MDX. You'll have to do some research on ways to extract text from PDF files. Once you have that, I wouldn't bother with MDX at all - just generate embeddings directly on that content.

  • @haunebuiii4103
    @haunebuiii4103 Жыл бұрын

    This is amazing, you’ve created Clippy just as enthusiastic and helpful you are! Thanks a lot

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Thanks for watching 😄

  • @maertscisum7243

    @maertscisum7243

    Жыл бұрын

    ​@@RabbitHoleSyndromehow did you generate the mdx files?

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    The MDX files weren’t generated - the Supabase team wrote them as you would any markdown file.

  • @milovangudelj
    @milovangudelj Жыл бұрын

    Wow, this is incredibile... I can see a future where every docs site does the same thing. This is truly powerful stuff! Well done 👏

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    I’m am both excited and blown away by the possibilities. Thanks for watching!

  • @franciscofredviana743
    @franciscofredviana743 Жыл бұрын

    What a fantastic video and content. I’ve gone through multiple videos trying to better understand embedding and how to work with ChatGPT in the best way for querying large amount of content and producing an analyzed response. I’m not a developer, have a background in computer science but I’m a software sales person that is curious about technology and I was able to completely understand your video and content. Subscribed, liked and will be watching more of your videos. Thank you!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it was helpful!

  • @aldousd666
    @aldousd666 Жыл бұрын

    This is a glorious illustration. Thank you very much! I've been trying to find an example of doing this, and yours has put it all together for me! Subscribed!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it helped, thanks for the sub!

  • @fotoflo
    @fotoflo Жыл бұрын

    Ive been looking for this information for months. Such an excellent tutorial and I love that Supabase's code is all open source so i can actually clone it and read how it works in detail later. Thank you so much for the walk through. Super talented dude too - love the blender stuff.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad to hear it helped! Agreed - open source is amazing! Let me know if you hit any road blocks along the way, happy to help 😃

  • @fraternitas5117

    @fraternitas5117

    Жыл бұрын

    @@RabbitHoleSyndrome why is the generate embeddings file so different in the video then what is in the repo now? I can't find anything you talk about in minutes 10-13.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    11 ай бұрын

    Hey @@fraternitas5117! Supabase moves pretty quick - the code I references has been refactored now to support multiple knowledge sources (ie. more than just markdown). You can find the markdown specific code here: github.com/supabase/supabase/blob/1b2361c099c2573afa1fe59d3187343bb8f1bcab/apps/docs/scripts/search/sources/markdown.ts

  • @benmak5326
    @benmak53267 ай бұрын

    This really is/was an epic video clearly and well laid out!

  • @ryanyoung1925
    @ryanyoung192510 ай бұрын

    You do a such great job bro ! I love what youhave built and your video, keep build great things bro.

  • @amardeep.sahota
    @amardeep.sahota10 ай бұрын

    Amazed by your content. Fantastic work here .

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    10 ай бұрын

    Thanks, glad it helps!

  • @javiasilis
    @javiasilis Жыл бұрын

    Wow. I'm so thrilled to know that you were one of the ones behind that great feature. I've been using Supabase for 6 months, and have been pretty happy with it. Except for the docs and the transition to 2.0. I was blown away when I saw that it generated the code for me when I started writing its documentation

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad to hear it has helped you! Any feedback on the docs that you think should improve?

  • @javiasilis

    @javiasilis

    Жыл бұрын

    @@RabbitHoleSyndrome So far so good! I think one challenge is to know how we can check if a user's email address exists. (Or other specific user's metadata) I couldn't find it in the docs. There was a GitHub issue which said to store the user's data in a separate table as the auth table was private. I ended up doing that and haven't had any problems. Btw, thanks again for all the awesomeness!

  • @flying-kite-spectre
    @flying-kite-spectre11 ай бұрын

    Wonderful primer on prompt engineering.

  • @NickLambourne
    @NickLambourne Жыл бұрын

    GREAT content. It explained everything you need to know about creating 'chat docs' or similar in one run, and all open source. Kudos! And subscribed.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Thanks for the sub! Great to hear 😃

  • @Jandodev
    @Jandodev Жыл бұрын

    I'm doing something even cooler with the vector embeds I cant wait till I can share!

  • @swyxTV
    @swyxTV Жыл бұрын

    incredible end to end tutorial, nicely done

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Thank you! 😃 Have you explored embeddings or pgvector?

  • @imranaalam
    @imranaalam Жыл бұрын

    excellent . you are hands-on & practical

  • @ApplicableProgramming
    @ApplicableProgramming Жыл бұрын

    Thanks Greg for a great explanation. I like your presentational style!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it was helpful!

  • @CraigShieldsAOTG
    @CraigShieldsAOTG Жыл бұрын

    This video was fantastic, lots of information given in an easily digestible way. Subscribing!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it was helpful, thanks for the sub!

  • @antonodman5709
    @antonodman5709 Жыл бұрын

    Absolutely amazing quality here, glad I found your video. Subbed!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Thank you!

  • @ThiagoVictorino
    @ThiagoVictorino11 ай бұрын

    You are awesome! Now I can understand how these things work together.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    11 ай бұрын

    Happy to hear it! 😀 thanks for supporting the channel!

  • @anonymousXYZ659
    @anonymousXYZ6596 ай бұрын

    Seems to have found one great coding channel among the noise of today; back to good old days when coding was boring & nerdy. Great job!

  • @user-nt2fs7qp6c
    @user-nt2fs7qp6c7 ай бұрын

    incredible value in this video

  • @jaiderariza8441
    @jaiderariza8441 Жыл бұрын

    I am grateful to this video. This open my eyes to Embeddings

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it helped 😃

  • @9uifranco
    @9uifranco Жыл бұрын

    So you're the one who made this beautiful thing. Pretty nice.

  • @caliwolf7150
    @caliwolf7150 Жыл бұрын

    This is truly valuable and useful content, thanks a lot.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    You bet! Glad it was helpful

  • @Kichaka_Ranch
    @Kichaka_Ranch11 ай бұрын

    Thank you for sharing . I was struggling on how to get started . This was very well presented . From this side of the world asante sana !

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    11 ай бұрын

    Glad it was helpful 😃

  • @FacadeMan
    @FacadeMan Жыл бұрын

    Prediction: This is gonna get a million views. Just saw fireship video about vector databases and wanted to understand embeddings. Before I could even search, this video was in the page. Though I wasn’t interested in a 40 min video (had a feeling I’ll just stop after 5 mins like I usually do) I ended up watching it all. The rabbit hole 🐇 🕳️ format is so naturally elegant. Clear end to end use case. I secretly don’t want to share it with anyone but I am forced to fulfill my prediction.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Thanks for the comment! Glad to hear the format is working 😃

  • @jonasqiao8834
    @jonasqiao8834 Жыл бұрын

    that's great, you did great job! helps me a lot in this.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it helped!

  • @naibafYT
    @naibafYT Жыл бұрын

    Just what I was looking for - thank you very much🥳

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Great 😃 Thanks for watching!

  • @martinfilteau8668
    @martinfilteau8668 Жыл бұрын

    Amazing video! I'm going to put this in practice right now!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Awesome! Feel free to share as you make progress!

  • @gr8tbigtreehugger
    @gr8tbigtreehugger Жыл бұрын

    Many thanks for this insightful and helpful video!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Happy to help, thanks for watching!

  • @BernhardSchlegel
    @BernhardSchlegel Жыл бұрын

    First time here. This is so well done. Subscribed. Your viewer number will explode! I like how you approached the topic in a very calm way without jumping on the "LLMs will take over the world" train :) You don't happen to have the clippy blender asset somewhere?

  • @jtjt8777
    @jtjt8777 Жыл бұрын

    thx for saving me hours if not days. I wanted to add openai to my supabase app and found the exact tutorial.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it helped!

  • @morrisy0x
    @morrisy0x Жыл бұрын

    This 40-minute video looks like 10 minutes. I have been researching related engineering topics recently and they have been very inspiring to me.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    The potentials with LLMs seem to be endless 🤯

  • @vince2nd
    @vince2nd Жыл бұрын

    Fireship guy? Either way, its been pretty useful in terms of learning API's and how to connect them to my nocode builder. Spent hours trying to get things working and the Assistant basically told me what i was doing wrong and how to fix it. So well done with the implementation.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it helped!

  • @tohafi
    @tohafi Жыл бұрын

    Amazing video! Great and detailed information!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it helped!

  • @vedantnn7
    @vedantnn7 Жыл бұрын

    This is really helpful and valuable, thanks a ton!!!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    You bet, thanks for watching!

  • @RikLogtenberg-uv4gx
    @RikLogtenberg-uv4gx7 ай бұрын

    Such a great video.

  • @fiftygrapes
    @fiftygrapes Жыл бұрын

    Cool video. I was thinking about doing something similar for some reference pdfs. Thanks for the video

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Thanks for watching & best of luck!

  • @ssahillppatell
    @ssahillppatell Жыл бұрын

    Thanks for explaining everything so clearly :))

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    You bet! Thanks for watching 😃

  • @benrobo8
    @benrobo8 Жыл бұрын

    Great job 👍, this was really helpful

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it helped!

  • @keepitdialed
    @keepitdialed Жыл бұрын

    Thanks for the knowledge share my friend. 💪🙏

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    You bet!

  • @user-mu8gs9mk9z
    @user-mu8gs9mk9z7 ай бұрын

    Subscribed, amazing content.

  • @sensvitae
    @sensvitae9 ай бұрын

    Thanks for the share !

  • @koigxiritb7ttgyuv
    @koigxiritb7ttgyuv10 ай бұрын

    I'm at 1:42 and this video already is 10/10

  • @casualcycling8738
    @casualcycling8738 Жыл бұрын

    🔥🔥🔥 This is amazing!

  • @roxforgegames4548
    @roxforgegames4548 Жыл бұрын

    The most amazing think is that I basically made a chatbot app in less than a week with only the help of GPT4, I had no knowledge of AWS services, PostgreSQL or python. Everthing you told in the video is what GPT4 told me. All of the serves and database are setup, it has memory, STT, TTS and Cognito login/register.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    It is quite amazing - and I’m sure it will only get better!

  • @BradleyKieser
    @BradleyKieser Жыл бұрын

    Damn that's a great, helpful video!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad to hear it was helpful, thanks for watching!

  • @field-officer
    @field-officer7 ай бұрын

    I'm reading the comments, and I'm like.. Yeah WTF 🗿🔥

  • @olboone
    @olboone Жыл бұрын

    Really good video, nice job! 🎉

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Cheers! 😀

  • @MT222100
    @MT222100 Жыл бұрын

    Amazing Content..

  • @sekarmaui8524
    @sekarmaui852411 ай бұрын

    Hi, I'm a lil bit confused. Do we need lemmatization or remove stopwords before we do embeddings? And do we need chunking after embedding?

  • @user-ow5mn6dn7n
    @user-ow5mn6dn7n8 ай бұрын

    Dude, that's exactly what I was looking for. No more bullshit articles with clickbait titles, just DIY in the essence. Is there a way to support you through patreon or smth. ?

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    7 ай бұрын

    Glad to hear it! You support by watching 🙌

  • @thehouse2620
    @thehouse2620 Жыл бұрын

    excellent info, great presentation

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it was helpful!

  • @jamelljones122
    @jamelljones122 Жыл бұрын

    This is awesome, thank you!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it was helpful!

  • @joaorodriguesjr
    @joaorodriguesjr Жыл бұрын

    This is really interesting! I'm looking to build something similar.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Best of luck!

  • @fflv_irn
    @fflv_irn Жыл бұрын

    super helpful. thanks. love supabase

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Happy to help!

  • @JuanUys
    @JuanUys Жыл бұрын

    28:37 Whenever I see examples of decoder (GPT) prompts starting with "You are a helpful finance advisor" or "You are an enthusiastic support rep", I can almost see the AI clearing its throat and sitting up straight and saying "right, ok". Gimme that can-do attitude, GPT!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    🤣

  • @mohali4338
    @mohali43389 ай бұрын

    Good job! :)

  • @hellofahmid2331
    @hellofahmid2331 Жыл бұрын

    Brilliant.

  • @DarrenTarmey
    @DarrenTarmey Жыл бұрын

    At database are you using to store the data, and would it be good to then use that data to fine chune the model.

  • @nattyzaddy6555
    @nattyzaddy6555 Жыл бұрын

    The documentation you used stored in the database be a couple megabytes big, so its able to find the relevant chunks pretty quickly. What if you wanted to fill a database with gigabytes of text, would it much slower? Is that a case where retraining the whole model would unfortunately be the best route to go?

  • @hallowatcher
    @hallowatcher Жыл бұрын

    So if I understood correctly: The embeddings were only used to check for similarity between the user's input and the doc's content, in order to provide the prompt with relevant (text) context, right? Is there a way to provide the GPT model with the embeddings instead?

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    That’s correct. You could have used an alternate search method, but embeddings have a nice alignment with LLMs since they also use language model themselves. Unfortunately no there is not currently a way to inject embeddings directly into GPT today. Maybe this will change in the future or become available in open source models like LLaMa in the same way we’ve seen it happen with Stable Diffusion.

  • @dataray
    @dataray Жыл бұрын

    I have not seen such a great video in a while! How wonderful have you explained the whole process 👍👍! Could you explain a bit more about how did you choose 0.78 as threshold for embeddings comparision? have you statisticized that wether the most relevant sections can be found with it?

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad you liked it! 0.78 was a first-stab threshold that worked best based on a limited sample of test queries. I wouldn’t claim that this number is universal - almost certainly this could change by domain.

  • @georgebarlowr
    @georgebarlowr Жыл бұрын

    I've got an internal api that returns train times for given stations. Could I use user prompts to ask the GPT LLM to find train times for a particular station and then fetch using an api get request an array of trains that are set to depart and then get GPT to format that back to the user in a tailored natural language type of way?

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Definitely possible, will just take some work on the prompt engineering side. Your best bet today is probably LangChain - I recommend you check out their documentation on creating custom tools: python.langchain.com/en/latest/modules/agents/tools.html

  • @orebimbo-salami4107
    @orebimbo-salami4107 Жыл бұрын

    This is Awesome

  • @muhammedkoroglu6544
    @muhammedkoroglu654410 ай бұрын

    is it possible to use the semantic text search with the embeddings without the database? I don't know much about running databases (newbie here) and would like to skip that step if possible. Can i for example do this with a pandas dataframe in python while saving the data (both the source text chunks themselves and the corresponding embeddings) as a csv file? Then just like in the video, I could calculate the dot-product between my query and all other embeddings and take the best k matches. Is there a downside to this?

  • @thatboi1465
    @thatboi1465 Жыл бұрын

    Yo, thank you for putting out your knowledge :). But I have question regarding the data search for the context. So basically your creating a graphdatabse with the gpt models and then you input the users question into the database to find the relevant articles/information?

  • @Jonathan-rm6kt
    @Jonathan-rm6kt Жыл бұрын

    16:10 would love an explanation on how embedding work with a document structure? I.e query is “summarize chapter 3”. The embedding sans retrieval don’t seem to capture the structure of the chunks that are contained in title chunk “chapter 3 “. All explanations on embedding I’ve seen all rely on the text content within a chunk.

  • @eRiicBelleT
    @eRiicBelleT Жыл бұрын

    About preprocessing the data, do you know about these Generative Pseudo-Labeling techniques (T5 model, Negative Mining)? Or any chunking techniques? Like, overlapping chunks. I'm pretty interested in all the preprocessing techniques, but I just read those ones.

  • @onemanops
    @onemanops Жыл бұрын

    Thank you

  • @ar9maker
    @ar9maker Жыл бұрын

    You are so cool! I love you!

  • @rogerganga
    @rogerganga Жыл бұрын

    This is a fantastic video! Thank you very much for sharing :D Quick question - Currently if the info is not in the documentation it responds "Sorry I don't know how to help with that". But how can we make it respond like this: "Sorry I don't have relevant info in the documentation but you can do something like this". For e.g. "I don't have any info about how to make banana pancakes in the documentation, but here is how you can make one...." Idea here is to make it act like chatgpt on top of the information provided. Keen to know more on this and thank you so much for making this video :D

  • @williamsalazar2624
    @williamsalazar26245 ай бұрын

    Thanks a lot. How to use supabase openAPI schema inside GPT4?

  • @haisai4159
    @haisai4159 Жыл бұрын

    amazing! how does clippy update the vector db and process the text as embedding when NEW documentation is added? is it automatic? maybe i missed it in the video

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Great question! The `generate-embeddings` script was designed to be diff-based. So next time you run it, it will pull in only the documents that have changed and re-create embeddings on just those. It currently works using checksums: 1. Generate a checksum for the content and store in the DB 2. Next time the script runs, compare the checksums. If they don't match, the content has changed and embeddings should be re-generated. The script runs on CI, so anytime documents change a GitHub Action will trigger the script. See this PR for details: github.com/supabase/supabase/pull/13936

  • @concuben
    @concuben Жыл бұрын

    When breaking up a document into smaller chunks to generate the embeddings, is proximity of the sections (same document) taken into consideration when generating the similarity scores? What if it so happens some key information is in a section that is separate from key words located in another section?

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Great point! Remember though that embeddings are not matching keywords - they’re matching meaning as understood by the underlying embedding model. But I agree that proximity should almost certainly be accounted for in the ranking since some context could be missed otherwise.

  • @concuben

    @concuben

    Жыл бұрын

    @@RabbitHoleSyndrome thanks for your reply... Do you have a suggestion for how to add this context? I didn't see from your tutorial how it was being accounted for.

  • @talhahasan6470
    @talhahasan64708 ай бұрын

    Great video! How closely do the .mdx files have to match this structure before they can be processed into embeddings? Do they need to export the meta const, for example?

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    7 ай бұрын

    The meta const is optional! You’re also free to tweak the pre processing logic to fit whichever format you need to work with

  • @Cygx
    @Cygx Жыл бұрын

    Is there a version of this that can apply to any type of content? Not just specifically made for Supabase documentation, but documentation of any type, or insert a book, of textbook, and the search outputs relevant gpt responses?

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Hey! I don’t have a pre-built suggestion off the top of my head, but solutions to this seem to be popping up everywhere (just hang out on Twitter for a few hours 😅). If you don’t mind coding, it should be relatively straightforward to swap out the MDX docs with really any kind of content source, and the remaining steps should be identical.

  • @GeyzsonKristoffer
    @GeyzsonKristoffer Жыл бұрын

    Watched because of the content, subscribed because of the dog. 👍🏻

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Thanks for the sub! 🐶

  • @Chris-se3nc
    @Chris-se3nc Жыл бұрын

    I'm curious how hot my API key to OpenAI would get in practice. How much would this cost on average based on the size of the doc base. I like free tiers, but I do not think they exist here

  • @neociber24
    @neociber24 Жыл бұрын

    Cool project, would be cool to create a tool like this that you can embed in any documentation, reading directly the markdown or scraping the website.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Definitely!

  • @_thehunter_
    @_thehunter_ Жыл бұрын

    you are great!

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Thanks for watching!

  • @pranayaryal
    @pranayaryal11 ай бұрын

    What kind of vectors did you generate from chatGPT. Are they word vectors? You passed one whole section of the mdx so they are not word vectors but paragraph vectors?

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    11 ай бұрын

    Yeah you got it - the community mostly calls these "sentence embeddings". Check out SBERT/sentence transformers for some good info

  • @mrc580
    @mrc580 Жыл бұрын

    Amazing video! This is exactly what I was looking for a long time. You basically explains everything I wanted to know about how to create a search engine using open ai. But I have a few questions: How much did you spend on open ai embending API building this? How much supabase spends monthly with searchs using the open ai api? It is possible to use an open source embedding API instead of calling the open ai api ? Wouldn't it be less expensive than the approach you took?

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it was useful! As for costs, you may be surprised how inexpensive OpenAI embeddings are (at least I was). To put it in perspective, for the Supabase guides we currently have around 1500 page sections which total just over 220000 tokens. At OpenAI's current embedding price ($0.0004/1k tokens), that brought us to just less than $0.10 for the entire guide knowledge base (~one-time pre-processing). After that the average query is likely

  • @phemartin
    @phemartin Жыл бұрын

    I'd love to learn how to also incorporate user-feedback (thumbs up/down)

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    This will likely make it into the next iterations. Will be a good challenge!

  • @phemartin

    @phemartin

    Жыл бұрын

    @@RabbitHoleSyndrome That's awesome! Can't wait

  • @LV-md6lb

    @LV-md6lb

    Жыл бұрын

    @@RabbitHoleSyndrome amazing video! Please let us know if you get to it:)

  • @forbiddenera

    @forbiddenera

    Жыл бұрын

    ​@@RabbitHoleSyndromeany progress?

  • @spirobel
    @spirobel Жыл бұрын

    are there alternatives to openai to create these vectors? Dont really feel comfortable building something around a closed source api that is controlled by one vendor.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Really great question. You’ll want to look into sentence embeddings. There has been a lot of work on the OSS side with Sentence-BERT (SBERT) you can check out. You might also want to look into Universal Sentence Encoder (USE) and InferSent.

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    LlamaIndex actually uses OpenAI (text-embedding-ada-002) by default for embeddings today. They're more of a toolkit layer to assist with the workflow. There are many other alternatives though (which LlamaIndex supports via LangChain) that are worth checking out: langchain.readthedocs.io/en/latest/reference/modules/embeddings.html

  • @trejohnson7677

    @trejohnson7677

    Жыл бұрын

    LOL what computer arch r u on

  • @berndeckenfels
    @berndeckenfels4 ай бұрын

    Can you make the completion also tell which chunks have been used and link to them or get “read more” links or would you do that by just listing “top 5” matches from the context?

  • @marspark6351
    @marspark6351 Жыл бұрын

    One thing that might help is if the question result shows the links to the documents that it acquired the information from. Since you are currently fetching which document to run chatgpt on based on similarity of features, maybe you can change the prompt so that it also returns the link of the document that was deemed as a "similar document"

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Absolutely! This should definitely be the next progression.

  • @paulgonzalez8121
    @paulgonzalez8121 Жыл бұрын

    Good stuff! Curious how you are handling performance of the respons? We setup a similar pattern and have found that GPT 3.5 obviously returns faster, but GPT-4 returns with much better quality. Have you experienced similar results? In this context does the speed of 3.5 outweigh the GPT4 quality - for us we are seeing some GPT-4 responses take >30 seconds, which is pretty terrible from a UX perspective. Also, curious if you are still using Algolia for the other "basic" search experience? We were thinking about using Algolia for search, and then sending the top few most relevant results to OpenAI for summarization. Not sure if the quality of Algolia's search results (even with all of their AI synonnyms, dynamic re-ranking, etc.) was meaningfully different then creating your own vector database? I appreciate you taking the time to make this video, it's validating to some stuff I'm working on for sure!

  • @LiiittleBigPlanet
    @LiiittleBigPlanet10 ай бұрын

    Isn't it super expensive to calculate the similarity twice (~27min), in the select and where?

  • @WilbertoCasillas
    @WilbertoCasillas Жыл бұрын

    Loved the video validated a lot of the decisions we are making at work. I have a question however on the section about context injection. You mention that you search for relevant information to inject into the prompt. How do you accomplish the search part ? Is it using an index or a sql query amongst all columns ?

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Glad it was helpful! The search is done through embeddings - we perform a similarity search between the embeddings generated from user's query and the pre-generated embeddings on the knowledge base (stored in a column using pgvector).

  • @WilbertoCasillas

    @WilbertoCasillas

    Жыл бұрын

    @@RabbitHoleSyndrome ahh so: 1) call OpenAi embedding api for the query 2) use cos sim to compare the query embedding against the stored embeddings 3) utilize the top results to inject into a prompt that we compile to send to OpenAi completion api ?

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    You got it 👍

  • @joshmadrid5253
    @joshmadrid5253 Жыл бұрын

    this would be amazing for obsidian note taking application

  • @RabbitHoleSyndrome

    @RabbitHoleSyndrome

    Жыл бұрын

    Great idea!

Келесі