How to build a real-time AI assistant (with voice and vision)

Ғылым және технология

This is a new version of my AI assistant, this time using LiveKit (livekit.io.) This is the same platform OpenAI used to build their ChatGPT assistant.
The source code of my example is here: github.com/svpino/livekit-ass....
I teach a live, interactive program that'll help you build production-ready Machine Learning systems from the ground up. Check it out here:
www.ml.school
To keep up with my content:
• Twitter/X: / svpino
• LinkedIn: / svpino
🔔 Subscribe for more stories: / @underfitted

Пікірлер: 47

@toddroloff9313 күн бұрын
Incredible video. You're taking your content to the next level. Keep up the good work and thankyou for all you do.
@sumitdevraye972513 күн бұрын
Great video. Keep these coming.
@davieslacker10 күн бұрын
Really cool stuff... I def plan to recreate some of these things along with you when I have a bit more time at my computer. Just a thought, adding screen capture in with this would be pretty cool too to get help with whatever applications you're in... I would imagine you could include both camera and screenshot images in the same context and it should be able to distinguish which you're asking about.. or build a different tool that it can function call for that. Can't wait until we get some slightly more expressive voices as an option like OpenAI teased us with.
@riemannderakhshan103713 күн бұрын
You turned your videos to the next level which is pretty amazing. I would like you to ask if is possible, show us how to use open source models in those apps. Thank you in advance.
@moacirosa7 күн бұрын
Amazing content with solid explanation. Thanks very much 👏
@underfitted
5 күн бұрын
Glad you liked it!
@jimmywang617713 күн бұрын
very interesting! thank you!
@jameszhang28322 күн бұрын
Fantastic, thank you very much. How would you adapt your code if you have multiple participants?
@insitegd74838 күн бұрын
Thank you, It is very interesting.
@7BlackJack813 күн бұрын
Can be used with google flash? Thanks for super content!❤
@user-yh2uz6fd7l3 күн бұрын
Great info, thx! Is there a way to use local LLM (like ollama, local AI etc) on this platform instead of openai?
@sr.modanez10 күн бұрын
obrigado, fantástico o vídeo 👏👏👏👏👏👏👏👏👏
@edgarl.mardal825611 күн бұрын
Hi, I am working on creating a closed lan-network, using per to per, and will input a live AI agent, locally stored, getting knowledge from LLM, and wonder if it is possible to have this kind of system then running without using internet?
@rithikkumar768313 күн бұрын
I hope we can we use gemini 1.5 pro? I will try to make this changes in old code
@rxWar14 күн бұрын
Nice men thanks
@andriusem8 күн бұрын
Hi, great video! How to change the source code that it captures my screen, desktop. Thanks.
@danieladama810514 күн бұрын
This is great!
@jock2134110 күн бұрын
sir can you help me why my assistant isnt talking back or nothings happening but its recognising in chat what im saying
@aaronwenniger79667 күн бұрын
now i keep running into troubles when using this code, I would love to be able to discuss this so i can get it fixed i want to implement some features to see if it can work for something else to.
@AI_by_AI_007
7 күн бұрын
Yes the API keys do not pass -- what are you experiencing?
@aaronwenniger7966
7 күн бұрын
@@AI_by_AI_007 Hi Yes, So i had to rework the code a little bit to get everything working again. And now its working great except that the voice of the AI is not working and i cannot give voice commands anymore.
@jarvisperaudon
6 күн бұрын
@@aaronwenniger7966How do you have do for the livekit api key ?
@jarvisperaudon
6 күн бұрын
How for the livekit api key ?
@aaronwenniger7966
6 күн бұрын
@@jarvisperaudon ?
@dmitrypehovski11 күн бұрын
Hi , i start test with all your steps and got stuck on the fact that text and audio from the openai api are not transferred to livekit, all requests pass in the terminal , tried many solutions...doesnt work
@densonsmith2
11 күн бұрын
I think I may have a similar issue on Windows there is some problem with the ffmpeg library.
@lets-makeiteasy10 күн бұрын
so i cannaot code can you make toturial for using ph3 which is free and have vision and also use visper ai to convert text to speech and other free tools so minimizing the cost to completely zero I am a student trying out these stuff and don't wanna pay or don't have money to pay for the API or other things so please make a toturial using all the free and open source tools
@boooosh200712 күн бұрын
Is this functionally any different than your previous video?
@underfitted
12 күн бұрын
While they work the same for the demo, my previous code is very brittle. This one is much better because I’m using an entire existing infrastructure to support it.
@reynoldoramas31389 күн бұрын
Hola Santiago saludos desde Cuba, acabo de ver en su perfil de Github que es un coterráneo. Su contenido es muy valioso, por aquí un ingeniero de IA tratando de salir adelante en este mundo. Me encantaría poder contactar con usted y ayudarle en algún proyecto.
@jeff_holmes8 күн бұрын
Curious about the latency. I noticed that you cut the video after each question (after 19:55), so I am assuming it was a few seconds?
@underfitted
8 күн бұрын
It wasn’t bad, but GPT-4o is not as fast as it could be, so you definitely have to wait a second or so for an answer
@vesalaasanen2158
Күн бұрын
@@underfitted , would be nice to add at least one answer in real time so we would get more realistic picture of it.
@densonsmith210 күн бұрын
Has anyone gotten this to work on Windows?
@jarvisperaudon8 күн бұрын
Hey I have a issue with key api livekit its telling me error like its invalid
@AI_by_AI_007
7 күн бұрын
Me as well -- YOU on windows or MAC as you try this?
@jarvisperaudon
7 күн бұрын
@@AI_by_AI_007 windows
@jarvisperaudon
7 күн бұрын
@@AI_by_AI_007Windows
@sharplcdtv1982 күн бұрын
your code generally doesn't run in VScode in windows... some things seem platform dependent unfortunately
@underfitted
2 күн бұрын
I don’t think it’s a problem with my code… it’s a problem with Windows. Try WSL.
@aidanthompson505311 күн бұрын
2:38