HI my name is Sam Witteveen, I have worked with Deep Learning for 9 years and with Transformers and LLM for 5+ years. I was appointed a Google Developer Expert for Machine Learning in 2017 and I currently work on LLMs and and since earlier in 2023 on Autonomous Agents.
Пікірлер
can you make a video about langchain v0.2?
Does this work well with threaded bg process?
WE can't use this outside of google colab like public links from gradio
You can self host using the gh repo
@@RuturajZadbuke That is the problem, I want to use it from google colab
I totally new to Langchain and I am getting error 429 - {'message': 'You exceeded your current quota, please check your plan and billing details, 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'} .. Please help me to resolve the error
Nice video, Streamlit and Gradio has already a lot of component.s. Curious about the decision to use Flask and not Fastapi?
I agree this is something I thought too. Fast API would give API end points and swagger for free etc.
Streamlit is also a free server.... Would be good to point out
can i do this on colab?
Why are ppl sleeping on chain it
ChainLit is great, but my sense is that these are different tools for different use cases.
Compared to Gradio is it simpler? from your video it looks similar...
Is there any way to host it for free and make a small app available to the public at a small scale?
Gradio has that. Basically the app runs locally but gradio provides ssh tunnelling by providing a publicly accessible url. need to use share=True parameter while launching the app
Any thoughts on how to use prompting to generate my UI on the fly for my user? I want to have a dynamic UI that is driven by the prompts.
Can you explain a bit more. What do you want to change or update etc?
hi Sam, can you do also file upload or it has a stored memory of uploaded files in the backend where FE can just query? thanks
I am not sure if they have an upload feature will look into it
Awesome!
Creator of Mesop here. Thanks for creating this video! Big fan of your KZread channel so it was awesome to see this 😊
I know a lot of people have wondered why we made another python UI framework. One of the reasons I didn't mention in the blog post is that it's very difficult to use most open source projects, especially FE ones, due to requirements around web security and build integration within Google
@@WillMesop 🔥🛠💎
I don't know if you or Sam are humans or bots. But you two are a treasure. Please keep going. Filthy causal here just trying to level up and/or avoid obsolescence.
@WillMesop Awesome!! As soon as I saw it I knew I wanted to make a video to help it get some more attention. Very cool to see you chime in here. Thanks for the great work.
Nice bro
Thanks for this good to see but there was nothing here really to take me away from streamlit. Despite it's frustrations sometimes.
Thank you :)
“Often when you are making it for your self, when you start to use it, perhaps some of the assumptions were totally wrong” I laughed out loud so hard 😂. That hit home.
so grateful for these carefully crafted walk thru's, the accompanying notebook, the detailed but concise narratives, simply fantastic Sam !
2:26 just got trippy real quick
Great overview, as usual 😎. I agree, the most interesting part is releasing the Reward model. Although there is one thing engaged my mind. Did you notice its HumanEval score is low? It's around 73.2 compared to models like Llama 3 at 81 or Qwen 2 at 86. I asked on X (Twitter), and they said the Qwen score is for 5 shots while for them is 0. I checked the Qwen tech report, and it's also 0 shots. I followed up with them about this but got no reply. This is crucial because HumanEval involves coding datasets, and if the instruct model has issues, the reward model might too.
so glad you simplified the presentation format. Much better without the stock video rolls.
im getting an error stating "TypeError: Chat.load_models() got an unexpected keyword argument 'compile'" Help
Free LLM to generate synthetic data? It's like a picks and shovels seller giving out free maps so that prospectors will do more mining and buy more picks and shovels.
😀 Well said.
Asinine. To make a good instruction-following dataset you'd need so much more, you want combined instructions, chain-of-thought, step-back questions, alphabetized lists and god knows what other tricks, breaking conversations into more turns, how to handle subject changes, and on and on
Welp. I was debating whether or not to get a server or a consumer platform given the upcoming wave of hardware but this tips me pretty heavily to a server platform, lol. I've always found it easier to work with models locally.
can these models be adopted to a specific dataset for sort of a financial domain? is there a possibility that they train the domain data for their models?
Hey can you make a tutorial on how to integrate vectorstore into salesgpt for e-commerce puropose
3 A100 for int8 inference
The AI is advancing so fast. Is it open for everyone? What is the hardware requirement for this?
2 nodes with 8xH100 or 8xA100 80 GB. Or 1 node with 8x H200.
Did they also release software for the pipeline of synthetic dataset generation? I saw that this will be apart of their NIM, and explanation in their tech report. But wasnt sure if they were going to release anything besides that.
Are we back to the tron naming
I'm coming from text-generation-webui, how can i use that model folder for ollama?
Sam, your videos are great! To the point, easy to listen to and no nonsense. I've noticed that most LLMs start their reply with a repetition of the question, or words like "sure, i can answer that...". Is there a way to make them suppress this output? I use llama3 (via Ollama) in my home automation, and I generate some text for TTS and it bothers me that it repeats instructions or puts those phrases before the actual text. Any help (also from the chat) is appreciated. :)
Generally this can be achieved via the alignment etc. With the bigger models you can do it via prompting and in context examples. The challenge is many orgs are actually doing instruction tuning to get the models to do exacly what you don't want. What model are you using?
@@samwitteveenai I use llama3:8b-instruct-q8_0. My current prompt is "You are a helpful assistant. Please, be brief and concise, the user doesn't like chatty LLMs. Try to be as precise as possible. Always use a step-by-step approach and ponder about the result before replying. No repetition or references to these instructions or apologies either, just the plain reply, please. If you do not have an answer, say so, don't make up wrong answers. Don't use any formatting, you may use UTF-8 smilies when appropriate."
I need to add that some times it works as expected, other times it doesn't.
@@PestOnYT My current prompt for Llama 3 is: "You are an AI assistant. You fulfill the user's requests in a neutral and informative manner. Do not be a sycophant to the user. Do not compliment the user. Do not thank the user." I find it gets rid of most of the things I personally do not care for.
@@ringpolitiet Nice. It worked fine for the couple of test I just did. Thank you!
Why do they have Sonnet and not Opus in the comparison? Does Opus beat it?
probably. Though I did notice after recording the video Nvidia were doing some zero shot comparisons to few shot from other companies. I do think its a decent model and great to have the ability to generate datasets without any any legal issues.
In my opinion, Opus beats it, especially for math and logic questions. However, it performs really well for summarization and RAG, which makes sense since Nvidia is focusing a lot on company internal local RAG deployments and has even previously published some Llama finetunes for that.
I have been always woundering, why would a company go through all the hassle of creating a new model just for the sake of creating it ending up losing money,time,and the model created turns to be that it suck and way behind similar sized ones??? then it stirked me like a lightning !! this is NVIDIA, the don't give a damn, they want you to buy/rent gpus to try their models!
This is Nvidia. They gave you tools to make new and better models. People who want to run these models which you make will also need graphic cards (than there are people like me who run 8b models on their smartphones lol).
Nvidia can certainly make money from people needing lots of GPUs for this. I think they also made it as a way to sell DGX machines which it is made to conveniently fit on. People who are buying a DGX usually want a model that they can run and FT locally for privacy reasons etc.
Releasing open models also makes perfect business sense when it comes to attracting and retaining the top research talent. As Meta knows, top researchers don’t just want a lot of money (which they get) they also want to see their results be used! (That results aren’t always strong is just the nature of research.)
@@samwitteveenai Another reason Nvidia is releasing this, I suspect, is that getting new accurate training data either for fine tuning or creating a new model is becoming more difficult. The large AI companies are creating their own synthetic data but Open Source users are are not licensed to use most models to create synthetic data. To keep Nvidia's gravy train running there needs to be more data. An H100 costs as much as a new car, and a DGX with 8xH100, well that's as much as a small train 🤣
@@jondo7680 The Llama3 70b model, which is comparative to this model, will actually run on a well speced MacBook Pro (You can't use it to make synthetic data, though). Maybe Nvidia is a little worried.
Sam, I like your content. How can I contact you about a project related to Vision?
hey thanks. Just ping me on Linkedin. Easier to chat there.
thank you for the update:)
Interesting model
Informative video
so what's actually better about this compared to whisper?
different kind of model, Whisper is for Speech to Text and this is Text to Speech (TTS)
@@samwitteveenai oh, I often confuse the abreviations. So it makes voice sounds.
Yes exactly.
This is a really important stuff
It sounds terrible. And clearly they are lying about "10 million hours". Just two chinese guys trying to rip you off.
It doesn't. Compared to what? For what price?
OMG your point is right on the spot! That's exactly the problem I had to deal with in my project
what version of chroma db you were using back then??
Not sure I think 1 or 2, that was about a year ago.
Thank you this was very instructive. Can you recommend the best libaries for : 1) sectioning a document based on topic changes, 2) summarizing each section while maintaining contextual continuity and coherence, and 3) combining the summaries into a cohesive final summary? I'm thinking something like transformers (Hugging Face), spaCy, Gensim, pandas?
I have a request. Can you please explain the Customer Support Bot that is an example in Langgraph documentation? Or if you could simplify some of the stuff from that tutorial so langchain beginners, who know agents and tools can follow the tutorial? I find that official langgraph tutorial video on YT extremely lacking.
let me take a look into it.
@@samwitteveenai Thank you so much!
The fact that it allows you to get a random speaker sample and then hold it for later use is very intriguing. It's something I wished for the first time I encountered the ElevensLab and similar platforms like Suno. Additionally, training the model with those extra tokens is another interesting feature that's challenging to achieve in ElevensLab. Now, I'm curious about the format of the data when you get that random speaker. Is it a tensor? If so, can you perform arithmetic on it? For example, if you have a sample representing a happy speaker, could you add or subtract it to/from other sounds? This could lead to some fascinating applications, similar to how you can manipulate word embeddings (e.g., "king" - "male" + "female" = "queen"). I'll definitely take a closer look at this. Thanks for sharing!
I'm using mbrola for my TTS in my home automation. Though it is outdated, the quality is still the best of the tools I've seen so far. This ChatTTS looks very promising.Which version did you use? Currently it is at 0.0.5 and that doesn't work the way you described it. Not even with the code sample shown on HF. The keyword "compile" is not in chat.load_models. The chat.sample_random_speaker doesn't exist either. I've used it with python 3.11. BTW: Would be nice if it could understand Speech Synthesis Markup Language (SSML). If anybody knows a similar TTS which does, drop me a hit please. :)
Any recommendations for arabic embeddings?
Check out the Cohere multi-lingual ones or BDE-M3 which is an open source multilingual embedding model
Thank you! 😃