Welcome to NLP, now.
We’re kickstarting a new chapter in machine learning by giving developers and businesses access to NLP powered by the latest generation of large language models, now.
Our platform can be used to generate or analyze text to do things like write copy, moderate content, classify data and extract information, all at a massive scale.
Пікірлер
Great webinar! I found that a good audience for it are people that already know how to build a RAG-powered tool but want to get some deeper insights: challenges to look out for & what works vs.what doesn't. Definitely useful as this kind of info is sparse out there.
100%
this is not an informative video, do not waste your time watching this video,
That's odd. I found it very informative.
this was nice.
I've never heard VAE described with such depth and clarity. Not going to pretend I understood all of the maths, but I nodded along anyhow.
Thank you Dr. Saquib, for the great talk.
Hi, thank you for the presentation. Can this approach be adapted to cluster sentences or paragraphs/ short ‘documents’? The notebook examples were of word embeddings.
sure, you can cluster or visualize the sentence embeddings - any vectorized representation/s for that matter.
Great job, Surya Krishna and team! I am happy to see the contributions you are making to the Telugu community in terms of high end LLMs, which are popularly dominated by the English datasets. Proud of you!! 💐🙏👏
I can understand word embedding as the process of passing the digital form of the word through the embedding network to get the word embedding. But is sentence embedding a combination of word embedding?
How to sign up? There are some mistakes when I sign up like 'Failed to fetch'
Promo_SM 🤗
Happy to watch a video that recalls that a vector is an element of a vector space, love maths :)
Kite is a type of bird ;)
How does one sign up for LLM University?
nice talk, thanks!
This complex topic explained beautifully; I recently read an article on this that intrigued me. This is the link to the article: ask.wiki/2024/01/09/semantic-search-an-intelligent-way-for-browsing/
I feel honored to have been a part of this project. It made me believe in myself more. Thank you :)
MaashaaAllah MaashaaAllah
Great video!
@cohere, @jpvalois: Excellent video however, there are some inaccuracies: 6:00 instead of "series of transformer blocks", should be "series of transformers" only? (or: attention block and feedforward block?). The description here says "three transformer layer"? The description also says "attention blocks". 09:00: the attention and feedforwarding blocks should be left to right and the arrows also left to right - this reflects the flow of the data better 09:20: should be "feedforwarding layer" instead of only "layer"? 09:40: is the first layer not an attention layer? And could an attention layer and a feedforwarding layer be combined to a transformer layer? (see "transformer blocks" on 06:00)
Good Job Marzieh! Keep up the great work!
where I can learn all of this BERTopic as mathematical procedure not computational?
No idea if you're interested in an AI Climate Change Foundation Model but here goes... Action Required for AI Climate Change Foundation Model: Include ALL DATA from the attached Wildfires paper, one of thousands of relevant papers. Need to interrogate an unbiased database of all relevant data going back at least 10,000 years. Need to know where the water has been and where it's going. Need to know if 65% of Russia is permafrost, the mathematics of the melt rate ( IT'S EXPONENTIAL, NOT LINEAR, CUZ IT'S BIOLOGICAL, AND GOING FROM FLAT TO VERTICAL and we're next), and where the Russian permafrost meltwater is going. Separate email on Canada emergency lack of preparedness and palpable incompetence . Nota bene: The size of the hardware indicated for AI Climate Change Foundation Model is global, everything from underseas fiberoptic to remote weather stations to satellites. Exemplia gratia: Permafrost microbiome is biological and multiples like rabbits...2 billion, 4, 8, 16, 32 kind of graph. Gives rise to sudden, catastrophic (biblical) events. 😩 Pairs well with books on bison, papers on ash layers, near extinction events in North America. Sadly our experts lack requisite expertise and skills. AI community indicated. Energy cost of foundation model and AI hardware attached. So, to keep costs down, restrict to 10,000 years of relevant Climate Change data. Reuters CEO has billions for AI project so that's another source of funding. Nota bene: Of hundreds of trillions of investments, much is cash equivalent and can and does shift in milliseconds from endangered investments. Kindly verify or rebut everything in this
Good to see you here Swabha! Long time :)
Does BERTopic need preprocesing like lemmatization, tokenization and removing stopwords?
Let's say one two movies have value for Action and Comedy 0, 1 the dot product will be 1. lets say another two movies have value of 0, 2 for both movies. And it will get dot product of 4. And will conclude second set is more simmilor than first set. But it is not the case in real. Could you please explain this.
Great stuff as always from Jay
Multilingual AI LLMs trained responsibly enable greater creativity, learning, and collaboration. Scientists can research across borders and cultures, enabling new discoveries. Isolated communiities can gain empathy for previously distant cultures. Kudos to the Cohere AI team and all the worldwide collaborator on this human interest project.
this is the best thing that came out of 2023!
First, congratulations on the launch! Second, how well does it perform in translation tasks between Spanish and English, for example?
Thanks Luis! Your explanation hits just the right notes for me: no fluff, not too complex, well structured, logical, good rhythm. Excellent overall. I'll be checking out your other material. Merci beaucoup!
Thank you very much.
Thanks for sending me here Jay
Very wonderful and informative interview Thanks Jay and Omar
you're such a great educator
Scale AI is way better than this
Excellent Video
Thanks - great talk. The problem with MI is its complexity. When each token has a thousand vectors and linear transforms on all of them makes it too complex for our normal human brains. I would like to use a cut down language, say just subject, verb, object, in present tense, no adverbs, conjunctions or even periods. Then limit the embeddings to, say, 5 vectors per token instead of hundreds or thousands. I bet human brain only uses 4 or 5 for each word. Then train the toy model on very very simple statements to "understand" and as few layers as possible. Then collect neuron data whilst changing activation function and the soft max function and more .... Its better when the model can only "say" ten or twenty things where it understands the syntactic logic rather than just memorising stuff. THEN observe what the neurons are doing - smaller and less complex. These LLMs are ridiculously complex to observe neurons. I am trying to make a very very small language syntax but need some tips. There is probably a simple way of doing it but I haven't found it yet. Maybe just simple additions or what else is possible?
Thank you!
over acting ki had hothi arjun , or bhai nahi degi chutiyeh kitna haat hila leh
Silly question here, Maarten. Can we use BERTopic in R? any go-around or emulation will be most welcome. TIA
Host needs to fix the mic at least, if not the whole "vlog" setup.
Thanks to Rosanne Liu. Your stroy inspires me a lot.
Proud of you our Daughter, Allmighty bless u in every steps.
very precise and nicely demonstrated with easy examples.. appreciate for such a wonderful explanation!
Hi Jay, thanks for the video. I have been doing something similar but I have faced a few issues. The first one is that KMeans doesn't seem to work so well for this use case as it requires a defined set of clusters and I was struggling to find the most optimal set of clusters. So I used HDBSCAN that can have a dynamic number of clusters, but it doesn't do well in high dimension vectors. I ended up first doing the dimension reduction via UMAP and then run the HDBSAN. It gives somewhat good results but I also have to play with the hyper parameters to find the most optimal result. Since that video, do you have some learnings around using other clustering algorithms for embeddings?
I wrote before listening to the Q&A part of the video😅. The third option you mentioned works quite well using UMAP to reduce to 15 to 20 dimensions, run HDBSCAN for the clustering, and finally re run UMAP again to plot. That really unblocked me towards better results
I admire Vered Shwartz's work, and she is an inspiration for my pursuit and exploration of this field. Thank you!
Hi Arjun and Sonam.
✨✨✨✨
Does cohere have developers that business owners can hire or work with to implement an embedding database?