Weights & Biases helps AI developers build better models faster. Quickly track experiments, version and iterate on datasets, evaluate model performance, reproduce models, and manage your ML workflows end-to-end.
Please subscribe to our channel and turn on notifications for future videos!
Пікірлер
❤
Eenneeabbaannbbaeeaaaeebbaabnnnaneebaaeebeeaeeebbaaannannabaeeannbaaaaabbbnnneeeeaeeeebbnnneeeebnnebbeeaeeeeeebaabaaannbbbbeee
Qreerbbeeqnrrerbbeeeeeeqreeebbbbeeeeebeeeeeeeqnnrnnnreebnqeebeeeeeeeebeeeeeebnnnnbeeeeeebbqeebqannbbeebbeebaaeeabeebannnqbbbeeeebaeeeeeaeebebaeannanneebaabaannaabebbaeaeeeaeebqeebannannaannabbannannbnnaannbannbeeeebnnnnnneabnqeeebaeebnaaaaaarrnnee
As mentioned in a previous comment, a significant challenge in applying machine learning to drug discovery projects lies in the scarcity of robust and well-structured data. For instance, a major factor contributing to the failure of drug discovery endeavours is the suboptimal ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties. The landscape could be transformed if we could develop models capable of predicting the outcomes of in vitro assays, allowing us to streamline the selection of well-optimized candidates for pre-clinical trials. However, the publicly available ADMET data is notably deficient in both quality and quantity, leading to the development of models that lack robustness.
Summary: Safety and size. The end.
Also 22:08 commenting on Lukas' question. The data in biological world are different from NLP or CV data in various ways, just to name a few: 1. In biology, the experiment data is only an estimation of the physical ground truth and often inconsistent, whereas in many other domain basically the test corpus used for model training is the same in training and real world. So the intrinsic noise within would impact the ceiling of how a model could be evaluated. Since the data is not ground truth, there is a greater gap between model output and reality, given even if the model is perfect on the testing data. 2. The lack of data is real. Partially because bio data is expensive. For CV an annotator could label a dozen or even a hundred pictures per hour and it costs less than $100. But in bio world, on average a single row of data could cost $100-$1000, even over $10k or more for things like protein structure, and takes days or weeks generate. It also requires high level expertise to conduct these experiments, and often repeats need to be done to analyze the intrinsic variances of these data. 3. The format of bio data is so diverse. For LLM, text is all you need, add voice and moving pictures we can train SORA. But in biology, there are hundreds of tasks, structure, affinity, stability, toxicity... each task has many different experiment types. Well. If you are interested in more about this my twitter is also NachuanShan. I work at BioMap as a data product manager, building protein language models.
20:27 cyberattackers watching this: "wtf I love ChatGPT now"
Great conversation. Love this topic.
What will be the SQLite of LLMs, with capability for local use? Llama?
Very insightful and informative
How silly is to redteam a model which you control the training data to check for bioweapons capabilities. How stupid should you have to be? Isn’t easier to run a search on the data 😂😅
How much did it cost to build, including hardware and engineering costs?
vin diesel!
There's a universe where Joseph Spisak is Mark Zuckerberg's brother. Oh, and nice presentation. Wonderful work they are doing at Meta AI.
My favorite fact from this is that the smarter the model, the more it violates rules. Just like us :)
Very true ! People who are way smarter on tax laws are the one who violate most , innocent people pay more than what they are supposed to etc . Same goes with many other laws
Or, the rules it uses instead of the rules we assumed are different.
Congratulations!
Can we use of for requirement classification ?
I’m glad they saw how useless they made codellama 😂, it was waaaay overly aligned
Thanks for this W&B
Our pleasure!
so all those supervisor/safeguard models are only utilized during training? i mean, once the weights of llama3 are out, there is no safeguard network between user and inference engine right?
I'm sure they have a safety model that tries to review every request and catch some negative responses.
a few hours go by...llama 3 no longer SOTA
That's why they open source it. They let the community figure things out and iterate. For Meta LLM is just a tool and not a product on itself
Wait what is sota now?
@@SkepticButOptimist "state of the art"
Which model is sota?
I think it's supposed ro be either phi or sensenova, neither of which are released @@JeiShian
I really enjoyed this. Thanks
Glad you enjoyed it!
I think he could have said "ridiculous" a bit more often
when i run detect.py --source "url" --weights best.pt i get this error ///// File "C:\Users\derp\AppData\Local\Programs\Python\Python39\lib\pathlib.py", line 1084, in __new__ raise NotImplementedError("cannot instantiate %r on your system" NotImplementedError: cannot instantiate 'PosixPath' on your system
I think you might have mentioned the other way around. If I want to apply transformation A to the input and then the transformation B then it should be A x B. But in your video you were mentioning "apply transformation A to a vector after applying transformation B". kzread.info/dash/bejne/dJ-mzM-rn8q2Z5M.html Let see an example, an input matrix X with shape [4,3], matrix A with shape [3,2] and matrix B with shape [2,3]. Now if I apply X x A and get output C. Then C x B the final shape is [4,3]. To achieve this with matrix composition you have to do A x B which will give you the shape [3,3]. May be your intention is the same as above but you have mentioned it as "apply transformation A after applying B" which you want to say "apply transformation B after applying transformation A" and represent it as A x B. Update: I think I understand how things are done in the domain of Machine Learning. When someone says multiply an input I with matrix M, its actually M x I not I x M. That is reason why everything is multiplied Right to Left and we transpose the row vector into a column vector. When I was searching on the web, the reason why we do a column vector rather than a row vector is because of historical reasons. Its similar to why array indexes start from 0 rather than 1. But its definitely not intuitive. Rather than transposing a row vector into a column vector and doing M x I, it is easy just using row vectors and doing I x M.
nice and informative video
Make the chat bubbles for each person talking different colors to mimic a text conversation- it’s subtle but easier to follow
Yes, that's how most of bio brains work
Polly, you should stop crackling your voice. It is very annoying and seriously detracts from the otherwise interesting content.
That helps to get the understanding of same book from different perspectives
solid! can see and feel your passionate through the screen bro. Excited to go through this playlist. I just got hired as a junior data scientist but struggle with the math portion of machine learning especially linear algebra and calculus.
Fantastic!
👀
raise NotImplementedError("cannot instantiate %r on your system" NotImplementedError: cannot instantiate 'PosixPath' on your system
Great tutorial, but i want to find multi-agent Independent PPO implemented with custom-made scenario, do you know where i can learn from?
Is bluesky a vector database? Also Grok AI has strange JAX vector weights? When Twitter is a graph database?
Exciting!!
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: " went into the python/script folder launched cmd from there"
Why do you need to modify Adam’s epsilon? What does that do?
My NN is not learning even thought I have the optimize step in my def train(model, config). Does someone have the same problem?
I understand now
*recurrent
Would like to detect boundaries of a race track instead of objects, how do i do that? My goal is to remote control a racing videogame by looking at the video feed of a camera or ps remote play on a pc window, surely it could detect other cars and avoid them but it will go straight off track at turn 1 if it doesn't know the layout or can see boundaries, can you help?
Good book, but the author really phoned it in on page 800, totally unrealistic
Really impressed to hear the thoughts
How do you guys make sure that there are no hidden big risks within the models?
Well, here's my experience listening to this: I gave up after 30 minutes. I learned nothing. It's frustratingly vague. Saying "we use ml" on a ml podcast is useless! The audience are here for technical details. If you can't share, then why are you here? (sorry, I'm mad because I am frustrated at the answers I'm hearing)
Excellent interview. A feet-on-the-ground view on making AI really work real in the Healthcare setting. Love how Mayo takes the time to explore something like GenAI to define where it can best be of value. This approach finds the right balance between being visionary and cautious. Good to see Mayo's collaborations expand worldwide. This doesn't just bring more data together, but hopefully replicates & scales that mindset too.
Looking at heathcere we should be aware that at some time in our life we will be patients - if AI augment doctors decision I want to be treated with doctor + AI teams