ML Collective (MLC) is an independent, nonprofit organization, with a mission to make research opportunities accessible and free, by supporting open collaboration in machine learning (ML) research.
Subscribe our channel to support our effort, and follow our events, especially the "Request for Plot" events where people pitch their project ideas and recruit collaborators!
More on our website: mlcollective.org/
Banner Image: "A Violet and Light Pink Tapestry representing the Collective Researcher Brain. Tessellation by M.C. Escher", generated by Nicholas Bardy.
Пікірлер
Love this and want to get involved. but looks like discord link is broken!
Loved the guy's explanation of DNA
Quick note: @3:10 when I discuss the step size stability threshold, I mistakenly say that the maximum stable step size is 2/η. I meant to say 2/sharpness! Equivalently, if the step size is fixed at η then the stability requirement is sharpness <= 2/η.
Discovering this channel is a source of joy for me as I delve into the fundamentals and connect with a supportive community that will offer insights for my projects. It feels like a dream come true!
please share slide
You should provide GitHub link for this work.
How can I join these meetups ?
These video is really helpful. I expect more videos to come.
I wanna know about Aditya's education background. What he studied and where?
Undergrad at NYU (CAS/Courant School)
@@dfmrrd Source? I didn't find any of his social media except Instagram and he is not even on linkedin ig.
At a startup, would a generalist have greater value?
These are great insights.
Thank you for sharing this! One's personal schedule can often make opportunities like this slip out of reach. Having it made available as a recording is most appreciated. I would suggest the viewer also check out Keerthana's website link above. A person to watch and someone who will go very far indeed! :)
Thanks for posting these as videos
very frank and insightful talk, i wish all top industry performers analyzed themselves in public like this. thank you!
interesting work, and helpful to democratize large RL model~
Great talk! One point is that the argument for why the lambda is seemingly at 0.5 doesn't seem right. Because these cases are chosen with random seeds, all you can expect is that the distribution of lambda is peaked at 0.5 (for lots and lots of seeds) but it doesn't follow by symmetry that it would be exactly 0.5. That seems to warrant an explanation.
This was a great talk! I missed the live talk. Thanks for recording this one.
Excellent
Thank you
Great video!
47:00
Amazing discussion
It was wonderful to present our work in this workshop, keep up the great work.
Is the book available for free?
It is not. We had limited-time access to drafts for the purposes of the reading group. The link to preorder is here: twitter.com/chipro/status/1526049559540944897?s=20&t=MC7VnVXF0evyvIwDdK0kbA
Support it! I believe ur job is meaningful.
Nice to see ML collective has a KZread channel. Didn’t watch the whole vid but I know Rosanne is top notch from Twitter :)
Followup on the "Overfitting a Single Batch" discussion from 31:49 -- I did some experiments to follow up on my claim about Transformers not being able to overfit single batches, and I actually want to weaken it a lot. I spent some time with HF Transformers and I've been able to get them to consistently overfit single batches for simple tasks like sequence classification. The other transformer problem I was working had a more difficult task -- image-to-text -- and the implementation was not as well-tested. Results are here: wandb.ai/cfrye59/hf-transformers-overfit-glue-mrpc/sweeps/soi1gyw5?workspace=user-cfrye59 Code is here: colab.research.google.com/drive/1pAWd6MsY4yJrjoqknIbPGxW0usiTTAOJ?usp=sharing The issues with the initialization, normalization, and gradient stability of the TF architectures are real. I've seen them in real-world models, e.g. from BigScience @ HF huggingface.co/bigscience/tr11-176B-ml-logs/tensorboard and in Dall-E mini from Boris Dayma twitter.com/charles_irl/status/1506487785783365633?s=20&t=qcNiNoQ9OF6uJmmqFx20SQ. They may still be related to the failure of the other model+task combo, but they're not as bad as I thought.
Will be session 4 uploaded? Or you leave chapter to the participants :)
Actually, Session 4 covers Chapter 5! The book is still being edited, and the numbering of the chapters changed mid-stream. So the next session is this one: kzread.info/dash/bejne/k5iTqtyIldjdoNI.html
@@charles_irl Thank you Charles, you're my best teacher ever in ML. 🔥
Your glasses remind me about adversarial attack on images. But its really very much colorful and nice @Charles
Bummed I missed this one. I’ll have to come do a quick share on progress
The area of the circular cross section perpendicular to the white pole black pole axis reduces when you get closer to the poles. This means that you have fewer shades to chose from. Isn't this invalid and shouldn't the number of shades remain the same?
Good initiative, keep up the great work.
This is one genuine talk.
Regarding a minor point around 8:45 mark -- I don't think that conference paper decisions are *that* correlated. Sure, strong papers get in, terrible papers get rejected. But for the mid-tier papers, re-submitting to different conferences is the action based on the belief that the reviewing process from one to the other is more independent (in a probabilistic sense) than correlated. Otherwise, if the reviewing processes are extremely correlated, a rejection from one conference is enough evidence that you shouldn't submit to somewhere else because they are all correlated.
Being open about personal experiences and vulnerabilities is still much too rare in tech. Thank you, Rosanne.
Hearing one of the ML community's rockstars share such an honest perspective on the struggles we likely all recognize is refreshing and motivating. Thank you for sharing this!!
i’m new to her work and need a bit of context - what are you referencing when saying she is a rockstar? (ie what must we know about her?)
With due respect, I do not buy the generalist argument for hiring. isn't there already so many people who know a little about everything (like RL, vision, gradient descent, conv nets, etc)? Even any fresh school graduate worked on ML should know a bit about these. Isn't it that, as a research community, we want to understand why deep learning works at the fundamental level rather than treating it as a black box, and that is where we need depth more than ever?
I think she meant being jack of all trades, master of one. BUT your 'jack' being equivalent to others 'master'. Also, I do agree to your point on interpretability of AI!
Realistic, open, and brave! Thanks a lot for this brilliant talk.
Nice video, thanks :)
Simply fabulous presentation! I love the thematic connection between the career advice of changing approach to alter outcomes, and the clever tweaking of the model to significantly change its output!
Here fully watching from Jamaica 🇯🇲👍
Incredibly brave and intelligent points to make. I hope it starts a lasting conversation, thanks for starting it.
Fantastic !!! Quiet relatable, inspiring, and very helpful. Thanks a lot, Rosanne :)
It is narrow when ... all of them are trying to hire the same kind of people, with the same rigid rubric. Can not agree more on this, we call this "内卷" in chinese.
I’m glad that you are an extremely petty person because I am just the same. Thanks for bringing up this topic.
Great talk! Your story almost brings tears to my eyes. 一定要成功呀!
great topic.
Amazing work, everyone!
This is an important message
Thanks Rosanne and Jason!