Data Scientist | PhD, Physics
--
I got my PhD from UT Dallas, where I applied data science techniques to human health and performance research.
During grad school, I began freelancing with my data science and research skills on Upwork and beyond.
After graduating, I worked as a data scientist for 1 year at Toyota Financial Services.
In July 2023, I left my DS role to pursue entrepreneurship full-time.
And now I am here.
Пікірлер
In the Mapper algorithm, I am not sure what the projecting, covering, and clustering the pre-image have to do with each other. Basically, I'm wondering couldn't the very first step be some clustering algorithm and then you can immediately make a graph from that? I'm just not sure what the projecting onto lower-dimensions , covering etc have to do with the eventual clustering step.
When someone says "high dimensional data" , like you said as an example some "500 dimensional data set", what do they mean exactly? Does it mean that each data point has 500 components or features to it?
Thank you for the valuable content - clear, concise
thank you
you really deserve a subscribe
Tysm!
Bro got a degree in Bell curve
😂😂
Is it feasible to train 7b qlora model on cpu only?
While I haven’t tried this, you can definitely do it. The key is to have at least 16GB of RAM (and some patience). Not sure how long it would take. This post might be helpful: www.reddit.com/r/LocalLLaMA/s/uqTTi6CLnS
you are amazing man !!!
I can recommend to acquire skills of data engineer and business analyst when you are learning data science. Usually, you need all roles to deploy the model, deliver a presentation for business stakeholders and maintain the model in production.
There’s definitely overlap with the data engineering skillset. Also, communication is critical!!
How does one even become one?
This might be helpful :) kzread.info/dash/bejne/jGp3z9ePdcydgbw.html
@@ShawhinTalebi TY. Haven't watched it yet but how many hours would it take to be decent and I'd still need certification right? I'm in my late 30s and still don't know what to do. I dropped out off a CS course, job hopped for a while before just giving up. Am actually even contemplating learning a trade, specifically welding. 😅
What about PowerBI?
I’m sure there are data scientists out there who use it, but the vast majority of data science work is done using Python (or R)
Is this the same as a data analyst?
There's probably a lot of similarities, but in general data scientists are working with more cutting edge questions, datasets, and use machine learning, to produce statistics and predictive models. Analysts work in business generating business info reports. A data scientist requires a much deeper knowledge of coding, AI, modeling and science.
You can improve your skill set to become a data scientist starting as an data analyst. I used to be an academic scientist focused on statistics, economics, and programming. I still needed to improve my skill set to become a data scientist.
While job descriptions can vary from company to company, in my experience a data analyst typically has the following key differences. 1) they may not need to use Python or R, as most of their analysis is done via tools like Excel, Tableau, and PowerBI 2) they typically don’t build machine learning models Hope that helps!
Hey man. Love the content. The post mortem was super informative and engaging. Your subscribers are on this journey with you. Need to chat again soon.
Tony Stark's part hahahahhahahah
Hey Shaw thanks for this wonderful series. I have completed it and learned so many new things but one thing I felt is that the code is very high level and it feels like to me that I have to remember most of the things during coding while practicing with those hugging face models. Do you have any suggestions for that?
I think the best way to solidify your understanding is to apply it to real-world use cases.
Very informative video. Thanks for sharing
l want to ask if one has never worked in datascience field before can can l add the other jobs that is not related to data science to the Portofolio l will be creating
Good question! It depends on the work history. I'd be hesitant to put much irrelevant on a portfolio for the same reason you wouldn't include it on a resume. I talk more about that here: kzread.info/dash/bejne/maRmm8GJY8LSl9I.html
VERY HELPFUL THANK YOU SO MUCH
@Shaw Talebi - Thanks for creating such amazing video. Indeed it helped so much. I got a question here, do you provide AI or LLM training? Perhaps, boot camp?
Glad it was helpful! While I currently only doing trainings for orgs, I'm putting one together for individuals. I can keep you posted if you are interested. Feel free to email me here: www.shawhintalebi.com/contact
Excellent! @Shaw Talebi - Done! I will see you soon.Thanks :)
Thanks!!!
Loving the info! Do you have a video on self-supervised training? I want to train a llm to write in my style.
Not yet! But I think I'll do this for my next video because of your comment :)
@@ShawhinTalebi AWSOME!
super helpful
Fascinating video on applying persistent homology to market analysis! I noticed two potential data leakage issues that could affect the results: In the initial data preparation (around line 29), the log-returns are calculated using future prices: r = np.log(np.divide(P[1:],P[:len(P)-1])). This means each return uses the next day's price, which wouldn't be known at the time. In the Wasserstein distance calculation loop (around line 46), the second time window r[i+w+1:i+(2*w)+1] uses future data that wouldn't be available at the prediction time. To fix these, you could calculate returns using only past data and adjust the second window to r[i+1:i+w+1]. These changes would ensure the analysis only uses information available at each point in time. Great work exploring these advanced techniques! Looking forward to seeing more.
Thanks for the notes! I'll need to revisit this work to confirm there are no leaks as well as apply it to other time series.
super interesting
you can make more money as a Data Analyst or Data Engineer than a Data Scientist
Based on convos with other creator/consultants, I think you might be right 😅
Thanks Shaw! I always look forward to your videos
This video’s title is misleading
I'm sorry. The moral of the story was building an LLM from scratch is not feasible or desirable for most people. However, that's not to say one can't build a smaller language model from scratch (<100M parameters). This of course would likely be an academic exercise given most use cases can be solved via readily available models.
Hey Shaw, appreciate your videos!
From where we all viewers get sildes of lecture
Slides are available on the GitHub: github.com/ShawhinT/KZread-Blog/tree/main/LLMs/_slides
Thank you for your tutorial!
哇,像天书
Hi Shaw! I heard that you may need to use code/html to ensure that email signatures are compatible when sending to different email platforms (gmail/outlook/etc.). Any tips for this? Have you had any issues with people viewing your email signatures after you created your design?
I haven't run into any issues like that, but I'm sure using HTML would lead to better formatting across platforms.
Wow your explanations are so tight. Not too dense and not fluffed at all. I followed along and was engaged throughout. Wonderful teaching style and I'm glad I found your channel ♥
Glad it was helpful!
Great content. Fascinating.
Dont fall for these stupid ideas - this is good for KZread video - you will never be able to train it in a level that someone will pay "MONEY" for your trained model.
Model as a service (MAAS) is a really interesting business model these days. I think of it like current data vendors, where the opportunity isn't necessarily sophisticated tech, but having the right data for the right use case. One I heard of last year was a small real estate data vendor doing about 1M/month with a 10 person team. I think a similar thing can be possible for MAAS.
Super helpful video, thanks Shawhin! Have you ever tried to build out this portfolio with multiple pages? Was thinking about creating a separate link for each of my project as I have a lot of pictures and things to say and curious if you've had experience with it
I haven't but you can add additional markdown files to the repo to add pages. See example here: github.com/pages-themes/minimal
I used RAG in my life coach KZreadr AI character in Telegram. Prepared a lot of AI processed data of him, with fine-grained segments. The prompt itself is also crafted according to the latest Claude prompt discovery. The most amazing AI experience for me yet - JulienHimselfBot
Hey KZread algorithm , I loved this video . suggest me more of them
Here's the series playlist: kzread.info/head/PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0
I loved the video
Thank you so much for the video! Just one question, did you use free colab or colab pro or did you pay for the gpu? thank you so much!
I have paid but I used the T4 GPU which is part of the free version. The only difference is the free is subject to access restrictions during peak usage times.
@@ShawhinTalebi Thank you so much!!
That Bayes probability at 10:26 (sum over z), shouldn't P(Z) on the right handed side be conditional probability as well?
There we are essentially pulling Z out of the joint PDF by summing over it. The math is broken down further on page 14 of reference 1: www.degruyter.com/document/doi/10.2202/1557-4679.1203/html
Young people who want to strike out independently should be thinking about products and platforms, not consulting services. After 10+ years consulting, I'd say contacts are everything. You need to work AS a contractor for years, delivering well, and moving regularly to build a network of other good contractors. The consulting model is sell the service, staff it, making somebody you trust the engagement lead and keep an eye on progress, but also move on to finding the next client. The big firms make a lot of their profit from the margin they get "bodyshopping" talented younger staff not yet with a CV that would let them go solo. They pay them a third of what they charge for them. To get to that scale, you've got to have the cash flow to recruit permanent staff and pay them when they are "benched". It's a big step up, which is why the market is quite polarised between "boutique" firms and global corps.
Only if I saw this comment 12 months ago 😅 Thanks for sharing your insight, great perspective!
I needed to know how parameter efficient finetuning works to finetune a voice encoder for emotion detection task. This video helped me a lot. I used LoRA for it. Thanks ❤
Glad it was helpful!
Like and Subscribe!!! Comment every video!!!
Amazing work! Thanks mate :)
Very amazing video!!! I have one question: when I use your code to fine-tune the model with my own dataset, but since my dataset is too large it leads to memory error (not gpu memory) when I read the dataset, what should I do to avoid this issue? Can I read and fine-tune in a small batch?
You can try reducing the batch size. Also happy to help troubleshoot via office hours: calendly.com/shawhintalebi/office-hours
Perfectly done! Certainly you better know what content is more interesting for your audience. But I will be happy to find more your videos on not so wide niches as LLM only, but TDA , RL and so on. You are in the top search for TDA and Persistent Homology. And I hope to see your videos in top for such subjects as RL algorithms for complex systems and games with imperfect information. Have a good day!
Great suggestion! I've got RL on my list :)
So I think that, the volume of content pitched to a relatively niche audience, considering the number of ad blockers and the variability in ad rates, this is still a reasonable number. Is it something you could live off, nope, is it a fair re-imbursement for the amount of time invested, almost definitely not, but it is an expected number. Most channels see slow, steady growth, with the odd blip if something suddenly goes viral, which is less likely to have such a huge impact in a highly technical niche. Even when a video does go viral, sometimes that can be a very delayed hit, with the video "popping off" many months after it was posted. Everything I read / watched on the topic of youtube for niche audiences suggests that the real way to truly get value from videos is to use them to funnel people into other more profitable routes, to use it as an acquisition tool, and as any marketer knows, customer acquisition costs money, or in this case more specifically time. To monetize your audience (without ramming it down their throats) rather than relying on ad revenue. I like your content, I recommended you to some friends, who may recommend you to others - there is your slow and steady organic growth. Keep it up. The road is long, the road is hard, but worthwhile efforts are rarely easy.
Very helpful! what is the package for truncated factorization formula, any sample code like your other videos! Thanks for sharing all of this!
Check out do why: github.com/py-why/dowhy