Shaw Talebi

3 ай бұрын

Text Embeddings, Classification, and Semantic Search (w/ Python Code)

Пікірлер

@ClumpypooCP18 сағат бұрын

In the Mapper algorithm, I am not sure what the projecting, covering, and clustering the pre-image have to do with each other. Basically, I'm wondering couldn't the very first step be some clustering algorithm and then you can immediately make a graph from that? I'm just not sure what the projecting onto lower-dimensions , covering etc have to do with the eventual clustering step.

@ClumpypooCP18 сағат бұрын

When someone says "high dimensional data" , like you said as an example some "500 dimensional data set", what do they mean exactly? Does it mean that each data point has 500 components or features to it?

@AlusineBarrie23 сағат бұрын

Thank you for the valuable content - clear, concise

@moitreegraph92 күн бұрын

thank you

@zihadedu63282 күн бұрын

you really deserve a subscribe

@2goofyskaters2 күн бұрын

Tysm!

@plant33413 күн бұрын

Bro got a degree in Bell curve

@ShawhinTalebi3 күн бұрын

😂😂

@eduardmart12373 күн бұрын

Is it feasible to train 7b qlora model on cpu only?

@ShawhinTalebi2 күн бұрын

While I haven’t tried this, you can definitely do it. The key is to have at least 16GB of RAM (and some patience). Not sure how long it would take. This post might be helpful: www.reddit.com/r/LocalLLaMA/s/uqTTi6CLnS

@habibtahardjebbar20563 күн бұрын

you are amazing man !!!

@michaelprinc3 күн бұрын

I can recommend to acquire skills of data engineer and business analyst when you are learning data science. Usually, you need all roles to deploy the model, deliver a presentation for business stakeholders and maintain the model in production.

@ShawhinTalebi3 күн бұрын

There’s definitely overlap with the data engineering skillset. Also, communication is critical!!

@TaLeng20233 күн бұрын

How does one even become one?

@ShawhinTalebi3 күн бұрын

This might be helpful :) kzread.info/dash/bejne/jGp3z9ePdcydgbw.html

@TaLeng20233 күн бұрын

@@ShawhinTalebi TY. Haven't watched it yet but how many hours would it take to be decent and I'd still need certification right? I'm in my late 30s and still don't know what to do. I dropped out off a CS course, job hopped for a while before just giving up. Am actually even contemplating learning a trade, specifically welding. 😅

@DekarNL4 күн бұрын

What about PowerBI?

@ShawhinTalebi3 күн бұрын

I’m sure there are data scientists out there who use it, but the vast majority of data science work is done using Python (or R)

@rainydaybats564 күн бұрын

Is this the same as a data analyst?

@DekarNL4 күн бұрын

There's probably a lot of similarities, but in general data scientists are working with more cutting edge questions, datasets, and use machine learning, to produce statistics and predictive models. Analysts work in business generating business info reports. A data scientist requires a much deeper knowledge of coding, AI, modeling and science.

@michaelprinc3 күн бұрын

You can improve your skill set to become a data scientist starting as an data analyst. I used to be an academic scientist focused on statistics, economics, and programming. I still needed to improve my skill set to become a data scientist.

@ShawhinTalebi3 күн бұрын

While job descriptions can vary from company to company, in my experience a data analyst typically has the following key differences. 1) they may not need to use Python or R, as most of their analysis is done via tools like Excel, Tableau, and PowerBI 2) they typically don’t build machine learning models Hope that helps!

@logicwerx5 күн бұрын

Hey man. Love the content. The post mortem was super informative and engaging. Your subscribers are on this journey with you. Need to chat again soon.

@user-vv7om6uq5w6 күн бұрын

Tony Stark's part hahahahhahahah

@uzairmalik70846 күн бұрын

Hey Shaw thanks for this wonderful series. I have completed it and learned so many new things but one thing I felt is that the code is very high level and it feels like to me that I have to remember most of the things during coding while practicing with those hugging face models. Do you have any suggestions for that?

@ShawhinTalebi6 күн бұрын

I think the best way to solidify your understanding is to apply it to real-world use cases.

@ajeethsuryash51237 күн бұрын

Very informative video. Thanks for sharing

@ama191287 күн бұрын

l want to ask if one has never worked in datascience field before can can l add the other jobs that is not related to data science to the Portofolio l will be creating

@ShawhinTalebi6 күн бұрын

Good question! It depends on the work history. I'd be hesitant to put much irrelevant on a portfolio for the same reason you wouldn't include it on a resume. I talk more about that here: kzread.info/dash/bejne/maRmm8GJY8LSl9I.html

@johnsantos23607 күн бұрын

VERY HELPFUL THANK YOU SO MUCH

@hardcoder-y6w8 күн бұрын

@Shaw Talebi - Thanks for creating such amazing video. Indeed it helped so much. I got a question here, do you provide AI or LLM training? Perhaps, boot camp?

@ShawhinTalebi6 күн бұрын

Glad it was helpful! While I currently only doing trainings for orgs, I'm putting one together for individuals. I can keep you posted if you are interested. Feel free to email me here: www.shawhintalebi.com/contact

@hardcoder-y6w6 күн бұрын

Excellent! @Shaw Talebi - Done! I will see you soon.Thanks :)

@user-vv7om6uq5w8 күн бұрын

Thanks!!!

@goinsgroove8 күн бұрын

Loving the info! Do you have a video on self-supervised training? I want to train a llm to write in my style.

@ShawhinTalebi6 күн бұрын

Not yet! But I think I'll do this for my next video because of your comment :)

@goinsgroove6 күн бұрын

@@ShawhinTalebi AWSOME!

@hugopristauz5388 күн бұрын

super helpful

@aberobwohl8 күн бұрын

Fascinating video on applying persistent homology to market analysis! I noticed two potential data leakage issues that could affect the results: In the initial data preparation (around line 29), the log-returns are calculated using future prices: r = np.log(np.divide(P[1:],P[:len(P)-1])). This means each return uses the next day's price, which wouldn't be known at the time. In the Wasserstein distance calculation loop (around line 46), the second time window r[i+w+1:i+(2*w)+1] uses future data that wouldn't be available at the prediction time. To fix these, you could calculate returns using only past data and adjust the second window to r[i+1:i+w+1]. These changes would ensure the analysis only uses information available at each point in time. Great work exploring these advanced techniques! Looking forward to seeing more.

@ShawhinTalebi6 күн бұрын

Thanks for the notes! I'll need to revisit this work to confirm there are no leaks as well as apply it to other time series.

@hugopristauz5389 күн бұрын

super interesting

@DRAI-ow1nq9 күн бұрын

you can make more money as a Data Analyst or Data Engineer than a Data Scientist

@ShawhinTalebi6 күн бұрын

Based on convos with other creator/consultants, I think you might be right 😅

@DRAI-ow1nq9 күн бұрын

Thanks Shaw! I always look forward to your videos

@danielbrockerttravel10 күн бұрын

This video’s title is misleading

@ShawhinTalebi6 күн бұрын

I'm sorry. The moral of the story was building an LLM from scratch is not feasible or desirable for most people. However, that's not to say one can't build a smaller language model from scratch (<100M parameters). This of course would likely be an academic exercise given most use cases can be solved via readily available models.

@caileymitchell532510 күн бұрын

Hey Shaw, appreciate your videos!

@NiteshKumar-up8ib10 күн бұрын

From where we all viewers get sildes of lecture

@ShawhinTalebi6 күн бұрын

Slides are available on the GitHub: github.com/ShawhinT/KZread-Blog/tree/main/LLMs/_slides

@adrian-laurentiuboicu900912 күн бұрын

Thank you for your tutorial!

@chequer-q3z12 күн бұрын

哇，像天书

@taylormorgan282212 күн бұрын

Hi Shaw! I heard that you may need to use code/html to ensure that email signatures are compatible when sending to different email platforms (gmail/outlook/etc.). Any tips for this? Have you had any issues with people viewing your email signatures after you created your design?

@ShawhinTalebi12 күн бұрын

I haven't run into any issues like that, but I'm sure using HTML would lead to better formatting across platforms.

@PrecedingPie12 күн бұрын

Wow your explanations are so tight. Not too dense and not fluffed at all. I followed along and was engaged throughout. Wonderful teaching style and I'm glad I found your channel ♥

@ShawhinTalebi12 күн бұрын

Glad it was helpful!

@user-wr4yl7tx3w13 күн бұрын

Great content. Fascinating.

@FirstNameLastName-fv4eu13 күн бұрын

Dont fall for these stupid ideas - this is good for KZread video - you will never be able to train it in a level that someone will pay "MONEY" for your trained model.

@ShawhinTalebi12 күн бұрын

Model as a service (MAAS) is a really interesting business model these days. I think of it like current data vendors, where the opportunity isn't necessarily sophisticated tech, but having the right data for the right use case. One I heard of last year was a small real estate data vendor doing about 1M/month with a 10 person team. I think a similar thing can be possible for MAAS.

@eliasromero756414 күн бұрын

Super helpful video, thanks Shawhin! Have you ever tried to build out this portfolio with multiple pages? Was thinking about creating a separate link for each of my project as I have a lot of pictures and things to say and curious if you've had experience with it

@ShawhinTalebi12 күн бұрын

I haven't but you can add additional markdown files to the repo to add pages. See example here: github.com/pages-themes/minimal

@MudroZvon15 күн бұрын

I used RAG in my life coach KZreadr AI character in Telegram. Prepared a lot of AI processed data of him, with fine-grained segments. The prompt itself is also crafted according to the latest Claude prompt discovery. The most amazing AI experience for me yet - JulienHimselfBot

@balubalaji995615 күн бұрын

Hey KZread algorithm , I loved this video . suggest me more of them

@ShawhinTalebi12 күн бұрын

Here's the series playlist: kzread.info/head/PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0

@balubalaji995615 күн бұрын

I loved the video

@ZixuanLiu-ld9qm16 күн бұрын

Thank you so much for the video! Just one question, did you use free colab or colab pro or did you pay for the gpu? thank you so much!

@ShawhinTalebi12 күн бұрын

I have paid but I used the T4 GPU which is part of the free version. The only difference is the free is subject to access restrictions during peak usage times.

@ZixuanLiu-ld9qm10 күн бұрын

@@ShawhinTalebi Thank you so much!!

@baluga-s6f16 күн бұрын

That Bayes probability at 10:26 (sum over z), shouldn't P(Z) on the right handed side be conditional probability as well?

@ShawhinTalebi12 күн бұрын

There we are essentially pulling Z out of the joint PDF by summing over it. The math is broken down further on page 14 of reference 1: www.degruyter.com/document/doi/10.2202/1557-4679.1203/html

@robw767616 күн бұрын

Young people who want to strike out independently should be thinking about products and platforms, not consulting services. After 10+ years consulting, I'd say contacts are everything. You need to work AS a contractor for years, delivering well, and moving regularly to build a network of other good contractors. The consulting model is sell the service, staff it, making somebody you trust the engagement lead and keep an eye on progress, but also move on to finding the next client. The big firms make a lot of their profit from the margin they get "bodyshopping" talented younger staff not yet with a CV that would let them go solo. They pay them a third of what they charge for them. To get to that scale, you've got to have the cash flow to recruit permanent staff and pay them when they are "benched". It's a big step up, which is why the market is quite polarised between "boutique" firms and global corps.

@ShawhinTalebi12 күн бұрын

Only if I saw this comment 12 months ago 😅 Thanks for sharing your insight, great perspective!

@rma156316 күн бұрын

I needed to know how parameter efficient finetuning works to finetune a voice encoder for emotion detection task. This video helped me a lot. I used LoRA for it. Thanks ❤

@ShawhinTalebi12 күн бұрын

Glad it was helpful!

@bithigh830116 күн бұрын

Like and Subscribe!!! Comment every video!!!

@telmorubioetxabe463817 күн бұрын

Amazing work! Thanks mate :)

@liaoyixu688217 күн бұрын

Very amazing video!!! I have one question: when I use your code to fine-tune the model with my own dataset, but since my dataset is too large it leads to memory error (not gpu memory) when I read the dataset, what should I do to avoid this issue? Can I read and fine-tune in a small batch?

@ShawhinTalebi12 күн бұрын

You can try reducing the batch size. Also happy to help troubleshoot via office hours: calendly.com/shawhintalebi/office-hours

@Free-pp8mr17 күн бұрын

Perfectly done! Certainly you better know what content is more interesting for your audience. But I will be happy to find more your videos on not so wide niches as LLM only, but TDA , RL and so on. You are in the top search for TDA and Persistent Homology. And I hope to see your videos in top for such subjects as RL algorithms for complex systems and games with imperfect information. Have a good day!

@ShawhinTalebi12 күн бұрын

Great suggestion! I've got RL on my list :)

@Tenebrisuk17 күн бұрын

So I think that, the volume of content pitched to a relatively niche audience, considering the number of ad blockers and the variability in ad rates, this is still a reasonable number. Is it something you could live off, nope, is it a fair re-imbursement for the amount of time invested, almost definitely not, but it is an expected number. Most channels see slow, steady growth, with the odd blip if something suddenly goes viral, which is less likely to have such a huge impact in a highly technical niche. Even when a video does go viral, sometimes that can be a very delayed hit, with the video "popping off" many months after it was posted. Everything I read / watched on the topic of youtube for niche audiences suggests that the real way to truly get value from videos is to use them to funnel people into other more profitable routes, to use it as an acquisition tool, and as any marketer knows, customer acquisition costs money, or in this case more specifically time. To monetize your audience (without ramming it down their throats) rather than relying on ad revenue. I like your content, I recommended you to some friends, who may recommend you to others - there is your slow and steady organic growth. Keep it up. The road is long, the road is hard, but worthwhile efforts are rarely easy.

@afroozansaripour48717 күн бұрын

Very helpful! what is the package for truncated factorization formula, any sample code like your other videos! Thanks for sharing all of this!

@ShawhinTalebi12 күн бұрын

Check out do why: github.com/py-why/dowhy

Shaw Talebi

I Was Wrong About YouTube (what I learned)

3 Reasons Businesses Should NOT Use AI

The #1 Skill That Holds (Most) Data Scientists Back

What Nature Can Teach Us About Business

Automating Data Pipelines with Python & GitHub Actions [Code Walkthrough]

How to Deploy ML Solutions with FastAPI, Docker, & AWS

How to Build ML Solutions (w/ Python Code Walkthrough)

How to Build Data Pipelines for ML Projects (w/ Python Code)

How to Manage Data Science Projects

4 Skills You Need to Be a Full-Stack Data Scientist

How I'd Learn Data Science (if I started over)

I Was Wrong About AI Consulting (what I learned)

Text Embeddings, Classification, and Semantic Search (w/ Python Code)

How to Improve LLMs with RAG (Overview + Python Code)

QLoRA-How to Fine-tune an LLM on a Single GPU (w/ Python Code)

3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning

AI for Business: A (non-technical) introduction

5 Questions Every Data Scientist Should Hardcode into Their Brain

How Much YouTube Paid Me in My First 6 Months of Monetization (as a Data Science Creator)

4 Ways to Measure Fat Tails with Python (+ Example Code)

Detecting Power Laws in Real-world Data | w/ Python Code

Pareto, Power Laws, and Fat Tails-what they don’t teach you in STAT 101

I Spent $716.46 Talking to Data Scientists on Upwork-Here’s what I learned.

I Have 90 Days to Make $10k/mo-Here's my plan

How to Build an LLM from Scratch | An Overview

Fine-tuning Large Language Models (LLMs) | w/ Example Code

Prompt Engineering: How to Trick AI into Solving Your Problems

Why I Quit My $150,000 Data Science Job

The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio

Пікірлер