Sebastian Raschka

Sebastian Raschka

I love creating educational content around machine learning and deep learning, and I created this channel to put things out there that you hopefully find useful.

I'm also the author of "Build a Large Language Model From Scratch" (mng.bz/amjo), Machine Learning Q and AI (nostarch.com/machine-learning-q-and-ai), and other books (sebastianraschka.com/books/).

During the day, I work as Staff Research Engineer Lightning AI, I focus on the intersection of AI research, software development, and large language models (LLMs).


Пікірлер

  • @akashkn6940
    @akashkn694011 сағат бұрын

    Starting Today 🫡!

  • @chineduezeofor2481
    @chineduezeofor248117 сағат бұрын

    Thank you Sebastian for your awesome contributions. You're a big inspiration.

  • @amrelsayeh4446
    @amrelsayeh4446Күн бұрын

    @sebastian At 13:20, why is the solution between the global minimum and the penalty minimum lie somewhere where one of the weights is zero. In other words, why it should lie at the corner of the penalty function not just at the line. between the global minimum and the penalty minimum.

  • @blogger.powerpoint_expert
    @blogger.powerpoint_expert6 күн бұрын

    how can i refer to that material? if i want to refer that in my work?

  • @studyselection2881
    @studyselection28817 күн бұрын

    At 25:08: When you said we are making 30 predictions, we are making these for each base classifier, right? So in total 30 * T predictions for each k-itteration.

  • @gabrielange-ux1kz
    @gabrielange-ux1kz9 күн бұрын

    I wanted to ask a question: For Classifier Comparison, do we need to perform normality test before deciding to go with a parametric test or non parametric (such as macneymar)?

  • @ArbaazBeg
    @ArbaazBeg12 күн бұрын

    Should we give prompt to LLM when fine tuning for classification with last layer modification or directly pass the input to the LLM like in deberta?

  • @SebastianRaschka
    @SebastianRaschka12 күн бұрын

    Thanks for the comment, could you explain a bit more what you mean by passing the input directly?

  • @ArbaazBeg
    @ArbaazBeg5 күн бұрын

    @@SebastianRaschka Hey, sorry for the bad language. I meant should the chat formats like alpaca etc be applied or we give the text as it is to LLM for classification.

  • @SebastianRaschka
    @SebastianRaschka5 күн бұрын

    @@ArbaazBeg Oh I see now. And yes, you can. I wanted to create an example and performance comparison for that to the GitHub repo (github.com/rasbt/LLMs-from-scratch) some time. For that I wanted to first instruction-finetune the model on a few more spam classification instructions and examples though.

  • @ArbaazBeg
    @ArbaazBeg5 күн бұрын

    @@SebastianRaschka Can I help in this?

  • @phanindrareddy4885
    @phanindrareddy488514 күн бұрын

    done

  • @emsif
    @emsif14 күн бұрын

    thank you for this great leacture. In your other leacture about sequential feature selection you showed that backward selection (sbs) is superior than forward selection (sfs) according to a study. How does recursive feature elimination compare to sbs and sfs?

  • @tashfeenahmed3526
    @tashfeenahmed352618 күн бұрын

    That's great Dr. Hope you will be doing good. I wish if i could download your deep learning book which is published recently. If there is any open source link to download it please mention in comments. Thanks and regards, Researcher at Texas

  • @anshumansinha5874
    @anshumansinha587418 күн бұрын

    How many times do you say ‘yah’ in a day? Although great content :)

  • @anshumansinha5874
    @anshumansinha587418 күн бұрын

    Hmm, yah

  • @118bone
    @118bone19 күн бұрын

    This was a great help, I was just trying to determine the best FS method for my dataset. I've now subscribed and I'm looking forward to checking out all the videos on the playlist, thank you!

  • @KumR
    @KumR21 күн бұрын

    Great Video. Now that LLM is so powerful , will regular machine learning & deep learning slowly vanish?

  • @SebastianRaschka
    @SebastianRaschka21 күн бұрын

    Great question. I do think that special purpose ML solutions still have and will continue to have their place. The same way ML didn't make certain more traditional statistics based models obsolete. Regarding deep learning ... I'd say LLM is a deep learning model itself. But yeah, almost everything in deep learning is nowadays either a diffusion model, transformer-based model (vision transformer and most LLMs), or state space model

  • @kenchang3456
    @kenchang345622 күн бұрын

    Thanks for sharing, especially about Lit-GPT (I'm always interested in more tutorials as my journey with fine-tuning and LLMs needs all the help it can get). Thanks again.

  • @SebastianRaschka
    @SebastianRaschka21 күн бұрын

    Glad you are liking LitGPT!

  • @alihajikaram8004
    @alihajikaram800422 күн бұрын

    Would you make videos about time series and trannsformer?

  • @mushinart
    @mushinart23 күн бұрын

    Im sold , im buying your book .. would love to chat with you sometime if possible

  • @SebastianRaschka
    @SebastianRaschka7 күн бұрын

    Thanks, hope you are liking it! Are you going to SciPy in July by chance, or maybe Neurips end of the year?

  • @mushinart
    @mushinart7 күн бұрын

    @@SebastianRaschka unfortunately not,but I'd like to have a zoom/google meet chat with you if possible

  • @haqiufreedeal
    @haqiufreedeal24 күн бұрын

    Oh, my lord, my favourite machine learning author is a Liverpool fan.😎

  • @SebastianRaschka
    @SebastianRaschka23 күн бұрын

    Haha, nice that people make it that far into the video 😊

  • @ananthvankipuram4012
    @ananthvankipuram401218 күн бұрын

    @@SebastianRaschka You'll never walk alone 🙂

  • @ramprasadchauhan7
    @ramprasadchauhan724 күн бұрын

    Hello sir, please also make with javascript

  • @jayp123
    @jayp12324 күн бұрын

    Great explanation

  • @mulderbm
    @mulderbm25 күн бұрын

    I recently listened to your latest videos. And now this one was recommended by perplexity for my specific use-case ;-) coincidence?

  • @SebastianRaschka
    @SebastianRaschka24 күн бұрын

    Haha, looks like LLMs are coming full circle here :D

  • @RobinSunCruiser
    @RobinSunCruiser26 күн бұрын

    Hi, nice videos! One question for my understanding. When talking about embedding dimensions such as 1280 in "gpt2-large" do you mean the size of the number vector encoding the context of a single token or the number of input tokens? When comparing gpt2-large and Lama2 the number is the same for the ".. embeddings with 1280 tokens".

  • @SebastianRaschka
    @SebastianRaschka20 күн бұрын

    Good question, the term is often used very broadly and may refer to the input embeddings or the hidden layer sizes in the MLP layer. Here, I meant the size of the tokens that are embedded.

  • @joisco4394
    @joisco439426 күн бұрын

    I've heard about instruct learning, and it sounds similar to how you define preference learning. I have also heard about transfer learning. How would you compare/define those?

  • @SebastianRaschka
    @SebastianRaschka26 күн бұрын

    Transfer learning is basically involved in everything you do when you start out with a pretrained model. We don't really name or call it out explicitly anymore because it's so common. In instruction finetuning, the loss function is different from preference tuning mainly. Instruction finetuning trains the model to answer queries, and preference finetuning is basically more about the nuance of how these get answered. All preference tuning methods that are used today (DPO, RLHF+PPO, KTO), etc. expect you to have done instruction finetuning on your model before you preference finetune.

  • @joisco4394
    @joisco439422 күн бұрын

    @@SebastianRaschka Thanks for explaining it. I need to do a lot more research :p

  • @hopelesssuprem1867
    @hopelesssuprem186726 күн бұрын

    thanks for a great explanation

  • @codewithwaseem6489
    @codewithwaseem648926 күн бұрын

    Boring..

  • @ThanhPham-xz2yo
    @ThanhPham-xz2yo26 күн бұрын

    Thanks

  • @tomhense6866
    @tomhense686627 күн бұрын

    Very nice video, I liked it so much that I preordered your new book directly after watching it (to be fair I have read your blog for some time now).

  • @SebastianRaschka
    @SebastianRaschka27 күн бұрын

    Thanks! I hope you are going to like the book, too!

  • @tusharganguli
    @tusharganguli27 күн бұрын

    Your articles and videos have been extremely helpful in understanding how LLMs are built. Building LLM from Scratch and Q and AI are resources that I am presently reading and they provide a hands-on discourse on the conceptual understanding of LLMs. You, Andrej Karpathy and Jay Alammar are shining examples of how learning should be enabled. Thank you!

  • @SebastianRaschka
    @SebastianRaschka27 күн бұрын

    Thanks for the kind comment!

  • @namratathakur1000
    @namratathakur100028 күн бұрын

    Thank you so much for uploading the lectures.

  • @timothywcrane
    @timothywcrane28 күн бұрын

    I'm interested in SLM RAG with Knowledge graph traversal/search for RAG dataset collection and vector-JIT semantic match for hybrid search. Any repos you think I would be interested in?

  • @timothywcrane
    @timothywcrane28 күн бұрын

    bookmarked, clear and concise.

  • @SebastianRaschka
    @SebastianRaschka26 күн бұрын

    Unfortunately I don't have a good recommendation here. I have only implemented standard RAGs without knowledge graph traversal.

  • @DataChiller
    @DataChiller28 күн бұрын

    the greatest Liverpool fan ever! ⚽

  • @SebastianRaschka
    @SebastianRaschka28 күн бұрын

    Haha nice, at least one person watched it until that part :D

  • @rachadlakis1
    @rachadlakis129 күн бұрын

    Thanks for the great knowledge You are sharing <3

  • @MadnessAI8X
    @MadnessAI8X29 күн бұрын

    What we are seeking not only fuzzing code

  • @SebastianRaschka
    @SebastianRaschka26 күн бұрын

    Glad that's useful

  • @redthunder6183
    @redthunder618329 күн бұрын

    Easier said than done unless u got a GPU super computer lying around lol

  • @SebastianRaschka
    @SebastianRaschka26 күн бұрын

    ha, I should mention that all chapters in my book run on laptops, too. It was a personal goal for me that everything should work even without a GPU. The instruction finetuning takes about ~30 min on a CPU to get reasonable results (granted, the same code takes 1.24 min on an A100)

  • @kartiksaini5847
    @kartiksaini584729 күн бұрын

    Big fan ❤

  • @box-mt3xv
    @box-mt3xv29 күн бұрын

    The hero of open source

  • @SebastianRaschka
    @SebastianRaschka26 күн бұрын

    Haha, thanks! I've learned so much thanks to all the amazing people in open source, and I'm very flattered by your comment to potentially be counted as one of them :)

  • @sahilsharma3267
    @sahilsharma326729 күн бұрын

    When is your whole book coming out ? Eagerly waiting 😅

  • @SebastianRaschka
    @SebastianRaschka29 күн бұрын

    Thanks for your interest in this! It's already available for preorder (both on the publisher's website and Amazon) and if the production stage goes smoothly, it should be out by the end of of August

  • @muthukamalan.m6316
    @muthukamalan.m6316Ай бұрын

    great content! love it ❤

  • @bashamsk1288
    @bashamsk1288Ай бұрын

    in the instruction fine tuning we propagate loss only on output text tokens? or for all tokens from start to EOS?

  • @SebastianRaschka
    @SebastianRaschkaАй бұрын

    That's a good question. You can do both. By default all tokens, but more commonly you'd mask the tokens. In my book, I include the token masking as a reader exercise (it's super easy to do). There was also a new research paper a few weeks ago that I discussed in my monthly research write-ups here: magazine.sebastianraschka.com/p/llm-research-insights-instruction

  • @bashamsk1288
    @bashamsk128829 күн бұрын

    @@SebastianRaschka Thanks for the reply I just have a general question: do we use masking? For example, was masking used during the instruction fine-tuning of LLaMA 3 or mistral any Open source LLMs? Also, does your book include any chapters on the parallelization of training large language models?

  • @SebastianRaschka
    @SebastianRaschka29 күн бұрын

    @@bashamsk1288 Masking is commonly used, yes. We implement it as the default strategy in LitGPT. In my book we do both. I can't speak about Llama 3 and Mistral regarding masking, because while these are open-weight models they are not open source. So there's no training code we can look at. My book explains DDP training in the PyTorch appendix, but it's not used in the main chapters because as a requirement all chapters should also work on a laptop to make them accessible to most readers.

  • @kumarutsav5161
    @kumarutsav5161Ай бұрын

    🤌

  • @SebastianRaschka
    @SebastianRaschka26 күн бұрын

    I take that as a compliment!? 😅😊

  • @kumarutsav5161
    @kumarutsav516126 күн бұрын

    @@SebastianRaschka Yes yes! It was supposed to be a compliment only. You are doing great work with our teaching materials :).

  • @goneforfishing
    @goneforfishingАй бұрын

    the explanation is so Cool ..:)

  • @susdoge3767
    @susdoge3767Ай бұрын

    and then a tweet from elon musk in 2024: "we dont use CNNs much these days TBH"

  • @Rictoo
    @RictooАй бұрын

    I have a couple of questions: Regarding the variance, is this calculated across different parameter estimates given the same functional form of the model? Also, these parameter estimates depend on the optimization algorithm used, right, ie., implying the model predictions are 'empirically-derived models' vs. some sort of theoretically optimal parameter combinations, given a particular functional form? If so, would this mean that _technically speaking_, there is an additional source of error in the loss calculation, which could be something like 'implementation variance' due to our model likely not having the most optimal parameters, compared to some theoretical optimum? Hope this makes sense, I'm not a mathematician. Thanks!

  • @algorithmo134
    @algorithmo134Ай бұрын

    how do we create the word embedding? Also what is x_i in 12:38?

  • @algorithmo134
    @algorithmo134Ай бұрын

    Does using a double for-loop like the one mentioned in the GAN original paper by Goodfellow easier to implement in practice? For example, we freeze training the generator when we train the discriminator and vice-versa.

  • @jayp123
    @jayp123Ай бұрын

    I don’t understand why you can’t multiply ‘E’ the expectation by ‘y’ the constant

  • @dilaracoban7467
    @dilaracoban7467Ай бұрын

    thanks a lot!

  • @reyhanmf50
    @reyhanmf50Ай бұрын

    Hello Professor Raschka. Although this video was uploaded 3 years ago, I hope it's not too late to comment. I've been searching for a good introductory course on machine learning for complete beginners. I've tried watching 2-3 videos on KZread, but I stopped after 30 to 1 hour. Someone on Reddit recommended your course as the best free introduction to ML and shared the link in a comment. Since this course is quite long (around a semester's worth of material), I have a few questions: 1.) Do you include hands-on coding examples, such as for the Linear Regression algorithm, to supplement your explanations? 2.) If so, is there a link where I can access your code examples? 3.) Are there any exercises or submissions available that I can use to test my knowledge after watching your videos? (As the saying goes, 'practice makes perfect.') Perhaps someone who has completed this playlist can also provide some insight. Thank you in advance

  • @Garrick645
    @Garrick645Ай бұрын

    My friend referred me your book

  • @SUCRAM7627
    @SUCRAM7627Ай бұрын

    It makes no sense for KZread recommending to me Sebastian's channel only today (2024). I've been using Sebastian's "Python Machine Learning" for at least 5 years. I'm glad that I found it!