CodeEmporium

CodeEmporium

Everything new and interesting in Machine Learning, Deep Learning, Data Science, & Artificial Intelligence. Hoping to build a community of data science geeks and talk about future tech! Projects demos and more! Subscribe for awesome videos :)

Embeddings - EXPLAINED!

Embeddings - EXPLAINED!

Q-learning - Explained!

Q-learning - Explained!

ChatGPT: Zero to Hero

ChatGPT: Zero to Hero

Llama - EXPLAINED!

Llama - EXPLAINED!

Пікірлер

  • @Akshaylive
    @Akshaylive15 сағат бұрын

    @4:38 are you sure d_q is the number of total time steps? I think it's supposed to be the dimension of the query & key.

  • @scott7948
    @scott794816 сағат бұрын

    In the final video are you going show an example when you feed data into the model and the interpret the output. It would be good to see any prepressing of the data to get it in the right format to feed into the model. I'm keen to use this model for a timeseries forecasting exercise 8 timesteps ahead.

  • @user-qd2oc6xq8n
    @user-qd2oc6xq8nКүн бұрын

    Can u tell an interactive model of AI neural network for school project.. And ur videos are nice and I understand easily.. Pls tell

  • @MadMax-ph1rl
    @MadMax-ph1rlКүн бұрын

    All this hindu Indian scientists where really selfie to keep there studies and research to a certain group And when someone from West discover out the completely same thing like after 200 or 300 years they started saying "no we discovered it hundreds of years ago " . So why don't you spread that knowledge Because of people like this the your science knowledge and this so called modern world is 100 years back in time

  • @anirudh514
    @anirudh514Күн бұрын

    Very well explained

  • @ashishanand9642
    @ashishanand96422 күн бұрын

    Why this is so Underrated, this should be on every one playlist for linear regression. Hatsoff man :)

  • @user-oj2wg8og9e
    @user-oj2wg8og9e2 күн бұрын

    wonderful explanation!!!

  • @ArielOmerez
    @ArielOmerez2 күн бұрын

    C

  • @ArielOmerez
    @ArielOmerez2 күн бұрын

    B

  • @ArielOmerez
    @ArielOmerez2 күн бұрын

    D

  • @bartekdurczak4085
    @bartekdurczak40852 күн бұрын

    good explanation but the noises are little bit annoying but thank you bro <3

  • @youtubeaccount8613
    @youtubeaccount86132 күн бұрын

    appreciate this! thank you so much!

  • @nirorit
    @nirorit4 күн бұрын

    Based on what we’ve studied in class (information theory and machine learning) bigger batches are more accurate as they minimize the MSE (min square error) iirc of the cost function - or in other words more accurate. So if I were to trust my memory/understanding and professor then unlike what you said: Smaller batches are better because they are faster to compute than a full data batch, and another reason is because it introduces more randomness (larger error) during training which can help escape high local minimums.

  • @psiddartha7115
    @psiddartha71156 күн бұрын

    I am non engineer how to prepare

  • @eeera-op8vw
    @eeera-op8vw6 күн бұрын

    good explanation for a beginner

  • @LNJP13579
    @LNJP135797 күн бұрын

    Brother, you have summarized really well in such a short video. Every second was GOLD 🙂

  • @sudlow3860
    @sudlow38607 күн бұрын

    With regard to the quiz I think it is B D B. Not sure how this is going to launch a discussion though. You present things very well.

  • @CodeEmporium
    @CodeEmporium6 күн бұрын

    Ding ding ding! Good work on the quiz! While this may or may not spark a discussion, just wanted to say thanks for participating :)

  • @wowcat4426
    @wowcat44267 күн бұрын

    Cringe

  • @rpraver1
    @rpraver17 күн бұрын

    Also as always great video, hoping in future you deal with encoder only and decoder only transformers...

  • @CodeEmporium
    @CodeEmporium6 күн бұрын

    Yep! For sure. Thank you so much!

  • @theindianrover2007
    @theindianrover20077 күн бұрын

    cool!

  • @CodeEmporium
    @CodeEmporium6 күн бұрын

    Thank you 🙏

  • @rpraver1
    @rpraver17 күн бұрын

    Not sure if just me, but starting at about 4:50 your graphics are so dark... maybe go to a white background or light gray, like your original png...

  • @CodeEmporium
    @CodeEmporium6 күн бұрын

    Yea. Let me try brightening them up for future videos if I can. Thanks for the heads up

  • @LeoLan-vv1nq
    @LeoLan-vv1nq7 күн бұрын

    Amazing work, can't wait for next episode !

  • @-beee-
    @-beee-7 күн бұрын

    I would love if the quizzes had answers in the comments eventually. I know this is a fresh video, but I want to check my work, not just have a discussion 😅

  • @dumbol8126
    @dumbol81267 күн бұрын

    is this same as the wjat timesfm uses

  • @neetpride5919
    @neetpride59197 күн бұрын

    Why aren't the padding tokens appended during data preprocessing, before the inputs are turned by the feedfoward layer into the key, query, value, vectors?

  • @slayer_dan
    @slayer_dan7 күн бұрын

    Adding padding before forming K, Q, and V vectors would insert extra tokens into the input sequences, altering their lengths and potentially distorting the underlying data structure. As a result, the subsequent computation of K, Q, and V vectors would incorporate these padding tokens, affecting the model's ability to accurately represent the original data. During the attention calculation, these padding tokens would influence the attention scores, potentially diluting the focus on the actual content of the input sequences. This could lead to less effective attention patterns and hinder the model's ability to learn meaningful representations from the data. Furthermore, applying padding after forming K, Q, and V vectors allows for the efficient use of masking techniques to exclude padding tokens from the attention mechanism. By setting the attention scores corresponding to padding positions to negative infinity before the softmax operation, the model effectively ignores these tokens during attention calculation. This approach preserves the integrity of the input sequences, ensures accurate attention computations, and maintains the model's focus on relevant information within the data. P.S. I used ChatGPT to format my answer because it can do this thing better.

  • @neetpride5919
    @neetpride59196 күн бұрын

    @@slayer_dan how could it possibly save computing power to pad the matrices with multiple, 512-element vectors, rather than simply appending <PAD> tokens to the initial sequence of tokens?

  • @eadweard.
    @eadweard.7 күн бұрын

    In answer to your question, I can either: A) mono-task or B) screw up several things at once

  • @Ishaheennabi
    @Ishaheennabi7 күн бұрын

    Love from kashmir india bro!❤❤❤

  • @algorithmo134
    @algorithmo1348 күн бұрын

    cant wait for more deep learning in depth coding and tutorials! Would love to see deep learning in time series :D

  • @CodeEmporium
    @CodeEmporium7 күн бұрын

    Nice! Currently making this playlist for the Infromer architecture. You can check out a few videos on this in the playlist “informer from scratch”

  • @user-mr3se3jk1r
    @user-mr3se3jk1r8 күн бұрын

    You have missed the concept of teacher forcing during training

  • @rasikannanl3476
    @rasikannanl34768 күн бұрын

    great .. so many thanks ... need more explanation

  • @aakarshrai5833
    @aakarshrai58338 күн бұрын

    Bro could you please label you equations. It'll be helpful

  • @algorithmo134
    @algorithmo1348 күн бұрын

    Hi @CodeEmporium, do you have the solution to quiz 2 at 8:46?

  • @joeybasile1572
    @joeybasile15728 күн бұрын

    Nice man

  • @himanshusingh2980
    @himanshusingh29808 күн бұрын

    Really want to hear indian accent of thai guy 😅😂

  • @katerinaneprasova2939
    @katerinaneprasova29399 күн бұрын

    Are there right answers to the quiz somewhere? Would be helpful to put them in the description.

  • @katnip1917
    @katnip19179 күн бұрын

    Great Video!! Thank you for the explanation. My question is, why not use the current state in the target network, instead of the next state?

  • @rajeshve7211
    @rajeshve72119 күн бұрын

    Best ever explanation of BERT! Finally understood how it works :)

  • @kenesufernandez1281
    @kenesufernandez12819 күн бұрын

    ✨💖

  • @jonfat4371
    @jonfat43719 күн бұрын

    Very great explanation, but for god’s sakes, stop the irritating noises. Im losing it man…. what would happen if u continued normal?!

  • @abinav92
    @abinav929 күн бұрын

    Good video! Well explained. In real life though a particular time series will correlate with itself and depend on other time series. Any way to take this into account to improve predictions?

  • @burakkurt1907
    @burakkurt190710 күн бұрын

    Allah razı olsun

  • @lazarus8011
    @lazarus801110 күн бұрын

    Good video here's a comment for the algorithm

  • @yaminevire7854
    @yaminevire785410 күн бұрын

    I am from Bangladesh ❤❤

  • @rpraver1
    @rpraver110 күн бұрын

    As always, great video, looking forward to next video on the code...

  • @StraightToTheAve
    @StraightToTheAve10 күн бұрын

    My brain can’t comprehend how some things were created

  • @davefaulkner6302
    @davefaulkner630211 күн бұрын

    Thanks for your efforts to explain a complicated subject. Couple of questions: did you intentionally skip the Layer Normalization or did I miss something? Also -- the final linear layer in the attention block has dimension 512 x 512 (input, output size). Does this mean that each token (logit?) output from the attention layer is passed token-by-token through the linear layer to create a new set of tokens, that set being of size token sequence length. This connection between the attention output and the Linear layer is baffling me. The output of the attention layer is (Sequence-length x transformed-embedding-length) or (4 x 512), ignoring batch dimension in the tensor. Yet the linear layer accepts a (1 x 512) input and yields a (1 x 512) output. So is each (1 x 512) output token in the attention layer output sequence passed one at a time through the linear layer? And does this imply that the same linear layer is used for all tokens in the sequence?

  • @jorgesanabria6484
    @jorgesanabria648411 күн бұрын

    Would historical nutritional data count?

  • @hackie321
    @hackie32112 күн бұрын

    Can you please blow up the Llama/Llama 2 architecture and code for us? Eagerly waiting for your LLM videos.

  • @CodeEmporium
    @CodeEmporium12 күн бұрын

    Yep! That’s definitely a future playlist idea

  • @hackie321
    @hackie32112 күн бұрын

    @@CodeEmporium Awesome. Thanks

  • @tripathi26
    @tripathi2612 күн бұрын

    This is interesting. Eagerly looking forward to next episodes ❤

  • @yolemmein
    @yolemmein12 күн бұрын

    Very useful and great explanation! Thank you so much!