Informer attention Architecture - FROM SCRATCH!

Here is the architecture of probsparse attention for time series transformers.
ABOUT ME
⭕ Subscribe: kzread.info...
📚 Medium Blog: / dataemporium
💻 Github: github.com/ajhalthor
👔 LinkedIn: / ajay-halthor-477974bb
RESOURCES
[1] Main paper that introduced the Informer: arxiv.org/pdf/2012.07436
PLAYLISTS FROM MY CHANNEL
⭕ Deep Learning 101: • Deep Learning 101
⭕ Natural Language Processing 101: • Natural Language Proce...
⭕ Reinforcement Learning 101: • Reinforcement Learning...
Natural Language Processing 101: • Natural Language Proce...
⭕ Transformers from Scratch: • Natural Language Proce...
⭕ ChatGPT Playlist: • ChatGPT
MATH COURSES (7 day free trial)
📕 Mathematics for Machine Learning: imp.i384100.net/MathML
📕 Calculus: imp.i384100.net/Calculus
📕 Statistics for Data Science: imp.i384100.net/AdvancedStati...
📕 Bayesian Statistics: imp.i384100.net/BayesianStati...
📕 Linear Algebra: imp.i384100.net/LinearAlgebra
📕 Probability: imp.i384100.net/Probability
OTHER RELATED COURSES (7 day free trial)
📕 ⭐ Deep Learning Specialization: imp.i384100.net/Deep-Learning
📕 Python for Everybody: imp.i384100.net/python
📕 MLOps Course: imp.i384100.net/MLOps
📕 Natural Language Processing (NLP): imp.i384100.net/NLP
📕 Machine Learning in Production: imp.i384100.net/MLProduction
📕 Data Science Specialization: imp.i384100.net/DataScience
📕 Tensorflow: imp.i384100.net/Tensorflow

Пікірлер: 19

  • @LeoLan-vv1nq
    @LeoLan-vv1nqАй бұрын

    Amazing work, can't wait for next episode !

  • @neetpride5919
    @neetpride5919Ай бұрын

    Why aren't the padding tokens appended during data preprocessing, before the inputs are turned by the feedfoward layer into the key, query, value, vectors?

  • @slayer_dan

    @slayer_dan

    Ай бұрын

    Adding padding before forming K, Q, and V vectors would insert extra tokens into the input sequences, altering their lengths and potentially distorting the underlying data structure. As a result, the subsequent computation of K, Q, and V vectors would incorporate these padding tokens, affecting the model's ability to accurately represent the original data. During the attention calculation, these padding tokens would influence the attention scores, potentially diluting the focus on the actual content of the input sequences. This could lead to less effective attention patterns and hinder the model's ability to learn meaningful representations from the data. Furthermore, applying padding after forming K, Q, and V vectors allows for the efficient use of masking techniques to exclude padding tokens from the attention mechanism. By setting the attention scores corresponding to padding positions to negative infinity before the softmax operation, the model effectively ignores these tokens during attention calculation. This approach preserves the integrity of the input sequences, ensures accurate attention computations, and maintains the model's focus on relevant information within the data. P.S. I used ChatGPT to format my answer because it can do this thing better.

  • @neetpride5919

    @neetpride5919

    Ай бұрын

    @@slayer_dan how could it possibly save computing power to pad the matrices with multiple, 512-element vectors, rather than simply appending tokens to the initial sequence of tokens?

  • @deltamico

    @deltamico

    19 күн бұрын

    Take it with a grain if salt but I think if you hardcore the mask to not be paid attention to, you don't need learn that extra behavior for the [pad] token so it's more stable.

  • @adelAKAdude
    @adelAKAdude10 күн бұрын

    great video thanks question ... in the third question ... how do sample subset of keys, queries "depending on importance"

  • @user-qd2oc6xq8n
    @user-qd2oc6xq8n25 күн бұрын

    Can u tell an interactive model of AI neural network for school project.. And ur videos are nice and I understand easily.. Pls tell

  • @rpraver1
    @rpraver1Ай бұрын

    Also as always great video, hoping in future you deal with encoder only and decoder only transformers...

  • @CodeEmporium

    @CodeEmporium

    Ай бұрын

    Yep! For sure. Thank you so much!

  • @sudlow3860
    @sudlow3860Ай бұрын

    With regard to the quiz I think it is B D B. Not sure how this is going to launch a discussion though. You present things very well.

  • @CodeEmporium

    @CodeEmporium

    Ай бұрын

    Ding ding ding! Good work on the quiz! While this may or may not spark a discussion, just wanted to say thanks for participating :)

  • @dumbol8126
    @dumbol8126Ай бұрын

    is this same as the wjat timesfm uses

  • @Ishaheennabi
    @IshaheennabiАй бұрын

    Love from kashmir india bro!❤❤❤

  • @theindianrover2007
    @theindianrover2007Ай бұрын

    cool!

  • @CodeEmporium

    @CodeEmporium

    Ай бұрын

    Thank you 🙏

  • @-beee-
    @-beee-Ай бұрын

    I would love if the quizzes had answers in the comments eventually. I know this is a fresh video, but I want to check my work, not just have a discussion 😅

  • @eadweard.
    @eadweard.Ай бұрын

    In answer to your question, I can either: A) mono-task or B) screw up several things at once

  • @rpraver1
    @rpraver1Ай бұрын

    Not sure if just me, but starting at about 4:50 your graphics are so dark... maybe go to a white background or light gray, like your original png...

  • @CodeEmporium

    @CodeEmporium

    Ай бұрын

    Yea. Let me try brightening them up for future videos if I can. Thanks for the heads up