Karpamambathy 002

Like 👍. Comment 💬. Subscribe 🟥.
🏘 Discord: / discord
github.com/hu-po/docs
github.com/state-spaces/mamba
github.com/karpathy/build-nan...

Пікірлер: 5

@AM-yk5yd9 күн бұрын
Checked mamba source code more: mamba2 norm is used for outproj. At least package assumes that input will be normalized: create_block puts either Mamba or Mamba2 into the block that performs input normalization and residual connection
@wolpumba40999 күн бұрын
*Summary* *Main Focus:* Continuing work on integrating Mamba2 blocks into Andrej Karpathy's NanoGPT training script, specifically for the ARC challenge. *Key Points:* * *Mamba2 Bug Fix (**8:00**):* Solved a cryptic Mamba2 error by ensuring certain dimension settings were multiples of 8, as suggested in a GitHub issue. * *Weights & Biases Integration (**11:20**):* * Added Weights & Biases (WandB) for experiment tracking and plotting. * Compared GPT-4 and GPT-40's ability to analyze loss curves from WandB, finding GPT-4 slightly better. * *Hyperparameter Sweep Setup (**46:39**):* * Identified several hyperparameters from the Mamba2 paper for sweeping, including: * `mamba_d_state` (state size) * `n_layer` (number of layers) * `att_n_embd` (attention embedding size) * `warmup_frac` (warmup fraction) * `max_lr` (maximum learning rate) * `max_steps` (number of training steps) * `weight_decay` * `grad_norm_clip` (gradient norm clipping) * Created a `sweep.py` script using Hyperopt to automatically run training with various hyperparameter combinations. * Configured the script to run each training instance within a Docker container for clean environment separation. * *Data Augmentation (**1:13:32**):* * Discussed the potential for augmenting the ARC challenge data by flipping examples vertically and horizontally. * Implemented a vertical flip augmentation in the data loader, but the effectiveness is uncertain. * *Padding Token Fix (**2:17:30**):* * Identified an issue where the padding token (0) conflicted with a valid token in the ARC challenge. * Added a separate padding token (10) and a separator token (11) to the vocabulary. * *Future Ideas:* * SSH into another computer to run the sweep in parallel for faster exploration. * Explore distilling a pre-trained language model (e.g., Phi-3 Mamba) to improve performance on the ARC challenge (2:19:59). *General Discussion:* * Compared different hardware for AI training and inference, including Nvidia, Groq, SambaNova Systems, and Xtropy (35:11). * Briefly discussed career advice for getting into the AI/ML field and standing out during interviews (37:31, 58:06). *Overall:* The stream focused on debugging, refining the training setup, and preparing for a large-scale hyperparameter sweep. The goal is to find an optimal configuration for the hybrid Mamba2-Transformer model and potentially improve performance on the ARC challenge. i used gemini 1.5 pro to summarize the transcript
@thivuxhale
7 күн бұрын
hi might i ask where did you get the transcript?
@wolpumba4099
7 күн бұрын
@@thivuxhaleyou can find a Link in the Video description for Most KZread videos
@thivuxhale
7 күн бұрын
Thanks bro

Karpamambathy 002

Ғылым және технология

Пікірлер: 5

@thivuxhale

7 күн бұрын

@wolpumba4099

7 күн бұрын

@thivuxhale

7 күн бұрын

Келесі