Meet the first OS for AI.
Use Lightning to build high-performance PyTorch models without the boilerplate. Scale the models with Lightning Apps (end-to-end ML systems) which can be everything from production-ready, multi-cloud ML systems to simple research demos.
Want to know more? Visit our website - lightning.ai/
And don't forget to subscribe!
Пікірлер
Nice intro to Thunder and DL compilers in general
Thank you For This 🤗🤗
Great introductionary video for a such a complex topic. Looking forward to a one about distributed.
Benefits of using cosine annealing learning rate scheduler
Cool. I hope you’ll continue doing that lives.
Thank you! Yes we will, see you next Friday!
Is there a template for comfyui?
Yes! We do have templates using comfyui and more templates being added regularly.
@@PyTorchLightning can you please link one in this post. It will be really helpful
@@Lily-wr1nw Visit Lightning.ai to browse the studio templates available! Here's a link to one to get you started: lightning.ai/mpilosov/studios/stable-diffusion-with-comfyui
I’ve been trying to understand the stable diffusion unet in detail for a while. This video added a few pieces of information I was missing from other material. Thanks!
I hope you solve this problem in PyTorch Lightning: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. self.pid = os.fork()
We are always working to alleviate problems people have while training. Join our discord to join the discussion and connect with a wide variety of experts in all things ML:. lnkd.in/g63PCKBN
Meu sonho, queria ter o modo 8 GPU desbloqueado pra mim usar um script para recuperar minha senha mais eu sou pobre
I want to train for detecting text similarity for 2 questions between 0 and 1....my dataset is unlabelled how should i proceed can you guide.
Good question! Join our discord and get advice from a wide variety of experts in all things ML, including a special channel dedicated to this course. lnkd.in/g63PCKBN
if overfit_batches uses the same batches for training and validation, shouldn't the validation loss == the training loss ?? I see the training loss getting reduced but the validation loss is increasing !! 😳
I have a guess, but I'd appreciate some confirmation, that overfit_batches doesn't use the same batch in training and validation BUT the same batch count! so if the DataModule provides val_dataloader and train_dataloader they are going to be called and the same batch count is going to be sampled from both.
@@osamansr5281 The answer you arrived at is correct. :) Join the Lightning AI Discord for continued discussion with the ML community: discord.gg/zYcT6Yk9kw
omg this is horrible
did I misunderstand something or the graph presented in the over-fitting section of the video from [0:22] to [1:00] is mislabeled🧐 over-fitting occurs when the train accuracy *RED* increases while the test accuracy *BLUE* decreases, correct? 🤔 aren't the colors are swapped! btw, thanks for the amazing tutorials and special thanks for updating them <3
Good question. I think the your question arises because this shows the training and test accuracy in a slightly different context. Here, we are looking at the performance for different portions of the dataset. The overall idea is still true: the larger the gap the bigger the degree of overfitting. But the reason why you are seeing the training accuracy go down is that with more data, it becomes harder to memorize (because there's simply more data to memorize). And if there is more data (and it's harder to memorize), it becomes easier to generalize (hence the test accuracy goes up)
Love it thanks a lot Linus //
This was very clear and informative
my plot_loss_and_acc(): def plot_loss_and_acc(log_dir) -> None: import pandas as pd import matplotlib.pyplot as plt metrics = pd.read_csv(f"{log_dir}/metrics.csv") # Group metrics by epoch and calculate mean for each metric df_metrics = metrics.groupby("epoch").mean() # Add epoch as a column df_metrics["epoch"] = df_metrics.index # Index is the grouping key (epoch) print(df_metrics.head(10)) df_metrics[["train_loss", "val_loss"]].plot( grid=True, legend=True, xlabel="Epoch", ylabel="Loss", title="Loss Curve" ) df_metrics[["train_acc_epoch", "val_acc_epoch"]].plot( grid=True, legend=True, xlabel="Epoch", ylabel="ACC", title="Accuracy" ) plt.show() plot_loss_and_acc(trainer.logger.log_dir)
Couldn't you use 8bit precision during training by using double weights hence enabling more error tolerance and hence more speed up options.
You just lost $510 because im not waiting 2 to 3 days to have my email "verified".
Thanks
why are you blinking like that are you ok
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.15, random_state=1, stratify=y) X_train, X_val, y_train, y_val = train_test_split( X_train, y_train, test_size=0.1, random_state=1, stratify=y_train) for this line of code im getting this error ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2. is there anything i can do to fix this?
Perhaps there is an issue in the data not getting loaded correctly and so there's a truncated dataset, which could cause that issue. If you open an issue on our course GitHub, we could help you debug and get to the bottom of it: github.com/Lightning-AI/dl-fundamentals/issues
i think we need more tutorial videos on lightning studio
Thanks for the feedback! Check out Lightning's founder William Falcon's youtube channel for more videos featuring Lightning Studio: www.youtube.com/@WilliamAFalcon
That is quite a unique and nice functionality! I faced the issues of OOM at higher batch sizes, and I think this is a good solution to it! Keep the good work going 😁
Isn't MLE used in Logistic regression and not Gradient descent?
Hi there, so in this example, we perform maximum likelihood estimation (MLE) using gradient descent
Sebastian, I have recvently started to watch your videos on AI. I find the material relatively easy to follow and very interesting. I do have a question related to section 3.6. In the conde we are looping over the minibatch 'for batch_idx, (features, class_labels) in enumerate(train_loader):'. At first I thought I understood this, but when I inserted a line in the code to print out the class_labels, I expected that the output on every second minibatch to be the same. However, they are not. Does this mean the every time we are running the line - for batch_idx, (features, class_labels) in enumerate(train_loader): - the date in being shuffled?? Ivar
Hi there. Yes the data is being shuffled via the data loader. This is usually recommended -- I have done experiments many years ago with and without shuffling, and neural networks learn better if they see the data in a different order in each epoch. You can turn off the shuffling though via `shuffle=False` if you want in the data loader if you want (in the code here it's set to shuffle=True)
Sebastian, I have recvently started to watch your videos on AI. I find the material relatively easy to follow and very interesting. I do have a question related to section 3.6. In the conde we are looping over the minibatch 'for batch_idx, (features, class_labels) in enumerate(train_loader):'. At first I thought I understood this, but when I inserted a line in the code to print out the class_labels, I expected that the output on every second minibatch to be the same. However, they are not. Does this mean the every time we are running the line - for batch_idx, (features, class_labels) in enumerate(train_loader): - the date in being shuffled?? Ivar
HATE PYTORCH LIGHTNING
missing unit 6.5
its true! unit 6.5 was great while it lasted, but in order to not share outdated material, we retired that one subsection.
French accent is classy but also unfortunately hard to understand
Passing just `overfit_batches` to the trainer also outputs validation metrics even if `limit_val_batches=0`. Any ideas?
Good question! These are not meant to be used together. Overfit batches is probably overwriting limit val batches, but if you feel like there's a bug please open an issue on GitHub.
Great Ideas. Thank u @thesephist!
Why not leaky RELU with a relatively steep slope (say .5)? Seems like all these activation functions tend towards almost no slope before 0 (which slows training). There must be a reason?
Great video, also , the speed at which you do things is just right so I can folow and write the code at the same time. (I'm only 12mn into the video , but for now it's great).
I think a great video if you don't have it already would be interfacing with REACT to reproduce Jupyter Notebook like embedments or stand alone webviews
Shouldn't you have a Sigmoid activation for it to be a true Logistic Regression?
can i change my weights ?
i am confused how can i set reload_dataloaders_every_epoch true , lines you have given change are from which class or function , you gave only 3 lines how to understand from where these lines are
this video is underrated!
I like the run through we get in every video but is there a github/collab file for the code we are using in every video? I would like to test the code myself for better understanding
Good question! You can check out the course site at lightning.ai/ai-education/ for relevant links to code files for each unit in Deep Learning Fundamentals.
Do I have to be expert in programming to build my LLM?
Check out Lightning Studios at Lightning.ai to find Studio Templates that can help jumpstart your LLM building and become an expert as you do it. It's more accessible than ever.
When using `tuner.lr_find` in Lightning 2.2.0 with PyTorch 2.2.0, I get the warning below: ... \ Lib \ site-packages \ torch \ optim \ lr_scheduler . py : 143: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " Is that an issue with Lightning?
`The batch size 65536 is greater or equal than the length of your dataset. Finished batch size finder, will continue with full run using batch size 65536` It's just a small test dataset but still, does that necessarily this is the optimal batch size?
This would be based on the assumption that larger batch sizes are always better, which has to be taken with a grain of salt. In this case, the dataset is so small that it would fit memory-wise, but it is probably not ideal because you would run full gradient instead of minibatch gradient descent then. I would reduce it.
Sorry, but your sound is horrible.
For some reason I can't get the trainer to run the fit without error, I always get `RuntimeError: DataLoader worker (pid(s) 42256, 36008, 33348, 43200) exited unexpectedly` with different pids at every run.
Setting `num_workers=0` in the `DataLoader` fixed the issue. Though, it would be cool to have multiprocessing enabled without crashes.
@@SaschaRobitzki I sometimes get the same issue (it's a PyTorch, not a PyTorch Lightning related one as it appears in either case, if I just use PyTorch or PyTorch Lightning). Interestingly, I also only observe something like this when I work on small teaching code using MNIST or small text files. I suspect it's something to do with the DataLoader loading too many files that are opened and closed too quickly, because the files are so small in this case. I am not 100% certain on but my best guess since, like you said, changing to num_workers=0, usually always works.
`datasets` is pretty picky when it comes to the `fsspec` version. I could get datasets 2.16.1 only to work with fsspec 2023.5.0, even though newer versions up to 2023.10.0 are supposed to be compatible.
Thanks for the comment. Arg, yeah, with PyTorch in general, I also use a Python version that is 1-2 versions behind the recent Python release. PyTorch is a pretty complex code base so it usually takes a bit of time to 100% support the next Python version.
In Unit 7.4 and 7.5 I sometimes get the "RuntimeError: Detected more unique values in `preds` than `num_classes`. Expected only 10 but found 11 in `preds`." Any idea how to fix that?
Arg sorry to hear, that sounds like a frustrating one. I must say that I never encountered this issue and thus can't say much about the root cause. I am suspecting there's maybe some parsing issue in the PyTorch Dataset class. Maybe it's operating system depending. I wish I could tell you more here.