Lightning AI

Lightning AI

Meet the first OS for AI.
Use Lightning to build high-performance PyTorch models without the boilerplate. Scale the models with Lightning Apps (end-to-end ML systems) which can be everything from production-ready, multi-cloud ML systems to simple research demos.

Want to know more? Visit our website - lightning.ai/
And don't forget to subscribe!

The Thunder Sessions | Session 2

The Thunder Sessions | Session 2

Пікірлер

  • @vmguerra
    @vmguerra12 күн бұрын

    Nice intro to Thunder and DL compilers in general

  • @King_Muktar
    @King_Muktar13 күн бұрын

    Thank you For This 🤗🤗

  • @andrei_aksionau
    @andrei_aksionau21 күн бұрын

    Great introductionary video for a such a complex topic. Looking forward to a one about distributed.

  • @EngineerXYZ.
    @EngineerXYZ.25 күн бұрын

    Benefits of using cosine annealing learning rate scheduler

  • @YokoSakh
    @YokoSakh29 күн бұрын

    Cool. I hope you’ll continue doing that lives.

  • @lucaantiga3941
    @lucaantiga394128 күн бұрын

    Thank you! Yes we will, see you next Friday!

  • @Lily-wr1nw
    @Lily-wr1nw29 күн бұрын

    Is there a template for comfyui?

  • @PyTorchLightning
    @PyTorchLightning29 күн бұрын

    Yes! We do have templates using comfyui and more templates being added regularly.

  • @Lily-wr1nw
    @Lily-wr1nw29 күн бұрын

    @@PyTorchLightning can you please link one in this post. It will be really helpful

  • @PyTorchLightning
    @PyTorchLightning29 күн бұрын

    @@Lily-wr1nw Visit Lightning.ai to browse the studio templates available! Here's a link to one to get you started: lightning.ai/mpilosov/studios/stable-diffusion-with-comfyui

  • @pedrogorilla483
    @pedrogorilla483Ай бұрын

    I’ve been trying to understand the stable diffusion unet in detail for a while. This video added a few pieces of information I was missing from other material. Thanks!

  • @kimomoh5439
    @kimomoh5439Ай бұрын

    I hope you solve this problem in PyTorch Lightning: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. self.pid = os.fork()

  • @PyTorchLightning
    @PyTorchLightningАй бұрын

    We are always working to alleviate problems people have while training. Join our discord to join the discussion and connect with a wide variety of experts in all things ML:. lnkd.in/g63PCKBN

  • @clippadas
    @clippadasАй бұрын

    Meu sonho, queria ter o modo 8 GPU desbloqueado pra mim usar um script para recuperar minha senha mais eu sou pobre

  • @samarbhosale8310
    @samarbhosale8310Ай бұрын

    I want to train for detecting text similarity for 2 questions between 0 and 1....my dataset is unlabelled how should i proceed can you guide.

  • @PyTorchLightning
    @PyTorchLightningАй бұрын

    Good question! Join our discord and get advice from a wide variety of experts in all things ML, including a special channel dedicated to this course. lnkd.in/g63PCKBN

  • @osamansr5281
    @osamansr5281Ай бұрын

    if overfit_batches uses the same batches for training and validation, shouldn't the validation loss == the training loss ?? I see the training loss getting reduced but the validation loss is increasing !! 😳

  • @osamansr5281
    @osamansr5281Ай бұрын

    I have a guess, but I'd appreciate some confirmation, that overfit_batches doesn't use the same batch in training and validation BUT the same batch count! so if the DataModule provides val_dataloader and train_dataloader they are going to be called and the same batch count is going to be sampled from both.

  • @PyTorchLightning
    @PyTorchLightningАй бұрын

    @@osamansr5281 The answer you arrived at is correct. :) Join the Lightning AI Discord for continued discussion with the ML community: discord.gg/zYcT6Yk9kw

  • @not_a_human_being
    @not_a_human_beingАй бұрын

    omg this is horrible

  • @osamansr5281
    @osamansr5281Ай бұрын

    did I misunderstand something or the graph presented in the over-fitting section of the video from [0:22] to [1:00] is mislabeled🧐 over-fitting occurs when the train accuracy *RED* increases while the test accuracy *BLUE* decreases, correct? 🤔 aren't the colors are swapped! btw, thanks for the amazing tutorials and special thanks for updating them <3

  • @SebastianRaschka
    @SebastianRaschkaАй бұрын

    Good question. I think the your question arises because this shows the training and test accuracy in a slightly different context. Here, we are looking at the performance for different portions of the dataset. The overall idea is still true: the larger the gap the bigger the degree of overfitting. But the reason why you are seeing the training accuracy go down is that with more data, it becomes harder to memorize (because there's simply more data to memorize). And if there is more data (and it's harder to memorize), it becomes easier to generalize (hence the test accuracy goes up)

  • @saikatnextd
    @saikatnextd2 ай бұрын

    Love it thanks a lot Linus //

  • @MrPaPaYa86
    @MrPaPaYa862 ай бұрын

    This was very clear and informative

  • @benc7910
    @benc79102 ай бұрын

    my plot_loss_and_acc(): def plot_loss_and_acc(log_dir) -> None: import pandas as pd import matplotlib.pyplot as plt metrics = pd.read_csv(f"{log_dir}/metrics.csv") # Group metrics by epoch and calculate mean for each metric df_metrics = metrics.groupby("epoch").mean() # Add epoch as a column df_metrics["epoch"] = df_metrics.index # Index is the grouping key (epoch) print(df_metrics.head(10)) df_metrics[["train_loss", "val_loss"]].plot( grid=True, legend=True, xlabel="Epoch", ylabel="Loss", title="Loss Curve" ) df_metrics[["train_acc_epoch", "val_acc_epoch"]].plot( grid=True, legend=True, xlabel="Epoch", ylabel="ACC", title="Accuracy" ) plt.show() plot_loss_and_acc(trainer.logger.log_dir)

  • @user-il9vr9oe7b
    @user-il9vr9oe7b2 ай бұрын

    Couldn't you use 8bit precision during training by using double weights hence enabling more error tolerance and hence more speed up options.

  • @River-xd8sk
    @River-xd8sk2 ай бұрын

    You just lost $510 because im not waiting 2 to 3 days to have my email "verified".

  • @NeoZondix
    @NeoZondix3 ай бұрын

    Thanks

  • @kevinsasso1405
    @kevinsasso14053 ай бұрын

    why are you blinking like that are you ok

  • @AbhishekBade1310
    @AbhishekBade13103 ай бұрын

    X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.15, random_state=1, stratify=y) X_train, X_val, y_train, y_val = train_test_split( X_train, y_train, test_size=0.1, random_state=1, stratify=y_train) for this line of code im getting this error ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2. is there anything i can do to fix this?

  • @PyTorchLightning
    @PyTorchLightning3 ай бұрын

    Perhaps there is an issue in the data not getting loaded correctly and so there's a truncated dataset, which could cause that issue. If you open an issue on our course GitHub, we could help you debug and get to the bottom of it: github.com/Lightning-AI/dl-fundamentals/issues

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w3 ай бұрын

    i think we need more tutorial videos on lightning studio

  • @PyTorchLightning
    @PyTorchLightning3 ай бұрын

    Thanks for the feedback! Check out Lightning's founder William Falcon's youtube channel for more videos featuring Lightning Studio: www.youtube.com/@WilliamAFalcon

  • @prathameshdinkar2966
    @prathameshdinkar29663 ай бұрын

    That is quite a unique and nice functionality! I faced the issues of OOM at higher batch sizes, and I think this is a good solution to it! Keep the good work going 😁

  • @pal999
    @pal9993 ай бұрын

    Isn't MLE used in Logistic regression and not Gradient descent?

  • @SebastianRaschka
    @SebastianRaschka3 ай бұрын

    Hi there, so in this example, we perform maximum likelihood estimation (MLE) using gradient descent

  • @user-ih8kn3ji7k
    @user-ih8kn3ji7k3 ай бұрын

    Sebastian, I have recvently started to watch your videos on AI. I find the material relatively easy to follow and very interesting. I do have a question related to section 3.6. In the conde we are looping over the minibatch 'for batch_idx, (features, class_labels) in enumerate(train_loader):'. At first I thought I understood this, but when I inserted a line in the code to print out the class_labels, I expected that the output on every second minibatch to be the same. However, they are not. Does this mean the every time we are running the line - for batch_idx, (features, class_labels) in enumerate(train_loader): - the date in being shuffled?? Ivar

  • @SebastianRaschka
    @SebastianRaschka3 ай бұрын

    Hi there. Yes the data is being shuffled via the data loader. This is usually recommended -- I have done experiments many years ago with and without shuffling, and neural networks learn better if they see the data in a different order in each epoch. You can turn off the shuffling though via `shuffle=False` if you want in the data loader if you want (in the code here it's set to shuffle=True)

  • @user-ih8kn3ji7k
    @user-ih8kn3ji7k3 ай бұрын

    Sebastian, I have recvently started to watch your videos on AI. I find the material relatively easy to follow and very interesting. I do have a question related to section 3.6. In the conde we are looping over the minibatch 'for batch_idx, (features, class_labels) in enumerate(train_loader):'. At first I thought I understood this, but when I inserted a line in the code to print out the class_labels, I expected that the output on every second minibatch to be the same. However, they are not. Does this mean the every time we are running the line - for batch_idx, (features, class_labels) in enumerate(train_loader): - the date in being shuffled?? Ivar

  • @Prithviization
    @Prithviization4 ай бұрын

    HATE PYTORCH LIGHTNING

  • @jiahao2709
    @jiahao27094 ай бұрын

    missing unit 6.5

  • @PyTorchLightning
    @PyTorchLightning4 ай бұрын

    its true! unit 6.5 was great while it lasted, but in order to not share outdated material, we retired that one subsection.

  • @astudent8885
    @astudent88854 ай бұрын

    French accent is classy but also unfortunately hard to understand

  • @adosar7261
    @adosar72615 ай бұрын

    Passing just `overfit_batches` to the trainer also outputs validation metrics even if `limit_val_batches=0`. Any ideas?

  • @PyTorchLightning
    @PyTorchLightning4 ай бұрын

    Good question! These are not meant to be used together. Overfit batches is probably overwriting limit val batches, but if you feel like there's a bug please open an issue on GitHub.

  • @stuxyz
    @stuxyz5 ай бұрын

    Great Ideas. Thank u @thesephist!

  • @JohnSmith-he5xg
    @JohnSmith-he5xg5 ай бұрын

    Why not leaky RELU with a relatively steep slope (say .5)? Seems like all these activation functions tend towards almost no slope before 0 (which slows training). There must be a reason?

  • @isaz2425
    @isaz24255 ай бұрын

    Great video, also , the speed at which you do things is just right so I can folow and write the code at the same time. (I'm only 12mn into the video , but for now it's great).

  • @cognitive-carpenter
    @cognitive-carpenter5 ай бұрын

    I think a great video if you don't have it already would be interfacing with REACT to reproduce Jupyter Notebook like embedments or stand alone webviews

  • @JohnSmith-he5xg
    @JohnSmith-he5xg5 ай бұрын

    Shouldn't you have a Sigmoid activation for it to be a true Logistic Regression?

  • @user-jl9oy7nw4k
    @user-jl9oy7nw4k5 ай бұрын

    can i change my weights ?

  • @user-jl9oy7nw4k
    @user-jl9oy7nw4k5 ай бұрын

    i am confused how can i set reload_dataloaders_every_epoch true , lines you have given change are from which class or function , you gave only 3 lines how to understand from where these lines are

  • @adrianstaniec
    @adrianstaniec5 ай бұрын

    this video is underrated!

  • @Deeznuts-wd2yu
    @Deeznuts-wd2yu5 ай бұрын

    I like the run through we get in every video but is there a github/collab file for the code we are using in every video? I would like to test the code myself for better understanding

  • @PyTorchLightning
    @PyTorchLightning5 ай бұрын

    Good question! You can check out the course site at lightning.ai/ai-education/ for relevant links to code files for each unit in Deep Learning Fundamentals.

  • @uoiuserresusaregnatsuj
    @uoiuserresusaregnatsuj5 ай бұрын

    Do I have to be expert in programming to build my LLM?

  • @PyTorchLightning
    @PyTorchLightning5 ай бұрын

    Check out Lightning Studios at Lightning.ai to find Studio Templates that can help jumpstart your LLM building and become an expert as you do it. It's more accessible than ever.

  • @SaschaRobitzki
    @SaschaRobitzki5 ай бұрын

    When using `tuner.lr_find` in Lightning 2.2.0 with PyTorch 2.2.0, I get the warning below: ... \ Lib \ site-packages \ torch \ optim \ lr_scheduler . py : 143: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " Is that an issue with Lightning?

  • @SaschaRobitzki
    @SaschaRobitzki5 ай бұрын

    `The batch size 65536 is greater or equal than the length of your dataset. Finished batch size finder, will continue with full run using batch size 65536` It's just a small test dataset but still, does that necessarily this is the optimal batch size?

  • @SebastianRaschka
    @SebastianRaschka5 ай бұрын

    This would be based on the assumption that larger batch sizes are always better, which has to be taken with a grain of salt. In this case, the dataset is so small that it would fit memory-wise, but it is probably not ideal because you would run full gradient instead of minibatch gradient descent then. I would reduce it.

  • @belamipro7073
    @belamipro70736 ай бұрын

    Sorry, but your sound is horrible.

  • @SaschaRobitzki
    @SaschaRobitzki6 ай бұрын

    For some reason I can't get the trainer to run the fit without error, I always get `RuntimeError: DataLoader worker (pid(s) 42256, 36008, 33348, 43200) exited unexpectedly` with different pids at every run.

  • @SaschaRobitzki
    @SaschaRobitzki6 ай бұрын

    Setting `num_workers=0` in the `DataLoader` fixed the issue. Though, it would be cool to have multiprocessing enabled without crashes.

  • @SebastianRaschka
    @SebastianRaschka5 ай бұрын

    @@SaschaRobitzki I sometimes get the same issue (it's a PyTorch, not a PyTorch Lightning related one as it appears in either case, if I just use PyTorch or PyTorch Lightning). Interestingly, I also only observe something like this when I work on small teaching code using MNIST or small text files. I suspect it's something to do with the DataLoader loading too many files that are opened and closed too quickly, because the files are so small in this case. I am not 100% certain on but my best guess since, like you said, changing to num_workers=0, usually always works.

  • @SaschaRobitzki
    @SaschaRobitzki6 ай бұрын

    `datasets` is pretty picky when it comes to the `fsspec` version. I could get datasets 2.16.1 only to work with fsspec 2023.5.0, even though newer versions up to 2023.10.0 are supposed to be compatible.

  • @SebastianRaschka
    @SebastianRaschka5 ай бұрын

    Thanks for the comment. Arg, yeah, with PyTorch in general, I also use a Python version that is 1-2 versions behind the recent Python release. PyTorch is a pretty complex code base so it usually takes a bit of time to 100% support the next Python version.

  • @SaschaRobitzki
    @SaschaRobitzki6 ай бұрын

    In Unit 7.4 and 7.5 I sometimes get the "RuntimeError: Detected more unique values in `preds` than `num_classes`. Expected only 10 but found 11 in `preds`." Any idea how to fix that?

  • @SebastianRaschka
    @SebastianRaschka6 ай бұрын

    Arg sorry to hear, that sounds like a frustrating one. I must say that I never encountered this issue and thus can't say much about the root cause. I am suspecting there's maybe some parsing issue in the PyTorch Dataset class. Maybe it's operating system depending. I wish I could tell you more here.