I take complex topics in programming, Machine Learning, Reinforcement Learning and make it simpler for you ;D
I'm a Walking Dead fan (back when it was good), as you can guess from my profile pic, which my wife drew. If my stuff helped you, consider buying me a coffee (or brainz):D
Пікірлер
when you put query commands from one table to another table, it will show same data like you have i first table if you have different columns (like months) with different data but in same order as in first table solution: is if you have multipe sheet then change data source number as 0 or 1 or 2, 3 , 4 and so on as per numbers of sheets or workbook available
Oh my god, This helped me a lot
i am doing it in vs code .. I have installed streamlit but while running it is giving error that module not found
Which module?
Do you have a discord or something where we can ask questions? I am getting some very strange outputs for my average reward graph. Additionally, my new best reward is struggling to pass 200. Also would you be able to post your source code somewhere so we can compare? I just want to see where our code deviates (if at all). Thanks! Great series so far :D Edit: I tweaked my hyperparameters and now I am getting a much better result! I am new to machine learning so I am getting used to having to play with things like hyperparameters in order to get better results.
I don't have a discord. You are welcome to drop your questions here. Glad you are able to make some progress. There are a lot of randomness involved in Reinforcement Learning which makes it extra difficult. Here is the final code: github.com/johnnycode8/dqn_pytorch
Johnny, you are amazing, seriously. I am so happy I found your channel, it is a gold mine.
thank you, man, really
Based on what do we assign these values to hyperparameters?
Based on trial and error, or a process called hyperparameter tuning.
I am running on WSL conda env and I am getting this error when i run the same code as you ```X Error of failed request: BadValue (integer parameter out of range for operation) Major opcode of failed request: 148 (GLX) Minor opcode of failed request: 3 (X_GLXCreateContext) Value in failed request: 0x0 Serial number of failed request: 146 Current serial number in output stream: 147```
Try my other video that installs Gymnasium on wsl: kzread.info/dash/bejne/q6yHl7l-os2_qMY.html
Hey do you still use your lenovo for doing these tasks?
Yes, my Lenovo (Slim 7 Pro X) is still solid.
You explained better than my university lecturer!!!
how to start learning reinforcement learning? i knew panda numpy matplotlib and basic ml algo?
The best teachers are those who teach a difficult lesson simply. thank you
what to do to see the Q-table?
The Q-table is a regular Python array, so you can just use a loop to print the value. In my other video, you can visually see the values on the map: kzread.info/dash/bejne/Y4uTrrF7XZOvdbw.html
Hello! I am struggling to implement a continuous action space agent for a different algorithm than DQN (TD3, using the continuous mountain car environment from gymnasium). I appreciated how you went through this series in simple and easy to understand lessons, so I wanted to ask if you have any suggestions for further resources on a similar implementation to what you have here, or even some more of the general concepts behind TD3. Thank you for what you do and any time you spent reading this!.
Thanks for the feedback and the super thanks! If the dqn tutorial series gets popular enough, I may continue this format on other algorithms.
@@johnnycodeI honestly would be super interested in that too. I thought your DQN videos were super clear! Keep it up!
Appreciate your feedback!
how can we solve the problem using a single DQN instead of 2?
You can use the same policy network in place of the target network. Training results may be worst than using 2 networks.
not working
is there a possibility that, due to the slippery flag, the agent chooses the best action (knowing the best path) but it falls in the hole?
Yes, the agent can fall into the hole when slippery is on even if it knows the best path. Think of "best path" as the path with the highest chance of success.
What a great series. I always watch your video by disabling ad blocker :)
Thanks for the feedback! And enduring some KZread ads😁
Thank you so so much for this. I'm not sure why these instructions aren't on the main page or on github
great video , thanks
Perfect, all i was interested in was persisting the database. Thanks for adding that at the end
thanks bro , good content
Hey Johnny, I was wondering if you knew how to make the algorithm learn some already known states? I have a challenge related to make a DQN learn and start with already known states stored in a csv file, and I am struggling because I have no idea how to do that. Is it possible?
I'm guessing if you know those states, then you would know what action to take or not take in relation to those states. For example, a pawn on a chess board can't go backwards, since you know that state is impossible. If my interpretation of your question is correct, then you might want to look into "action masking", which prevents the agent from taking illegal actions. You can start with this SB3 reference, but the concept is not limited to PPO: sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html
Awesome!! Is it better to divide data into chunks before embeddings?
It depends on the problem that you're trying to solve. The semantic meaning of a line of text vs a paragraph of text vs a page of text vs the whole document are different. Which level to divide into chunks depends on what your needs are when querying the database.
this tutorial series is awesome! looking forward to actor critic series!
Hi, your videos are great and helped me a lot since you were using the latest version of stable baseline3...But I am facing an issue that the verbose values are not getting printed in output I have put verbose = 1 and even tried to use verbose = 2 but not getting the desired outputs (like rewards, loss, iterations, ep_len_mean etc.) as it was getting printed in your videos. Can you please help me? Is this due to the custom environment I am using or something else? Also, tensorboard logs are also not working...
You should try creating a new conda environment and then install SB3 again. In my SB3 introduction video, I just ran pip install stable-baselines3[extra] and didn't do anything else special: kzread.info/dash/bejne/gaWquqqij7TahJM.html
@@johnnycode Hi, I will try this one again...Thanks a lot for the reply and your time! Might need your help again...
Hi, @johnnycode, I tried reinstalling the stable-baselines3[extras] but I am not getting the monitor data also the tensorboard logs are also not getting displayed...Is there some issue with the new version of stable-baselines3[extra] can you please give me the version you installed when making the video?
stable-baselines3 2.0.0 tensorboard 2.13.0
Really good video. I have a big issue with the understanding how to delete and add files to the chromaDB I have done it also with a Docker Container and I have issue to get the file Names for example of the documents who are in the DB stored. I am also using the Metadata fields for reference files for specific chunks etc. Is there any explanation of those concepts, that would be awesome!
I use the metadata fields in these videos and also use the Where Filter to query, let me know if these solves your problem: kzread.info/dash/bejne/g4eLlK5xlM7His4.html kzread.info/dash/bejne/p5OCk9Zpc6XboaQ.html
Fantastic
Now I can put chroma db in my resume. Thanks you creating such a crisp and straightforward tutorial.
nice
Waiting for the next video
Working on it :D
could you do balancing double inverse pendulum example?
Great Video Johnny :) Thankyou so much for the series
I do not understand dude, how do you always make the exact video I need? Are you reading my mind!?!?
HAHA I wish!
Excellent!
Thanks !!!
thank you so much.can u please implement any env with dqn showing the forgetting problem?
I can't believe this is free
Great video, i like how you show us graphically the network , nodes , policy etc. Good job!
Thanks for the feedback!
Very helpful! thank you
Thank you!!
Awesome vid. Can you please do a video/tutorial to read driver lisences of different states.
Great video ❤
Great tutorial can you upload more videos on custom environment for multi robots and navigation
Hi, good job. Do you know if it's possible with Ollama Embeddings ? Because it seems to use only one process in my code.
Sorry, I have not worked with that so I don’t know.
@@johnnycode It works, thanks you with Ollama. Just I can't exceed more than 166 batch size
That is great! Thank you for the coffee!
Hey Johnny! Thanks so much for these videos! I have a question, is it possible to apply this algorithm to a continuous action space? For example, select a number in a range between [0, 120] as an action, or should I investigate other algorithms?
Hi, DQN only works on discrete actions. Try a policy gradient type algorithm. My other video talks about choosing an algorithm: kzread.info/dash/bejne/ZHV6zo-ih6q3qsY.html
Thanks !
I want to know the method to use GPU when creating collection and querying collection. Is it possible?
If you enable device=cuda for the sentence transformer like how what I did in the video, then run query, it might use the GPU. You'll probably need a very large database before you can see much activity on the GPU though.
Do you have any problem related to the laptop's hinge? I heard that the yoga line-up has bad hinge
I have not had any hinge problems with my 10 year old yoga.
Great video! Just a suggestion: Can you maybe do a small Quickstart tutorial on Unity ML-Agents? I feel like that's the future after Gym for most of us RL developers LOL. Just a suggestion, don't work on it if you don't want to, no hard feelings.