vLLM on Kubernetes in Production

Ғылым және технология

vLLM is a fast and easy-to-use library for LLM inference and serving. In this video, we go through the basics of vLLM, how to run it locally, and then how to run it on Kubernetes in production with GPU-attached nodes via a DaemonSet. It includes a hands-on demo explaining vLLM deployment in production.
Blog post: opensauced.pizza/blog/how-we-...
John McBride(‪@JohnCodes‬)
►►►Connect with me ►►►
► Kubesimplify: kubesimplify.com/newsletter
► Newsletter: saiyampathak.com/newsletter
► Discord: saiyampathak.com/discord
► Twitch: saiyampathak.com/twitch
► KZread: saiyampathak.com/youtube.com
► GitHub: github.com/saiyam1814
► LinkedIn: / saiyampathak
► Website: / saiyampathak
► Instagram: / saiyampathak
► / saiyampathak

Пікірлер: 4

  • @JohnCodes
    @JohnCodesАй бұрын

    Thanks for having me on Saiyam!! It was alot of fun to show you how we use vLLM at OpenSauced!! Happy to answer any questions here people might have!

  • @divyamchandel8734
    @divyamchandel87343 күн бұрын

    Hi John / Saiyam. In the last part you mentioned "In lot of cases could be cheaper" What are those cases where locally hosting it is cheaper vs when using openai is cheaper: Is it just dependent on the load which we will have (RPD and max RPM)?

  • @umeshjaiswal5298
    @umeshjaiswal5298Ай бұрын

    Thanks for this tutorial Saiyam.

  • @kubesimplify

    @kubesimplify

    Ай бұрын

    Glad its useful, you building something with LLM?

Келесі