Maximize ChromaDB Embedding Speed w CUDA & Multiprocessing

Ғылым және технология

How to vectorize embeddings into ChromaDB as fast as possible leveraging the power of your NVidia CUDA GPU along with Python's Multiprocessing capability. We'll use Multiprocessing to 1) launch a Python producer process on the CPU to handle the workload of reading and transforming the data and 2) launch a consumer process to vectorize the data into embeddings using the GPU.
Get the code: github.com/johnnycode8/chroma...
Buy Me a Coffee: www.buymeacoffee.com/johnnycode
ChromaDB Playlist: • ChromaDB Vector Databa...

Пікірлер: 31

  • @kenchang3456
    @kenchang34564 ай бұрын

    YAY!!! That was it! For me, using the Python venv rather than Conda (all although I see in the video you're using Conda with no issues) helped make a difference and your coming out with this video really helped implement using Cuda. For 11K documents, I went from 48 mins using CPU to 10 mins using GPU. Thanks again, I really appreciate your sharing. Best regards.

  • @johnnycode

    @johnnycode

    4 ай бұрын

    Thanks for the multiple coffees, Ken :)

  • @kenchang3456
    @kenchang34564 ай бұрын

    Thanks @johnnycode I'll give this a try. After wrestling with trying to get my Nvidia involved with embedding with ChromaDb, using Conda, the Conda virtual environment seemed to get corrupted after about three kernel restarts. I just switched to python venv and we back to a previous working version based on your prior example. It is stable now but when I added the device="cuda" when creating the embedding function it threw and error and it's probably because I need install pytorch. I can about 1K rows of my data in 3 mins and once I got through my complete embeddings which is 11K, I'll try the pytorch way. Thanks for putting this out as it is very timely. PS: I have no idea why my conda virtual environment kept corrupting but at least with python venv I have a way forward.

  • @josef58149
    @josef581494 ай бұрын

    Man, thanks for this video u are amazing

  • @user-oz4oh2wm7p
    @user-oz4oh2wm7p2 ай бұрын

    I watched your 2 previous videos it was very interesting, thank you very much.

  • @user-oz4oh2wm7p

    @user-oz4oh2wm7p

    2 ай бұрын

    I'm watching this video and trying it on my nvidia jetson nano

  • @johnnycode

    @johnnycode

    2 ай бұрын

    What are you planning to do with your jetson nano?

  • @user-oz4oh2wm7p

    @user-oz4oh2wm7p

    2 ай бұрын

    @@johnnycode I plan to use it because it has nvidia cuda gpu, my hp 845 g10 laptop doesn't have nvidia card =)))

  • @user-oz4oh2wm7p

    @user-oz4oh2wm7p

    2 ай бұрын

    oh it seems jetson nano doesn't meet the requirements to install chromadb

  • @johnnycode

    @johnnycode

    2 ай бұрын

    @user-oz4oh2wm7p Really? I thought 2gb memory is the only system requirement.

  • @Tommy31416
    @Tommy314164 ай бұрын

    This was brilliant, thank you! Could you do a video on taking the text, images and tables from a pdf using Unstructured please? It would be great to see if it is possible to vectorise and embed all 3 types in a ChromaDB for langchain multimodal retrieval afterwards. To be able to query a document and return the relevant text plus any charts and tables would be the holy grail of RAG deployment. Love your channel, finally someone showing us how to use ChromaDB properly

  • @johnnycode

    @johnnycode

    4 ай бұрын

    Great suggestion! Text with chart images and tables are particularly difficult, I'll see if Unstructured provides a good way to deal with them.

  • @Tommy31416

    @Tommy31416

    4 ай бұрын

    @@johnnycode thank you so much!!

  • @AbhishekKumar-jb4ky
    @AbhishekKumar-jb4ky2 ай бұрын

    when i'm using the cuda my gpu is not been crossing more than 25% usage. but it is getting the work done pretty quickly, any idea how to utilize its full power. i'm not using the batch technique as you are using.

  • @johnnycode

    @johnnycode

    2 ай бұрын

    The collection.add(your_batch) function is the one that utilizes your GPU. You don't necessarily need to implement multiprocessing like I did, just try the collection.add function with different sizes.

  • @ddoq1345j
    @ddoq1345jАй бұрын

    I want to know the method to use GPU when creating collection and querying collection. Is it possible?

  • @johnnycode

    @johnnycode

    Ай бұрын

    If you enable device=cuda for the sentence transformer like how what I did in the video, then run query, it might use the GPU. You'll probably need a very large database before you can see much activity on the GPU though.

  • @davidtindell950
    @davidtindell9504 ай бұрын

    Horray! Your Pytorch Chroma code worked well the very first run: " Elapsed seconds: 213 Record count: 33198 ! ". Please try to find additional ways to Maximize Speed and Improve Local Persistent Storage ! P.S. This test was on an OLD Dell G7 with a SLOW NVIDIA GTX 1060 !!!!

  • @johnnycode

    @johnnycode

    4 ай бұрын

    Your 1060 is still pretty solid!

  • @truthwillout2371
    @truthwillout23714 ай бұрын

    Isn't ChromaDB for image vectors? I know you can use it but is it optimal?

  • @johnnycode

    @johnnycode

    4 ай бұрын

    Actually, ChromaDB started out with text vectors, then added image vectors in the last few months.

  • @KevKevKev74yes
    @KevKevKev74yesАй бұрын

    Hi, good job. Do you know if it's possible with Ollama Embeddings ? Because it seems to use only one process in my code.

  • @johnnycode

    @johnnycode

    Ай бұрын

    Sorry, I have not worked with that so I don’t know.

  • @KevKevKev74yes

    @KevKevKev74yes

    Ай бұрын

    ​@@johnnycode It works, thanks you with Ollama. Just I can't exceed more than 166 batch size

  • @johnnycode

    @johnnycode

    Ай бұрын

    That is great! Thank you for the coffee!

  • @AbhishekKumar-jb4ky
    @AbhishekKumar-jb4ky2 ай бұрын

    i don't know why but without using the batch size i can get much lower time

  • @johnnycode

    @johnnycode

    2 ай бұрын

    Maybe you can use Python's time module to measure the execution time of various lines code that you're running.

Келесі