Improve ChromaDB Vector Embeddings w Gemini Pro (FREE)

Ғылым және технология

Improve your semantic searches with vector embeddings from one of the best LLMs out there. We'll swap the ChromaDB out-of-the-box local model with the Gemini Pro embedding model with just a few code changes.
Get the code: github.com/johnnycode8/chroma...
Buy Me a Coffee: www.buymeacoffee.com/johnnycode
ChromaDB Playlist: • ChromaDB Vector Databa...

Пікірлер: 8

  • @capitaoTigelinha
    @capitaoTigelinha4 ай бұрын

    This array of documents, they can be any kind of filetype? Eg: pdfs, .txt, .word, etc?

  • @johnnycode

    @johnnycode

    4 ай бұрын

    ChromaDB's naming of "documents" is misleading, a document is just a string of text. You first have to read the file into memory using other Python packages that can read pdfs, etc.

  • @kenchang3456
    @kenchang34564 ай бұрын

    Hi, thanks for including Gemini Pro. Do you have a video on using Nvidia GPU on Windows 11 with ChromaDb? I could use some GPU acceleration as 1K documents took 3.6 hours to add to my collection using the all-mpnet-base-v2 model. Or, do you think using Nomic embeddings would be worth a try?

  • @johnnycode

    @johnnycode

    4 ай бұрын

    Did you try passing in device='cuda' like this: sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-mpnet-base-v2", device="cuda") Run your embedding code and then check Task Manager to see if your GPU is kicking in.

  • @kenchang3456

    @kenchang3456

    4 ай бұрын

    ​@@johnnycode That was the answer! I had tried embedding ~11K documents in batches with the first batch at 5K which took 3 hours and the 2nd 5K batch didn't finish. Using the "cuda" as suggested did the trick my first batch of 5K ran for 19 mins and the 2nd batch took 26 mins and 3rd batch of 1K took 6 mins. I am running on a Surface Book 3 that is 3 years old and it has a Nvidia Geforce GTX video in it and I did install the Nvidia drivers running Windows 11. Looking at Task Manager/Performace tab, I don't see any activity on the Nvidia GPU and very slight activity on the built-in Intel GPU (~4%). But hey, I'll take it :-) Thank you very very much for your help and I hope you enjoy the coffee and may your subscriptions skyrocket.

  • @johnnycode

    @johnnycode

    4 ай бұрын

    Awesome news! Maybe your bottleneck is at the hard drive speed now. Thanks for the coffee :D

  • @kenchang3456

    @kenchang3456

    4 ай бұрын

    @@johnnycode sorry to bug you but have you run into the conda virtual environment getting corrupted every couple of days? I believe that is what is happening to me and my solution is to remove the environment and the re-create it and re-install the packages I need for my app. Just thought I'd ask.

  • @johnnycode

    @johnnycode

    4 ай бұрын

    That is strange, I haven't seen that. Maybe create a snapshot of the good state, i.e. "conda env export environment.yml", then create another snapshot when the environment goes bad, and compare the 2 yml files?

Келесі