OCR Using Microsoft's Phi-3 Vision Model on Free Google Colab

Ғылым және технология

In this video, I demonstrate how to implement Microsoft's recently released Phi-3-Vision-128K-Instruct model on a free Google Colab workspace using a T4 GPU. I use Optical Character Recognition (OCR) as the primary use case to showcase the model's capabilities.
You'll learn:
1. An introduction to the Phi-3-Vision-128K-Instruct model
2. Setting up a Google Colab environment with a T4 GPU
3. Loading and configuring the Phi-3-Vision-128K-Instruct model
4. Implementing OCR task with this advanced model
5. Evaluating the performance and results of OCR using Phi-3-Vision-128K-Instruct
Code Link - colab.research.google.com/dri...
Phi-3 Vision Model - huggingface.co/microsoft/Phi-...
#phi3 #vision #multimodal #multimodalai #llm #microsoftai #googlecolab #ocr #machinelearning #ai #tutorial #freeresources #phi3vision128kinstruct #attention

Пікірлер: 7

  • @theailearner1857
    @theailearner185718 күн бұрын

    There is an update in Phi-3 Vision's Hugging Face page. Now you need not to comment lines in code files to run model without flash attention. You just need to import model in eager mode. (huggingface.co/microsoft/Phi-3-vision-128k-instruct#sample-inference-code) model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto", _attn_implementation='eager') # use _attn_implementation='eager' to disable flash attention

  • @ai_enthusiastic_

    @ai_enthusiastic_

    Күн бұрын

    I just tried this model on my cpu. It appears that the model loads successfully, but it remains in a running state without producing any output thus far. My system's RAM capacity is 8 GB. Could this limitation be the reason for the lack of functionality?

  • @arunbhyashaswi1515
    @arunbhyashaswi151521 күн бұрын

    Quite enriching video. I will be trying it and letting you know my experience.

  • @d.d.z.
    @d.d.z.21 күн бұрын

    Hey man, thank you!

  • @Cloudvenus666
    @Cloudvenus66626 күн бұрын

    Awesome video but this model is unreliable. It extract text on some pages, other times it just stops midway or returns a blank output. I thought, its for sure the low gpu power of the T4, so I tried it directly with azure, and it reproduced the same outcome.

  • @theailearner1857

    @theailearner1857

    26 күн бұрын

    Try to change prompt and test it out. And still if it doesn't work you might need to fine tune this model on domain specific documents.

  • @gabrielesilinic
    @gabrielesilinic5 күн бұрын

    I mean, cool. but if you really can't run it locally you likely have bigger issues. The Phi-3 model is just that small that can run about anywhere.

Келесі