OCR Using Microsoft's Phi-3 Vision Model on Free Google Colab

Ғылым және технология

In this video, I demonstrate how to implement Microsoft's recently released Phi-3-Vision-128K-Instruct model on a free Google Colab workspace using a T4 GPU. I use Optical Character Recognition (OCR) as the primary use case to showcase the model's capabilities.
You'll learn:
1. An introduction to the Phi-3-Vision-128K-Instruct model
2. Setting up a Google Colab environment with a T4 GPU
3. Loading and configuring the Phi-3-Vision-128K-Instruct model
4. Implementing OCR task with this advanced model
5. Evaluating the performance and results of OCR using Phi-3-Vision-128K-Instruct
Code Link - colab.research.google.com/dri...
Phi-3 Vision Model - huggingface.co/microsoft/Phi-...
#phi3 #vision #multimodal #multimodalai #llm #microsoftai #googlecolab #ocr #machinelearning #ai #tutorial #freeresources #phi3vision128kinstruct #attention

Пікірлер: 7

@theailearner185718 күн бұрын
There is an update in Phi-3 Vision's Hugging Face page. Now you need not to comment lines in code files to run model without flash attention. You just need to import model in eager mode. (huggingface.co/microsoft/Phi-3-vision-128k-instruct#sample-inference-code) model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto", _attn_implementation='eager') # use _attn_implementation='eager' to disable flash attention
@ai_enthusiastic_
Күн бұрын
I just tried this model on my cpu. It appears that the model loads successfully, but it remains in a running state without producing any output thus far. My system's RAM capacity is 8 GB. Could this limitation be the reason for the lack of functionality?
@arunbhyashaswi151521 күн бұрын
Quite enriching video. I will be trying it and letting you know my experience.
@d.d.z.21 күн бұрын
Hey man, thank you!
@Cloudvenus66626 күн бұрын
Awesome video but this model is unreliable. It extract text on some pages, other times it just stops midway or returns a blank output. I thought, its for sure the low gpu power of the T4, so I tried it directly with azure, and it reproduced the same outcome.
@theailearner1857
26 күн бұрын
Try to change prompt and test it out. And still if it doesn't work you might need to fine tune this model on domain specific documents.
@gabrielesilinic5 күн бұрын
I mean, cool. but if you really can't run it locally you likely have bigger issues. The Phi-3 model is just that small that can run about anywhere.