OCR Using Microsoft's Phi-3 Vision Model on Free Google Colab
Ғылым және технология
In this video, I demonstrate how to implement Microsoft's recently released Phi-3-Vision-128K-Instruct model on a free Google Colab workspace using a T4 GPU. I use Optical Character Recognition (OCR) as the primary use case to showcase the model's capabilities.
You'll learn:
1. An introduction to the Phi-3-Vision-128K-Instruct model
2. Setting up a Google Colab environment with a T4 GPU
3. Loading and configuring the Phi-3-Vision-128K-Instruct model
4. Implementing OCR task with this advanced model
5. Evaluating the performance and results of OCR using Phi-3-Vision-128K-Instruct
Code Link - colab.research.google.com/dri...
Phi-3 Vision Model - huggingface.co/microsoft/Phi-...
#phi3 #vision #multimodal #multimodalai #llm #microsoftai #googlecolab #ocr #machinelearning #ai #tutorial #freeresources #phi3vision128kinstruct #attention
Пікірлер: 7
There is an update in Phi-3 Vision's Hugging Face page. Now you need not to comment lines in code files to run model without flash attention. You just need to import model in eager mode. (huggingface.co/microsoft/Phi-3-vision-128k-instruct#sample-inference-code) model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto", _attn_implementation='eager') # use _attn_implementation='eager' to disable flash attention
@ai_enthusiastic_
Күн бұрын
I just tried this model on my cpu. It appears that the model loads successfully, but it remains in a running state without producing any output thus far. My system's RAM capacity is 8 GB. Could this limitation be the reason for the lack of functionality?
Quite enriching video. I will be trying it and letting you know my experience.
Hey man, thank you!
Awesome video but this model is unreliable. It extract text on some pages, other times it just stops midway or returns a blank output. I thought, its for sure the low gpu power of the T4, so I tried it directly with azure, and it reproduced the same outcome.
@theailearner1857
26 күн бұрын
Try to change prompt and test it out. And still if it doesn't work you might need to fine tune this model on domain specific documents.
I mean, cool. but if you really can't run it locally you likely have bigger issues. The Phi-3 model is just that small that can run about anywhere.