Accelerate Transformer inference on CPU with Optimum and ONNX

Ғылым және технология

In this video, I show you how to accelerate Transformer inference with Optimum, an open source library by Hugging Face, and ONNX.
I start from a DistilBERT model fine-tuned for text classification, export it to ONNX format, then optimize it, and finally quantize it. Running benchmarks on an AWS c6i instance (Intel Ice Lake architecture), we speed up the original model more than 2.5x and divide its size by two, with just a few lines of simple Python code and without any accuracy drop!
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️
⭐️⭐️⭐️ Want to buy me a coffee? I can always use more :) www.buymeacoffee.com/julsimon ⭐️⭐️⭐️
- Optimum: github.com/huggingface/optimum
- Optimum docs: huggingface.co/docs/optimum/o...
- ONNX: onnx.ai/
- Original model: huggingface.co/juliensimon/di...
- Code: gitlab.com/juliensimon/huggin...

Пікірлер: 14

  • @anabildea9274
    @anabildea9274 Жыл бұрын

    Thank you for sharing! great content!

  • @geekyprogrammer4831
    @geekyprogrammer4831 Жыл бұрын

    Thanks a lot for creating this video. I saved a month by watching this video!

  • @juliensimonfr

    @juliensimonfr

    Жыл бұрын

    Great to hear, thank you.

  • @youssefbenhachem993
    @youssefbenhachem993 Жыл бұрын

    To the point ! great explanation, thanks 😀

  • @juliensimonfr

    @juliensimonfr

    Жыл бұрын

    Glad it was helpful!

  • @TheBontenbal
    @TheBontenbal4 ай бұрын

    I am trying to follow along. Many updates to the code so many errors unfortunately.

  • @juliensimonfr

    @juliensimonfr

    4 ай бұрын

    Docs and examples here: huggingface.co/docs/optimum/onnxruntime/overview

  • @Gerald-iz7mv
    @Gerald-iz7mv2 ай бұрын

    How do you export to onnx using cuda? It seems optimum doesnt support it - is there an alternative?

  • @juliensimonfr

    @juliensimonfr

    2 ай бұрын

    huggingface.co/docs/optimum/onnxruntime/usage_guides/gpu

  • @ahlamhusni6258
    @ahlamhusni6258 Жыл бұрын

    is there any optimization methods applied on word2vec 2.0 model ? and can I apply these methods on the word2vec 2.0

  • @juliensimonfr

    @juliensimonfr

    Жыл бұрын

    Hi, Word2Vec isn't based on the transformer architecture. You should take a look at Sentence Transformers, they're a good way to get started with Transformer embeddings huggingface.co/blog/getting-started-with-embeddings

  • @ibrahimamin474

    @ibrahimamin474

    7 ай бұрын

    @@juliensimonfr I think he meant wav2vec 2.0

  • @Gerald-xg3rq
    @Gerald-xg3rq2 ай бұрын

    what the difference between setfit.exporters.onnx and optimum.onnxruntime (optimizer = ORTModelFromFeatureExtraction.from_pretrained(...) optimizer.optimize()) etc.?

  • @juliensimonfr

    @juliensimonfr

    2 ай бұрын

    Probably the same :)

Келесі