Building an Audio Transcription App with OpenAI Whisper and Streamlit

Ғылым және технология

In this video, I will show you how to build a simple and yet powerful audio transcription app using the recently released Whisper model from OpenAI and Streamlit.
If you liked this video don't forget to like and subscribe! :)
Here are a few affiliate links for the best gadgets for programmers:
Bose Noise Cancelling headphones: amzn.to/3Um2qIR
Logitech MX Master 3 Advanced Wireless Mouse: amzn.to/3DVffUZ
Corsair K55 RGB Keyboard: amzn.to/3zFucIs
- Subscribe!: / @automatalearninglab
- Follow me on Medium: / lucas-soares
- Join Medium: / membership
- Tiktok: www.tiktok.com/@enkrateialucc...
- Twitter: / lucasenkrateia
- LinkedIn: / lucas-soares-969044167
Music "Before Chill" by Yomoti on Epidemic Sound
www.epidemicsound.com/track/v...

Пікірлер: 130

  • @abroniewski
    @abroniewski Жыл бұрын

    Thank you for NOT editing out any mixups in your coding. It is REALLY helpful to watch others struggle through and figure things out instead of making everything look perfect from the first go. SUBSCRIBED!

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    Check out the code without THAT MANY mixups here 😂 github.com/EnkrateiaLucca/openai_whisper Thanks for subscribing though! 😊🎉

  • @gacevedobastias
    @gacevedobastias Жыл бұрын

    Nice work!!! it was all I've been looking for working as a court reporter!!! :) Thank you so much

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    Oh Nice man! glad I was able to help! :) Cheers!

  • @marcoaerlic2576
    @marcoaerlic2576Ай бұрын

    Thanks for the video. Also, good on you for having the courage to upload an unedited video.

  • @automatalearninglab

    @automatalearninglab

    Ай бұрын

    Yeah I mean I did some edits, but overall I found people appreciate if you publish your process, which is what I was trying to do here.

  • @tgard007
    @tgard0075 ай бұрын

    Love your videos thank you man - try working through code first and bringing up pain points for us along the way, it’s a little disengaging to watch that happen the first time

  • @automatalearninglab

    @automatalearninglab

    5 ай бұрын

    Ok got it. SOme people like this format, some prefer it like you're saying, I've been trying different ways but I guess working through the major pain points first should be a no brainer! Appreciate the feedback, watch my next video coming out next Sunday and tell me what you think! :) Cheers!

  • @stoufa
    @stoufa7 ай бұрын

    Thanks for sharing! ^_^

  • @marcelogarcia6981
    @marcelogarcia6981 Жыл бұрын

    Thank you! 👏

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    Thank you!

  • @hamzahbaagil9828
    @hamzahbaagil9828Ай бұрын

    Is it possible to also do speaker recognition? Do you have a video for it?

  • @user-xi8cq9zr4c
    @user-xi8cq9zr4c Жыл бұрын

    Nice video my friend! Good job and nice relaxing music :) I want to ask if you have any idea how can we create a real time speech recogniton app with whisper.

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    Thanks! Great question! :) I'm not sure how to reduce the latency of these models to make it work on real time but hugging face seems to have a working demo here: www.google.com/url?sa=t&source=web&rct=j&url=huggingface.co/spaces/anzorq/openai_whisper_stt&ved=2ahUKEwj8vKLJs637AhWqhv0HHeSyAK0QFnoECAsQAQ&usg=AOvVaw1KGFMD_qray96CgiXAgMb6

  • @user-xi8cq9zr4c

    @user-xi8cq9zr4c

    Жыл бұрын

    @@automatalearninglab Thanks for the answer! I will check it out

  • @mikiallen7733
    @mikiallen7733 Жыл бұрын

    thanks sir , but what extra elements should be added to transcript an audio from one language to another ? so basically somebody provides me with some audio file format , I want the app to take that in -and without any editing- the app should be able to transcript in audio as well as text format but in another language let say french - German ? your input is highly appreciated

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    I'm pretty sure whisper can accept multiple languages, look up the different models in the whisper documentation from openai,

  • @swelanauguste6176
    @swelanauguste61769 ай бұрын

    i am been trying to find a solution for larger audio files, can you integrate celery with streamlite and have the run in the background?

  • @automatalearninglab

    @automatalearninglab

    9 ай бұрын

    A good solution I found is using pydub if I am not mistaken to break the large audio files into chunks and then apply on those and then concatenate the result!

  • @uen1857
    @uen18575 ай бұрын

    interesting and good explanation thanks. i wish you made it also for real time transcription using the Mcirophone if its possible

  • @automatalearninglab

    @automatalearninglab

    5 ай бұрын

    its been on my mind actually, I did something with whisper cpp a while back. Will probably take a crack at this real time audio transcription stuff soon! :) thanks for watching

  • @eduardogamboa7209
    @eduardogamboa7209 Жыл бұрын

    Very nice video thanks it looks like a really cool project😊. Do you think after having the text to be able to move it to chat gpt in order to get some good class notes?

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    Yeah of course!

  • @eduardogamboa7209

    @eduardogamboa7209

    Жыл бұрын

    @@automatalearninglab Sorry I was trying to follow your video and just installed Vs studio. and cant run even the start, can you maybe upload or guide me how to setup VS code in order to run the codes? I don't get why I cant sorry I know these comment might be frustrating, but I'm just starting to code outside of repplit or google collab :(

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    @@eduardogamboa7209 checkout this article about how to set up vscode for machine learning. copyassignment.com/machine-learning-in-visual-studio-code/

  • @awa8766
    @awa8766 Жыл бұрын

    This was awesome! Quick question - is there any way that streamlit would receive voice input , rather than uploading an audio file? The workflow I'm thinking of is 1) user presses "record audio" on streamlit 2) once finished, the generated audio output will be passed to whisper 3) whisper transcribes. I've been researching how to incorporate audio input into streamlit for a while to no avail,

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    I am not sure if Streamlit takes in direct audio input (GPT-4 says no...LoL) but apparently gradio can! I wrote some boilerplate code for you to test: import gradio as gr import openai import numpy as np # Set OpenAI key openai.api_key = 'your-openai-api-key' def transcribe_audio(audio): """ This function transcribes audio using OpenAI's Whisper API """ # You might need to convert the audio into a suitable format for Whisper # Convert to suitable audio format (like .wav) # Make the API request # Here we assume whisper_asr is a hypothetical function that performs the transcription. transcription = whisper_asr(audio) # Return the transcription return transcription iface = gr.Interface(fn=transcribe_audio, inputs=gr.inputs.Audio(source="microphone"), outputs="text") iface.launch() Let me know if it works! (replace the whisper_asr stuff with the proper call to whisper) Cheers!! Thanks for watching!

  • @awa8766

    @awa8766

    Жыл бұрын

    @@automatalearninglab Thanks! I tried implementing the following code, and I got few errors I would greatly appreciate your input on! Code: ``` def transcribe(audio): file = open(audio, "rb") if file is None: return "" with file as f: t_text = openai.Audio.transcribe( model="whisper-1", file=f, api_key=OPENAI_API_KEY ) return t_text["text"] gr.Interface( title = 'Medical Scriber', fn=transcribe, inputs=[gr.Audio(source="microphone", type="filepath")], outputs=["text"] ).launch() ``` I get the following errors: 1) When I record audio, then pass it to Whisper API, I get the following message pop up in terminal, though the transcription works: "UserWarning: Trying to convert audio automatically from int32 to 16-bit int format." 2) Whisper API has a 25 Mb limit on the file size. I recorded a 5 minute audio snippet (2.5 Mb), and I got a "size limit exceeded" error. More info: github.com/openai/whisper/discussions/1385 Any suggestions?

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    @@awa8766 Hey! I think you’ll find everything you need in this more recent video where I did a slight update on this project here: kzread.info/dash/bejne/emenl8ixZ6bZiso.html Thanks for watching! :) Cheers!

  • @naturallydope247
    @naturallydope247 Жыл бұрын

    do you have a github repo for this or somewhere where the code is that you used to build this?

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    Yep, it's here github.com/EnkrateiaLucca/openai_whisper

  • @jakubfronczyk2496
    @jakubfronczyk2496 Жыл бұрын

    Nice work, can you do somthing like that with whisper-jax ?

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    I haven't looked into it, but I'll take a look! Thanks! :)

  • @rushilpatel702
    @rushilpatel7029 ай бұрын

    Is there any way we can make it so that the text will dynamically highlight each word as it is played through the st.audio

  • @automatalearninglab

    @automatalearninglab

    9 ай бұрын

    Not sure! I haven't tried highlighting each word like that before. Sorry could not help more! :( Thanks for watching! :)

  • @Nursultan_karazhigit
    @Nursultan_karazhigit7 ай бұрын

    is it possible to make it possible to perform actions after transcribing like in SIRI?

  • @automatalearninglab

    @automatalearninglab

    7 ай бұрын

    yeah of course you just go to add that to the workflow in the script.

  • @ravindumihiranga6165
    @ravindumihiranga61658 ай бұрын

    Hey bro If I need to detect the language of the voice how should I do it. I meant what is the modification which I should do for the code?

  • @automatalearninglab

    @automatalearninglab

    8 ай бұрын

    WHen you're loading the base model make sure to add the initial of your target language

  • @blackhat965
    @blackhat965 Жыл бұрын

    This is a great step-by-step tutorial. Where do I find the Whisper documentation to know what language and syntax it uses with Python? I want to be able to add features and functionality. I saw you had a openai_whisper_tutorial.ipynb - is that the official documentation to building whisper apps in Python?

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    Check out the openai documentation which has the official docs! :)

  • @blackhat965

    @blackhat965

    Жыл бұрын

    @@automatalearninglab Hey I'm really sorry to ask - I'm not being lazy here. I'm new to coding so a lot of this isn't obvious to me. The Github OpenAI whisper documentation only has a few scrappy examples of the language under the "Python Usage". It's far less robust than proper documentation that showcases how to leverage all of the whisper functionality. Is there anywhere else I could see the Python translation for all the functionality? For example, using whisper on terminal produces srt/txt/vtt etc. files, but there's no standard script to show how to create .srt files in a .py script. I had to look at how other people created .srt files and they didn't reference documentation either. Sorry for the long question.

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    @@blackhat965 not sure, check out the GitHub repo for whisper!

  • @incredibleG007
    @incredibleG007 Жыл бұрын

    Great! Thank you. Where can we find the source code to try it?

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    Here github.com/EnkrateiaLucca/openai_whisper After a lot of people asking for this! I finally created a proper repo with the code! enjoy!

  • @aldya1532
    @aldya15326 ай бұрын

    Thanks for great tutorial. Problem is that on the laptop working fine just with tiny and base model. More huge models have problems with memory. All solutions for fixiing from documentantion dont work

  • @automatalearninglab

    @automatalearninglab

    6 ай бұрын

    Yes, in this case you can host the app in a cloud with a better machine with more memory, or try to make your use case acceptable with the tiny model!

  • @TalkingWithBots
    @TalkingWithBots5 ай бұрын

    Do you know maybe some alternatives to Streamlit? I am curious which you were already used :)

  • @automatalearninglab

    @automatalearninglab

    5 ай бұрын

    Probably Gradio would be the first to come to mind, also there are many other options coming up now but I haven't been looking into it that much, I usually use the terminal for most things. :)

  • @TalkingWithBots

    @TalkingWithBots

    5 ай бұрын

    @@automatalearninglabI can relate :) I know Gradio and Streamlit are good frameworks for Machine Learning apps. Recently I was using collab paired with Anvil to create something. It was also nice and easy.

  • @SportyxChannel
    @SportyxChannel Жыл бұрын

    Hi! Can Whisper transcribe MP3 greater than 30 seconds? If yes, can you share the code? Thanks!

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    Ill work on something like that soon stay tuned! :)

  • @tomasdemarcos570
    @tomasdemarcos570 Жыл бұрын

    Is there a limit size / length on audio ? You are using api key ?

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    Yeah, right now it just supports file sizes of up to 25mb and audios up to 30s

  • @meditatepositivity5111

    @meditatepositivity5111

    11 ай бұрын

    How to overcome these limitations? @@automatalearninglab

  • @joelmartinez7628
    @joelmartinez7628 Жыл бұрын

    Is this possible if x number of individuals speaking can it identify them? Speaker 1 up to n?

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    I don't think so I think it will only transcribe as the same voice in a stream

  • @JoEl-jx7dm

    @JoEl-jx7dm

    4 ай бұрын

    same name same duobt, that process is called speaker diarization, yes it is possible with a custom classification model integrated to this workflow!

  • @rubibeats
    @rubibeats Жыл бұрын

    can we get real time transcription? say up to certain lengths in time?

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    I haven't played with real time applications yet so o could not say right now

  • @PenguLuna
    @PenguLuna Жыл бұрын

    How can we add an option to translate the transcribed text? The AI has that capability

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    No, not this one.This is just for transcription but you can use openai gpt3 based translation model for the rest (almost 100% sure but check it!) Cheers :)

  • @batosato
    @batosato Жыл бұрын

    Any recommendation for hosting website where I can deploy this app?

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    Not really, I haven't hosted it so I couldn't tell yah.

  • @batosato

    @batosato

    Жыл бұрын

    @@automatalearninglab Thanks. I do get the following error when I load an audio file. Any suggestion on how to fix this? AttributeError: module 'ffmpeg' has no attribute 'Error'

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    @@batosato Sup man, yeah, you should install the ffmpeg module! For windows: phoenixnap.com/kb/ffmpeg-windows FOr linux: phoenixnap.com/kb/install-ffmpeg-ubuntu Cheers man!

  • @fredsakay994
    @fredsakay9943 ай бұрын

    It would be nice to have it as free ready-to-use web online. Not everyone is a programmer.

  • @automatalearninglab

    @automatalearninglab

    3 ай бұрын

    Right but that involves some work. I do want to have something like that running soon!

  • @udaykumarbilla6436
    @udaykumarbilla6436 Жыл бұрын

    does the code works in streamlit cloud

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    I have no idea!

  • @JoEl-jx7dm
    @JoEl-jx7dm4 ай бұрын

    how about diarization of multiple speakers to classify them?

  • @automatalearninglab

    @automatalearninglab

    4 ай бұрын

    This one does not have it, I was playing around with that using just GPT4 api and it works quite well.

  • @JoEl-jx7dm

    @JoEl-jx7dm

    4 ай бұрын

    @@automatalearninglab one more thingy, found any solutions to process real time voice rather than basic audio file input?

  • @ravindrakarande59
    @ravindrakarande5911 ай бұрын

    I am still getting this error please help FileNotFoundError: [WinError 2] The system cannot find the file specified

  • @automatalearninglab

    @automatalearninglab

    11 ай бұрын

    Did you point to the path of the file?

  • @chinnibngrm272

    @chinnibngrm272

    3 ай бұрын

    hii Did u solved the error please help me to solve

  • @hedinaouara5625
    @hedinaouara5625 Жыл бұрын

    thanks for this tutorial 1st , but I got this error "FileNotFoundError: [WinError 2] The system cannot find the file specified"

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    Your welcome, check out the discussion below where we walked about this! Cheers :)

  • @hedinaouara5625

    @hedinaouara5625

    Жыл бұрын

    @@automatalearninglab thanks for your response, i solved with run the VS code as administrator

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    @@hedinaouara5625 nice!

  • @jaybhuva5531
    @jaybhuva55314 ай бұрын

    i want that it should only recognise english language is it possible?

  • @automatalearninglab

    @automatalearninglab

    4 ай бұрын

    Yep

  • @chinnibngrm272
    @chinnibngrm2723 ай бұрын

    FileNotFoundError: [WinError 2] The system cannot find the file specified Facing this error can anyone please help me to solve the error

  • @automatalearninglab

    @automatalearninglab

    3 ай бұрын

    Feed the right path (file to the app) I think thats the issue

  • @bingolio
    @bingolio Жыл бұрын

    Any particular reason no Github repo, dont wanna share code?

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    No reason at all, the code is here: github.com/EnkrateiaLucca/openai_whisper I was just lazy before LoL

  • @bingolio

    @bingolio

    Жыл бұрын

    @@automatalearninglab Thx Good work btw!

  • @satisfyingartwork6839
    @satisfyingartwork6839 Жыл бұрын

    why don't you give app link I will use it for transcription

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    There is no app link, just the code for you to run it yourself. Sorry :(, thanks for watching! Cheers! :)

  • @satisfyingartwork6839

    @satisfyingartwork6839

    Жыл бұрын

    @@automatalearninglab OK

  • @spider279
    @spider2797 ай бұрын

    can you add timestamp and diarization to your app ?

  • @automatalearninglab

    @automatalearninglab

    7 ай бұрын

    Timestamp yes, diarization not sure. THere is a github repo called openwhisper timestamp that's pretty good.

  • @spider279

    @spider279

    7 ай бұрын

    Try to combine it with pyannote lib for diaritization ,i will be very happy if you do so 😄

  • @automatalearninglab

    @automatalearninglab

    7 ай бұрын

    nice!@@spider279

  • @spider279

    @spider279

    7 ай бұрын

    @@automatalearninglab Do you know whisper jax diarization ? if yes have you ever tested it ?

  • @rajkumarsingh8862
    @rajkumarsingh8862 Жыл бұрын

    I'm still getting a FileNotFound error

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    That's weird, did you upload a file present in your machine?

  • @rajkumarsingh8862

    @rajkumarsingh8862

    Жыл бұрын

    @@automatalearninglab ofcourse and I'm in the same directory where my code and file is Have tried many things but doesn't work Can u provide me your code please because i need to create this project asap 🙂🙂

  • @automatalearninglab

    @automatalearninglab

    Жыл бұрын

    @@rajkumarsingh8862 Sure! here it is: import streamlit as st import whisper st.title("Whisper App") # upload audio file with streamlit audio_file = st.file_uploader("Upload Audio", type=["wav", "mp3", "m4a"]) model = whisper.load_model("base") st.text("Whisper Model Loaded") if st.sidebar.button("Transcribe Audio"): if audio_file is not None: st.sidebar.success("Transcribing Audio") transcription = model.transcribe(audio_file.name) st.sidebar.success("Transcription Complete") st.markdown(transcription["text"]) else: st.sidebar.error("Please upload an audio file") st.sidebar.header("Play Original Audio File") st.sidebar.audio(audio_file)

  • @rajkumarsingh8862

    @rajkumarsingh8862

    Жыл бұрын

    @@automatalearninglab brother it's giving me error the system cannot find the file specified Winapi.createprocess errors Please help can we chat on somewhere

  • @rajkumarsingh8862

    @rajkumarsingh8862

    Жыл бұрын

    @@automatalearninglab i just want yo ask you that uow can i host or deploy this web app please tell me 🙂❤️

  • @adesigne
    @adesigne11 ай бұрын

    I got the error. Tell me how to solve pls)) 2023-09-04 18:13:53.734 Uncaught app exception Traceback (most recent call last): File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script exec(code, module.__dict__) File "/mount/src/ai/app.py", line 2, in import whisper ModuleNotFoundError: No module named 'whisper'

  • @automatalearninglab

    @automatalearninglab

    11 ай бұрын

    pip install openai-whisper

  • @forexhunter2040

    @forexhunter2040

    8 ай бұрын

    ​@@automatalearninglab will we need an api key from open ai?

  • @automatalearninglab

    @automatalearninglab

    8 ай бұрын

    yep@@forexhunter2040

  • @forexhunter2040

    @forexhunter2040

    8 ай бұрын

    I see, it is the reason i have been facing issues with running the app successful. thanks for quick reply@@automatalearninglab

  • @zocio_
    @zocio_ Жыл бұрын

    sub number 541

Келесі