Thorsten-Voice
Күн бұрын
23,675
1

Using high quality local Text to Speech in Python with Coqui TTS API

Ғылым және технология

Tutorial showing you how to setup high quality local text to speech in a Python script using Coqui TTS API.
Please subscribe to my channel 😊.
kzread.info...
00:00 Intro
00:50 Preparations
02:00 Create TTS HelloWorld script
05:00 Testing the script
06:30 More info on codebase
07:18 Outro
#texttospeech #python #api #privacy
---
- www.Thorsten-Voice.de
- github.com/thorstenMueller/Th...

Пікірлер: 71

@Talaxianer Жыл бұрын
I can really recommend to activate GPU computing. On my system it lead to a 6x speedup! tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", gpu=True)
@ThorstenMueller
Жыл бұрын
True, true. Thanks for putting this important information into spotlight 👍.
@pfuhad3760
Жыл бұрын
Is it possible to create realtime text to speech using gpu .
@ThorstenMueller
Жыл бұрын
@@pfuhad3760 You mean an RTF (Real-time-factor) Yes, that's possible. Obviously depending on your hardware setup. I've made an audio comparison including some RTF values. This might give you some overview of performance. kzread.info/dash/bejne/eqOe17imh5iyhaw.html
@toykotokyoto Жыл бұрын
Very practical, thanks for sharing!
@ThorstenMueller
Жыл бұрын
Thanks for your nice feedback, Josh 😊.
@christopherwoods3339 Жыл бұрын
Hello - thank you very much for this work and this video. I was working on a personal project and ran into some issues with another TTS package and I've been feeling pretty bummed but now I'm thinking this might work for me so I'm gonna give it a go and watch more of your videos. Thank you!
@ThorstenMueller
Жыл бұрын
Thanks for your feedback and i really hope you succeed with your personal project 😊.
@zerthura8500 Жыл бұрын
Beautiful way of doing! Great! Thank you so much
@ThorstenMueller
Жыл бұрын
Thanks a lot for your kind feedback 😊.
@johnpaulvela6816 Жыл бұрын
Finally something better than espeak 🙏 Waaay better
@ThorstenMueller
Жыл бұрын
Glad my video helped you finding a "waaay" better service 😁
@AiEdgar Жыл бұрын
Good Video, It will be nice to see what other extra stuff we can do with the api or it only does synthetization without other options? for example tortoise can do stuff like changing the mood of the speaker if you say I am happy in the prompt. I wonder if Coqui have modulation options
@ThorstenMueller
Жыл бұрын
Thanks for your comment. At the moment the API can just synthesize audio from a pretrained model as it is. I think what you are looking for is "Speech Synthesis Markup Language (SSML) which is not supported by Coqui at the moment.
@PooperScooperTrooper Жыл бұрын
This was very helpful...seems that chatGPT doesn't know about Coqui. It's quite incredible when you compare this to 'say' on the Amiga or the talking program on the Atari ST, well, it's quie incredible full stop in all honesty.
@ThorstenMueller
Жыл бұрын
Thanks for your feedback and i agree, it's impressive how fast technology has moved (since days of Atari ST) and still is moving forward 😊.
@johnwhite721825 күн бұрын
Wunderbar. Danke.
@ThorstenMueller
23 күн бұрын
Sehr gerne 😊.
@florishol75524 ай бұрын
Amazing content @Thorsten-Voice. However, the processing time is 75 seconds for creating a wav file with 2 sentences input and the cloned voice created from a reference wav file. Is there anything I can do to make it run faster? Otherwise it is not usable in a web application for instance. I've seen your pinned gpu=True comment, but are there also other ways? Or is one of the models significantly faster?
@ThorstenMueller
4 ай бұрын
Thanks for your nice feedback on my content 😊. Some models might be faster than others, but if you are looking for a nice quality and performance maybe take a look to Piper TTS. Do you know my videos about Piper? kzread.info/dash/bejne/ooCGl6OskqazeNY.html
@florishol7552
4 ай бұрын
@@ThorstenMueller thanks, will definitely check it out! How much faster is piper compared to coqui tts? Is it like 10x faster or should I lower my expectations?
@raphaelbird1423 Жыл бұрын
I have another issue, it seems I get this error, raise Exception(" [!] No espeak backend found. Install espeak-ng or espeak to your system.") Exception: [!] No espeak backend found. Install espeak-ng or espeak to your system. I did install espeakng 1.0.2 but the error remained
@ThorstenMueller
Жыл бұрын
This is weird 🤔. Maybe you can ask in Coqui TTS community if someone encountered a similar problem.
@MaximBordyug.3 ай бұрын
I would kill for a 'Windows version tutorial for someone who doesn't know Python :)'
@ThorstenMueller
3 ай бұрын
No need to do so 😉. I've added your idea on my growing TODO list.
@MaximBordyug.
3 ай бұрын
@@ThorstenMueller I have a five-minute KZread video in English that I want to automatically translate and voice into Russian. While I can easily translate the subtitles to Russian, voicing it as well might be overwhelming for me :)
@entl_4538 Жыл бұрын
Is it possible to change a vocoder in this code example, as well as, is male voice available for english in TTS? Thanks for the answer
@ThorstenMueller
Жыл бұрын
IMHO there's automatically associated the best vocoder for each TTS model. You can see a list of all supported models (for each language) and vocoder here: github.com/coqui-ai/TTS/blob/dev/TTS/.models.json There will be soon a possibility to use the API with your local TTS models, too. This brings more flexibility including different vocoders.
@ThorstenMueller
Жыл бұрын
Btw. do you know my audio sample comparison videos? Maybe you can check which english model you like most. * kzread.info/dash/bejne/eqOe17imh5iyhaw.html * kzread.info/dash/bejne/iKKe2JSFY5TLqbQ.html
@Rocketman1105Gaming
Ай бұрын
Did you ever find a way ti change the vocoder? I would like to experiment with them but I can't seem to figure it out either.
@ywueeee Жыл бұрын
new subscriber here, great videos! i was wondering if you know if coqui can do speech-to-speech i.e for ex: instead of you narrating this video in your voice, can you do it in obama's voice? if not, are you aware of any other tool that's able to do that, which is open-source? a video on this topic would be appreciated :)
@ThorstenMueller
Жыл бұрын
Thanks for joining my voice tech journey 😊. Coqui offers general STT support, but not in that specific "Obama" based context 😉. github.com/coqui-ai/stt
@ywueeee
Жыл бұрын
@@ThorstenMueller thanks for the link, i was looking for speech-to-speech directly, not speech to text
@ThorstenMueller
Жыл бұрын
@@ywueeee Ah sorry, did not get this right. IMHO speech2speech is not supported by Coqui at the moment. As i didn't play around on that field i cannot name open source tools/communities working on that.
@weebprogrammer2979 Жыл бұрын
I have trained custom model. How to load it in the api?
@ThorstenMueller
Жыл бұрын
imho this is not supported, yet. According to this github.com/coqui-ai/TTS/blob/dev/TTS/api.py#L78 code files will be downloaded from published TTS models. But might be worth asking on Coqui community. But you can run your custom model with "tts-server" and access it. But this might only be a workaround.
@AdrianFlores-dc3vu Жыл бұрын
How can I change the tonality or modify aspects regarding the VOICE from pyhton
@ThorstenMueller
Жыл бұрын
IMHO the Coqui TTS models currently do not have the option to modify these aspects. But maybe try asking on Coqui community for any updates on that.
@mir_intizam Жыл бұрын
Is it possible to run the TTS model we developed in google colab using tacotron 2 with this?
@ThorstenMueller
Жыл бұрын
Not yet. At the moment the API can just handle public models. But there's a change in progress at Coqui to support this. github.com/coqui-ai/TTS/pull/2303/files
@mir_intizam
Жыл бұрын
@@ThorstenMueller With which synthesizer code can I run the speech model developed by tacotron 2 in google colab by using python on windows?
@user-lo6lz1oj7e8 ай бұрын
Hallo, dein Video ist echt gut gemacht und es hat alles super funktioniert. Ich habe eine Frage: Gibt es eine Möglichkeit meine eigene erstellte Stimme zu verwenden (du hast es ja auch geschafft, deine Stimme zu nutzen)? Also quasi mein eigenes tts_model verwenden. Wenn ja, wie erstellt man so ein model überhaupt. Ich bin mir sicher du kannst helfen. Danke für das Video und vielleicht auch eine Antwort
@ThorstenMueller
8 ай бұрын
Hallo und vielen Dank für dein tolles Feedback 😊. Ja, du kannst mit Coqui TTS deine eigene Stimme klonen und dann mit der lokalen API verwenden. Kennst Du meine deutschsprachige Videoserie kzread.info/dash/bejne/nI6m1dyYY82XZsY.html Oder alternativ meine englischsprachiges Video dazu? kzread.info/dash/bejne/Zo2ImrmThMLeZJs.htmlsi=eBJX4oo5UpWaGISd
@user-lo6lz1oj7e
8 ай бұрын
Danke für das verlinken der Videos. Damit werde ich es bestimmt schafen@@ThorstenMueller
@raphaelbird1423 Жыл бұрын
Does this not work with the newer version of python?? I'm using 3.11.3 looks like you need 3.9
@ThorstenMueller
Жыл бұрын
Python 3.11 isn't supported by Coqui TTS, yet (github.com/coqui-ai/TTS/). You need 3.7 up to 3.10.
@shailendrarathore445 Жыл бұрын
How to clone specific voice for hindi language in tortoise or coqui model.
@ThorstenMueller
Жыл бұрын
Do you know my step by step tutorial on voice cloning using Coqui TTS? kzread.info/dash/bejne/Zo2ImrmThMLeZJs.html This should work for Hindi, too.
@shailendrarathore445
Жыл бұрын
@@ThorstenMueller its so time consuming method i had seen nanomomad video he had made a tool but i need a tutorial forr hindi language as i am newbie thanks for the tutorial I will see it if something happens but Refer to nanomomad channel and make a video tutorial for hindi language as step by step he had given only in theoretical way..
@deprome999 Жыл бұрын
can I use my own dataset? if so, how to create and connect it
@ThorstenMueller
Жыл бұрын
Sure, but you have to create/train your own TTS model first. I've made two step-by-step tutorials on that. - kzread.info/dash/bejne/Zo2ImrmThMLeZJs.html (Complete walkthrough) - kzread.info/dash/bejne/lH6e3LWoj8m1g5s.html (Special tutorial for Windows users)
@TheAiConqueror Жыл бұрын
Heisst das man kann Coqui lokal auf seinem Rechner laufen lassen ohne API? Ich sehe immer mehr so Dienste wie ElevenLAB usw.. echt schade gibt es nicht auch lokale möglichkeiten um stimmen zu klonen usw. Also was jeder anwenden kann 😅
@ThorstenMueller
Жыл бұрын
Danke für deinen Kommentar und ja, Coqui TTS läuft komplett lokal. Die API läuft auf deinem lokalen PC und benötigt keinen Cloud Dienst. War das mit deiner Frage gemeint?
@TheAiConqueror
Жыл бұрын
@@ThorstenMueller Hey Thorsten, unter API verstehe ich einen Schlüssel der verwendet wird um über ein Programm, Kommandozeile ect. über das Internet auf gewisse Funktionen zuzugreifen. Viele dieser Dienste die Stimmen Klonen stellen ja einen API Key zur Verfügung. Ich verstehe das sie gewisse Techniken wie die stimmen repliziert werden nicht öffentlich machen. 💸 Es gibt ja ein Projekt von dem Franzosen CorentinJ glaube ich. Ich habe auch einen Udemy Kurs gekauft um das Tool richtig zu verwenden. Und du bist fast der einzige der in der Deutschen Sprache ein TTS Dataset zur verfügung stellt😅 was echt krass ist! Könnte ich dir vieleicht per Mail schreiben?
@ThorstenMueller
Жыл бұрын
@@TheAiConqueror Eine API bedeutet nicht zwangsweise ein "Schlüssel" oder "Cloud". Die Coqui TTS API läuft offline bei dir auf dem PC und benötigt keinerlei Schlüssel oder Authentifizierung. Du kannst mich gerne anschreiben. www.thorsten-voice.de/kontakt/
@Global_Info10 ай бұрын
Hello. Can I get the source files in github? If not, is there any deployed exe file of these source files?
@ThorstenMueller
10 ай бұрын
Hello, you mean the script from here? kzread.info/dash/bejne/f42GybmwmdaXes4.html This is mainly based from here. tts.readthedocs.io/en/latest/inference.html Does this help you? If not i might upload it on my Github repo and add link to the description.
@PlayGameToday2 ай бұрын
How to make audio output to 44.1kHz? I have bad quality audio - only 24kHz sounds like walkie-talkie radio speech..
@ThorstenMueller
2 ай бұрын
Coqui TTS models are trained on a voice dataset with a specific sample rate and will generate output in the same sample rate. Do you train your own model or use an existing model? Maybe use tools like ffmpeg to adjust sr after generation.
@martinparidon905610 ай бұрын
Kannst du darauf eingehen, wie Umlaute funktionieren und ob, und wenn ja wie, inline-Englisch funktioniert? Danke!
@ThorstenMueller
10 ай бұрын
Prinzipiell funktionieren deutsche Umlaute. Allerdings hatte ich manchmal Probleme mit eSpeak-ng unter Windows und deutschen Umlauten. Mit eSpeak, also ohne -ng ging dann. Habe hier einen Lösungsvorschlag dokumentiert: www.thorsten-voice.de/einfach-loslegen#umlaut
@martinparidon9056
10 ай бұрын
@@ThorstenMueller danke für den Hinweis
@gammingtoch25911 ай бұрын
how can convert a book from a txt file? I cant use it for longs text. can u help me o share me a code please.
@ThorstenMueller
11 ай бұрын
That's a really imprtant point, but Coqui TTS doesn't support texfile input by default. But there's someone working on a script for that, but i haven't played around with it. github.com/coqui-ai/TTS/discussions/1622
@gammingtoch259
11 ай бұрын
@@ThorstenMueller hey bro, i ve tested it and also opened a issue, it script dont work for me
@ThorstenMueller
10 ай бұрын
@@gammingtoch259 I've put long text input with or without this script on my TODO list. So hopefully i can come back to this topic in (nearer) future.
@KarlBretz-sp5ni7 ай бұрын
Text to speech and no ads offline this app is a total disaster
@KarlBretz-sp5ni7 ай бұрын
Bs
@vahedmamghaderi12674 ай бұрын
I downloaded the model tts_models/en/ljspeech/tacotron2-DCA, how to use both British and American pronunciation?
@ThorstenMueller
4 ай бұрын
Are you looking for a way to dynamically change the accents?