Voice Cloning For Any Language | Fine-Tuning Tortoise-TTS | Part 1

Ғылым және технология

In this video I will show you how to fine-tune the Tortoise-TTS model to generate speech in any language! If you want to explore the realm of text-to-speech models beyond English, this video is for you. In this video I will show you a step-by-step process for adapting the Tortoise-TTS model for your native language, allowing you to create high-quality speech samples in your language. From acquiring or creating a suitable dataset to adjusting the fine-tuning code, everything will be covered. Plus, don't miss out on the chance to win an NVIDIA RTX 3080 Ti GPU! I hope you enjoy this video which hopefully allows you to generate speech in your language.
Register for GTC 2024 and win an NVIDIA RTX 3080 Ti (Deadline March 22nd):
nvda.ws/3Sastoe
Send Your Proof of Attendance:
forms.gle/hmXuJhvmFBp4hzrYA
GTC sessions mentioned in the video:
What’s Next in Generative AI
www.nvidia.com/gtc/session-ca...
The Fastest Stable Diffusion in the World
www.nvidia.com/gtc/session-ca...
Human-Like AI Voices: Exploring the Evolution of Voice Technology
www.nvidia.com/gtc/session-ca...
Code Used in This Video
colab.research.google.com/dri...
My Medium Article for This Video:
/ 32dbcbc34e8c
My Workstation
GPU: NVIDIA RTX 6000 Ada nvda.ws/47U7wmA
CPU: Intel Core i9-13900K amzn.to/47qDQgp
RAM: Corsair Vengeance 64 GB amzn.to/47o4S8e
Motherboard: ASRock Z790M PG amzn.to/3SxvtLS
Storage: Samsung 980 PRO 2 TB amzn.to/3u8X23Y
PSU: Corsair RM 850x amzn.to/3uhTNXS
Case: Fractal Design Meshify 2 Mini www.fractal-design.com/produc...
CPU Cooler: Noctua NH-U12A amzn.to/3Qpv4IM
Case Fan: Noctua NF-A12x25 amzn.to/3srf1lE
00:00:00 Intro
00:00:21 Promo
00:01:27 Prepare Your Dataset
00:06:00 Adjust Fine-Tuning Code
00:13:04 Clone and Install Adjusted Repository
00:14:15 Train Tokenizer For Your Language
00:18:39 Adjust Sampling Rate
00:19:32 Fine-Tuning
Stay in Touch
Medium
/ martin-thissen
LinkedIn
/ mthissen135
KZread
Of course, feel free to subscribe to my channel! :-)
Of course, financial support is completely voluntary, but I was asked for it:
/ martinthissen
ko-fi.com/martinthissen

Пікірлер: 24

  • @carlosedubarreto
    @carlosedubarreto4 ай бұрын

    Wow. incredible. I was trying to make a Tortoise TTS to work in portuguese and was lost, now I have a way to do that, thanks for sharing this info. Now I just have to wait for the other parts, and find free time to do that. that is an amazing effort from your side, since its a very complex topic 👏👏 Congrats;

  • @martin-thissen

    @martin-thissen

    4 ай бұрын

    Thanks a lot, really appreciate your nice words! :-) The next part will come soon!

  • @olcaybuyan
    @olcaybuyan4 ай бұрын

    Great video. Looking forward to the custom dataset video.

  • @martin-thissen

    @martin-thissen

    4 ай бұрын

    Glad you enjoyed the video! Awesome! :-)

  • @shashwatrajput6714

    @shashwatrajput6714

    8 күн бұрын

    @@martin-thissen Hey, I am still waiting on that video? Hahaha.. I wanna clone Elon Musk's voice and I have 3 hours of recorded audio of him as well I gathered it from podcasts. Need your help.

  • @bouchrasaidi1174
    @bouchrasaidi11744 ай бұрын

    Hello , thank you for the great tutorial and i wanted to ask when the part2 of this please ?

  • @martin-thissen

    @martin-thissen

    4 ай бұрын

    I will upload part 2 probably this weekend! :-)

  • @albertigle
    @albertigle14 күн бұрын

    Nice video Martin! How long did it take you to train the new language?

  • @shovonjamali7854
    @shovonjamali78544 ай бұрын

    Another great one, really useful but I have a question though. The dataset you used, has different speakers (like maybe even male or female too), right? So, for training the model, we can put all the wavs from different speakers under a single wavs folder, we don't need to create/manage different ones for different speakers, is my understanding correct?

  • @martin-thissen

    @martin-thissen

    4 ай бұрын

    Thanks a lot! Yes, your understanding is correct! If you wanted, you could however keep the wavs in separated folders for each speaker. You just need to make sure that paths stated in the train.txt and val.txt files is correct for all files.

  • @shovonjamali7854

    @shovonjamali7854

    4 ай бұрын

    @@martin-thissen Ahh! yes, got it. Thanks for the clarrification! 😍

  • @awnyfaris9326
    @awnyfaris93263 ай бұрын

    Hi Martin, Is it possible to train a voice in the Arabic language and then use that voice to read English text ?

  • @bouchrasaidi1174
    @bouchrasaidi11743 ай бұрын

    Hello , can i fine tune turtoise for English speech?

  • @ashuu9257
    @ashuu92574 ай бұрын

    heyy , did you implement this without any gpu?

  • @Athelstanovsky
    @AthelstanovskyАй бұрын

    Hi, Great video,Does the TTS work with an RTX2060 ?

  • @martin-thissen

    @martin-thissen

    25 күн бұрын

    Unfortunately, 6GB VRAM is probably not enough. :/ You can run it using a free Colab notebook though.

  • @BoskaPalma
    @BoskaPalma5 сағат бұрын

    My transcription txt file is around 1GB, i am running tokenizer now for about 30 minutes and don't see any progress 🤔 Running locally on m1 max mac studio

  • @BoskaPalma

    @BoskaPalma

    2 сағат бұрын

    yup, it's stuck

  • @bobsmithy3103
    @bobsmithy31034 ай бұрын

    Who's the lucky new owner for the 3080Ti?

  • @MightyMindsDev
    @MightyMindsDev2 ай бұрын

    Hello. I would like to hear how to create a dataset for your language

  • @FAITHseek
    @FAITHseek4 ай бұрын

    Please make a Fine Tune guide for MetaVoice 1B TTS

  • @martin-thissen

    @martin-thissen

    4 ай бұрын

    Will look into it! Thanks for the recommendation! :-)

  • @DM-dy6vn
    @DM-dy6vn3 ай бұрын

    Quite a few subsets in that German language data are of peculiar quality. Anastasia Solokha gave me shievers ))

  • @timothymaggenti717
    @timothymaggenti7175 күн бұрын

    WHY do you always use someone else's computer, what is the point of this....

Келесі