How to Program Speech Synthesis in an Animatronic Mouth Using Python and Arduino

Ғылым және технология

Here's a closer look at the programming behind my animatronic mouth. Using Arduino, Python, and a few open-source libraries, I take a typed sentence and convert it into an animation sequence.
Support me on Patreon! / nilheimmechatronics
Contact: enquiries@willcogley.com
Discord Server: / discord
Open source animatronic mouth design: www.nilheim.co.uk/latest-proje...
Instructable: www.instructables.com/id/Simp...

Пікірлер: 47

@scottduede81344 жыл бұрын
As a linguist, I can say that this is awesome sauce.
@hypodyne14 жыл бұрын
I did the same thing with a talking head program. Used the same dictionary and mapped the visemes to the phonemes. You could have a conversation in real time with my app (called Ayako). Awesome that you took it further with the robotic mouth. Well done.
@stevecoxiscool4 жыл бұрын
Nice work !!!
@PhG19614 жыл бұрын
Excellent work ! Awesome !!
@Nono-hk3is4 жыл бұрын
Good work!
@TheRainHarvester4 жыл бұрын
Make your mouth the narrator in the bottom right of all your videos! How loud are the servos in real life?
@tecnicotec14 жыл бұрын
Very good vídeo and better job. Its really amazing and interesting. Thanks about your job.
@MattHollands4 жыл бұрын
Are you planning to put a skin on the mouth? Seems like there are bits around the mouth to deform the lips etcs but looks a bit odd without a skin
@kiltmaster7041
2 жыл бұрын
It did strike me as odd that he was setting servo positions for certain expressions when he doesn't even know what those expressions will look like on a completed face. Surely it would make more sense to finish the head before setting something like that? But what do I know?
@tankart36454 жыл бұрын
Looks awesome I got to say
@drudtube3 жыл бұрын
That looks great! I'm working on a same kind of controller. But I am using a stereo track that is analyzed by a Teensy Audio Board. The Right track contains the speech and controlls the Jaw. The Left track contains tones that correspond to the mouth positions of the other servo's. For example 200Hz for A, E, I. And 250Hz for B, M, P. So the actor who's gonna play with this mouth only has te make an audio track with the right tones in the right position on the Left-track.
@Skillseboy14 жыл бұрын
Such a cool video
@satyakidas71444 жыл бұрын
Beautiful video awesome robot
@jonathangriffin34864 жыл бұрын
For emulating audio you would probably need to look at how the frequency domain representation of the signal is changing over time.
@cdoebler4 жыл бұрын
Excellent work. My upcycled Teddy Ruxpin uses a much simpler set of visemes.
@robertwesterfield34547 ай бұрын
Wow thanks!
@FirstLast-wr9mh4 жыл бұрын
Fantastic
@TheMrJuoji4 жыл бұрын
instead of maping in an array each servo position for each viseme you could try to do a reverse kynematic model then just have each end position for the mouth maped in set of position , you could even map that to motion capture
@twobob4 жыл бұрын
I seem to remember using a Regular Expression to make the UK version of the Arpabet for my speech projects in the past. I think thats right. This looks like a fun project. I used a c# library under unity for this last time I faffed around. Good job overall. it's a tricky subject :)
@twobob
4 жыл бұрын
Also the iphonex does a decent job these days. apps.apple.com/us/app/face-cap/id1373155478 might be worth an eyeball for example. I did cmusphinx.github.io/2013/03/speech-recognition-on-kindle-touch-with-cmusphinx/ donkeys years ago showing that one can indeed get a decent extract of words from those tools you mentioned. You need to faff a bit though. Good luck, this one looks like fun,
@Skyliner_3694 жыл бұрын
I'm sure that if I wanted to, I could probably write a blender extension that avoids all this phoneme stuff and instead sends direct pose data from animated frame data. That way, the mouth is animated like how someone might animate a character.
@DustinWatts4 жыл бұрын
Great work Will! I was thinking, what about using an ESP32 instead of Arduino? ESP32 can run MicroPython. Therefore eliminating the need for two microcontrollers and a serial connection.
@KineticWasEpicVideos
4 жыл бұрын
NLTK will not run on an ESP32. Raspi + arduino combo would be ideal for a contained system.
@Robots-and-androids Жыл бұрын
You might be able to use text to speech software first and then convert that text that it gives you into phonemes. both microsoft and bing offer "free" speech recognition for python. I use both. I am planning on doing something similar with a human figure.
@mariotoys61734 жыл бұрын
Respect
@AltMarc4 жыл бұрын
For local ASR try DeepSpeech, on a RPI4 DeepSpeech Lite works in real time. Local Speech recognition is still tricky, works better on full sentences than single words.
@Robots-and-androids Жыл бұрын
where did you get that amazing servo tester????? I NEED one of those! --Thomas
@HeathLedgersChemist4 жыл бұрын
Could you approximate the mouth positions for Leeds by just leaving the mouth open all of the time?
@SDRIFTERAbdlmounaim3 жыл бұрын
use a loop and table instead of a bunch of 'if's those will stack up real quickly lol
@REALVIBESTV11 ай бұрын
Can I buy the code
@Allanusmonostat Жыл бұрын
So if you modded this just a hair it could be a phonetic filter.
@SpaceDave-on8uv3 жыл бұрын
2:44 Does this mean switch statements do not exist in arduino?
@Jimmyfpv_
3 жыл бұрын
Yes, but bare in mind that they are not very useful when you do logic operations within the ‘if’. You would need to map the possible results into values so that you could use the switch statements
@CyberSyntek4 жыл бұрын
Will, take a look at audioservocontroller dot com. I'm not sure how many servos it is cabable of controlling at once as I haven't grabbed one yet, I can inquire though as someone from the FB group has one. There is a few dif audio servo controllers out there but they don't seem to be very common. Scary Terry is another one. I saw someone post the hardware layout at some point so it might be easier to throw one together depending on the components. Might be the only way to get that many sevos running in sync. Anymore thoughts on the potential forum? XD *edit* Fernado from the group has a vid up with him testing it on his DARA robot. "DARA robot lip sync" if u r curious. Can ask him if he played with it anymore since. I think he had it just hooked up to the jaw and not his tongue model, he would know better mind you. :9
@ViennaMike
4 жыл бұрын
Scary Terry and similar just work off the volume of the sound source, not visemes. Adequate for a jaw on a prop or toy (I use them and I'm developing a similar thing using a Raspberry Pi), bit nowhere near what Will is doing with visemes, jaw,.face, and tongue movements.
@gone64422 жыл бұрын
Ok im making the mad hatter and march hare
@saucelessbones5872 Жыл бұрын
gona make me act up
@Bigbirddev3 жыл бұрын
People who made this *it took me 2 years to make*
@abetusk4 жыл бұрын
Unfortunately this isn't "open source". The source is available, as are the STLs, but there is no license on them and so cannot be used used, redistributed or altered legally. The commonly held definition of "open source" is (from en.wikipedia.org/wiki/Open-source_license): "... a type of license for computer software and other products that allows the source code, blueprint or design to be used, modified and/or shared ... Licenses which only permit non-commercial redistribution or modification of the source code for personal use only are generally not considered as open-source licenses." From the "terms of service" page at www.nilheim.co.uk/terms-of-service.html: " .. not to (or permit anyone else to) do or attempt any of the following: * distribute, rent, loan, lease, sell, sublicense, or otherwise transfer or offer the Service for any commercial purpose; " Which puts it in direct contradiction with the definition of "open source" most widely used. Please consider removing the term "open source" for something more appropriate like "source available", or putting the source code and STL files under a free/libre license.
@ViennaMike
4 жыл бұрын
Of course I agree that prohibiting commercial use means it's not "open source" under the common definition. But I can certainly see reasons for doing so. I do think that the developer should consider some standard license, rather than the current "terms of service" which has some clear wording errors and more importantly, use of a non-standard license restricts uses the creator intended to allow, as no one is familiar with them or exactly how the terms may be interpreted. Besides just changing to an open source license . wouldn't other options include: 1) While not intended for software, use the Creative Commons license limiting commercial use, 2) License under the VERY unrestictive GPL, with options for commercial users to pay for closed licenses. This doesn't actually prohibit commercial use provided the user abides by the terms of GPL opening up their own changes to the same terms, but may make it more attractive for commercial users to pay for a restrictive license, or 3) While I haven't seen it used, use the Commons Clause (commonsclause.com/)?
@mr.e.4844 жыл бұрын
#10
@Mr_Motor3 жыл бұрын
on L the tongue should touch the top
@MrMoka154 жыл бұрын
Are you a Furry? You could make a lot of money by seling this to them :3
@ChrisD__
4 жыл бұрын
It might be hard to fit all this stuff into a mask, but I remember there being a few people building animatronic fursuit heads like this.