Tired of foreign dubbing trashing his movies, British film director Scott Mann, in collaboration with researchers from the German Max Plank Insititute, has created an artificial intelligence capable of making the movement of the actor’s mouth and his gestures reproduce with astonishing precision when folded into another idiom.
In the video demonstration of the technology, which you can see a few lines below, we see Jack Nicholson and Tom Cruise performing the key scene of Some Good Men in French, Robert De Niro speaking perfect German and Tom Hanks playing the role of Forrest Gump in German, Spanish and Japanese. In all planes there are a perfect synchronization between the mouth of the actors and the audio that is heard.
Mann comment in statements to Wired that it all started with the horror he felt when he saw the dubbed version into another language of his feature film Heist, with Robert De Niro. That dubbing changed the original dialogues of the film for others that were closer to the movement of the actors’ mouths. This for him destroyed the essence of the scenes that had cost him so much work to compose.
“I remember feeling devastated,” says Mann. “If you make a small change in a word or in a performance, you can have a big change in a character in the rhythm of the story, and in turn in the movie.”
As a result, Mann became interested in deepfake technology: an artificial intelligence algorithm that makes it possible to substitute one person’s face and gestures for another in a remarkably realistic way. This tool has been highly controversial in recent years for its unethical use. It has been used, for example, to insert the faces of famous people in pornographic videos or to create false images with which to make bullying to the students of an institute.
In his search, Mann uncovered an investigation led by Christian Theobalt, Director of the Graphics, Vision & Video research group at the Max-Planck Institute in Germany. Theobalt has created a technology, related to deepfake although much more complex, that allows modify the movements and gestures of the actors’ lips as if they were speaking in another language.
The algorithm of the German researchers takes on the one hand the facial expressions and movements of an actor and on the other the gestures of the lips of a person who is reciting the text in another language. The result is an animated 3D model that perfectly matches the actor’s face with the lip movement of the bender. This model is then inserted into the film to replace the original face of the actor.
“[Esta tecnología] it will be invisible in no time “says Mann. “People will see something and they won’t know it was originally shot in French or whatever.”
The practical application of this tool in film production can be enormous. Not only because it can make dubbing more natural and credible, but also because it would allow the directors to continue modifying the dialogues after filming is finished. It is very common in any production to have to bring the team and actors together to re-shoot scenes that for technical or artistic reasons do not quite fit into the montage. Directors like Woody Allen have used their teams to reserve themselves a couple of weeks after the main shoot is over for these types of contingencies. But ‘reshoots’ are expensive for producers and getting money to carry them out is not always possible, so a tool like this would save them a lot of headaches.
Mann is aware that actors may be shocked by this technology at first. “There is fear and amazement: they are the two reactions that I receive”, admits. Virginia Gardner, one of the actresses in his film, has also spoken for Wired: “I think it is the best way to be able, as an actor, to keep your interpretation in another language,” he says. “If you trust your director and trust that this process is only going to improve a movie, then I really don’t see any downside to him.”
Mann has taken Theobalt’s technology and is beginning to commercialize it through his Flawless company. According to the director It has already contacted several studios to create the other language versions of various films.
Although we have already seen similar technologies to this, they had not yet been able to become fully credible. It is a matter of time, and not long, that it is impossible to distinguish the manipulated videos from the real ones. In a very few years it will not only be possible to easily transform videos by processing them with tools like this, but also we will even be able to do it in real time
This is how the new envy video call works
Nvidia introduced last October a new technology based on artificial intelligence that reduces the bandwidth needed to make a video call to one tenth than is used today. In addition, this neural network is capable of modifying videos while the call is being made without loss of image quality or connection problems. This tool allows you to correct the position of the head so that it is always shown in front of the camera or even show us how an avatar that moves its mouth and gestures just like we are doing.
Nvidia talks about a rebirth of video calling, and surely he is right. There is not much left so that our video calls are not interrupted, they are always seen in a great light and we can also do them in the language we want thanks to the advance in real-time translation tools. In a few years the ‘zooms’ and the ‘skypes’ of now will seem to us experiences of the last century. Cinema, from what we are seeing, takes the same path.