The object looks like a science fiction movie prop. It is in fact a very complex project of a team of the Creative Machines laboratory at Columbia University in New York (United States). Called EVA, this humanoid robot head was presented at the ICRA 2021 Robotics and Automation Conference, which took place between May 30 and June 5.
Six different types of expressions
It is made of a supple blue synthetic skin behind which a set of 25 motors, with nylon cables and 3D printed mechanisms, allow to animate as many muscles of the face, neck, eyelids as well as the eyes.
Everything fits entirely inside the robot’s skull. The computer vision system was not placed in the robot’s eyes. It relies on an external camera to which EVA is connected.
The purely mechanical aspect of the project allows for six different types of expressions: joy, sadness, fear, anger, disgust and surprise. But the main stake is to automate the generation of facial expressions by the robot, according to those of its human interlocutor. This is to make human-machine interaction more natural, especially in work contexts where humans and robots are “colleagues” or assist each other.
Learn from your own image
For this, the team used machine learning, in particular the technologies of deep learning. The principle is simple: the robot learns to identify the facial expressions of the people it is looking at in order, in return, to emit the same expression. We smile at him, he smiles. In practice, it was necessary to train EVA. The robot, at first, learned to understand how its synthetic face worked. For this, the researchers placed him for several hours in front of a camera which filmed him and returned his image to him while he chained the facial expressions at random. EVA thus learned which action of which motor activated which “muscle” in its face.
After which, the camera films the expressions of a real interlocutor. The open source software OpenPose, for posture recognition, extracts the key areas in play (around the eyes, the mouth, etc.) and places them on a static image of the robot embedded in it. The software then generates, for the robot, an image of itself adopting the facial expression which has just been captured in the third person. This has the effect of starting the engines to reproduce it. EVA followed this learning method as well via YouTube videos of eight different personalities. In the end, the robot was exposed to the behavior of 12 people for 380 expressions.
Notable fact, for deep learning, at no time were there any supervision and database labeling tasks as is practiced most of the time to train automatic recognition systems (images, shapes, people).
In the current state of the project, the robot is only imitating, which in a real situation may not be suitable. But the project has at least demonstrated that it is possible to automate facial reactions and trigger them in real time. To refine the project, estimates the researchers at the end of their article, it will be necessary to make the robot acquire “A higher level of understanding of human emotions, desires and intentions.” The eternal grail of artificial intelligence. The one that consists of making it appear less artificial.