Computer vision has an impressive track record. He has the superhuman ability to recognize people, faces and objects. He can even recognize many different types of actions, but still not as well as humans.
But there are limits to its performance. Machines have a particularly tough time when people, faces, or objects are partially obscured. And when light levels drop too much, they are effectively blinded, just like humans.
But there is another part of the electromagnetic spectrum which is not limited in the same way. Radio waves fill our world, be it night or day. They pass through walls easily and are both transmitted and reflected by human bodies. Indeed, researchers have developed various ways to use Wi-Fi radio signals to see behind closed doors.
But these radio vision systems have some shortcomings. Their resolution is low; images are noisy and filled with distracting reflections, making it hard to figure out what’s going on.
In this sense, radio images and visible light images have complementary advantages and disadvantages. And this raises the possibility of using the strengths of one to overcome the shortcomings of the other.
Enter Tianhong Li and his colleagues at MIT, who have found a way to teach a radio vision system to recognize people’s actions by training it with visible light images. The new radiovision system can see what people are doing in a wide range of situations where visible light imaging fails. “We introduce a neural network model capable of detecting human actions through walls and occlusions, and in poor lighting conditions,” report Li and co.
The team method uses an interesting trick. The basic idea is to record video images of the same scene using visible light and radio waves. Machine vision systems are already able to recognize human actions from visible light images. The next step therefore consists in correlating these images with the radio images of the same scene.
But the challenge is to ensure that the learning process focuses on human movement rather than other features, such as the background. Li and co therefore introduce an intermediate stage in which the machine generates 3D stickman models that reproduce the actions of the people in the scene.
“By translating the input into an intermediate skeleton-based representation, our model can learn from both vision-based and radio-frequency-based datasets, and allow both tasks to help each other.” said Li and company.
In this way, the system learns to recognize actions in visible light, and then to recognize the same actions taking place in the dark or behind walls, using radio waves. “We show that our model achieves accuracy comparable to vision-based action recognition systems in visible scenarios, while continuing to perform accurately when people are not visible,” the researchers say.
This is interesting work with great potential. The obvious applications are in scenarios where visible light images fail, in low light conditions and in camera.
But there are also other applications. A problem with visible light images is that people are recognizable, which raises privacy concerns.
But a radio system doesn’t have the resolution for facial recognition. Identifying actions without recognizing faces does not raise the same privacy concerns. “It can bring action recognition to people’s homes and enable its integration into smart home systems,” Li and co say. This could be used to monitor an elderly person’s home and alert the appropriate services in the event of a fall, for example. And it would do so without too many privacy risks.
This is beyond the capability of today’s vision-based systems.
Ref: arxiv.org/abs/1909.09300: Making the invisible visible: action recognition through walls and occlusions