Synthesis of human images from reflected radio waves

0

Chinese researchers have developed a method to synthesize near-photoreal images of people without a camera, using radio waves and generative antagonist networks (GANs). The system they designed is formed on real images taken in good lighting, but is capable of capturing relatively authentic ‘snapshots’ of humans even in bleak conditions – and even through major obstacles that would obscure the images. people from conventional cameras.

The images are based on “heat maps” from two radio antennas, one capturing data from the ceiling downwards and the other recording radio wave disturbances from a “standing” position.

Photos resulting from the researchers’ proof of concept experiments have a faceless “J-Horror” appearance:

RFGAN is trained on images of real people in controlled environments and on radio wave heat maps that record human activity. Having learned the characteristics of the data, RFGAN can then generate snapshots based on new RF data. The resulting image is an approximation, based on the limited resolution of the low frequency RF signals available. This process works even in dark environments and through a variety of potential obstacles. Source: https://arxiv.org/pdf/2112.03727.pdf

Form the GAN, baptized RFGAN, the researchers used corresponding data from a standard RGB camera and corresponding concatenated radio heat maps that were produced at the exact moment of capture. The images of people synthesized in the new project tend to be blurry in a manner similar to the first daguerreotype photograph, as the resolution of the radio waves used is very low, with a depth of 7.5cm and angular resolution. about 1.3 degrees.

Above, the image transmitted to the GAN network - below, the two heatmaps, horizontal and vertical, which characterize the person in the room, and which are themselves synthesized inside the architecture into a 3D representation of disrupted data.

Above, the image transmitted to the GAN network – below, the two heatmaps, horizontal and vertical, which characterize the person in the room, and which are themselves synthesized inside the architecture into a 3D representation of disrupted data.

The new journal, titled RFGAN: Human synthesis based on RF, comes from six researchers at the University of Electronic Science and Technology of China.

Data and architecture

Due to the lack of previous datasets or projects sharing this scope, and the fact that RF signals had not previously been used in a GAN image synthesis setting, researchers had to develop new methodologies. .

The basic architecture of RFGAN.

The basic architecture of RFGAN.

Adaptive normalization was used to interpret twin heat map images during training, so that they spatially match the captured image data.

The RF capture devices were millimeter wave (mmWave) radars configured in two antenna arrays, horizontal and vertical. Continuous frequency modulated waves (FMCW) and linear antennas have been used for transmission-reception.

The generator receives a source frame as an input layer, with the merged RF representation (heatmap) orchestrating the network via normalization at the convolutional layers.

Data

Data was collected from the RF signal reflections from the mmWave antenna at just 20hz, with simultaneous human video captured at a very low 10fps. Nine indoor scenes were captured, using six volunteers, each wearing different clothes for various data collection sessions.

The result was two separate data sets, RF-Activity and RF-Marche, the first containing 68,860 images of people in different positions (such as squat and walk), as well as 137,760 corresponding heat map frames; and the latter containing 67,860 human random walking frames, along with 135,720 pairs of associated heat maps.

The data, by convention, was distributed unevenly between training and testing, with 55,225 picture frames and 110,450 pairs of heat maps used for training, and the rest held for testing. RGB capture images have been resized to 320 × 180 and heat maps to 201 × 160.

The model was then trained with Adam at a constant learning rate of 0.0002 for generator and discriminator, at a time of 80 and a (very sparse) lot size of 2. Training took place via PyTorch on a consumer-grade GTX sole. -1080 GPU, of which the 8 GB of VRAM would generally be considered quite modest for such a task (explaining the small size of the batches).

Although the researchers adapted some conventional metrics to test the realism of the output (detailed in the article) and performed the usual ablation tests, there was no previous equivalent work against which to measure the performance of RFGAN.

Open interest in secret signals

RFGAN isn’t the first project to attempt to use radio frequencies to create a volumetric image of what’s going on in a room. In 2019, researchers at MIT CSAIL developed an architecture called RF-Avatar, capable of reconstructing humans in 3D from radiofrequency signals in the Wi-Fi range, under severe occlusion conditions.

In the 2019 MIT CSAIL project, radio waves were used to remove occlusions, including walls and clothing, to recreate the subjects captured in a more traditional CGI-based workflow.  Source: https://people.csail.mit.edu/mingmin/papers/rf-avatar.pdf

In the 2019 MIT CSAIL project, radio waves were used to remove occlusions, including walls and clothing, to recreate the subjects captured in a more traditional CGI-based workflow. Source: https://people.csail.mit.edu/mingmin/papers/rf-avatar.pdf

The researchers of the new paper also recognize loosely related earlier work around environmental mapping with radio waves (none of which attempted to recreate photoreal humans), which sought to estimate human speed; see through walls with Wi-Fi; assess human poses; and even recognize human gestures, among various other purposes.

Transferability and wider applicability

The researchers then set out to see if their finding was too suited to the initial capture environment and training circumstances, although the article offers little detail on this phase of the experiment. They claim:

“To deploy our model in a new scene, we don’t need to recycle the entire model from the start. We can refine the pre-trained RFGAN using very little data (around 40s of data) to achieve similar results. “

And carry on:

“The loss functions and hyperparameters are the same with the training stage. From the quantitative results, we find that the pre-trained RFGAN model can generate desirable human activity frames in the new scene after fine-tuning with only little data, which means that our proposed model has the potential to be widely used.

Based on the details of the article on this seminal application of a new technique, it is not clear whether the network the researchers created is “formed” exclusively for the original subjects, or whether the heat maps RF can infer from details such as clothes color. , because it seems to overlap the two different types of frequencies involved in the optical and radio capture methods.

Either way, RFGAN is a new way to use the powers of imitation and representation of contradictory generative networks to create an intriguing new form of surveillance – one that could potentially work in the dark and through the dark. walls, in an even more impressive way than recent efforts. to see rounded corners with reflected light.

December 8, 2021 (day of first posting), 8:04 p.m. GMT + 2 – repeated word deleted. – MY

Share.

Comments are closed.