Chinese researchers have developed a method to synthesize near-photoreal images of people without a camera, using radio waves and generative antagonist networks (GANs). The system they designed is formed on real images taken in good lighting, but is capable of capturing relatively authentic ‘snapshots’ of humans even in bleak conditions – and even through major obstacles that would obscure the images. people from conventional cameras.
The images are based on “heat maps” from two radio antennas, one capturing data from the ceiling downwards and the other recording radio wave disturbances from a “standing” position.
Photos resulting from the researchers’ proof of concept experiments have a faceless “J-Horror” appearance:
Form the GAN, baptized RFGAN, the researchers used corresponding data from a standard RGB camera and corresponding concatenated radio heat maps that were produced at the exact moment of capture. The images of people synthesized in the new project tend to be blurry in a manner similar to the first daguerreotype photograph, as the resolution of the radio waves used is very low, with a depth of 7.5cm and angular resolution. about 1.3 degrees.
The new journal, titled RFGAN: Human synthesis based on RF, comes from six researchers at the University of Electronic Science and Technology of China.
Data and architecture
Due to the lack of previous datasets or projects sharing this scope, and the fact that RF signals had not previously been used in a GAN image synthesis setting, researchers had to develop new methodologies. .
Adaptive normalization was used to interpret twin heat map images during training, so that they spatially match the captured image data.
The RF capture devices were millimeter wave (mmWave) radars configured in two antenna arrays, horizontal and vertical. Continuous frequency modulated waves (FMCW) and linear antennas have been used for transmission-reception.
The generator receives a source frame as an input layer, with the merged RF representation (heatmap) orchestrating the network via normalization at the convolutional layers.
Data was collected from the RF signal reflections from the mmWave antenna at just 20hz, with simultaneous human video captured at a very low 10fps. Nine indoor scenes were captured, using six volunteers, each wearing different clothes for various data collection sessions.
The result was two separate data sets, RF-Activity and RF-Marche, the first containing 68,860 images of people in different positions (such as squat and walk), as well as 137,760 corresponding heat map frames; and the latter containing 67,860 human random walking frames, along with 135,720 pairs of associated heat maps.
The data, by convention, was distributed unevenly between training and testing, with 55,225 picture frames and 110,450 pairs of heat maps used for training, and the rest held for testing. RGB capture images have been resized to 320 × 180 and heat maps to 201 × 160.
The model was then trained with Adam at a constant learning rate of 0.0002 for generator and discriminator, at a time of 80 and a (very sparse) lot size of 2. Training took place via PyTorch on a consumer-grade GTX sole. -1080 GPU, of which the 8 GB of VRAM would generally be considered quite modest for such a task (explaining the small size of the batches).
Although the researchers adapted some conventional metrics to test the realism of the output (detailed in the article) and performed the usual ablation tests, there was no previous equivalent work against which to measure the performance of RFGAN.
Open interest in secret signals
RFGAN isn’t the first project to attempt to use radio frequencies to create a volumetric image of what’s going on in a room. In 2019, researchers at MIT CSAIL developed an architecture called RF-Avatar, capable of reconstructing humans in 3D from radiofrequency signals in the Wi-Fi range, under severe occlusion conditions.
The researchers of the new paper also recognize loosely related earlier work around environmental mapping with radio waves (none of which attempted to recreate photoreal humans), which sought to estimate human speed; see through walls with Wi-Fi; assess human poses; and even recognize human gestures, among various other purposes.
Transferability and wider applicability
The researchers then set out to see if their finding was too suited to the initial capture environment and training circumstances, although the article offers little detail on this phase of the experiment. They claim:
“To deploy our model in a new scene, we don’t need to recycle the entire model from the start. We can refine the pre-trained RFGAN using very little data (around 40s of data) to achieve similar results. “
And carry on:
“The loss functions and hyperparameters are the same with the training stage. From the quantitative results, we find that the pre-trained RFGAN model can generate desirable human activity frames in the new scene after fine-tuning with only little data, which means that our proposed model has the potential to be widely used.
Based on the details of the article on this seminal application of a new technique, it is not clear whether the network the researchers created is “formed” exclusively for the original subjects, or whether the heat maps RF can infer from details such as clothes color. , because it seems to overlap the two different types of frequencies involved in the optical and radio capture methods.
Either way, RFGAN is a new way to use the powers of imitation and representation of contradictory generative networks to create an intriguing new form of surveillance – one that could potentially work in the dark and through the dark. walls, in an even more impressive way than recent efforts. to see rounded corners with reflected light.
December 8, 2021 (day of first posting), 8:04 p.m. GMT + 2 – repeated word deleted. – MY