Science

ImageNet, the disturbing privacy database

There are 14,197,122 images in the ImageNet database, used for research in artificial intelligence and computer vision. It dates back to 2009 and owes its fame to the fact that it was used a lot in the development of “deep learning” from 2012. But since March 11, 2021, 243,198 photos showing people have been altered. More precisely, the faces have been blurred, as indicated by an update and a research article posted on the site ofImageNet. There are sometimes several faces in the same image, resulting in more than 560,000 blurring operations. The initiative goes to the creators of this corpus, researchers at the American universities of Princeton, New Jersey, and Stanford, California.

ImageNet has the particularity of presenting real situations, scenes of everyday life, objects in their context and sometimes in use: a bottle being drunk, a traveler carrying a bag-to- back, a rider at a rodeo, etc. Not to mention meeting scenes, work situations in front of computers, classroom or office environments.

An anarchic deployment of facial recognition

Images showing people, however, are only a small part of the total. But it is enough to raise ethical questions. Because with the performance of “deep learning”, a whole series of works and uses have developed and among them, facial recognition. In the United States, this is deployed in a rather anarchic manner, with a tendency towards the absence of transparency, poorly supervised practices and biased results. These are all elements that have led several cities to ban their services from using this technology (and in the first place the police) and some leading companies in the field to stop their work in this area as long as there is no regulation. federal.

The creators of ImageNet decided that the database should not be used to train facial recognition algorithms, as it was not created for this and the subject posed a big privacy issue. Ironically, in order to locate images that included faces and annotate them, the researchers resorted to a technology that has come under fire from critics for being used by many fonts, Amazon’s Rekognition. Then they called on Internet users paid via the Amazon Mechanical Turk platform to verify and refine this work.

Blurred registrations

As they recall in their article, this is not the first time that a database has been modified in this way due to privacy concerns. Faces and license plates are blurred in Google Street View (following protests in particular in Germany) as well as in the nuScenes corpus which consists of driving scenes in the streets of Boston and Singapore and is used to train driving algorithms autonomous.

But they completed this work by evaluating the impact of this alteration on the performance of computer vision algorithms. Do they recognize as well as before the objects present in the images? 15 programs were therefore tested. Conclusion: the overall performance is very slightly lower when these tools must recognize objects on images containing blurring. The pass rate is less than 1%, 69.37% instead of 70.02%. A single algorithm (MobileNet, designed by Google researchers) sees its performance drop by more than one percent (1.01%). On the other hand, it has been found that the larger the blur area, the more it affects the visual recognition performance. Likewise, objects whose use directly involves the face, such as a harmonica or a tuba, are less well recognized than before.

A stranger consequence affects the recognition of objects which are basically ambiguous. For example, overall, algorithms have a hard time distinguishing between two similar breeds of dog, the American Eskimo and the Siberian Husky. However, once the faces are blurred on ImageNet, the recognition rate of the first drops while that of the second increases! Ditto for the green anole, known as the American chameleon, versus the green lizard, or for a basin and a bathtub. And this even though no human face appears in the photos, causing no blurring or alteration in the image! The researchers do not provide explanations for the phenomenon, just noticing it and finding it “intriguing”. This is to say, under these conditions, if designing facial recognition algorithms requires taking a few tweezers.

Back to top button

Adblock Detected

Please consider supporting us by disabling your ad blocker