Facebook’s ‘Red Team’ Hacks Its Own AI Programs
Instagram encourages its billion or so users to add filters to their photos to make them more shareable. In February 2019, some Instagram users began editing their photos with a different audience in mind: Facebook’s automated porn filters.
Facebook depends heavily on moderation powered by artificial intelligence, and it says the tech is particularly good at spotting explicit content. But some users found they could sneak past Instagram’s filters by overlaying patterns such as grids or dots on rule-breaking displays of skin. That meant more work for Facebook's human content reviewers.
Facebook’s AI engineers responded by training their system to recognize banned images with such patterns, but the fix was short-lived. Users “started adapting by going with different patterns,” says Manohar Paluri, who leads work on computer vision at Facebook. His team eventually tamed the problem of AI-evading nudity by adding another machine-learning system that checks for patterns such as grids on photos and tries to edit them out by emulating nearby pixels. The process doesn’t perfectly recreate the original, but it allows the porn classifier to do its work without getting tripped up.