How do you keep an AI’s behavior from becoming predictable?
A lot of neural networks are black boxes. We know they can successfully categorize things—images with cats, X-rays with cancer, and so on—but for many of them, we can't understand what they use to reach that conclusion. But that doesn't mean that people can't infer the rules they use to fit things into different categories. And that creates a problem for companies like Facebook, which hopes to use AI to get rid of accounts that abuse its terms of service.
Most spammers and scammers create accounts in bulk, and they can easily look for differences between the ones that get banned and the ones that slip under the radar. Those differences can allow them to evade automated algorithms by structuring new accounts to avoid the features that trigger bans. The end result is an arms race between algorithms and spammers and scammers who try to guess their rules.
Facebook thinks it has found a way to avoid getting involved in this arms race while still using automated tools to police its users, and this week, it decided to tell the press about it. The result was an interesting window into how to keep AI-based moderation useful in the face of adversarial behavior, an approach that could be applicable well beyond Facebook.