Text-based AI models are vulnerable to paraphrasing attacks, researchers find
Thanks to advances in natural language processing (NLP), companies and organizations are increasingly putting AI algorithms in charge of carrying out text-related tasks such as filtering spam emails, analyzing the sentiment of social media posts and online reviews, evaluating resumes, and detecting fake news.
But how far can we trust these algorithms to perform their tasks reliably? New research by IBM, Amazon, and University of Texas proves that with the right tools, malicious actors can attack text-classification algorithms and manipulate their behavior in potentially malicious ways.
The research, being presented today at the SysML AI conference at Stanford, looks at “paraphrasing” attacks, a process that involves modifying input text so that it is classified differently by an AI algorithm without changing its actual meaning.