Hacking internal AI chatbots with ASCII art
Insider threats are among the most devastating types of cyberattacks, targeting a company’s most strategically important systems and assets. As enterprises rush out new internal and customer-facing AI chatbots, they’re also creating new attack vectors and risks.
How porous AI chatbots are is reflected in the recently published research, ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs. Researchers were able to jailbreak five state-of-the-art (SOTA) large language models (LLMs), including Open AI’s ChatGPT-3.5, GPT-4, Gemini, Claude, and Meta’s Llama2 using ASCII art.
ArtPrompt is an attack strategy researchers created that capitalizes on the poor performance of LLMs in recognizing ASCII art to bypass guardrails and safety measures. The researchers note that ArtPrompt only requires black-box access to targeted LLMs and fewer iterations to jailbreak an LLM. While LLMs excel at semantic interpretation, their ability to interpret complex spatial and visual recognition differences is limited. Gaps in these two areas are why jailbreak attacks launched with ASCII art succeed. Researchers wanted to further validate why ASCII art could jailbreak five different LLMs.