Making MLLMs Blind: Adversarial Smuggling Attacks in MLLM Content Moderation
This article explores adversarial smuggling attacks targeting Multimodal Large Language Models used in content moderation. By hiding harmful content within human-readable formats, these attacks exploit AI vulnerabilities. The study presents SmuggleBench, a benchmark of 1,700 attack examples, and discusses potential mitigation strategies.