All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 3 results for this tag.
Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification
This paper unifies classical visual coding and modern visual token technology, proposing a framework that bridges their distinct approaches to visual information compression. It demonstrates how principles from each field can enhance the other, particularly for multimodal AI applications.
Chameleon: Adaptive Adversarial Agents for Scaling-Based Visual Prompt Injection in Multimodal AI Systems
This paper introduces Chameleon, an adaptive adversarial framework that exploits image downscaling vulnerabilities in Vision-Language Models (VLMs) to inject hidden malicious visual prompts. By employing an iterative, feedback-driven optimization mechanism, Chameleon can craft imperceptible perturbations that hijack VLM execution and compromise agentic decision-making systems.
DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation
DraCo introduces a novel interleaved reasoning paradigm, Draft-as-CoT, for text-to-image generation that leverages both textual and visual content. This approach addresses limitations of existing methods by generating low-resolution draft images for visual planning and verification, significantly improving the generation of rare attribute combinations and overall image quality.