All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 4 results for this tag.
Enhancing multimodal affect recognition in healthcare: the robustness of appraisal dimensions over labels within age groups and in cross-age generalisation
This paper investigates multimodal affect recognition in AI-assisted Computerized Cognitive Training (CCT), comparing appraisal dimensions and categorical labels across young and older adult populations. It demonstrates that appraisal dimensions consistently outperform and generalize better than categorical labels, especially across different age groups.
Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification
This paper unifies classical visual coding and modern visual token technology, proposing a framework that bridges their distinct approaches to visual information compression. It demonstrates how principles from each field can enhance the other, particularly for multimodal AI applications.
Chameleon: Adaptive Adversarial Agents for Scaling-Based Visual Prompt Injection in Multimodal AI Systems
This paper introduces Chameleon, an adaptive adversarial framework that exploits image downscaling vulnerabilities in Vision-Language Models (VLMs) to inject hidden malicious visual prompts. By employing an iterative, feedback-driven optimization mechanism, Chameleon can craft imperceptible perturbations that hijack VLM execution and compromise agentic decision-making systems.
DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation
DraCo introduces a novel interleaved reasoning paradigm, Draft-as-CoT, for text-to-image generation that leverages both textual and visual content. This approach addresses limitations of existing methods by generating low-resolution draft images for visual planning and verification, significantly improving the generation of rare attribute combinations and overall image quality.