Articles tagged with: Multimodal AI

Showing 4 results for this tag.

Advanced·Apr 29, 2026

Enhancing multimodal affect recognition in healthcare: the robustness of appraisal dimensions over labels within age groups and in cross-age generalisation

This paper investigates multimodal affect recognition in AI-assisted Computerized Cognitive Training (CCT), comparing appraisal dimensions and categorical labels across young and older adult populations. It demonstrates that appraisal dimensions consistently outperform and generalize better than categorical labels, especially across different age groups.

Affect Recognition

Multimodal AI

Healthcare AI

Advanced·Jan 27, 2026

Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification

This paper unifies classical visual coding and modern visual token technology, proposing a framework that bridges their distinct approaches to visual information compression. It demonstrates how principles from each field can enhance the other, particularly for multimodal AI applications.

Visual Compression

Multimodal AI

Information Theory

Advanced·Dec 3, 2025

Chameleon: Adaptive Adversarial Agents for Scaling-Based Visual Prompt Injection in Multimodal AI Systems

This paper introduces Chameleon, an adaptive adversarial framework that exploits image downscaling vulnerabilities in Vision-Language Models (VLMs) to inject hidden malicious visual prompts. By employing an iterative, feedback-driven optimization mechanism, Chameleon can craft imperceptible perturbations that hijack VLM execution and compromise agentic decision-making systems.

Adversarial Attacks

Multimodal AI

AI Security

Vision-Language Models

Advanced·Dec 3, 2025

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

DraCo introduces a novel interleaved reasoning paradigm, Draft-as-CoT, for text-to-image generation that leverages both textual and visual content. This approach addresses limitations of existing methods by generating low-resolution draft images for visual planning and verification, significantly improving the generation of rare attribute combinations and overall image quality.

Multimodal AI

Text-to-Image

Multimodal Large Language Models

Text-to-Image Generation

Chain-of-Thought

Multimodal LLMs

Machine Learning

Research Guy

All Tags

Research Guy

Understand New Research — Instantly

Daily AI-generated explanations of the latest arXiv papers.

Research Guy

Research Guy

All Tags

Research Guy

Research Guy

Articles tagged with: Multimodal AI