All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 16 results for this tag.
eLasmobranc Dataset: An Image Dataset for Elasmobranch Species Recognition and Biodiversity Monitoring
This paper introduces the eLasmobranc Dataset, a new, curated image collection designed to improve fine-grained identification of elasmobranch species (sharks and rays) for conservation and biodiversity monitoring. It addresses limitations of existing datasets by providing high-quality, out-of-water images with expert-validated annotations and detailed metadata to support AI system development.
Fusion-CAM: Integrating Gradient and Region-Based Class Activation Maps for Robust Visual Explanations
Fusion-CAM is a novel framework that unifies gradient-based and region-based Class Activation Map (CAM) methods through a dedicated fusion mechanism. It aims to provide robust and highly discriminative visual explanations by first denoising gradient-based maps and then adaptively combining them with region-based maps to enhance class coverage and precision, outperforming existing CAM variants.
MonoRace: Winning Champion-Level Drone Racing with Robust Monocular AI
MonoRace is an autonomous drone racing system that utilizes a monocular camera and IMU to achieve champion-level performance, notably winning the A2RL 2025 competition. It features robust state estimation combining neural-network-based gate segmentation with a drone model, an offline optimization procedure, and a neural network for guidance and control.
Patch-Discontinuity Mining for Generalized Deepfake Detection
This paper introduces GenDF, a generalized deepfake detection framework that leverages a fine-tuned Vision Transformer (ViT) to identify subtle patch discontinuities in fake images and continuities in real ones. It employs deepfake-specific representation learning, feature space redistribution, and classification-invariant feature augmentation to achieve state-of-the-art generalization across various unseen deepfake patterns with minimal trainable parameters.
Your Reasoning Benchmark May Not Test Reasoning: Revealing Perception Bottleneck in Abstract Reasoning Benchmarks
This paper challenges the common interpretation of AI models' performance on abstract reasoning benchmarks like ARC, hypothesizing that visual perception limitations, not reasoning deficiencies, are the primary bottleneck. It introduces a two-stage pipeline to separate perception and reasoning, revealing that most model failures stem from perception errors and demonstrating significant performance improvements.
Self-Supervised Learning for Transparent Object Depth Completion Using Depth from Non-Transparent Objects
This paper introduces a novel self-supervised learning method for completing depth maps of transparent objects, a challenging task for conventional sensors. By simulating transparent object depth deficits within non-transparent regions, the approach significantly reduces reliance on costly labeled data while achieving comparable performance to supervised methods.
CNN on `Top': In Search of Scalable & Lightweight Image-based Jet Taggers
This paper explores the use of a lightweight and scalable EfficientNet architecture, combined with global jet features, for the computationally inexpensive yet competitive classification of top-quark jets. It aims to address the high computational demands of current state-of-the-art jet tagging methods like Transformers and GNNs.
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
MLLMs often lack transparent reasoning, merely providing final predictions without intermediate steps or visual evidence. This paper introduces the Visual Reasoning Tracer (VRT) task and associated benchmarks (VRT-Bench, VRT-80k) to explicitly require models to localize intermediate objects in their reasoning paths, significantly enhancing model interpretability and reliability.
A Modular Architecture Design for Autonomous Driving Racing in Controlled Environments
This paper introduces a modular architecture for autonomous vehicles designed for racing in closed circuits. It integrates perception, localization, path planning, and control subsystems to achieve real-time, precise autonomous navigation in controlled environments.
Artificial Microsaccade Compensation: Stable Vision for an Ornithopter
This paper introduces "Artificial Microsaccade Compensation," a real-time video stabilization method inspired by biological microsaccades. It enables stable camera-based perception for aggressively shaking tailless ornithopters, a significant challenge for autonomous flapping-wing robots.