All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 14 results for this tag.
MonoRace: Winning Champion-Level Drone Racing with Robust Monocular AI
MonoRace is an autonomous drone racing system that utilizes a monocular camera and IMU to achieve champion-level performance, notably winning the A2RL 2025 competition. It features robust state estimation combining neural-network-based gate segmentation with a drone model, an offline optimization procedure, and a neural network for guidance and control.
Patch-Discontinuity Mining for Generalized Deepfake Detection
This paper introduces GenDF, a generalized deepfake detection framework that leverages a fine-tuned Vision Transformer (ViT) to identify subtle patch discontinuities in fake images and continuities in real ones. It employs deepfake-specific representation learning, feature space redistribution, and classification-invariant feature augmentation to achieve state-of-the-art generalization across various unseen deepfake patterns with minimal trainable parameters.
Your Reasoning Benchmark May Not Test Reasoning: Revealing Perception Bottleneck in Abstract Reasoning Benchmarks
This paper challenges the common interpretation of AI models' performance on abstract reasoning benchmarks like ARC, hypothesizing that visual perception limitations, not reasoning deficiencies, are the primary bottleneck. It introduces a two-stage pipeline to separate perception and reasoning, revealing that most model failures stem from perception errors and demonstrating significant performance improvements.
CNN on `Top': In Search of Scalable & Lightweight Image-based Jet Taggers
This paper explores the use of a lightweight and scalable EfficientNet architecture, combined with global jet features, for the computationally inexpensive yet competitive classification of top-quark jets. It aims to address the high computational demands of current state-of-the-art jet tagging methods like Transformers and GNNs.
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
MLLMs often lack transparent reasoning, merely providing final predictions without intermediate steps or visual evidence. This paper introduces the Visual Reasoning Tracer (VRT) task and associated benchmarks (VRT-Bench, VRT-80k) to explicitly require models to localize intermediate objects in their reasoning paths, significantly enhancing model interpretability and reliability.
Self-Supervised Learning for Transparent Object Depth Completion Using Depth from Non-Transparent Objects
This paper introduces a novel self-supervised learning method for completing depth maps of transparent objects, a challenging task for conventional sensors. By simulating transparent object depth deficits within non-transparent regions, the approach significantly reduces reliance on costly labeled data while achieving comparable performance to supervised methods.
Fast & Efficient Normalizing Flows and Applications of Image Generative Models
This PhD thesis presents innovations to improve the efficiency of normalizing flows through new architectures and algorithms, alongside applying generative models to diverse computer vision challenges such as agricultural quality assessment, privacy-preserving autonomous driving, geological mapping, art restoration, and missing traffic sign detection.
Artificial Microsaccade Compensation: Stable Vision for an Ornithopter
This paper introduces "Artificial Microsaccade Compensation," a real-time video stabilization method inspired by biological microsaccades. It enables stable camera-based perception for aggressively shaking tailless ornithopters, a significant challenge for autonomous flapping-wing robots.
A Modular Architecture Design for Autonomous Driving Racing in Controlled Environments
This paper introduces a modular architecture for autonomous vehicles designed for racing in closed circuits. It integrates perception, localization, path planning, and control subsystems to achieve real-time, precise autonomous navigation in controlled environments.
MRI Brain Tumor Detection with Computer Vision
This study explores the application of deep learning techniques in detecting and segmenting brain tumors from MRI scans, achieving significant improvements in accuracy and efficiency.