Detailed conference program
July 26 (Saturday)
Registration (8:30–)
Opening (9:00–9:30)
IAPR Invited Talk 1 (9:30–10:30)
Chair: TBD
Oral 1-1: 3D Vision and Scene Understanding (10:45–11:45)
Chair: TBD
- O1-1-1 Leveraging 2D-VLM for Label-Free 3D Segmentation in Large-Scale Outdoor Scene Understanding
- O1-1-2 FMDP: Leveraging a Foundation-Model for Dual-Pixel Disparity Estimation
- O1-1-3 Capturing Fine-Grained Alignments Improves 3D Affordance Detection
Poster 1 snapshot (13:00–13:30)
Chair: TBD
Poster 1 (13:30–15:00)
Posters from Oral Sessions 1-1, 1-2, 1-3, and 3-1 are also included in this session.
- P1-01 Noise-based Regularized Training for Diffusion Models
- P1-02 Bidirectional Action Sequence Learning for Long-term Action Anticipation with Large Language Models
- P1-03 RGB-Thermal Cooperative Robot Vision Strategy for Multi-Person Tracking in Both Well-Lit and Low-Light Scenes
- P1-04 A Minimalist Approach to HDR Image Compression with Applications to Low-Light Image Enhancement
- P1-05 Magic for the Age of Quantized DNNs
- P1-06 A Lightweight Convolutional Neural Network for Underwater Image Quality Enhancement
- P1-07 Real-Time Fire Detection Using Hybrid Feature Extraction: Color, Texture, and Motion Analysis
- P1-08 Viewpoint-Aware 3D Dense Captioning
- P1-09 Transformer-based Visual Grounidng with Inter-Modality Cross-Attention
- P1-10 Unsupervised 3D Braided Hair Reconstruction from a Single-View Image
- P1-11 Enhancing Reliability of Medical Image Diagnosis through Top-rank Learning with Rejection Module
- P1-12 Pre-Manipulation Alignment Prediction for Open-Vocabulary Object Manipulation Based on End-Effector Trajectories
- P1-13 Domain Generalization of Pathological Image Segmentation by Patch-Level and WSI-Level Contrastive Learning
- P1-14 Dynamic Age Estimation via Mixture of Experts: Bridging Semantic and Structural Models
- P1-15 Temporal Conditioning for Realistic Performance Video Generation from Instrumental Sounds
- P1-16 Binned MSE for Imbalanced Dust Density Estimation
- P1-17 IG-ODAM: Instance-Aware Visual Explanations for Object Detection with Integrated Gradients
- P1-18 ShadowAug: A Multi-Strategy Data Augmentation Method for Image Shadow Removal
- P1-19 Any-scale Object Detection using Arbitrary-scaled Images
- P1-20 3D Object Reconstruction Through Integration of Hyperspectral and RGB-D Imaging
- P1-21 Parallel Sampling of Diffusion Models on SO(3)
- P1-22 Style-Preserving Diffusion for Scene Text Editing
- P1-23 FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models
Oral 1-2: Medical Imaging and Microscopy (15:00–16:00)
Chair: TBD
- O1-2-1 Self-supervised 3D Image Deburring for Lattice Light Sheet Microscopy
- O1-2-2 ZECO: ZeroFusion Guided 3D MRI Conditional Generation
- O1-2-3 Advancing Disease Detection Using Deep Learning in Low-Data Environments
Oral 1-3: Image Synthesis and Generation (16:15–17:15)
Chair: TBD
- O1-3-1 DLSF: Dual-Layer Synergistic Fusion for High-Fidelity Image Synthesis
- O1-3-2 Data-driven Head Motion Generation through Natural Gaze-Head Coordination
- O1-3-3 Low-Latency Real-Time Audio-Driven Talking Head Generation Based on Future Speech Feature Prediction
July 27 (Sunday)
Registration (8:30–)
IAPR Invited Talk 2 (9:00–10:00)
Chair: TBD
Technical event (10:15–11:00)
- TE-1 MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results
- TE-2 Intersection-based Ensemble for Small Multi-Object Tracking in Challenging Environments
- TE-3 Boosting Small Object Tracking via Collaborative Detection Transformer
- TE-4 Confidence-based Adaptive Weighted Boxes Fusion for Multi-Object Tracking of Small Birds
- TE-5 Joint Q&A session for all presenters
- TE-6 Award announcement and closing remarks
Oral 2-1: Domain Adaptation and Segmentation (11:15–12:15)
Chair: TBD
- O2-1-1 Leveraging Masked Feature and Consistency Regularization for Unsupervised Domain Adaptation Based Semi-Supervised Semantic Segmentation
- O2-1-2 MoExDA: Domain Adaptation for Edge-based Action Recognition
- O2-1-3 MobileSACNet: Lightweight Spectral-Spatial Compression for Hyperspectral Segmentation in Autonomous Driving Systems
Poster 2 snapshot (13:30–14:00)
Poster 2 (14:00–15:30)
Posters from Oral Sessions 2-1, 2-2, and 2-3 are also included in this session.
- TE-1 MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results
- TE-3 Boosting Small Object Tracking via Collaborative Detection Transformer
- P2-01 Age Prediction of Komatsuna using Hu Moments with Neural Networks for Small Datasets
- P2-02 Revisiting Self-Generating Simple Figure Patterns for Learning Microscopy Image Segmentation
- P2-03 Semantic Segmentation of iPS Cells: Case Study on Model Complexity in Biomedical Imaging
- P2-04 Snapshot Hyperspectral Imaging using Petrographic Thin Section
- P2-05 Cross-Modal Knowledge Distillation from First-Person Views to Third-Person BEV Maps for Universal Point Goal Navigation
- P2-06 Impact of Optical System Size on Robustness in Laser Speckle Authentication
- P2-07 Guidelines for Optimizing Optical System Design for Laser Speckle Authentication
- P2-08 Modifying Generative Distributions in Latent Diffusion Models to Improve Alignment with Desired Properties
- P2-09 Real-Time LiDAR Point Cloud Densification for Low-Latency Spatial Data Transmission
- P2-10 Statistic Temporal Checking and Depth Layering based Multi-Object Relative Size Estimation from Monocular Video
- P2-11 Edge-Augmented HLAC and Gaussian Distribution-Based Weighted Feature Extraction for 1-ms Abnormal Detection System in Logistics
- P2-12 Geometrically Constrained Position Estimation through Low-level Tracking
- P2-13 Very Similar Appearance Feature Classification for Chronic Endometritis Diagnosis in Hysteroscopy Images
- P2-14 Detecting Hand-Object Interaction Based on Movements in Hand Surrounding Region
- P2-15 Multi-Person Pose Estimation Evaluation Using Optimal Transportation and Improved Pose Matching
- P2-16 Object State Recognition in Cooking Videos through End State Frames Analysis
- P2-17 Detection of Medial Epicondyle Avulsion in Elbow Ultrasound Images via Bone Structure Reconstruction
- P2-18 Point Cloud Edge Extraction Based on 3D Point Separability Filter with Spherical Mask
- P2-19 Gaze Attention Estimation for Medical Environments
- P2-20 Modality Selection and Skill Segmentation via Cross-Modality Attention
- P2-21 Efficient Skeleton-Based Action Recognition using Superposed Shape Subspace
Oral 2-2: Computer Vision for Real-World Applications (15:30–16:30)
Chair: TBD
- O2-2-1 Supervised Domain Adaptation from Scene Text Recognition for Licence Plate Recognition
- O2-2-2 CLIP-Guided Cross-Modal Feature Fusion based Few-Shot Learning for Nighttime Pavement Defect Detection
- O2-2-3 Decoupled Scale and Appearance for Optimal Deep Diamond ReID
Oral 2-3: Image Enhancement and Restoration (16:45–17:45)
Chair: TBD
- O2-3-1 IRR-RADA: A Reflection-Aware Saliency Map and Adaptive Curriculum Learning Based Data Augmentation Method for Image Reflection Removal
- O2-3-2 Simple Yet Effective Way to Use Polarimetric Information in Stereo Matching
July 28 (Monday)
Registration (8:45–)
Oral 3-1: Scene Understanding and Human-Computer Interaction (9:15–10:15)
Chair: TBD
- O3-1-1 Scene Recognition Meets Knowledge Graphs: Enhancing Robustness to Object Diversity
- O3-1-2 An Automatic Rating Approach Using Machine Learning and Feature Selection for Finger Tapping in MDS-UPDRS Part III
- O3-1-3 Analysis and Prediction of Attractive Fonts on Title-overlaid Food Images
IAPR Invited Talk 3 (10:30–11:30)
Chair: TBD