Qualitative comparison of motion prediction on DanceTrack (frames 508–511). PlugTrack adaptively fuses Kalman filter and data-driven predictions to better handle both linear and non-linear motions, achieving up to +10.6 IoU gains.
Abstract
The Coexistence of Motion Patterns:
A New Perspective on Real-World Tracking
A critical yet overlooked phenomenon in multi-object tracking: even in non-linear motion domains, linear patterns dominate a substantial fraction of scenarios. Our empirical analysis on DanceTrack—a dataset explicitly designed for complex non-linear motion—reveals that the Kalman filter achieves superior predictions in 34% of all tracklets (1,700 out of 5,000), outperforming specialized data-driven predictors like DiffMOT and TrackSSM.
This finding challenges the prevailing assumption that motion domains can be cleanly separated into "linear" or "non-linear" categories. Real-world tracking scenarios are inherently heterogeneous, containing a natural mixture of both motion patterns regardless of dataset characteristics. On MOT17, where linear motion predominates, the Kalman filter excels in 60.3% of cases—yet data-driven predictors still capture the remaining 40% more effectively. Conversely, on DanceTrack's dance sequences with frequent direction changes, data-driven methods dominate 66% of tracklets, but the Kalman filter surprisingly outperforms them in the remaining third.
This coexistence of motion patterns within single sequences demands a paradigm shift: rather than selecting one predictor over another, we need adaptive fusion that dynamically leverages each predictor's strengths based on instantaneous motion context. PlugTrack is built on this fundamental insight—intelligently blending classical Kalman filtering with modern data-driven approaches to handle the full spectrum of real-world motion dynamics.
Core Contributions
- Key insight: Even in non-linear datasets, Kalman filter can outperform data-driven predictors in a substantial fraction of cases (e.g., up to 34%), motivating adaptive fusion rather than replacement.
- PlugTrack framework: A plug-and-play mechanism that adaptively fuses Kalman filter and any data-driven motion predictor without modifying the base predictor.
- Multi-perceptive motion understanding: Contextual Motion Encoder (CME) that analyzes motion via:
- Motion Pattern Module (MPM),
- Prediction Discrepancy Module (PDM),
- Uncertainty Quantification Module (UQM).
- Stable training: Monte Carlo Alpha Search (MCAS) to generate pseudo-GT blending factors and prevent bias collapse.
Proposed Method (Overview)
PlugTrack consists of (1) Contextual Motion Encoder (CME) for multi-perceptive motion analysis and (2) Adaptive Blending Generator (ABG) that predicts coordinate-wise blending factors for alpha blending. MCAS is used during training to supervise optimal blending factors.
Citation
@article{kim2025plugtrack,
title={PlugTrack: Multi-Perceptive Motion Analysis for Adaptive Fusion in Multi-Object Tracking},
author={Kim, Seungjae and Lee, SeungJoon and Cho, MyeongAh},
journal={arXiv preprint arXiv:2511.13105},
year={2025}
}