This paper is currently under review

The submission is under evaluation. Please check back for updates.

BiSTS: A Biphasic Spatial-Temporal Synergy Architecture Improving Separability of Approaching Motion Recognition

Yi Zheng1, Mingshuo Xu2*, Jigen Peng3, Haiyang Li3, Zewen Wang1, and Shigang Yue2, Senior Member, IEEE
1School of Arts and Sciences, Guangzhou Maritime University, Guangzhou 510725, China 2School of Mathematics and Computing Science, University of Leicester, Leicester LE1 7RH, UK 3Machine Life and Intelligence Research Center, Guangzhou University, Guangzhou 510006, China

Abstract

Vision-based approaching motion recognition is essential for collision avoidance in resource-constrained robotic and edge systems. Compared with deep feature extraction methods, bio-inspired motion-based models offer a compact and computationally efficient solution. In particular, Lobula Giant Movement Detector (LGMD) models, inspired by the locust visual pathway, have shown strong potential due to their learning-free structure and hardware-friendly computation. Nevertheless, existing LGMD-based models remain vulnerable to non-looming motion patterns and environmental artifacts (i.e., image noise and contrast variations), resulting in missed warnings and false alarms in complex dynamic scenes. To address these challenges, we propose Biphasic Spatial-Temporal Synergy (BiSTS), a compact architecture that jointly exploits short- and long-term spatiotemporal cues to improve motion separability. BiSTS consists of two complementary modules. The Short-Term Spatiotemporal Antagonism (STSA) module with rigorous mathematical foundations suppresses transient non-motion disturbances, while the Long-Term Spatiotemporal Coupling (LTSC) module improves the separability of the approaching pattern under varying contrast conditions. We further build a benchmark by coupling a 3,447 sequence dataset with a separability-oriented metric termed SA, to quantify the response distinction between approaching motion and other patterns. Experiments on an edge device show that BiSTS outperforms state-of-the-art LGMD-based models, with a 51.8\% improvement in SA, while achieving an inference speed of 158.5 FPS and an energy efficiency of 270.2 FPS/W. The proposed framework provides a reliable, efficient, and hardware-friendly baseline for visual collision detection on edge platforms. Code is available at Project Repository.

BiSTS Architecture

LGMD Benchmark

Overview and SA metric

Videos of benchmark dataset

Benchmark evaluation

Response curves of various computational models

Third research result visualization

System response curves of various computational models across synthesized and real-world datasets. Each column represents a distinct testing scenario, from left to right: (1) a synthesized looming ball superimposed on a driving background, (2) a synthesized looming square embedded in a natural woodland scene, (3) a real-world looming colored ball against a static background, (4) a real-world vehicle collision event, and (5) a real-world non-collision driving sequence. The top row displays sample keyframes from the corresponding video sequences. Each row denotes the output response of a specific baseline model, with the bottom row representing our proposed BiSTS model. Red shaded areas indicate the ground-truth time windows of the actual looming or collision events.

Poster (Generlized by Google NotebookLLM with Gemini)

BibTeX

@article{yi2026bists,
  title={BiSTS: A Biphasic Spatial-Temporal Synergy Architecture Improving Separability of Approaching Motion Recognition},
  author={Zheng, Yi and Xu, Mingshuo and Peng, Jigen and Li, Haiyang and Wang, Zewen and Yue, Shigang},
  journal={arXiv preprint arXiv:},
  year={2026}, 
}