JIITA, vol.9 no.1, p.1070-1075, DOI: 10.22664/ISITA.2025.9.1.1070
Hyunduk Kim, Sang Heon Lee, Myoung Kyu Sohn, Jung kwang Kim
Abstract. Remote photoplethysmography (rPPG) has emerged as a promising method for contactless heart rate estimation using video sequences. In this study, we propose CrossSTSPhys, which incorporates a cross-attention mechanism between dual streams of video inputs: the original RGB stream and the NIR stream. This dual-path structure enhances the network’s ability to exploit complementary features from the two input modalities. The CrossSTSPhys architecture adopts Spatial-Temporal SwiftFormer blocks and integrates cross-attention layers at multiple hierarchical levels to exchange and refine information across the two streams. Experimental results show that CrossSTSPhys achieves superior heart rate estimation accuracy on benchmark datasets, outperforming the baseline STSPhys model and existing state-of-the-art methods.
Keywords; remote heart rate estimation, RGB-NIR fusion, visual transformer
Fullpaper: