3D facial Landmarks Detection and Head Pose Estimation using Multi-task Learning and Vision Transformer

JIITA, vol.7 no.1, p.666-670, 2023, DOI: 10.22664/ISITA.2021.7.1.666

Hyunduk Kim 1*), Sang-Heon Lee 1) , Myoung-Kyu Sohn 1)
1) Division of Automotive Technology, DGIST, Daegu, Republic of Korea

Abstract: In this paper, we present 3D facial landmarks detection and head pose estimation algorithms. To solve these two tasks simultaneously, we apply the multi-task learning technique. Our architecture consists of three components: a multi-head to deal with different tasks, a backbone to represent common features, and linear layers to output results. For the real-time process, we apply MobileViT as a backbone network. Moreover, we employ the PCGrad algorithm for stable convergence during training. To evaluate the performance of the proposed algorithm, we trained and tested on AFLW200-3D datasets, respectively. In the experiments, we demonstrate the experimental results for comparing the accuracy between MobileNetV3 and MobileViT.

Keywords: 3d facial landmarks detection; head pose estimation, multi-task learning; vision transformer

Fullpaper:

Scroll to top