DGIST Scholar: Video Face Recognition with Audio-Visual Aggregation Network

Detail View

Division of Mobility Technology 2. Conference Papers

Citations

WEB OF SCIENCE

Citations

SCOPUS

Metadata Downloads

XML

Citation: Li, Qinbo. (2021-12-08). Video Face Recognition with Audio-Visual Aggregation Network. 28th International Conference on Neural Information Processing, ICONIP 2021, 150–161. doi: 10.1007/978-3-030-92273-3_13

Abstract: With the continuing improvement in deep learning methods in recent years, face recognition performance is starting to surpass human performance. However, current state-of-the-art approaches are usually trained on high-quality still images and do not work well in unconstrained video face recognition. We propose to use audio information in the video to aid in the face recognition task with mixed quality inputs. We introduce an Audio-Visual Aggregation Network (AVAN) to aggregate multiple facial and voice information to improve face recognition performance. To effectively train and evaluate our approach, we constructed an Audio-Visual Face Recognition dataset. Empirical results show that our approach significantly improves the face recognition accuracy on unconstrained videos. © 2021, Springer Nature Switzerland AG.
더보기

Show Full Item Record