WEB OF SCIENCE
SCOPUS
With the continuing improvement in deep learning methods in recent years, face recognition performance is starting to surpass human performance. However, current state-of-the-art approaches are usually trained on high-quality still images and do not work well in unconstrained video face recognition. We propose to use audio information in the video to aid in the face recognition task with mixed quality inputs. We introduce an Audio-Visual Aggregation Network (AVAN) to aggregate multiple facial and voice information to improve face recognition performance. To effectively train and evaluate our approach, we constructed an Audio-Visual Face Recognition dataset. Empirical results show that our approach significantly improves the face recognition accuracy on unconstrained videos. © 2021, Springer Nature Switzerland AG.
더보기