DGIST Scholar: Near Infra-Red Ray based Eye-tracking Method using Convolutional Neural Networks for Head Mounted Displays

Department of Electrical Engineering and Computer Science Theses Master

Cited time in webofscience

Cited time in scopus

Near Infra-Red Ray based Eye-tracking Method using Convolutional Neural Networks for Head Mounted Displays

Title: Near Infra-Red Ray based Eye-tracking Method using Convolutional Neural Networks for Head Mounted Displays

Author(s): Junghun Kim

DGIST Authors: Cho, Sunghyun ; Park, Sanghyun ; Kim, Junghun

Advisor: 조성현

Co-Advisor(s): Sanghyun Park

Issued Date: 2019

Awarded Date: 2019-02

Type: Thesis

Abstract: The eye tracking is determined by the accuracy of pupil center detection. At this time, if you can clearly capture the pupil, you can easily find the center of the pu-pil. However, in most eye-tracking environments, there are many obstacles to the image of the eye.
The Eye Tracking Module used in the Head Mount Display (HMD) does not cover the field of view, so I shoot at a limited angle. A limited angle in eye photography can cause many obstacles to detecting the pupil. Shadowing by eyelid or eyelashes, reflections by glasses or contact lenses, shadow by hardware, and objects that can be mistaken for pupil boundaries create an environment where conventional pupil tracking algorithMaster can not work well.
We have solved this problem by converting it to a semantic segmentation task us-ing the Convolutional Neural Network (CNN). At this time, it is necessary to con-sider the special feature that the pupil center point detection must operate in real time. Considering the trade-off between accuracy and speed, we propose a seman-tic segmentation model of ShuffleNetV2 that is suitable for pupil center-point de-tection. ShuffleNetV2 uses semantic segmentation to detect the pupil center point at the level applicable to the real environment (average pixel-distance error = 7.1, average frame per second = 56).
|본 논문은 실제 Head Mount Display(HMD)를 사용하는 환경에서 시선 추적을 할 때 발생하는 동공 추적에서의 문제점을 Convolutional Neural Network를 활용한 semantic segmentation task로 전환하여 풀어내는 방법을 제안한다.
시선 추적은 동공 중심점 검출의 정확도에 의해 결정된다. 이 때, 동공이 선명하게 촬영될 수 있다면 동공의 중심을 쉽게 찾을 수 있다. 그러나 대부분의 시선 추적 환경에서는 동공을 촬영한 이미지에는 많은 방해요소가 있다.
HMD에 사용 된 시선 추적 모듈은 시야를 가리지 않아야 하므로 제한된 각도에서 동공을 촬영할 수 밖에 없다. 제한된 각도에서의 동공 촬영은 동공을 검출하는 데 많은 장애를 일으킬 수 있다. 눈꺼풀이나 속눈썹에 의한 가림, 안경이나 콘택트 렌즈에 의한 반사, 하드웨어에 의한 그림자, 동공의 경계로 착각할 수 있는 안경테와 같은 물체는 기존의 시선 추적 알고리즘이 잘 작동하지 않는 환경을 조성한다.
우리는 CNN을 사용하여 이를 semantic segmentation task로 변환함으로써 이 문제를 해결한다. 이때 동공 중심점 검출이 실시간으로 작동해야하는 특수 기능을 고려해야한다. 우리는 정확도와 속도 사이의 균형을 고려하여, 동공 중심점 탐지에 적합한 ShuffleNetV2의 semantic segmentation 모델을 제안한다. ShuffleNetV2는 semantic segmentation 을 사용하여 HMD를 사용하는 실제 환경에서 시선 추적에 적용 가능한 수준으로 (평균 픽셀 거리 오차 = 7.1, 초당 평균 프레임 수 = 56 )에서 동공 중심점을 검출합니다.

Table Of Contents: Ⅰ. INTRODUCTION 1
Ⅱ. BACKGROUND 3
2.1. Head Mounted Display (HMD) 3
2.2. Gaze Estimation 4
2.3. Dark Pupil Technique 5
2.4. Convolutional Neural Network (CNN) 6
Ⅲ. RELATED WORKS 8
3.1. Starburst 8
3.2. ExCuSe 10
3.3. Semantic Segmentation 11
3.3.1. Fully Convolutional Networks 12
3.3.2. U-Net 13
3.4. Real-time semantic segmentation 15
3.4.1. MobileNet 15
3.4.2. ShuffleNet 17
Ⅳ. METHODS 19
4.1. Pupil Dataset 19
4.1.1. Existing pupil dataset 19
4.1.2. Our pupil dataset 20
4.2. Pre-trained Model 21
4.3. ShuffleNetV2 22
4.3.1. Evaluation guideline 22
4.3.2. Designs for good network 23
4.4. Segmentation Network 26
Ⅴ. Experiments 28
5.1. Dataset 28
5.2. Experiment Settings 28
5.3. Comparisons 29
Ⅵ. Conclusion 32