Multimodal interface keeps evolving in order to better represent people’s intention. A gesture as a type of the multimodal interface is one of the effective ways for people to express their intention. Specially, hand gesture recognition provides an eidetic and convenient way of human-machine interaction (HMI).
In this thesis, we investigate the problems of dynamic hand gesture recognition and develop a Korean sign language (KSL) recognition system which can help many hearing and speech-impaired people communi-cate with the public.
To recognize sign language, the system should first determine the shape of the hand and the movement of the arm. Since sign language consists of a sequence of movements, it is difficult to distinguish a certain gesture from gestures (movements). To address this problem, the recognition system has to know the beginning and end of the gesture. To get the starting and ending points, we have defined the basic posture. The sign language also has various lengths of gestures. It is effective to make the fixed length input data (gestures) rather than predefine the length of each gesture for recognition.
Many attempts to study the hand gesture recognition commonly use various types of sensors such as cameras, electromyograms (EMG), glove sensors, and inertial measurement units (IMU). Inconvenience caused by their weight, the shapes uncomfortable to wear, and cumbersome calibration processes might decrease the usability of them. Wearable devices like smart watches and armbands can solve this problem. Furthermore, in order to improve recognition accuracy, the effective way is to exploit multiple heterogeneous sensors (both an EMG sensor and an IMU sensor) which can produce the redundant information to the same physical variable. It is necessary to pre-process before classification since it is important to classify the gesture using the values extracted from the sensor. We evaluated the performance of two different methods, min-max and z-score nor-malization.
Specially, we focus on the fact that EMG signals depends on physical features of people because the amount of muscle and the thickness of the fat layer are different for each person. Unfortunately, in the traditional recognition technique not to consider human physical features, since a single model is applied to all users, it does not guarantee the performance in terms of accuracy. To address these issues, we create group-dependent Neural Network (NN) models based on a sensor fusion technology. Our approach on group-dependent NN models is to separate the models so that people can use different models. People are experimentally divided into several groups according to persons’ data with similarity in body features after learning. We proved that the physical similarity exists in our created models.
Finally, We compare our model with models of Artificial neural networks (ANNs) including convolution neural networks (CNNs) and long short-term memory (LSTM) since the performance of those is high in the classification. The experimental results show that the proposed method has high accuracy (99.13% of CNN without dropout and 98.1% of CNN with dropout). ⓒ 2017 DGIST
Table Of Contents
I. Introduction 1--
II. Background 5--
2.1 Hand Gesture Recognition 5--
2.2 Sensors for Hand Gesture 6--
III. Methodology 8--
3.1 Feature Extraction 8--
3.2 Preprocessing and Acquisition 11--
3.3 Creation of Architectures Using Neural Networks 13--