PhD. Thesis

Automatic Sign Language Recognition Inspired by Human Sign Perception

G.A. ten Holt
PhD. Thesis, Delft University of Technology, 2010

The thesis can be found in the TU Delft Repository

Abstract

Automatic sign language recognition is a relatively new field of research (since ca. 1990). Its objectives are to automatically analyze sign language utterances. Until recently, research has mostly been focused on direct issues such as capturing signs, extracting interesting information, and classification techniques. However, other issues are important as well: which elements of a sign are important? Are all elements equally informative, and if not, which should we focus on? In this thesis, we investigate these issues and attempt to use the results to improve automatic sign language recognition.

First, we explore the way human signers recognize sign language (part I of the thesis). In part II, we investigate if and how the results can be used to improve an automatic recognition system. The results show that this is a complex matter. Simple improvements are possible (e.g. determining which sign characteristics to use), but more complex results pertaining to human sign recognition strategies are not easily transferred to the automatic recognizer, possibly because of the differences in system architecture between human signers and the automatic system. Further research is needed to fully explore the possibilities of using human sign perception to improve automatic sign language recognition.

Summary

Automatic sign language recognition is a relatively new field of research (since ca. 1990). Its objectives are to automatically analyze sign language utterances. There are several issues within the research area that merit investigation: how to capture the utterances (cameras, magnetic sensors, instrumented gloves), how to extract interesting information from the captured data, and how to classify signs or sentences automatically using the extracted information. These issues are of an immediate and basic nature, and must be solved before any automatic recognition of sign language can be achieved. But other issues, pertaining to the nature of sign language and human recognition, are no less interesting: which elements of a sign are important for the meaning of an utterance? How do consecutive signs influence one another? Why are certain types of variation unimportant while others change the meaning of the sign? Automatic sign language recognition has, until recently, mostly focused on the first set of issues. In this thesis, we attempt to integrate knowledge about sign languages and human sign recognition into the automatic sign recognition process. Research on the (psycho)linguistics of sign languages is itself quite young (since ca. 1960), and many questions as yet unanswered. For this reason, we conduct our own studies of human sign language recognition. The knowledge gained from these experiments is applied in an existing automatic sign language recognition system.

The thesis is divided into two parts: the first part describes the experiments conducted with human signers, the second part describes experiments investigating the possibilities of integrating such knowledge in the automatic recognizer. This
recognizer is meant to be used in an interactive environment for young children to practice sign language vocabulary. For this reason, it is vision-based (which is unobtrusive), and only handles isolated signs.

The experiments in part I of the thesis investigate the information content of various sign elements: fragments of a sign in time (chapter 2), and the sign aspects handshape and hand orientation (chapter 3). In time, the central phase of a sign is the most informative one, equally informative to the entire sign. Recognition based on other phases is also possible to a certain extent, and the transition from the preparation phase to the central phase appears to be a salient moment. As for the aspects, the aspect handshape proves more useful for recognition than hand orientation. Chapter 4 gives an overview of the human recognition research and discusses possibilities for application.

In part II, the possibilities of utilizing the results of part I in the recognition system are investigated. Chapter 5 describes the addition of the handshape feature to the system (which chapter 3 showed to be the most interesting feature to add). Adding handshape gives a small improvement in the recognition performance. In chapter 6, the salience of the sign fragments used in chapter 2 for the automatic recognizer is investigated. The central phase proves to be the most informative one, as it was for human signers. Chapter 7 describes experiments in which a small set of frames is used to represent a sign. The results show a deterioration in recognition performance. Strict demands on the correctness of the remaining frames are probably partly responsible for the performance decrease.

In conclusion, we can say that applying human knowledge in automatic sign language recognition is a complex task. Conclusions about human sign recognition do not necessarily hold for the automatic recognizer as well. The most important obstacles for utilizing information successfully seem to be: 1) data acquisition: computer vision is not as accomplished as human observers in capturing the complex, dynamic hand and face motions that form sign language. This means that information that is present in a sign movement for a human being may not be (correctly) observed by an automatic vision analysis system. Thus, the data that humans work with is not necessarily identical to the data the recognizer works with, and this may cause techniques that are successful for human signers to fail in the automatic system. And 2) differences in basic system architecture. Research into human sign recognition is still ongoing, there is no clear model of human sign recognition yet. This makes it more difficult to translate observations from human sign recognition to the automatic recognizer: human signers may use techniques that are not compatible with the current architecture of the recognizer. For example: human signers may process aspects independently. If the recognition system processes all data as a single stream, then such a technique cannot be implemented. A more thorough understanding of human sign recognition, more sophisticated computer vision techniques, and a close co-operation between the fields of automatic sign language recognition and human sign perception, seems the best way to overcome these obstacles.

AttachmentSize
thesis.dvi810.37 KB