Only a few among the hundreds of sign languages in the world have attained legal recognition ; the remainder hold no official status. In France, Law No2005-102 5 for the equal rights, opportunities, participation and citizenship of disabled persons has officially recognised French Sign Language (lsf) since 2005.
sl, because they have no writing system, are eminently oral. This all the more strongly differentiates them from spoken languages : sl are gestural rather than vocal, and with no writing system are uniquely oral. They are thus spoken languages without recourse to forms of transmission or teaching other than the immediate face-to-face, or with a delay through video.
One of the main reasons for the difficulty of creating or borrowing a written form for use in sl lies in their modality : sl exist within space and time, through gestures, postures, mimicry and gaze, all meaningful and potentially simultaneous. This languages’ mode of expression is therefore multilinear and multidimensional. By contrast, the ensemble of human writing follows from the (mono)linearity of spoken languages.
Currently, no graphics technology makes it possible to provide sl with the primary and daily functions of writing (e.g. recording, note-taking on the fly, linear reading), even if video is sometimes used to fill some of these roles [BRUGEILLE 2006]. The only form of writing available to the deaf is generally that of the spl of the country where they live. But the majority of the profoundly deaf do not read and write well enough to access a high level of education and training, to access a writing-based means of communication, or to even assume their role as citizens. These facts hinder their professional and personal development.
The fact that sl are unwritten languages, and that access to writing is difficult for the vast majority of the profoundly deaf, results in very incomplete access to information in cyberspace. A few rare sites offer a translation of written text with video in SL, but this remains a very limited phenomenon, with few updates.
5 http://www.legifrance.gouv.fr/affichTexte.do cidTexte=JORFTEXTAnnelies Braffort & Patrice Dalle This need has generated an interest among researchers in Sign Language Processing (slp), which would include elements of recognition, generation and machine translation. This research relies on corpus analysis, principally video.
The following section takes inventory of the available sl corpora. The two sections thereafter cover recognition and generation, and the last section addresses existing or future applications [DALLE 2007].
CORPUS : ON LESS-RESOURCED LANGUAGES sl are less-resourced languages. Indeed, they have access to too few of the resources, and in some cases none at all, that are commonly available to other languages. Specifically, this includes a writing system, reference books describing a language’s operation (grammars, dictionaries), mass publishing and distribution (books, press, cultural works), technical and learning books (technical publications, scientific, educational), communication media of everyday life (letters, instruction manuals), as well as computer applications in that language. Similarly, the corpora, which are the only way to establish and maintain a permanent patrimonial record in sign language, are few and small-scale.
Some sign languages have available reference works, cultural works on media like dvds, and a few tv shows that are interpreted in sl as a locket embedded in the screen, but it is still very limited, and most shows lack even these limited resources. Research suffers in this situation ; works detailing the operation of sl are rare and quite limited, corpora are rare and too small, existing slp systems are only prototypes in laboratories, and are rarely generalisable or even reusable.
Technological advances in video capture, storage and handling have recently (in the XXIst century) enabled the initiation of several projects to create sl corpora, even multilingual corpora, in different countries (Figure 1). Workshops associated with conferences, or funded by national projects, have enabled the scientific community to share experiences on the creation, annotation, analysis and archiving of video corpora 6. Some recommendations provided by the most experienced researchers are 6 http://www.ru.nl/slcn ; http://www.sign-lang.uni-hamburg.de/lrec2008/programme.html Annelies Braffort & Patrice Dalle beginning to be included in different projects [JOHNSTON 2008]. However, it is too early to define standards and norms.
Figure 1 : Extract from the LSF corpus of the European project Dicta-Sign The following table lists some sl corpora that already exist or are under development. There are many others, but they rarely exceed a few dozen hours or speakers. This illustrates how recently most countries have begun to finance the development and production of large corpora. The funds themselves are short-term, which often requires large projects to be conducted as a sequence of smaller projects, with the notable exception of Germany, which has funding for fifteen consecutive years. Its corpus, which will be the largest in the world, will comprise only 400 hours of video, far from even approaching the size of the written or audio corpora currently available.
Country SL Name of corpus Size Status Australia Auslan Auslan Corpus 300 hours Completed Variously financed since 1990 100 speakers http://www.auslan.org.au Great BSL BSL Corpus 249 speakers Completed Birtain Financed for 3 years : 2008-http://www.bslcorpusproject.org France LSF Corpus Creagest 130 hours In process Financed for 5 years : 2007-2011 125 speakers http://www.umr7023.cnrs.fr/-Realisation-de-corpus-de-donnes-.html Allemagne DGS DGS Corpus 400 hours In process Financed for 15 years : 2009-2023 300 speakers http://www.sign-lang.uni-hamburg.
de/dgs-korpus 7 http://www.dictasign.eu Annelies Braffort & Patrice Dalle The formation of significantly large and varied corpora (lexicon, monologue, dialogue, group discussion), that are both archived and accessible, is one of the most important mechanisms both for archiving the heritage of these less-resourced languages, and also for enabling further research on slp that requires, as with all languages, the analysis of large corpora.
ANALYSIS AND RECOGNITION The ultimate goal of video analysis in sl [ONG 2005] is to understand automatically the meaning of an utterance, to translate this utterance into another language, or to perform an action, such as a request to a database, or a search for information in a document in sl. In general, this task remains out of reach for computer programs. This leads to limited areas of processing (reduced lexical and semantic fields), imposed constraints on expression and its context, or targeting only semi-automatic programs that assist a human operator. However, more accessible intermediate stages of processing already enable the production of applications.
Where do these difficulties come from, and what can explain the performance gap with nlp or speech processing First, the context of this research. As mentioned in the introduction, sl have only recently achieved the status of a language, and in keeping research on sl is also recent. Furthermore, the linguistic models, which could reinforce the computer models, are not yet stable. There exist persistently few researchers studying image analysis with sl as the research topic and not a single applicable framework. In addition, almost none of these researchers understand sign language, the very object of their study.
Finally, we know that the tools of speech recognition (vocal language) progressed strongly when they were able to integrate statistical approaches built on large corpora. Due to the lack of data, it is not possible at present to follow the same approach in the case of sl [COOPER 2009].
Another difficulty lies in the nature of the video signal, which is extremely complex to analyse. Different body elements are brought into play to perform in conjunction ; their analysis must occur in very different spatial and temporal scales (e.g. simultaneously estimating a fleeting change in gaze and a repetitive swaying of the body), all resulting from a projection onto a plane of 3D postures and movements, which engenders a substantial loss of three-dimensional information and introduces numerous occlusions.
Annelies Braffort & Patrice Dalle This spatial and multi-component character makes the use of analysis or speech recognition tools, developed for linear and mono-source spoken languages, unpredictable when applied to sl [DALLE 2006].
The processing chain generally presents in two main steps :
– analysis, that is, detecting (by identifying and tracking in every video frame) the characteristics of the relevant bodily elements and estimating their parameters ; and – recognition, that is, performing a temporal segmentation into units and identifying them by assigning them to a class. These linguistic units have different levels of granularity.
The recognition step is preceded by a learning phase, based on examples and prior knowledge (grammatical rules), which enables an estimation of the parameters of these classes in the framework of a given sl linguistic model.
Early work focused on the recognition of the dactylological alphabet, that is, using manual configuration to realise the alphabet of the written language. So, not really sl, but rather recognising hand configurations, either isolated (single letters), or chained (to spell a word).
The next step aimed at recognising isolated signs, most often based off of four manual parameters : hand shape, orientation, movement and location of the hands in relation to the body ; the given sign involving either one or both hands. Some works report recognising hundreds of signs, with a recognition rate of over 90 %. In reality, performance varies greatly, according to the nature of the data and whether they were acquired using 2D and conventional cameras, 3D featuring stereovision devices, or multi-camera motion capture systems with face and body markers.
Generally, these latter systems are used only for learning models (body geometry, movement dynamics, sign signatures), while recognition is realised on 2D videos.
The transition to a continuous signed production analysis is indirect ; a statement in sl is not simply a sequence of isolated signs concatenated together. It involves not only the hand signals, but also the postures and movements of the trunk, head and facial expression. The direction of the gaze also plays an important grammatical role. Yet all these elements are Annelies Braffort & Patrice Dalle difficult to detect and characterise. Lastly, the signer 8 uses the space in front of her (the signing space) to support and structure her discourse.
Signs, then, are located in this space, and many pointing and referencing operations are observed. Here also, the loss of 3D information and the great variability in possible ways to use this space make it difficult to characterise and model its exploitation by the signer [LENSEIGNE 2005].
Furthermore, processing a continuous production reveals the problems of transitions between signs and coarticulations [SEGOUAT 2010].
The analysis of statements in sl can target varying objectives :
• Corpus annotation : This assists the annotator by enriching the signal (3D information reconstruction), by automatically performing certain measurements (gesture dynamics, characterization of facial expressions, etc.), or by detecting specific events (hand contact, particular areas of the face, etc.).
• Identification and demonstration of syntactical structures : In particular the exploitation of the signing space (to identify instances of pointing and locate their target.), and to structure the utterance.
• Sign recognition : The search for a sign within a continuous stream of SL imply using – standardisation methods, to overcome the variations in aspect and scale between signers ;
– time alignment methods, because signing can be made at different speeds ;
– characterization methods, to extract a sign’s intrinsic properties rather than the variability introduced by each speaker ;
– comparison methods [ALON 2009].
The most common methods are based on Hidden Markov Models and their variants (coupled, paralleled), attempting to take into account the parallelism of signs and their synchronisation and spatialisation. However, for these methods to be successful, they must be based on decomposing a sign into smaller units of a phonetic nature, whose relevance, definition and detection are still problematic [THEODORAKIS 2010].
• Sentence comprehension : The results in this area remain modest [JUNG-BAE 2002] : finding and explaining the grammatical characteristics 8 The one who use Sign Language.
Annelies Braffort & Patrice Dalle of a sentence, or attempting to translate it into an spl are not simple tasks. The order of signs is not the same as the order of words in a spoken language, and there is not always a systematic sign-word correspondence. On the other hand, the signer has the option to choose between two forms of expression : an illustrative form (“showing while saying”, utilising structures that exploit iconicity) and a nonillustrative form (using standard signs). The illustrative form calls on perceptual-practical experience, and the extent to which a machine can interpret is debatable.
GENERATION TO ANIMATION sl generation software, coupled with the analysis one presented before, is conceived to enable a complete and bidirectional access to information, whether for the expression and understanding in sl. Moreover this is a response to the difficulty the vast majority deaf adults have in mastering writing. Its potential applications are manifold : web accessibility, sl sub-titling, educational software using sl, and so on.
The process consists of two steps : the generation of the utterance from a linguistic point of view, followed by the generation of the signal received, in the form of an animated virtual character called “virtual signer”. No such sl-generating software using these two steps yet exists, but research in this area is undergoing very active development.
To generate an utterance, there are three main approaches :
• Generating utterance by concatenation. This method is used if the utterances are known in advance, are of a finite number, and contain variable parts, typically, information messages or alerts in public places.
Материалы этого сайта размещены для ознакомления, все права принадлежат их авторам.
Если Вы не согласны с тем, что Ваш материал размещён на этом сайте, пожалуйста, напишите нам, мы в течении 1-2 рабочих дней удалим его.