Introduction Automatic translation between human languages, also known as “machine translation” is of enormous social, political, and scientific importance. In the age of global and information based society real-time online translation on the Internet and other resources will support personal communication and information needs. There are many attempts to develop integrated translation software which can work in different scientific domains. There is already research on cross-lingual information retrieval, multilingual summarization, multilingual text generation Fourth International Conference I.TECH 2006 from databases, integration of translation with summarization, database mining, document retrieval, information extraction, etc. In machine translation research, there is much interest in exploring new techniques in neural networks, parallel processing, and particularly in corpus-based approaches: statistical text analysis (alignment, etc.), example-based machine translation, hybrid systems combining traditional linguistic rules and statistical methods, etc. [Hutchins J., 2002] Because of the importance of the automatic translation systems in the current development phase of the society, the project for Text-to-Text Machine Translation System (TTMT System)1 design and coding has started. In the present paper we present the TTMT System overview and hybrid architecture, based on different machine translation approaches. A digital script and method for communication2 is also used.
Machine Translation as a Subfield of Language Engineering and Natural Language Processing Language engineering may be defined as a discipline or act of engineering software systems that perform tasks involving processing human language [Cunningham H., 1999]. The language engineering not only collects the information and knowledge of a language among the linguistic society but also serves as a foundation on which linguistic culture and technologies can be based [Oh et. al., 1004]. It is the base for the development of natural language processing (NLP) systems. These systems use the NLP technologies, which combine algorithms and methods from artificial intelligence and linguistics. They are designed to solve the problems of automated generation and understanding of natural human languages.
Machine translation (MT) is one of the major tasks in Natural Language Processing. It investigates the use of computer software to translate text or speech in between natural languages. It consists of two major parts:
decoding the meaning of the source text, and re-encoding this meaning in the target language. It is based on computational linguistics and modeling of natural languages from a computational perspective. The knowledge about the syntax, morphology, semantics and pragmatics of languages is needed. Therefore the translation process can be defined as a complex task.
Nowadays different approaches to machine translation exist. They can be divided in four main groups as follows:
1. Dictionary-based machine translation, which is based on dictionary entries (the translation is done word by word, without much correlation of meaning between them). For more information on dictionary-based machine translation, see [Ballesteros L., Croft W.B., 1996].
2. Statistical machine translation, which try to capture regularities of natural language using probability distributions of linguistic events, such as the appearance of words within a context, sentences, or whole documents. For more information about statistical language modeling and statistical machine translation see [Brown P., et. al., 1990; Casacuberta F., 1996].
3. Example-based machine translation, which is essentially a translation by analogy and can be viewed as an implementation of case-based reasoning approach of machine learning. For more information about example-based machine translation, see [Brown R., 1996].
The Project is currently funding by Gluon Technologies Ltd., Varna, Bulgaria, URL: http://www.gluontechnologies.com Mitev K., BG Patent 63704 B1, Digital Script and Method for Communication on Moder Tongue Knowledge Engineering 4. Rule-based machine translation. They create an intermediary symbolic representation, from which the text in the target language is generated. The approach is also known as interlingual machine translation, or transfer-based machine translation. These methods require extensive lexicons with morphological, syntatic, and semantic information, and large sets of rules.
The designed TTMT System uses a hybrid approach to machine translation, which contains the four mentioned above.
Text-to-Text Machine Translation System (TTMT System) Overview The TTMT System is based on an in-depth analysis of the human language. The human speech can be defined by means of the digits from decimal system regardless of the speaker's mother tongue. The digital script is developed. It contains combinations of numbers from 0 to 9, which a computer device can store in its memory, and to translate finished texts from a random language into any other.
The digital script is applicable in all the world languages and dialects. Thus, the ten numbers can be turned into a universal tool for communication. The unique sequence of operations integrated in the software design to allow real time translation of a finished thought, sentence by sentence, in the respective word order. The operations are common for the phonetic speech and the speech in-writing. It allows communication in the mother tongue: voice – to voice; text – to text; voice – to text and vise-versa.
The TTMT System is designed to solve problems of one of the main tasks in the field of natural language processing – text-to-text machine translation. The methodology is developed for four base languages – English, Russian, German and Bulgarian. The translation can be done from every language to every language from those four languages. TTMT System design allow easy to include new languages. The flexible architecture of the systems makes it easy to turn into universal translator. Many of the benefits of improvement to the system flow automatically to outputs in all the languages. The system can take ready made sub-systems such as black boxes or as open source software and incorporate them to it.
The approach is simple. The whole process is divided into different tasks which are solved independently by system modules.
TTMT System Architecture The architecture of TTMT system is highly modular. The complex problem of MT has been broken into smaller sub-problems. Every sub-problem is a task which is handled by an independent module. The modules are put together in a united system. The output of the previous module becomes the input of the following module.
Because each module has a specific task, the complexity remains under control.
The modules are put together in the three main parts of the system: a front-end, middle and back-end. The front-end takes input in the form of text from Internet and other electronic sources. It includes also the user interface, for adjusting the text part parameters. The output is a used to fill in the database. For different languages different databases exist.
The middle part of the TTMT System is built from the languages databases, dictionary management subsystem, language analyzer and model generation tools, language alignment sub-system, validation and verification tools.
Fourth International Conference I.TECH 2006 The language database consists of two parts: dictionary and metadata. The dictionaries can be represented as monolingual data banks, covering the data in a specific domain, and multilingual dictionaries for different domains (medical science, law, electrical and electronic engineering, technical and computer-science domain, every-day language, etc. The available dictionaries are semantically oriented using ontology-based lexicon and text part parameters, which represent the metadata used in the system.
The data in the databases are coded digitally using the digital script. The syntax, morphology, semantics and pragmatics are also taken under consideration by coding.
Electronic Resources Tools for Loading Data into the Data Base Data Base Language Management Analysis Monolingual Sub-system Sub-system Data Base Language Verification Model Monolingual and Validation Generation Knowledge Base Language One Multilingual Analysis Multilingual Model Generation Multilingual Data Multilingual Verification and Validation Base Knowledge Base Front-End Sub-system Back-End Sub-system Fig.1. TTMT System Architecture Language …..
Language Two Knowledge Engineering The dictionary management sub-system is used to provide customizable information and managerial functions for text data bases, and to provide the environment for dictionary development and management and merging existing dictionaries. It contains tools for extraction, indexing and searching the data. Because of the big size of each text to be stored and lots of keywords to be indexed and searched for each text, it requires special storing and managing mechanisms. This is also the need from the dictionary management sub-system.
For every language a language analysis package is designed. The language model is a result from that analysis. Morphological Analyzer is an important part of the analysis. It is extended to cover special symbols, abbreviated words, spell errors, etc.
Language alignment sub-system gathers the correspondences between representations of different languages.
Validation and verification tools are used to estimate the result, to correct the errors and learn the system to avoid this error in the future work.
The back-end outputs the synthesized translated text.
The TTMT System architecture is presented in Fig.1.
Conclusion The developed methodology is a novel approach to natural language engineering and natural language translation. It makes possible to build the Universal Machine Translation System and to implement it in modern telecommunication systems. Future work is to include voice recognition and voice generation sub-system, and to produce speech-to-text, text-to-speech, and speech-to-speech automatic translation Bibliography [Hutchins J., 2002] Hutchins J., Machine translation today and tomorrow, In Computerlinguistik: was geht, was kommt Festschrift fr Winfried Lenders, hrsg. Gerd Wille, Bernhard Schrder, Hans-Christian Schmitz. Sankt Augustin: Gardez! Verlag, 2002, pp.159-162.
[Cunningham H., 1999] Cunningham H., A Definition and Short History of Language Engineering, Journal of Natural Language Engineering, 5(1):1-16, [Oh et. al., 1004] Oh, Gil-Rok, Choi, Key-Sun, and Park, Se-Young, Hangul Engineering, Seoul, Korea: Daeyoungsa, [Brown P., et. al., 1990] Brown P.F., Cocke J., Della Pietra S.S., Della Pietra V.J., Jelinek F., Lafferty J.D., Mercer R.L., Roosin P.S., A statistical approach to machine translation, Computational Linguistics, 16(2):79-85, [Casacuberta F., 1996] Casacuberta F., Growth transformations for probabilistic functions of stochastic grammars, International Journal of Pattern Recognition and Artificial Intelligence, 10(3):183-201, [Ballesteros L., Croft W.B., 1996] Ballesteros L., Croft W.B., Dictionary-based methods for cross-lingual information retrieval, In Proceedings of the 7th International DEXA Conference on Database and Expert Systems Applications, pp.791-801, [Brown R., 1996] Brown R., Example-Based Machine Translation in the Pangloss System, In Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), pp.169-174, Copenhagen, Denmark, Authors’ Information Todorka Kovacheva – Economical University of Varna, Kniaz Boris Str, Varna, Bulgaria, e-mail: firstname.lastname@example.org, phone: +Koycho Mitev – e-mail: email@example.com, phone: +Nikolay Dimitrov – Varna, Bulgaria, phone: +Fourth International Conference I.TECH 2006 Information Models ON THE COHERENCE BETWEEN CONTINUOUS PROBABILITY AND POSSIBILITY MEASURES Elena Castieira, Susana Cubillo, Enric Trillas Abstract: The purpose of this paper is to study possibility and probability measures in continuous universes, taking different line to the one proposed and dealt with by other authors. We study the coherence between the probability measure and the possibility measure determined by a function that is both a possibility density and distribution function. For this purpose, we first examine functions that satisfy this condition and then we anlyse the coherence in some notable probability distributions cases.
ACM Classification Keywords: I.2.3 Artificial Intelligence: Deduction and Theorem Proving (Uncertainty, “fuzzy” and probabilistic reasoning); I.2.4 Artificial Intelligence: Knowledge Representation Formalisms and Methods (Predicate logic, Representation languages).
Introduction Possibility distributions are, sometimes, good means for representing incomplete crisp information. It is precisely this incompleteness that often makes it impossible to determine a probability that could describe this information.
Now, if the possibility distribution meets certain requirements, for example, it is either a density function or its graph "encloses" a finite area, it will always be possible to consider either the probability whose density function is this possibility distribution or an associated density function.
Материалы этого сайта размещены для ознакомления, все права принадлежат их авторам.
Если Вы не согласны с тем, что Ваш материал размещён на этом сайте, пожалуйста, напишите нам, мы в течении 1-2 рабочих дней удалим его.