Posteado por: aintza | Junio 19, 2008

Example of Translation with MT Sytem Translator

         MT systems offer the possibility of translating a text or a fragment (no more than 1000 characters) for free. However, if we focus our attention in the translation we would realise that there are several mistakes.
        Those mistakes are related to grammar mistakes, word order, wrong words or even whole sentences, change in the meaning, etc.
I have chosen a fragment about Global Warming taken from wikipedia and I’ve used Translendium to translate the text:

“Global warming is the increase in the average measured temperature of the Earth’s near-surface air and oceans since the mid-twentieth century, and its projected continuation.
Although the term has become popular with the public and media, according to the National Academy of Sciences, “the phrase ‘climate change’ is growing in preferred use to ‘global warming’ because it helps convey that there are [other] changes in addition to rising temperatures.” Climate change refers to any significant change in measures of climate (such as temperature, precipitation, or wind) lasting for an extended period (decades or longer). Global warming is an average increase in the temperature of the atmosphere near the Earth’s surface and in the troposphere, which can contribute to changes in global climate patterns. Global warming and climate change can be caused by a variety of factors, both natural and human-induced.[1]
Increasing global temperature is expected to cause sea level to rise, an increase in the intensity of extreme weather events, and significant changes to the amount and pattern of precipitation. Other expected effects of global warming include changes in agricultural yields, modifications of trade routes, glacier retreat, species extinctions and increases in the ranges of disease vectors.
Remaining scientific uncertainties include the amount of warming expected in the future, and how warming and related changes will vary from region to region around the globe. Most national governments have signed and ratified the Kyoto Protocol aimed at reducing greenhouse gas emissions, but there is ongoing political and public debate worldwide regarding what, if any, action should be taken to reduce or reverse future warming or to adapt to its expected consequences.”

El calentamiento global es el aumento en la temperatura medida media del aire de cerca superficie de la Tierra y los océanos desde el siglo de mid-twentieth, y su continuación proyectada.
Aunque el plazo|término se ha vuelto popular con el público y medios de comunicación, según el Nacional Academy of Sciences , “la frase|locución ‘cambio climático’ está aumentando en el uso preferido” a ‘calentamiento global’ porque ayuda a transmitir eso hay [otro] cambios además de temperaturas crecientes. ” El cambio climático se refiere a cualquier cambio significativo en medidas de clima (como temperatura, precipitación, o viento) que dura para|por un periodo extendido (décadas o más mucho tiempo). El calentamiento global es un aumento medio en la temperatura del ambiente/atmósfera cerca de la superficie de la Tierra y en la troposfera, que puede contribuir a cambios en patrones de clima globales. El calentamiento global y cambio climático pueden ser causados por una variedad de factores, los dos natural e induced.[1 humano]
Se espera que la temperatura global creciente cause nivel del nivel del mar para levantarse/subir, un aumento en la intensidad de acontecimientos de tiempo extremos, y cambios significativos para la cantidad y patrón de precipitación. Otros efectos esperados de calentamiento global incluyen cambios en rendimientos agrícolas, modificaciones de rutas comerciales, retirada de glaciar, extinciones de especie y aumentos en las series de vectores de enfermedad.
Las incertidumbres científicas restantes incluyen la cantidad de apreciar/calentar esperada en el futuro, y cómo apreciando/calentando y siendo relatado variarán los cambios de región a región alrededor del globo. Han firmado la mayor parte de gobiernos nacionales y ratificado el Protocolo de Kyoto apuntado a emisiones de gas de efecto invernadero que reducen, pero hay debate político y público en curso mundialmente en cuanto a qué, si hay, se debería considerar acción que redujera o invirtiera en futuro calentar o se adaptara a sus consecuencias esperadas.
 
       As we can see, there are some problems related to the word order; medida media should be “media medida”; más mucho is an inpossible order in Spanish, instead the correct order is “mucho más”. In these examples we have only two words in the wrong place, but there are also problems with whole sentences; for example, “Han firmado la mayor parte de gobiernos nacionales” should be “la mayor parte de gobiernos nacionales han firmado…”; “a emisiones de gas de efecto invernadero que reducen” in the context has no sense, instead if we change the order we could understand the meaning “apuntado a reducir las emisiones de gases que producen el efecto invernadero”.
       On the other hand, we can also find problems articles, that in some cases are omitted: “de cerca de superficie” needs an article “de cerca de la superficie”.

       We can also find some words that are not translated: mid-twentieth or induced.
       And, finally, wrong sense of the word in the sentence: “proyectada” in this case would be better “presvista o planeada”. There are also some cases were we are given two options, but one of them does not suit because of the meaning of the sentece: “plazo/término” in this case should be “término”,”ambiente/atmósfera” better “atmósfera”, etc.

 

Sources:


* Translendium. Retrieved 16:05, 19 June, 2008 from http://www.translendium.net:8080/home/text.do
*Global warming. (2008, June 19). In Wikipedia, The Free Encyclopedia. Retrieved 19:50, June 19, 2008, from http://en.wikipedia.org/w/index.php?title=Global_warming&oldid=220411732

 

Posteado por: aintza | Abril 21, 2008

Explanation of the Following Concepts (Q3)

     Machine Translation, sometimes referred to by the abbreviation of MT, is a sub-field of comptutational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies.

Current machine translation software often allows for customization by domain or proffesion (such as weather reports) improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic languages is used. It follows then that the machine of governement and legal documents more readily produces usable output than conversation or less standardized text.

     Computer-assisted translation, computer-aided translation or CAT, is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process. Computer-assisted translation is a broad and imprecise term cobering range of tools, from the fairly simple to the more complicated. These can include: spell checkers, grammar checkers, terminology manager dictionaries on CD-rom, terminology databases, full-text search tools, project management software, translation memory managers (TMM). Systems that are nearly automatic as in machine translation, but allow user decisions for ambiguous cases. These are called sometimes human-aided machine translation. Some advanced computer-assisted translation solutions include controlled machine translation (MT). Integration of MT into computer-assisted translation has been implemented in various ways by various parties. Although this type of technology is neither widely known nor available to individual translators, carefully customized user dictionaries based on correct terminology significantly improve the accuracy of MT, and as a result, they improve the efficiency of translation process.

     Content management or CM is a set of technologies that the evolutionary life cycle of digital information. This digital information is also referred to as content or, to be precise, digital content. Digital content may take the form of text, such as documents, multimedia files, such as audio or video files, or any other file type that follows a content life cycle which requires management. The digital life content consists of 6 primary phases: create, update, publish, translate, archive and retrieve. A critical aspect of content management is the ability to manage versions of content as it evolves. Authors and editors often need to restore older version of edited products due to a process failure or an undesirable series of edit.

     Translation technology covers all the systems that are used for the translation process. Translation is the action of interpretation of the meaning of a text, and subsequent production of an equivalent text, also called a translation, that communicates the same message in another language. The text to be translated is called the “source text”, and the language it is to be translated into is called the “target language”; the final product is sometimes called the “target text”. Translation must take into account constraints that include context, the rules of grammar of the two languages, their writing conventions, and their idioms. With the advent of computers, attempts have been made to computerize or otherwise automate the translation of natural-language texts (machine translation) or to use computers as an aid to translation (computer-assisted translation).

 

Sources:

*Machine translation. (2008, April 19). In Wikipedia, The Free Encyclopedia. Retrieved 14:47, April 21, 2008, from http://en.wikipedia.org/w/index.php?title=Machine_translation&oldid=206756422

*Computer-assisted translation. (2008, April 18). In Wikipedia, The Free Encyclopedia. Retrieved 14:48, April 21, 2008, from http://en.wikipedia.org/w/index.php?title=Computer-assisted_translation&oldid=206481669

*Content management. (2008, April 19). In Wikipedia, The Free Encyclopedia. Retrieved 14:49, April 21, 2008, from http://en.wikipedia.org/w/index.php?title=Content_management&oldid=206669339

*Translation. (2008, April 21). In Wikipedia, The Free Encyclopedia. Retrieved 14:49, April 21, 2008, from http://en.wikipedia.org/w/index.php?title=Translation&oldid=207111486 

 

 

 

     This article gathers and summarizes the characteristics of a translation task according to the FEMTI report. As this report points out, the characteristics of the translation task refers to the information flow intended for the output, from the point of view of the agent (human or otherwise) who receives the translation.

Assimilation: the ultimate purpose of the assimilation task (of which translation forms a part) is to monitor a (relatively) large volume of texts produced by people outside the organization, in (usually) several languages.

Dissemination: the ultimate purpose of dissemination is to deliver to others a translation of douments produced inside the organization.

Communication: the ultimate purpose of the communication task is to support multi-turn dialogues between people who speak different languages. The translation quality must be high enough for painless conversation, despite possible syntactically ill-formed input and idiosyncratic word and format usage.

 

Sources:

* Eduard Hovy, Margaret King and Andrei Popescu-Belis. (2003). FEMTI. Retrieved 15:23, April 19, 2008, from http://www.issco.unige.ch:8080/cocoon/femti/printable.html

Posteado por: aintza | Abril 13, 2008

Explanation of Some of the Topics: Speech Synthesis (Q2)

     Speech synthesis is an area in which research is being carried out by the Austrian Research Institute for Artificial Intelligence (OFAI).

     We can point out the explanation that Wikipedia gives about this topic. Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.

Synthesized speech can be created by concatenating pieces of recorded speech that are sotred in database. Systems differ in the size of the stored speech units. A synthesizer can incorporate a model of the vocal track and other human voice characteristics to create a completely “synthetic” voice output.

     Thierry Dutoit explains  that the ultimate goal of a text-to-speech (TTS) synthesizer is to read any text, whether it was directly introduced in the computer by an operator or scanned and submitted to an optical character recognition (OCR) system. Reading should be intelligible and natural.

In reference to the definition of text-to-speech systems Dutoit points out that specific talking machines termed as voice response systems produce artificial speech by simply concatenating isolated words or parts of sentences. They are, however, applicable only when a limited vocabulary is required and when the sentences to be pronounced share a very restricted structure, as is the case for the announcement of arrivals in train stations, for instance. In the context of TTS synthesis, it is impossible to record and store all the words of the focus language. It is thus more suitable to define text-to-speech as the production of speech by machines, by way of automatic phonetization of the sentences to utter.

 

Sources:

*Speech synthesis. (2008, April 10). In Wikipedia, The Free Encyclopedia. Retrieved 14:04, April 13, 2008, from http://en.wikipedia.org/w/index.php?title=Speech_synthesis&oldid=204741034

*Thierry Dutoit. “An Introduction to Text-to-Speech Synthesis”. Published in 1997, Springer. 285 pages. Retrieved 13:46, April 13, 2008 from http://books.google.com/books?hl=en&lr=&id=bTmWkXi1e90C&oi=fnd&pg=PR13&dq=an+introduction+to+text-to-speech+synthesis&ots=Ik9kl1XoPC&sig=dYqHTVQuexMyMU0OxUbOoNKSJg0

 

 

Posteado por: aintza | Abril 12, 2008

Explanation of Some of the Topics: Machine Learning (Q2)

      In this second article about the explanations of some of the topics that deal with Human Language Technologies, I’ll focus on another area where the Language Technology Group conducts research: Machine Learning.

   As an introduction we can mention what Wikipedia points out about this topic. Machine learning is a subfield of artificial intelligence that is concerned with the design and development of algorithms and techniques that in some way make the computers learn. As Martin Sewell mentions, in practice, this involves creating programs that optimize a performance criterion through the analysis of data. The major focus of machine learning research is to extract information from data automatically, by computational and statistical methods. Hence, machine learning is closely related not only to data mining and statistics, but also theoretical computer science. Among all the applications that machine learning has, wikipedia mentions the following: natural language processing, syntactic pattern recognition, search engines, medical diagnosis, bioinformatics and cheminformatics, detecting credit card fraud, stock market analysis, classifying DNA sequences, speech and handwriting recognition, object recognition in computer vision, game playing and robot locomotion.

       According to Ethem Alpaydin, in machine learning the approach is to collect a large collection of sample utterances from different people and learn to map these to words. Already, there are many successful applications of machine learning in various domains: there are comercially available systems for recognizing speech and handwriting.

Machine learning is not just a database problem; it is also a part of artificial intelligence as said above. To be intelligent, a system that is in a changing environment should have the ability to learn. Machine learning also helps us find solutions to man problems in vision, speech recognition, and robotics.

Sources:

*Martin Sewell. “Machine Learning”. (2007). Retrieved 22:45, April11, 2008, from http://www.machinelearning.net/machine-learning.pdf 

*Ethem Alpaydin. “Introduction to Machine Learning”. Published in 2004, MIT Press. 415 pages. Retrieved 23:12, April 11, 2008, from http://books.google.com/books?hl=en&lr=&id=1k0_-WroiqEC&oi=fnd&pg=PR13&dq=introduction+to+machine+learning&ots=p8XJTOgLyQ&sig=o-HYojpOGv2AVM5voMAe0WHibjw

*Machine learning. (2008, April 1). In Wikipedia, The Free Encyclopedia. Retrieved 14:01, April 12, 2008, from http://en.wikipedia.org/w/index.php?title=Machine_learning&oldid=202557440

Posteado por: aintza | Abril 12, 2008

Explanation of Some of the Topics: Text Mining (Q2)

     In this article I’m going to develop the concept of “Text mining” on which the Language Technology Group conducts research.

     The definition that Wikipedia gives about Text Mining is useful as a first step on understanding what does that concept refer to. Text mining refers generally to the process of deriving high quality information from text. High quality information is derived through the dividing of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 

     We can point out another definition given by Marti Hearst in his article “What is Text Mining?” . He refers to this term as the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. A key element is the linking together of the extracted information together to form new facts or new hypothesis to be explored further by more conventional means of experimentation.

In text minig, the goal is to discover heretofore unknown information, something that no one yet knows and so could not have yet written down.

A typical example in data mining is using consumer purchasing patterns to predict which products to place close together on shelves, or to offer coupons for, and so on. A related application is automatic detection of fraud, such as in credit card usage. Analysts look across huge numbers of credit card records to find deviations from normal spending patterns.

     To go deeper into what is text mining we could make reference to what Ronen Feldman and James Janger explain in their book “The text mining handbook: advanced approaches in analyzing unstructured data”. Text Mining is a new research area that tries to solve the information overload problem by using techniques from data mining, machine learning, natural language processing (NLP), information retrieval (IR), and knowledge management. Text mining involves the preprocessing of document collections (text categorization, information extraction, term extraction), the storage of the intermediate representations, the techniques to analyze these intermediate represenations (such as distribution analysis, dustering, trend analysis, and association rules), and visualization of the results.

Text mining can be broadly defined as a knowledge-intensive process in which a user interacts with a document collection over time using a suite of analysis tools. In a manner analogous to data mining, text mining seeks to extract useful information from data sources through the identification and exploration of interesting patterns. In the case of text mining, however, the data sources are document collections, and interesting patterns are found not among formalized database records but in the unstructured textual data in the documents in these collections.

For text mining systems, preprocessing operations center on the identification and extraction of representative features for natural language documents. These processing operations are responsible for transforming unstructured data stored in document collections into a more explicity structured intermediate format, which is a concern that is not relevant for most data mining systems. 

 

Sources:

*Ronen Feldman and James Sanger. “The text mining handbook: advanced approaches in analyzing unstructured data”. Published in 2006, Cambrige University Press, 410 pages. Retrieved 22:15, 11 April, 2008 from http://books.google.com/books?hl=en&lr=&id=3PcEoz48RBcC&oi=fnd&pg=PR10&dq=the+text+mining+handbook&ots=dDUECB3_k4&sig=yoHE5tmdhlhZ2q8Qb9N0Spwywxc

*Marti Hearst. “What is Text Mining?”. (October 17, 2003). Retrieved 22:05, 11 April, 2008, from http://www.jaist.ac.jp/~bao/MOT-Ishikawa/FurtherReadingNo1.pdf

*Text mining. (2008, April 8). In Wikipedia, The Free Encyclopedia. Retrieved 11:52, April 12, 2008, from http://en.wikipedia.org/w/index.php?title=Text_mining&oldid=204166609

Posteado por: aintza | Abril 6, 2008

Some Research Topics on Human Language Technologies (Q2)

     In this article I’m going to mention some of the recent research topics that are developed at major sites on Human Language Technologies.

The Language Technology Lab (DFKI) in research, development and commercial projects elaborates the following themes:

  • exploiting – and automatically extending – ontologies for content processing.
  • tighter integration of shallow and deep techniques in processing.
  • enriching deep processing  with statistical methods.
  • combining language checking with structuring tools in document authoring.
  • document indexing for German and English.
  • automatically associating recognized information with related information and thus building up collective knowledge.
  • automatically structuring and visualizing extracted information.
  • processing information encoded in multiple languages, among them Chinese and Japanese.

 

The European Network of Excellence in Human Language Technologies (ELSNET) points out these topics:

  • automated retrieval, extraction, and enrichment of information and knowledge from multimedia, multi-lingual, and multiparty information sources.
  • translingual and crosslingual retrieval, presentation, and sharing of knowledge.
  • automated detection and tracking of emerging topics from unstructured multimedia data.
  • use of knowledge sources to facilitate knowledge mapping and access.
  • automated question answering from heterogeneous source.
  • intelligent tools that support the automated bibliometrics and document analysis/understanding in support of discovery of distributed experts and communities of expertise.
  • summarization and presentation generation of knowledge.
  • modeling of user knowledge, beliefs, plans, (dis)abilities and preferences from queries, created artifacts, and human computer interactions.

 

The Language Technology Group conducts research and development in in a number of areas such as:

  • combining shallow semantics and domain knowledge.
  • text mining for biomedical content curation.
  • cross-retail multi-agent retail comparison.
  • smart qualitative data: methods and community tools for data mark-up.
  • machine learning for named entity recognition.
  • integrated models and tools for fine-grained prosody in dicourse.
  • joint action science and technology.
  • AMI consortium projects that are developing technologies from meeting  browsing and to assist people participating in meetings from a remote location.
  • study of how pairs collaborate when in planning a route on a map.

 

The Austrian Research Institute for Artificial Intelligence (OFAI) in the area of language and speech processing is conducting both basic and applied research. They develop linguistic resources and processes as well as application prototypes:

Linguistic Resources and Processes

  • Typed unification-based grammar formalisms.
  • Development of a HPSG-based grammar for German.
  • Natural Language Generation.
  • Speech Synthesis.
  • Computational Morphology.

 

Application Prototypes

  • Natural language interfaces and advisory systems.
  • Concept-to-speech systems.

 

The Common Language Resources and Technology Infrastructure (CLARIN) offers the following list:

  • Texts of all shorts which can be digitized medieval resources, web-sites, newspapers, digitized books, etc.
  • Multimedia recordings (audio/video) and time series recorded during communication.
  • Various types of manually or automatically created annotations on texts, media streams, etc.
  • Tools such as aligners, speech recognizers, tokenizers, part-of-speech taggers, parses, manual annotators, viewers, etc.
  • Various types of of knowledge sources encapsulating knowledge about resources and languages such as metadata descriptions, GIS, lexica, concept registries, ontologies, etc.

 

Sources:

*Language Technology Lab (OFAI, Germany). Retrieved 13:34, March 15, 2008, from http://www.dfki.de/lt/projects.php

*European Network of Excellence in Human Language Technologies (ELSNET). Retrieved 18:45, March 15, 2008, from http://www.elsnet.org/

*Edinburgh Language Technology Group (LTG). Retrieved 13:15, March 15, 2008, from http://www.ltg.ed.ac.uk/

*Language Technology Group at the Austrian Research Institute for Artificial Intelligence (OFAI). Retrieved 13:34, March 15, 2008, from http://www.ofai.at/research/nlu/

*Common Language Resources and Technology Infrastructure (CLARIN). Retrieved 13:21, March 20, 2008, from http://www.clarin.eu/

 

 

 

  

 

 

 

 

Posteado por: aintza | Abril 6, 2008

About Hans Uszkoreit (Q1)

     Hans Uszkoreit is not only Proffesor of Computational Linguistics at Saarland University, but he is also the Scientific Director at the German Research Center for Artificial Intelligence (DFKI) where he heads the DFKI Language Technology Lab and the Proffesor of the Computer Science Department.

Furthermore, Uszkoreit is also Permanente Member of the International Comittee of Computational Linguistics (ICCL). Member of the European Academy of Sciences, Past President of the European Association for Logic, Language and Information, Member of the Executive Board of the European Network of Language and Speech, Member of the Board of the European Language Resources Association (ELRA), and serves on several international editorial and advisory boards.

     Hans Uszkoreit studied Linguistics and Computer Science at the Technical University of Berlin and the University of Texas at Austin. During the time he spent at Austin, he worked as a research associate in a large machine translation project at the Linguistics Research Center. From 1982 and 1986, he worked as a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, Ca, and he was also affiliated with the Center for the study of Language and Information at Stanford University as a senior researcher and later as a project leader. In 1986 he spent six months in Stuttgart on an IBM Research Fellowship at the Science Division of IBM Germany. In December of that year he returned to Stuttgart to work for IBM Germany as a project leader in the project LILOG (Linguistics and Logical Methods for the Understanding of German Texts). In 1988 Hans Uszkoreit was appointed to a newly created chair of Computational Linguistics and Phonetics. In 1989 he became the head of the newly founded Language Technology Lab at DFKI. He has been a co-founder and principal investigator of the Special Collaborative Research Division (SFB 378) “Resource-Adaptive Cognitive Processes” of the DFG (German Science Foundation). He is also co-founder and prefessor of the “European Postgraduate Program Language Technology and Cognitive Systems”, a joint Ph.D. program with the University of Edinburgh.

     His current research interests are computer models of natural language understanding and production, advanced applications of language and knowledge technologies such as semantic information systems, cognitive foundations of language and knowledge, grammar formalisms and their implementation, syntax and semantics of natural language and the grammar of German.

     Among all his publications we can point out some of the latests:

- Methods and Applications for Relation Detection. In: Proceedings of the Third IEEE International Conference on Natural Language Processing and Knowledge Engineering, Beijing. Uszkoreit, H. (2007).

- Challenges and Solutions of Multilingual and Translingual Information Service Systems, To appear in Proceedings of HCL International 2007, 12th International Conference on Human-Computer Interaction, Beijing, 2007. Uszkoreit, H., F. Xu, W. Liu (2007).

Resources:

*Hans Uszkoreit. Short Curriculum Vitae. Retrieved 11:30, April 6, 2008, from  http://www.coli.uni-saarland.de/~hansu/bio.html

                                                         

Posteado por: aintza | Marzo 30, 2008

Some Research Centres of HLT (Q1)

     There are several research centers where we can find more information about Human Language Technologies and we can get to know what is going on this field:

  • The National Centre for Language Technology, which is directed by the professor Josef Van Genabith, points out in its web page that it conducts research into the processing of human language by computers, such as speech recognition and synthesis, machine translation, human-computer interfaces, information retrieval and extraction, the teaching and learning of languages using computers and software localisation and globalisation. Research in Human Language Technology (HLT) is interdisciplinary and includes Natural Language Processing (NLP) and Computational Linguistics (CL). HLT has substancial economic implications ans potential. The centre carries out basic research and develops applications. This centre has several research areas such as CALL Computer Assisted Language Learning, Corpus Linguistics, Machine Translation and Transaltion Technology, Treebank-Based Unification Grammar Acquisition, Semantics, Speech Technology, Multilingual Information Retrieval/Exatraction, Language Evolution. The National Centre for Language Technology has also several publications where we include books, journal articles, conference papers, etc.
  • The Language Tehnology Documentation Centre in Finland is being maintained by the Department of General Linguistics in the University of Helsinki. The Nordic language technology documentation project was financed partly by the Nordic Language Technology Research Program administered by NorFA, which later became NordForsk. They also cooperate via NorDokNet with similar sites in Denmark, Norway, Sweeden and Iceland.
  • The Austrian Research Institute for Artificial Intelligence (OFAI) has a group which focuses on Language Technology (LT) since its inception in 1984. This group conducts research in modelling and processing human languages, especially for German. This includes constructing linguistic resources (such as lexicons, grammars, discourse models), processing algorithms (such as morphological components, parsers, generators, speech sythesizers, discourse processing components), and application prototypes (such as natural language interfaces, advisory systems and concept-to-speech systems). The Language Technology Group at OFAI is a member at the EU’s European Network of Excellence in Human Language Technologies (ELSNET). In the area of language and speech processing they are conducting both basic and applied research. They develop linguistic resources and precesses as well as application prototypes. Most of their work concentrates on the German Language. The Language Technology Group at OFAI is involved in a number of projects in basci and applied research. On the European level, the group acts as a partner in projects funded by the EC, modtly in the Human Language Technologies/Language Engineering sector. On a national level, projects are funded mainly by the Austrian Science Foundation (FWF) and the FFF/ITF. In some of the projects, cooperation exist with Austrian University department and companies.
  • The Edinburgh Language Technology Group (LTG) is a research and developmet group that has been working in the area of natural language engineering since the early 1990s. The LTG was originally established as a part of the Human Communication Research Centre, and is now based in the Institute for Communicating and Collaborative Systems of the Division of Informatics, University of Edinburgh, one of the largest communities of natural language processing specialists in Europe. The LTG’s work is application oriented: they focus on building practical solutions to real problems in the text processing. They have worked in all areas of large-volume text handling, from text annotation through markup architectures and from informations extraction to automatic or computer-assited generation of text. The Language Technology Group conducts research and development in a number of areas. Some of the LTG projects are listed at its web page.       

Resources:

*Language Technology Documentation Centre in Finland (FiLT). Retrieved 13:37, March 15, 2008, from http://www.ling.helsinki.fi/filt/info/index-en.shtml

*Language Technology Group at the Austrian Research Institute for Artificial Intelligence (OFAI). Retrieved 13:34, March 15, 2008, from http://www.ofai.at/research/nlu/

*National Centre for Language Technology (NCLT). Retrieved 11:52, March 15, 2008, from http://www.nclt.dcu.ie/index.html

*The Edinburgh Language Technology Group (LTG). Retrieved 13:15, March 15, 2008, from http://www.ltg.ed.ac.uk/

Posteado por: aintza | Marzo 30, 2008

Definition of Human Language Technologies (Q1)

     Nowadays there are several definitions for Human Language Technologies; and taking into account that there are different terms which refer to it, the number of definitions increases.

     The free encyclopedia, Wikipedia, points out that Human Language Technologies (HLT) is often called Language Technology or Natural Language Processing (NLP).

If we search for Human Language Technologies at Wikipedia, we find the following definition:

Human Language Technology (HLT) consists of computational linguistics (or CL) and speech technology as its core but includes also many application oriented aspects of them. Language technology is closely connected to computer science and general linguistics.”

On the other hand, to make clear the concept of Human Language Technologies, we can also refer to Natural Language Technologies. Wikipedia gives the following definition of this term that is also used in the place of the term that we are studying (HLT):

Natural Language Processing (NLP) is a subfield of artificial intellingence and linguistics. It studies the problems of automated generation and understanding of natural human languages. Natural language generation systems convert information from computer databases into normal-sounding human language, and natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.”

     We can point out another definition given by Hans Uszkoreit in his study “What is Language Technology?”, published in 2007:

Language Technology -sometimes also referred to as Human Language Technology- comprises computational methods, computer programs and electronic devices that are specialized for analyzing, producing or modifying texts and speech. These systems must be based on some knowledge of human language. Therefore language technology defines the engineering branch of computational linguistics.” 

     There’s another definition of Language Technology given by the Language Technology Documentation Centre in Finland:

Language Technology is a multidisciplinary field, which studies technical means and methods that can be used to process natural language with computers. Some well-known applications of language technology are for example automatic authoring tools (such as spell checking) and speech recognition. Language technology has also many other application areas, which are introduced in the technologies section and in Language Technology World.”

     At the web page of the course Human Language Technology at the University of Arizona there’s a short explanation of what is HLT:

Human Language Technology is a developing interdisciplinary field that encompasses most subdisciplines of linguistics, as well as computational linguistics, natural language processing, computer science, artificial intellingence, psychology, philosophy, mathematics and statistics.”

Resources:

*Language technology. (2007, December 19). In Wikipedia, The Free Encyclopedia. Retrieved 14:30, March 18, 2008, from http://en.wikipedia.org/w/index.php?title=Language_technology&oldid=179070229

*Natural language processing. (2008, March 7). In Wikipedia, The Free Encyclopedia. Retrieved 14:30, March 18, 2008, from http://en.wikipedia.org/w/index.php?title=Natural_language_processing&oldid=196512922

*Hans Uszkoreit. What is Language Technology? Retrieved 18:42, March 18, 2008, from http://www.dfki.de/~hansu/LT.pdf

*Human Language Technology at the University of Arizona. Retrieved 18:55, March 15, 2008, from http://hlt.arizona.edu/about/about.php

*Language Technology Documentation Centre in Finland (FiLT). Retrieved 13:37, March 15, 2008, from http://www.ling.helsinki.fi/filt/info/index-en.shtml

Entradas antiguas »

Categorías