Posteado por: aintza | Marzo 30, 2008

Definition of Human Language Technologies (Q1)

     Nowadays there are several definitions for Human Language Technologies; and taking into account that there are different terms which refer to it, the number of definitions increases.

     The free encyclopedia, Wikipedia, points out that Human Language Technologies (HLT) is often called Language Technology or Natural Language Processing (NLP).

If we search for Human Language Technologies at Wikipedia, we find the following definition:

Human Language Technology (HLT) consists of computational linguistics (or CL) and speech technology as its core but includes also many application oriented aspects of them. Language technology is closely connected to computer science and general linguistics.”

On the other hand, to make clear the concept of Human Language Technologies, we can also refer to Natural Language Technologies. Wikipedia gives the following definition of this term that is also used in the place of the term that we are studying (HLT):

Natural Language Processing (NLP) is a subfield of artificial intellingence and linguistics. It studies the problems of automated generation and understanding of natural human languages. Natural language generation systems convert information from computer databases into normal-sounding human language, and natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.”

     We can point out another definition given by Hans Uszkoreit in his study “What is Language Technology?”, published in 2007:

Language Technology -sometimes also referred to as Human Language Technology- comprises computational methods, computer programs and electronic devices that are specialized for analyzing, producing or modifying texts and speech. These systems must be based on some knowledge of human language. Therefore language technology defines the engineering branch of computational linguistics.” 

     There’s another definition of Language Technology given by the Language Technology Documentation Centre in Finland:

Language Technology is a multidisciplinary field, which studies technical means and methods that can be used to process natural language with computers. Some well-known applications of language technology are for example automatic authoring tools (such as spell checking) and speech recognition. Language technology has also many other application areas, which are introduced in the technologies section and in Language Technology World.”

     At the web page of the course Human Language Technology at the University of Arizona there’s a short explanation of what is HLT:

Human Language Technology is a developing interdisciplinary field that encompasses most subdisciplines of linguistics, as well as computational linguistics, natural language processing, computer science, artificial intellingence, psychology, philosophy, mathematics and statistics.”

Resources:

*Language technology. (2007, December 19). In Wikipedia, The Free Encyclopedia. Retrieved 14:30, March 18, 2008, from http://en.wikipedia.org/w/index.php?title=Language_technology&oldid=179070229

*Natural language processing. (2008, March 7). In Wikipedia, The Free Encyclopedia. Retrieved 14:30, March 18, 2008, from http://en.wikipedia.org/w/index.php?title=Natural_language_processing&oldid=196512922

*Hans Uszkoreit. What is Language Technology? Retrieved 18:42, March 18, 2008, from http://www.dfki.de/~hansu/LT.pdf

*Human Language Technology at the University of Arizona. Retrieved 18:55, March 15, 2008, from http://hlt.arizona.edu/about/about.php

*Language Technology Documentation Centre in Finland (FiLT). Retrieved 13:37, March 15, 2008, from http://www.ling.helsinki.fi/filt/info/index-en.shtml

Posteado por: aintza | Febrero 16, 2008

Money and Fame over Everything

      When the first reality shows appeared and the people realised how easy it was to become famous through this programmes, being famous became a very important thing. This can be appreciated in the way people’s behaviour has changed since the first reality show. The participants of the first edition were so famous when the reality show ended that everybody wanted to be like them and follow their steps.

      In the following editions, as I said above, people’s behaviour changed in order for them to win the programme or just become famous. It seemed that the directors of the reality shows were looking for the most extravagant people or that they were even choosing actors just to gain audience. So, the essence of the reality show was lost because we couldn’t see how people really coexisted.

      Taking part in a reality show doesn’t mean only becoming famous. You also have the possibility of earning a lot of money. It’s not only the prize you receive from winning the reality show, but when you leave the programme you can make money just by appearing on different TV programmes or in magazines. So, it’s clear that a reality show does not offer just fame, but also money. And that’s what many people want to achieve nowadays.

      The problem with these reality shows and the way the participants behave is that education and respect are forgotten. The people in those reality shows are arguing or shouting most of the time, and don’t demonstrate knowledge or even respect. I think that this is a very bad reference for society, who will end up behaving like them.

      To sum up, and taking into account what I’ve explained, for me it’s clear that this type of programme is one of the worst things on TV. I’m not saying that they must be forbidden, but should be controlled. The participants must have at least some knowledge and education, because it seems that they’re chosen for being the silliest people in the world. This has to be canged in order to improve our society.

      The first reaction against the decision the father has taken comes from Bruno, who does not understand why they are leaving their extraordinary house. Not only his house, but all his life. He loves everything in Berlin, the people, the houses, and of course his three best friends. Of course, as he is only nine years old, he is unable to find a reason which could make him understand his father’s behaviour. He’s only told that his father is a very important person and going to Out-With is decisive for his job. Nevertheless, Bruno does not know exactly which his father’s job is. Meanwhile, all the people around him seem to show no reaction to the father’s decision.

      As the story develops, we can see how things start to change and the different characters suffer an evolution. Bruno’s sister confesses that she is upset with the place and the house, and has nothing to do there. However, she doesn’t show her feelings to anyone else (except Bruno) until the end of the book when she is offered the oportunity to return to Berlin with her mother.

Bruno’s mother suddenly comes to the point that she can’t stand any longer the situation of living in such a horrible place. So, she takes the decision to return to Berlin even when she knows how important it is for her husband to be there.

      At the end it’s clear how the main characters think and what they really want to do. They put father’s opinion in a second place and begin to think by themselves. The influence that the father had at the beginning vanishes and each character makes his own way, which at the ebd, makes the father realise what should have been the most important thing: his family that at the end it is broken in the same way he was breaking other Jewish families.

Fifty-three sea lions have been found massacred, nearly all with crushed skulls, on an island in the Galápagos.

The brutal slaying has officials looking for clues and locals calling for tougher controls, especially on the archipelago’s uninhabited islands.

Park wardens from Galápagos National Park discovered the bodies while working in early January to remove feral goats from the islands of Floreana, Isabela, Santiago, and Pinta, all part of the famed Galápagos Islands located off Ecuador’s Pacific Coast.

On Pinta, a protected island surrounded by the Galápagos Marine Reserve, workers found the dead sea lions “in an advanced state of decomposition,” according to Victor Carrion, the park manager.

The animals, nearly all showing signs of being beaten in the head, were distributed within a half-mile (0.8-kilometer) radius in a spot known as Puerto Pasado, Carrion said.

The marine mammals are Galápagos sea lions, listed as a threatened species by The International Union for the Conservation of Nature and Natural Resources.

They are sometimes hunted for their fur and body parts, particularly penises, which are used to make aphrodisiacs by some practitioners of traditional Asian medicine.

But the dead animals found last month—9 adult males, 6 adult females, 25 “immature sea lions,” and 13 pups—bore no signs of cutting or dismemberment, Carrion said.

“One hundred percent of the animals had their skin intact,” he said in an email.

“That is to say, we can discard the theory that they were killed for their skin.” 

Source:

* Kelly Hearn. (04 February 2008). “Dozens of Sea Lions Found Massacred in Galapagos”. National Geographic. Retrieved: 09 February 2008.

http://news.nationalgeographic.com/news/2008/02/080204-sea-lions.html

 

Posteado por: aintza | Febrero 11, 2008

Mona Lisa’s misterious smile

     The Mona Lisa, painted by Leonardo da Vinci in the 1500s, has intrigued art lovers for five centuries because of its subject’s mysterious smile. Da Vinci’s painting, possibly the most famous portrait of all time, is housed at the Louvre in Paris.

The smile on the face of the Mona Lisa is so enigmatic that it disappears when it is looked at directly, says the US scientist Professor Margaret Livingstone of Harvard University. According to Livingstone the smile only became apparent when the viewer looked at other parts of the painting.

The theory has been presented at the American Association for the Advancement of Science’s (AAAS) annual meeting in Denver, Colorado, this week.

The smile disappeared when it was looked at because of the way the human eye processes visual information, said Prof Livingstone. The eye uses two types of vision, foveal and peripheral. Foveal, or direct vision, is excellent at picking up detail but is less suited to picking up shadows.

 “The elusive quality of the Mona Lisa’s smile can be explained by the fact that her smile is almost entirely in low spatial frequencies, and so is seen best by your peripheral vision,” Prof Livingstone said.

The more a person stares fixedly ahead, the less useful is their peripheral vision. Prof Livingstone said the best example of this effect was if someone was to stare at a letter on a page of print. Concentrating on one letter made it difficult to pick out other letters even a short distance away, Prof Livingstone said. She said the same principle was used by da Vinci on the painting. The smile only became apparent if a viewer looked at her eyes or elsewhere on her face.

     According to a research presented this week at the Uffizi Gallery in Florence, Italy, Hidden behind the Mona Lisa’s enigmatic smile are millions of invisible dots. The Mona Lisa code consists of countless of dot layers applied with a technique of micro-divided brushstrokes. Jacques Franck, a consultant at the Armand Hammer Centre for Da Vinci Studies at the University of California, reported that the technique is somewhat similar to pointillism used by the French Neo-Impressionists in the late 19th century.“Examples of this micro-division of tones exist since the ancient Romans. Leonardo took an existing techniques, but used it to the extreme, like nobody else”. Called “sfumato,” from the Italian word “fumo” (meaning smoke), the painting technique produces an almost three-dimensional effect, the result of the delicate brushwork that blends light, shadow, and contours.
Da Vinci never really explained how he was able to blend shadow and light in such an imperceptible way.
The only reference to the sfumato technique appears in his notes on painting: “‘light and shade should blend without lines or borders, in the manner of smoke,” he wrote.
     Pascal Cotte, a french engineer, affirms that Mona Lisa’s smile was originally wider and more expressive. Mr. Cotte said that his 240-megapixel scans revealed traces of Mona Lisa’s left eyebrow, obliterated by long-ago restoration efforts. “The face of the Mona Lisa appears slightly wider and the smile is different and the eyes are different. The smile is more accentuated”.

     Another study by a University of Amsterdam using a computer program concludes that she was mainly happy. The painting was analysed using “emotion recognition” software. It concluded that the subject was 83% happy, 9% disgusted, 6% fearful and 2% angry. The computer rated features such as the curvature of the lips and crinkles around the eyes.

The program, developed with researchers at the University of Illinois, US, draws on a database of young female faces to derive an average “neutral” expression.  The software uses this average expression as the standard for comparisons.

Sources:

* http://news.bbc.co.uk/2/hi/entertainment/2775817.stm ”Mona Lisa smile secrets revealed”. Retrieved: 27 December 2007

*Rosella Lorenzi. (04 April 2006). “Secrets of Mona Lisa’s smile revealed?”. Retrieved: 09 February 2008.

http://www.sgallery.net/news/04_2006/04.php

*Matthew weaver and agencies. (22 October 2007). Retrieved: 09 February 2008.

http://arts.guardian.co.uk/art/news/story/0,,2196892,00.html

*http://news.bbc.co.uk/2/hi/entertainment/4530650.stm Retrieved: 09 February 2008.

Posteado por: aintza | Febrero 10, 2008

Something about the future of TEI

The TEI has achieved a major milestone in establishing an intellectual foundation for text encoding and a set of encoding conventions substantial enough to serve the fundamental needs of most encoding projects, both large and small. However, much of this development has necessarily taken place in advance of experience. It is essential to continue the work if the TEI by extending the Guidelines more broadly and providing materials and facilities for user support. In addition, now that the core of a coherent set of encoding practices has been established, it is critical to provide for extensive evaluation and testing in large-scale use, and to implement mechanisms for continued extension and modification of the Guidelines in response.

The best way to promote a standard is to develop resources and software that embody it. Therefore, the primary focus of the TEI must shift to the wide-spread and large-scale implementation of the Guidelines. Actual use of the Guidelines will become the major force driving the development of extensions and modifications to it. Activity within the TEI will focus on user support, instruction, consulting, etc. One of the primary roles of the TEI will be to form a liaison with and provide consultancy for users, as appropriate, to ensure compatibility with the Guidelines as they currently exist, and to incorporate the results eventually into future versions. Another central concern of this phase will besystematic evaluation and review, again accomplished on the basis of actual experience using the Guidelines, the results of which will also guide the further development of the Guidelines.

Extension of the Guidelines will continue, to incorporate modifications, revisions, and extensions suggested or required on the basis of user responses; provide refinements and further developments of chapters in the current version; and form or encourage work groups for areas that have only been outlined, for example, physical description (manuscripts, papyri, inscriptions, etc.), literary analysis and interpretation, alignment mechanisms for multilingual corpora and for coordinating speech with speech transcriptions, multimedia processing, etc.

Sources:

* Nancy M. Ide and C.M. Sperberg-McQueen. (1995). “The Text Encoding Initiative: Its History, Goals and Future Development”. Retrieved: 06 February 2008.http://www.cs.vassar.edu/~ide/papers/teiHistory.pdf

Posteado por: aintza | Febrero 10, 2008

Technical and cultural consequences of XML

The world in which we live is strongly affected, if not dominated, by a collection of amazingly varied and powerful technical, economic, political, and cultural norms and standards. Although it is easy to forget the impact of technological standards, their importance is recalled merely by contemplating the significance of agreements to drive on the same side of the road, standardized weights and measures, standards for a common electrical power grid, or TCP/IP. If we eliminated even a small number of technology standards, the world would be a very different place. With respect to XML, it has very broad applicability, and XML is achieving its potential through broad usage. In light of this, we contend that XML will take its place among the technical standards having the greatest import to the world. The authors believe that many computer scientists would agree with this observation. Why do we think XML is so important? Perhaps, this is because we can describe XML as a universally applicable, durable ‘‘Code of Integration’’; that is, a broadly applicable language for creating, storing, transmitting, accessing, and transforming information from a multitude of sources. It also naturally leads to a set of extensions which support semantically rich, tagged interchange and storage standards. Even though we would postulate that the von Neumann computing architecture, the techniques for analyzing algorithms, and the elegant structures that fuse complexity theory, formal language theory, compilers, and programming languages may be more important to computer science, and are in some sense considerably deeper accomplishments, the Code of Integration may be of comparable importance. This is because a Code of Integration can be applied coherently to a wide range of technical problems with a number of benefits, the most significant of which are the following:

 1. A consistent programming paradigm

As programming involves the expression of rich interfaces and the techniques for manipulating information, the XML Code of Integration can be a basis for significant consistency, automation, and reuse in expressing software processes. Although XML does not purport to solve all problems, it does provide the language in which solutions can be expressed. This will increasingly improve the economics of IT-based automation.

2. Simplicity of integration A Code of Integration can greatly reduce the cost of integrating and processing information. Just as common languages and vocabulary are among the most important cultural bases of civilization, agreement on a standardized form for defining information is exceedingly valuable to enable knowledge synthesis and systems integration. With a common way to express semantic information, there will be more (albeit incomplete) standardization of semantic information, paving the way to numerous benefits: information fusion, totally automated or semiautomated assembly of systems, greatly increased use of machine learning, computer-based reasoning, and more.3. Economies of scale

Because of the universality of the Code of Integration, skills related to its use are widely useful. Significant investment can wisely be made in its implementations, leading to a high degree of optimization. Examples of this include significant investments in high-performance XML software and hardware (e.g., IBM’s recent DataPower acquisition). Even beyond the probable importance of the XML Code of Integration as a primary technical standard, XML will become a defining element (albeit one that is behind the scenes) of economics, politics, and culture.
Sources:

* S. Adler, R. Cochrane, J. F. Morar and A. Spector. (01 June 2006). “Technical Context and Cultural Consequences of XML”. Retrieved: 05 January 2008.

https://www.research.ibm.com/journal/sj/452/adler.pdf

 

 

Posteado por: aintza | Febrero 10, 2008

The Text Encoding Initiative (TEI)

Before they can be studied with the help of computers, texts must be encoded in computer-readable form. Standard data processing practice provides convenient solutions for basic text representation problems, but many texts of interest to scholarly research present difficulties not resolved by industrial standards. Therefore, over the years scholars have developed a variety of methods for representing special characters, encoding logical divisions of a text, representing analytic or interpretative information, and reducing textcritical apparatus to a single linear sequence. Because of the lack of a unified, standard format, scores of such encoding schemes were developed in the 1960’s, 70’s, and 80’s from scratch or adapted from existing schemes. These schemes typically reflected the specialized interests of their originators and were, by and large, incompatible; the end result was that a text encoded for one purpose or piece of software often required substantial editing to be used for another purpose or with other software, if it was reusable at all.Following the Vassar conference the ACH was joined by the Association for Literary and Linguistic Computing and the Association for Computational Linguistics in driving the standards effort, thus forming the Text Encoding Initiative (TEI).The Text Encoding Initiative, an XML schema devoted to the markup of literary and linguistic texts. TEI allows useful abstractions of typographic features of source documents, but in a manner that enables effective searching, indexing, comparison, and print publication — something not possible with publications archived as mere photographic images.Basically, TEI aims to encode all the semantically significant aspects of literary texts, both old ones that predate XML technology (or indeed, computers in general) and newly created ones. TEI offers varying degrees of typographic and semantic markup options.In a general sense, any tool that can work with XML can work with TEI. DTDs are available for several TEI variations, as are XSLT stylesheets of various sorts. Naturally, customizations for working with TEI in Emacs, Framemaker, and MS-Word can be found at the TEI Web site. An XMetal customization is also downloadable.Very quickly, it was recognized that the TEI’s goals served not only humanities scholarship, but were critical for a broad range of applications by the language industries more generally. It has become crucial for both research and industry to ensure that any text that is created can be used and, more importantly, reused for any number of applications and for more, as yet not fully understood, purposes. Thus since its inception, the work of the TEI has achieved increasingly central importance for text-based work across disciplines and applications.The TEI’s achievements include:

1. determination that the Standard Generalized Markup Language (SGML) is the appropriate framework for development of the Guidelines;
2. specification of restrictions on and recommendations for SGML use that best serves the needs of interchange, as well as enables maximal generality and flexibility in order to serve the widest possible range of research, development, and application needs;

3. analysis and identification of categories and features for encoding textual data, at many levels of detail;

4. specification of a set of general text structure definitions that is effective, flexible, and extensible;

5. specification of a method for in-file documentation of electronic texts that is compatible with library cataloging conventions and can be used to trace the history of the texts and thus can assist in authenticating their provenance and the modifications they have undergone;

6. specification of encoding conventions for special kinds of texts or text features:

    a. character sets

    b. language corpora

    c. general linguistics

    d. dictionaries

    e. terminological data

    f. spoken texts

    g. hypermedia

    h. literary prose

    i. verse

    j. drama

    k. historical source materials

    l. text critical apparatus

The TEI Guidelines are the result of this work. They provide encoding conventions for describing the physical and logical structure of many classes of texts, as well as features particular to a given text type or not conventionally represented in typography. They treat common text encoding problems, including intra- and inter-textual cross reference, demarcation of arbitrary text segments, alignment of parallel elements, overlapping hierarchies, etc. In addition, they provide conventions for linking texts to acoustic and visual data. As such, the TEI Guidelines answer the fundamental needs of a wide range of users: researchers in the humanities, sciences, and social sciences, publishers, librarians, and those concerned generally with document retrieval and storage. They also answer many of the needs of the growing “language technology” community, which is amassing substantial multi-lingual, multi-modal corpora of spoken and written texts and lexicons in order to advance research in human language understanding, production, and translation.

Sources:

* David Mertz. (04 September 2003). “XML Matters: TEI — the Text Encoding Initiative”. Retrieved: 02 February 2008.

  http://www.ibm.com/developerworks/library/x-matters30.html

* Nancy M. Ide and C.M. Sperberg-McQueen. (1995). “The Text Encoding Initiative: Its History, Goals and Future Development”. Retrieved: 06 February 2008.

http://www.cs.vassar.edu/~ide/papers/teiHistory.pdf

 

 

Posteado por: aintza | Febrero 8, 2008

Some advantages of XML

Simplicity
Information coded in XML is easy to read and understand, plus it can be processed easily by computers.

Openness
XML is a W3C standard, endorsed by software industry market leaders.

Extensibility
There is no fixed set of tags. New tags can be created as they are needed.Self-description
In traditional databases, data records require schemas set up by the database administrator. XML documents can be stored without such definitions, because they contain meta data in the form of tags and attributes.XML Provides a basis for author identification and versioning at the element level. Any XML tag can possess an unlimited number of attributes such as author or version.
Contains machine-readable context information
Tags, attributes and element structure provide context information that can be used to interpret the meaning of content, opening up new possibilities for highly efficient search engines, intelligent data mining, agents, etc.
This is a major advantage over HTML or plain text, where context information is difficult or impossible to evaluate.

Separates content  from presentation
XML tags describe meaning not presentation. The motto of HTML is: “I know how it looks”, whereas the motto of XML is: “I know what it means, and you tell me how it should look.” The look and feel of an XML document can be controlled by XSL style sheets, allowing the look of a document (or of a complete Web site) to be changed without touching the content of the document. Multiple views or presentations of the same content are easily rendered.

Supports multilingual documents and Unicode
This is important for the internationalization of applications.

Facilitates the comparison and aggregation of data
The tree structure of XML documents allows documents to be compared and aggregated efficiently element by element. 

Can embed multiple data types
XML documents can contain any possible data type – from multimedia data (image, sound, video) to active components (Java applets, ActiveX).

Can embed existing data
Mapping existing data structures like file systems or relational databases to XML is simple. XML supports multiple data formats and can cover all existing data structures and .

Provides a ‘one-server view’ for distributed data
XML documents can consist of nested elements that are distributed over multiple remote servers. XML is currently the most sophisticated format for distributed data – the World Wide Web can be seen as one huge XML database.

Rapid adoption by  industry
Software AG, IBM, Sun, Microsoft, Netscape, DataChannel, SAP and many others have already announced support for XML. Microsoft will use XML as the exchange format for its Office product line, while both Microsoft’s and Netscape’s Web browsers support XML. SAP has announced support of XML through the SAP Business Connector with R/3. Software AG supports XML in its Bolero and Natural product lines and provides Tamino, a native XML database.

Sources:

http://www.softwareag.com/xml/about/starters.htm  Retrieved: 15 January 2008

 

 

Posteado por: aintza | Febrero 8, 2008

What is XML?

XML is a standard, simple, self-describing way of encoding both text and data so that content can be processed with relatively little human intervention and exchanged across diverse hardware, operating systems, and applications. XML is a markup language for documents containing structured information.

Structured information contains both content (words, pictures, etc.) and some indication of what role that content plays (for example, content in a section heading has a different meaning from content in a footnote, which means something different than content in a figure caption or content in a database table, etc.). Almost all documents have some structure.
The XML specification defines a standard way to add markup to documents. The word “document” refers not only to traditional documents, like this one, but also to the myriad of other XML “data formats”. These include vector graphics, e-commerce transactions, mathematical equations, object meta-data, server APIs, and a thousand other kinds of structured information.

XML was created so that richly structured documents could be used over the web. The only viable alternatives, HTML and SGML, are not practical for this purpose. HTML comes bound with a set of semantics and does not provide arbitrary structure. SGML provides arbitrary structure, but is too difficult to implement just for a web browser. Full SGML systems solve large, complex problems that justify their expense. Viewing structured documents sent over the web rarely carries such justification.

Unlike records in traditional data base systems, XML data does not require relational schemata, file description tables, external data type definitions, etc., because the data itself contains this information. In contrast to the widely used Web format, HTML, which only ensures the correct presentation of the formatted data, XML also guarantees total usability of data.

XML documents can contain any imaginable data type – from classical data like text and numbers, or multimedia objects such as sounds, to active formats like Java applets or ActiveX components.

You can change the look and feel of documents or even entire websites with XSL Style Sheets without manipulating the data itself.

In brief, XML offers a widely adopted standard way of representing text and data in a format that can be processed without much human or machine intelligence. Information formated in XML can be exchanged across platforms, languages, and applications, and can be used with a wide range of development tools and utilities.

 Sources:

* Norman Walsh. (October o3, 1998). “A Technical Introduction to XML”. Retrieved: 12 January 2008

 

    http://www.xml.com/pub/a/98/10/guide0.html

 

* http://www.softwareag.com/xml/about/starters.htm  Retrieved: 15 January 2008

 

 

« Entradas Recientes - Entradas antiguas »

Categorías