Latent Semantic Analysis and Machine Translation

ABSTRACT

During next decades from 1970s, there has gradually been some improvement of translation quality, although not as rapidly as many would have hoped (Hutchins, 1986(Hutchins, & 1988. In general, improvement in this field came from research building upon computational and linguistic methods and techniques.

LITERATURE REVIEW
Latent Semantic Analysis (LSA, also known as Latent Semantic Indexing, or LSI) is a welldeveloped technique for representing word and passage meanings as vectors in a high dimensional "semantic" space. Through application of linear algebra methods singular value decomposition and dimensional reduction, a co-occurrence matrix is transformed to better reflect the "latent," or hidden, similarities between words and documents. The technique can be used to determine the most likely meaning of a polysemous word from some given context by comparing a vector constructed from that context with document vectors. Vectors representing similar passage meanings should be near each other, as LSA is said by some of its creators to "closely approximate human judgments of meaning similarity between words" (Landauer and et al. 1998).
Most studies to date have focused on LSA's applications in searching and document retrieval. In this field, LSA has been shown to offer a marked improvement over other methods (Dumais, 1994).
Cross-language information retrieval search results in languages differing from the query has also received attention (Rehder and et al. 1998) as has LSA's use in language modeling (Kim and Khudanpur, 2004).
LSA has also been tried with human vocabulary synonym and word-sorting tests, in the course of research on how well LSA models human conceptual knowledge, and scored not far below group norms (Landauer and et al. 1998). On the practical side, LSA has been used in a commercial product called the "Intelligent Essay Assessor," which evaluates students' knowledge and writing skills (Landauer and et al. 2000).
However, at least one study has addressed LSA's potential in machine translation, specifically in dealing with polysemy in Korean-English translation (Kim and et al. 2002). This study did not use the general context of an ambiguous word, but rather considered a single argument word in a specific grammatical relationship, such as subject-verb, between the argument and the target polysemous word. The correct meaning of the target was drawn

DISCUSSION
Translation has been defined as the production of a text in TL with the same effect in SL (Newmark, 1981). Part of producing the same effect in TL is to know how words are perceived and comprehended in TL different contexts. The question is that how it is possible to account for different contextual usage of words in translation. How it is possible to know which word is most probable to occur in a given context? Comprehension of the text in the target language based on its contextual usage is the key point which plays a fundamental role in depicting the way language is processed in TL. In this way, the texts produced in translation will be perceived and comprehended more naturally because the texts will be comprehended as it is stored in the mind and retrieved in different contexts. The aim this article seeks is to prepare for the way of facilitating the translation especially machine translation by relying on contextual usage of words in TL. Latent semantic analysis (LSA) is the framework used to give the solution to some problems facing computer to cope with.

Latent semantic analysis and translation
Latent semantic analysis is a general theory of acquired similarity and knowledge representation. It ignores all linguistic structures in the text including syntax, morphology, etc, and is sensitive only to occurrences of words. The basic assumption of LSA is that the words which have similar meanings tend to occur in similar contexts. LSA's power lies in the fact that it is sensitive not only to direct cooccurrences, but can also infer indirect relations between words across texts. Measuring LSA in translation will enable machine to cope with some drawbacks that face machine in choosing between words while translating into another language. This model is able to represent complex semantic structures of given contexts in TL. This fact will help to provide the reader the structures above the LSA is a valuable analysis tool with wide range of applications (Deerwester, Dumais and Landauer, 1990;Foltz and Dumais, 1992;Landauer and Dumais, 1997).
Application of LSA in machine translation will improve its efficiency beyond that of translation done without LSA at hand. LSA can be used to identify locations in the text where topic shift occurs so that the text can be segmented into discrete topics (Landauer, Foltz and Kintsch, 1998). Discourse segmentation is based on the premise that the coherence should be lower in areas of discourse where the discourse topic changes.

Latent Semantic Analysis and Machine Translation
Measuring the topic shift in machine translation is a big advantage which facilitates making more accurate text in TL applying LSA.  (Landauer et al, 1998) and is unable to detect synonyms from antonyms (Aynat, 2002), for this reason strategies should be taken into account to enable machine for distinguishing the two.
It is also important to be aware that the relationships