Alinhamento de árvores sintáticas português-inglês
Araújo, Josué Garcia de
MetadataShow full item record
The manual translation of a source natural language into a target natural language is a task that demands time and expertise. In order to reduce the work needed for manual translations, the aim is to accomplish this task through Machine Translation (MT) systems. Since the 1940s, various approaches and techniques of MT have been proposed, investigated and evaluated in order to improve the quality of translations generated automatically. Nowadays, statistical machine translation methods are considered the state-of-art regarding the evaluation automatic measures commonly used in the area (such as BLEU and NIST), however a recent trend indicates that such systems will not improve their level of performance without the application of deeper linguistic knowledge, for instance, syntactic information. Thus, as an attempt to support the building of automatic translators, this dissertation presents the research, the implementation and the evaluation of parse trees alignment techniques. The computational tool for the automatic alignment of syntactic trees, result of this work, may be used to generate an extremely useful resource for various MT techniques: the aligned syntactic trees . This resource, so far unavailable for Brazilian Portuguese, will allow the development of new researches, which can provide the scientific advancement of the area. In this dissertation, a study of various techniques for parse trees alignment from the literature is presented. Besides, the pre-processing of a corpus for the inclusion of syntactic information from which the alignment is performed is also described, as well as the phases of lexical alignment and syntactic analysis. Some implementations and tests have been carried out with the pre-processed corpus, based on the theoretical foundations derived from the study of the techniques proposed in the literature. Based on the results of the intrinsic evaluation of the alignment, it was possible to conclude that the alignment of syntactic trees reached the accuracy of 97.36% and the coverage of 93.48% for tree pairs, representing parallel sentences in Brazilian Portuguese and in English by using different settings. Since the results have been promising, as future work, the aim is to apply the tool to a larger corpus of parallel syntactic trees, in order to obtain more examples of translation and, thus, allow its application to syntax-based machine translation techniques, such as syntax-based statistical methods or data-oriented translation.