Um analisador sintático neural multilíngue baseado em transições
Abstract
A dependency parser consists in inducing a model that is capable of extracting the right
dependency tree from an input natural language sentence. Nowadays, the multilingual techniques
are being used more and more in Natural Language Processing (NLP) (BROWN
et al., 1995; COHEN; DAS; SMITH, 2011), especially in the dependency parsing task.
Intuitively, a multilingual parser can be seen as vector of different parsers, in which each
one is individually trained on one language. However, this approach can be a really pain
in the neck in terms of processing time and resources. As an alternative, many parsing
techniques have been developed in order to solve this problem (MCDONALD; PETROV;
HALL, 2011; TACKSTROM; MCDONALD; USZKOREIT, 2012; TITOV; HENDERSON,
2007) but all of them depends on word alignment (TACKSTROM; MCDONALD;
USZKOREIT, 2012) or word clustering, which increases the complexity since it is difficult
to induce alignments between words and syntactic resources (TSARFATY et al., 2013;
BOHNET et al., 2013a). A simple solution proposed recently (NIVRE et al., 2016a)
uses an universal annotated corpus in order to reduce the complexity associated with the
construction of a multilingual parser. In this context, this work presents an universal
model for dependency parsing: the NNParser. Our model is a modification of Chen e
Manning (2014) with a more greedy and accurate model to capture distributional representations
(MIKOLOV et al., 2011). The NNparser reached 93.08% UAS in English
Penn Treebank (WSJ) and better results than the state of the art Stack LSTM parser for
Portuguese (87.93% × 86.2% LAS) and Spanish (86.95% × 85.7% LAS) on the universal
dependencies corpus.