Investigação de estratégias de sumarização humana multidocumento

Camargo, Renata Tironi de

Investigação de estratégias de sumarização humana multidocumento

Arquivos

5583.pdf (2.07 MB)

Data

2013-08-30

Autores

Camargo, Renata Tironi de

Editor

Universidade Federal de São Carlos

Resumo

The multi-document human summarization (MHS), which is the production of a manual summary from a collection of texts from different sources on the same subject, is a little explored linguistic task. Considering the fact that single document summaries comprise information that present recurrent features which are able to reveal summarization strategies, we aimed to investigate multi-document summaries in order to identify MHS strategies. For the identification of MHS strategies, the source texts sentences from the CSTNews corpus (CARDOSO et al., 2011) were manually aligned to their human summaries. The corpus has 50 clusters of news texts and their multi-document summaries in Portuguese. Thus, the alignment revealed the origin of the selected information to compose the summaries. In order to identify whether the selected information show recurrent features, the aligned (and nonaligned) sentences were semi automatically characterized considering a set of linguistic attributes identified in some related works. These attributes translate the content selection strategies from the single document summarization and the clues about MHS. Through the manual analysis of the characterizations of the aligned and non-aligned sentences, we identified that the selected sentences commonly have certain attributes such as sentence location in the text and redundancy. This observation was confirmed by a set of formal rules learned by a Machine Learning (ML) algorithm from the same characterizations. Thus, these rules translate MHS strategies. When the rules were learned and tested in CSTNews by ML, the precision rate was 71.25%. To assess the relevance of the rules, we performed 3 different kinds of intrinsic evaluations: (i) verification of the occurrence of the same strategies in another corpus, and (ii) comparison of the quality of summaries produced by the HMS strategies with the quality of summaries produced by different strategies. Regarding the evaluation (i), which was automatically performed by ML, the rules learned from the CSTNews were tested in a different newspaper corpus and its precision was 70%, which is very close to the precision obtained in the training corpus (CSTNews). Concerning the evaluating (ii), the quality, which was manually evaluated by 10 computational linguists, was considered better than the quality of other summaries. Besides describing features concerning multi-document summaries, this work has the potential to support the multi-document automatic summarization, which may help it to become more linguistically motivated. This task consists of automatically generating multi-document summaries and, therefore, it has been based on the adjustment of strategies identified in single document summarization or only on not confirmed clues about MHS. Based on this work, the automatic process of content selection in multi-document summarization methods may be performed based on strategies systematically identified in MHS.

Palavras-chave

Linguística, Sumarização automática, Sumarização humana multidocumento, Estratégias de seleção de conteúdo, Multi-document human summarization, Content selection strategy, Multidocument automatic summarization

Citação

CAMARGO, Renata Tironi de. Investigação de estratégias de sumarização humana multidocumento. 2013. 135 f. Dissertação (Mestrado em Ciências Humanas) - Universidade Federal de São Carlos, São Carlos, 2013.

URI

https://repositorio.ufscar.br/handle/20.500.14289/5781

Coleções

Teses e Dissertações

Página do item completo

Investigação de estratégias de sumarização humana multidocumento

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced