• português (Brasil)
    • English
    • español
  • português (Brasil) 
    • português (Brasil)
    • English
    • español
  • Entrar
Sobre
  • Políticas
  • Instruções aos autores
  • Contato
    • Políticas
    • Instruções aos autores
    • Contato
Ver item 
  •   Página inicial
  • Centro de Ciências Exatas e de Tecnologia - CCET
  • Programas de Pós-Graduação
  • Ciência da Computação - PPGCC
  • Teses e dissertações
  • Ver item
  •   Página inicial
  • Centro de Ciências Exatas e de Tecnologia - CCET
  • Programas de Pós-Graduação
  • Ciência da Computação - PPGCC
  • Teses e dissertações
  • Ver item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Navegar

Todo o repositórioComunidades e ColeçõesPor data do documentoAutoresOrientadorTítulosAssuntosÁreas do CNPqPrograma de pós-graduaçãoTipo de documentoEsta coleçãoPor data do documentoAutoresOrientadorTítulosAssuntosÁreas do CNPqPrograma de pós-graduaçãoTipo de documento

Minha conta

Entrar

Um processo baseado em parágrafos para a extração de tratamentos de artigos científicos do domínio biomédico

Thumbnail
Visualizar/Abrir
4310.pdf (3.114Mb)
Data
2012-02-24
Autor
Duque, Juliana Lilian
Metadata
Mostrar registro completo
Resumo
Currently in the medical field there is a large amount of unstructured information (i.e., in textual format). Regarding the large volume of data, it makes it impossible for doctors and specialists to analyze manually all the relevant literature, which requires techniques for automatically analyze the documents. In order to identify relevant information, as well as to structure and store them into a database and to enable future discovery of significant relationships, in this paper we propose a paragraph-based process to extract treatments from scientific papers in the biomedical domain. The hypothesis is that the initial search for sentences that have terms of complication improves the identification and extraction of terms of treatment. This happens because treatments mainly occur in the same sentence of a complication, or in nearby sentences in the same paragraph. Our methodology employs three approaches for information extraction: machine learning-based approach, for classifying sentences of interest that will have terms to be extracted; dictionary-based approach, which uses terms validated by an expert in the field; and rule-based approach. The methodology was validated as proof of concept, using papers from the biomedical domain, specifically, papers related to Sickle Cell Anemia disease. The proof of concept was performed in the classification of sentences and identification of relevant terms. The value obtained in the classification accuracy of sentences was 79% for the classifier of complication and 71% for the classifier of treatment. These values are consistent with the results obtained from the combination of the machine learning algorithm Support Vector Machine with the filter Noise Removal and Balancing of Classes. In the identification of relevant terms, the results of our methodology showed higher F-measure percentage (42%) compared to the manual classification (31%) and to the partial process, i.e., without using the classifier of complication (36%). Even with low percentage of recall, there was no impact observed on the extraction process, and, in addition, we were able to validate the hypothesis considered in this work. In other words, it was possible to obtain 100% of recall for different terms, thus not impacting the extraction process, and further the working hypothesis of this study was proven.
URI
https://repositorio.ufscar.br/handle/ufscar/496
Collections
  • Teses e dissertações

UFSCar
Universidade Federal de São Carlos - UFSCar
Deixe sua opinião

UFSCar

IBICT
 

 


UFSCar
Universidade Federal de São Carlos - UFSCar
Deixe sua opinião

UFSCar

IBICT