MIDB: um modelo de integração de dados biológicos
Resumo
In bioinformatics, there is a huge volume of data related to biomolecules and to nucleotide and amino acid sequences that reside (in almost their totality) in several Biological Data Bases (BDBs). For a specific sequence, there are some informational classifications: genomic data, evolution-data, structural data, and others. Some BDBs store just one or some of these classifications. Those BDBs are hosted in different sites and servers, with several data base management systems with different data models. Besides, instances and schema might have semantic heterogeneity. In such scenario, the objective of this project is to propose a biological data integration model, that adopts new schema integration and instance integration techniques. The proposed integration model has a special mechanism of schema integration and another mechanism that performs the instance integration (with support of a dictionary) allowing conflict resolution in the attribute values; and a Clustering Algorithm is used in order to cluster similar entities. Besides, a domain specialist participates managing those clusters. The proposed model was validated through a study case focusing on schema and instance integration about nucleotide sequence data from organisms of Actinomyces gender, captured from four different data sources. The result is that about 97.91% of the attributes were correctly categorized in the schema integration, and the instance integration was able to identify that about 50% of the clusters created need support from a specialist, avoiding errors on the instance resolution. Besides, some contributions are presented, as the Attributes Categorization, the Clustering Algorithm, the distance functions proposed and the proposed model itself.
Collections
Itens relacionados
Apresentado os itens relacionados pelo título, autor e assunto.
-
Integração de cálculo CALPHAD com Python para seleção de ligas multielemento para armazenagem de hidrogênio
Stoco, Caroline (Universidade Federal de São Carlos, UFSCar, , Câmpus São Carlos, 17/03/2023)O desenvolvimento de ligas multielemento voltadas para o armazenamento de hidrogênio no estado sólido, por meio de hidretos metálicos, é crucial para utilizar o hidrogênio como vetor energético. Essas ligas devem apresentar ... -
Giberela do trigo: viabilidade técnica da integração de estratégias de manejo visando racionalização do uso de fungicidas e redução de micotoxinas
Oliveira, Duane Nascimento (Universidade Federal de São Carlos, UFSCar, Programa de Pós-Graduação em Planejamento e Uso de Recursos Renováveis - PPGPUR-So, Câmpus Sorocaba, 14/07/2023)Fusarium head blight is the most difficult disease to control in wheat, causing damage to grain quality, mainly due to the production of mycotoxins. The current management of the disease is done with the use of fungicides, ... -
Potencial de mitigação de gases de efeito estufa (GEE) através do sistema de integração-lavoura-pecuária-floresta (ILPF), uma revisão
Noernberg, Rebeca Rodrigues (Universidade Federal de São Carlos, UFSCar, , Câmpus Araras, 06/04/2023)The emission of greenhouse gases (GHG) generates great concern worldwide due to its high impact on humanity, increasing the negative impacts of global warming, and thus, a higher incidence of extreme weather events. In ...