Desenvolvimento de um pipeline para análise genômica e transcriptômica com base em Web services
Abstract
Pipeline systems for genomic and transcriptomic analysis aim to create communication bridges among the existing analysis tools, therefore reducing researchers efforts. Most of the pipelines found in the literature lack important features which would be useful to the development of genome or transcriptome sequencing projects. Among them, the capacity of tracking the project results along its development, including the generation of partial reports; the presence of a collaborative environment where the involved laboratories can contribute with new data and chromatograms; the possibility to configure analysis parameters; multiple pipeline support and the possibility to include new tools and modules. In this work, a pipeline prototype was developed to overcome these shortcomings. Sequencing projects progresses are tracked along all over their developments. Chromatograms are progressively received along the development of the project and partial reports over newly received data are generated. The communication with the processing server is done via Web service, which offers a universal language interface, allowing client applications in heterogeneous platforms to submit data and execute operations and queries. Pipelines are configured in XML documents written in a predefined format, through which the researchers choose the tools and parameters to be used. The prototype offers support to multiple pipelines executed simultaneously in the same project. Pipelines are executed in parallel by the means of thread pools, what increases efficiency by distributing the workload in multiprocessed systems. Another feature of the prototype is the extensibility as each pipeline step is wrapped in a module. New modules can be easily inserted in the system through the implementation of a programming interface, therefore without the needing of recompilation. Module insertions are done in a declarative way through XML documents. A client application was also developed in the collaborative platform Sakai, allowing different research groups involved in a sequencing project to create pipelines, view results and exchange information on the project current status. To evaluate the efficiency of the prototype, a case study was carried out. Sequences generated from sequencing of Sphenophorus levis transcriptome were submitted and a pipeline was configured to analyze the data. The case study has pointed out that the prototype is efficient and produces good results.