Uma arquitetura não intrusiva e reativa para realizar o processo ETL em tempo real em ambientes de data warehousing
Vilela, Flávio de Assis
MetadataShow full item record
There is a great interest in obtaining data that support the decision-making process in business. These data are available in data sources in the operational environment, which are autonomous, heterogeneous, and distributed. The data are extracted through the Extract, Transform, and Load process (ETL) and stored in the informational environment in a homogeneous, integrated, and dimensional database called data warehouse. The ETL process traditionally takes place at predefined periods, such as daily, weekly, monthly, or according to the organization's data update rules. However, there are applications that need operational data as quickly as possible or immediately after the data is available from data sources. Examples of these applications are medical systems, highway control systems and digital farming systems. Therefore, the traditional ETL process and currently available techniques are unable to make the data available for decision making in real-time, ensuring availability, low elapsed time, and scalability. This work presents an innovative, non-intrusive and reactive architecture, called Data Magnet, from which it is possible to perform the ETL process in real time in data warehousing environments. The non-intrusive feature means that the solution does not need to search for data in the operating environment and, therefore, it is not necessary to make a connection with the data sources or deal directly with the heterogeneity of the data. The reactive feature indicates that the solution will react to events in the operating environment and perform an automatic action in order to guarantee real-time requirements. Two experimental tests were performed, the first one in a real environment in the field of dairy farming, and the second one in a synthetic environment, in order to assess the Data Magnet with a high volume of data. In addition, the Data Magnet produced a good performance with low elapsed time, guaranteed availability and great scalability as the data volume increased. The Data Magnet also produced a huge performance gain for the average metric with regard to the traditional trigger technique commonly used in real-time ETL process.
The following license files are associated with this item: