Análise de dados sequenciais heterogêneos baseada em árvore de decisão e modelos de Markov : aplicação na logística de transporte
Ataky, Steve Tsham Mpinda
MetadataMostrar registro completo
Latterly, the development of data mining techniques has emerged in many applications’ fields with aim at analyzing large volumes of data which may be simple and / or complex. The logistics of transport, the railway setor in particular, is a sector with such a characteristic in that the data available in are of varied natures (classic variables such as top speed or type of train, symbolic variables such as the set of routes traveled by train, degree of tack, etc.). As part of this dissertation, one addresses the problem of classification and prediction of heterogeneous data; it is proposed to study through two main approaches. First, an automatic classification approach was implemented based on classification tree technique, which also allows new data to be efficiently integrated into partitions initialized beforehand. The second contribution of this work concerns the analysis of sequence data. It has been proposed to combine the above classification method with Markov models for obtaining a time series (temporal sequences) partition in homogeneous and significant groups based on probabilities. The resulting model offers good interpretation of classes built and allows us to estimate the evolution of the sequences of a particular vehicle. Both approaches were then applied onto real data from the a Brazilian railway information system company in the spirit of supporting the strategic management of planning and coherent prediction. This work is to initially provide a thinner type of planning to solve the problems associated with the existing classification in homogeneous circulations groups. Second, it sought to define a typology of train paths (sucession traffic of the same train) in order to provide or predict the next movement of statistical characteristics of a train carrying the same route. The general methodology provides a supportive environment for decision-making to monitor and control the planning organization. Thereby, a formula with two variants was proposed to calculate the adhesion degree between the track effectively carried out or being carried out with the planned one.