Representação multimodal para classificação de informação

Ito, Fernando Tadao

Representação multimodal para classificação de informação

Arquivos

Disserta__o___Fernando_Tadao_Ito (1).pdf (4.3 MB)

Data

2018-06-08

Autores

Ito, Fernando Tadao

Editor

Universidade Federal de São Carlos

Resumo

The most basic meaning of "multimodality" is the use of multiple means of information to compose an "artifact", a man-made object that expresses a concept. In our day-to-day life, most media outlets use multimedia to express information: news are composed of videos, narrations and ancillary texts; theater plays tell a story from actors, gestures and songs; electronic games use the player's physical gestures as actions, and respond with visual or musical cues. To interpret such "artifacts," we have to extract information from multiple media and combine them mathematically. The extraction of characteristics is done from mathematical models that receive raw data (texts, images, audio signals) and turns it into a numerical vector, where the distance between instances denotes its relation, where close data encode similar meanings. To create a multimodal semantic space, we use models that `` fuse '' information from multiple data types. In this work, we investigate the interaction between different modes of information representation in the formation of multimodal representations, presenting some of the most used algorithms for vector representation of texts and images and how to merge them. To measure the relative performance of each combination of methods, we use classification and similarity tasks in databases with images and paired texts. We found that in our data sets different methods of unimodal representation can lead to vastly different results. We also note that the performance of a representation in the data classification task does not mean that such representation does not encode the concept of an object, having different results in similarity tasks.

Palavras-chave

Representação multimodal, Representação distribuída, Inteligência artificial, Aprendizado não-supervisionado, Multimodal representation, Distributed representation, Autoencoder, Artificial intelligence, Unsupervised learning

Citação

ITO, Fernando Tadao. Representação multimodal para classificação de informação. 2018. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2018. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/10365.

URI

https://repositorio.ufscar.br/handle/20.500.14289/10365

Coleções

Teses e Dissertações

Página do item completo

Representação multimodal para classificação de informação

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced