Alinhamento texto-imagem em sites de notícias
Veltroni, Wellington Cristiano
MetadataShow full item record
Text-image alignment is the task of aligning elements in a text with elements in the image accompanying it. In this work the text-image alignment was applied in news sites. A lot of news do not make clear the correspondence between elements of a text and elements within the associated image. In this scenario, text-image alignment arises with the intention of guiding the reader, bringing clarity to the news and associated image since it explicitly explains the direct correspondence between regions of the image and words (or named entities) in the text. The goal of this work is to combine Natural Language Processing (NLP) and Computer Vision (CV) techniques to generate a text-image alignment for news: the LinkPICS aligner. LinkPICS uses the YOLO convolutional network (CNN) to detect people and objects in the image associated with the news text. Due to the limitation of the number of objects detected by YOLO (only 80 classes), we decided to use three other CNNs to generate new labels for detected objects. In this work, the text-image alignment was divided into two distinct processes: (1) people alignment and (2) objects alignment. In people alignment, the named entities identified in the text are aligned with images of people. In the evaluation performed with the Folha de São Paulo International news corpus, in English, LinkPICS obtained an accuracy of 98% precision. For the objects alignment, the physical words are aligned with objects (or animals, fruits, etc.) present in the image associated with the news. In the evaluation performed with the news corpus of BBC NEWS, also in English, LinkPICS achieved 72% precision. The main contributions of this work are the LinkPICS aligner and the proposed strategy for its implementation, which represent innovations for the NLP and CV areas. In addition to these, another contribution of this work is the possibility of generating a visual dictionary (words associated with images) containing people and objects aligned, which can be used in other researches and applications such as helping to learn a second language.