Normalização textual e indexação semântica aplicadas da filtragem de SMS spam
Silva, Tiago Pasqualini da
MetadataMostrar registro completo
The rapid popularization of smartphones has contributed to the growth of SMS usage as an alternative way of communication. The increasing number of users, along with the trust they inherently have in their devices, makes SMS messages a propitious environment for spammers. In fact, reports clearly indicate that volume of mobile phone spam is dramatically increasing year by year. SMS spam represents a challenging problem for traditional filtering methods nowadays, since such messages are usually fairly short and normally rife with slangs, idioms, symbols and acronyms that make even tokenization a difficult task. In this scenario, this thesis proposes and then evaluates a method to normalize and expand original short and messy SMS text messages in order to acquire better attributes and enhance the classification performance. The proposed text processing approach is based on lexicography and semantic dictionaries along with the state-of-the-art techniques for semantic analysis and context detection. This technique is used to normalize terms and create new attributes in order to change and expand original text samples aiming to alleviate factors that can degrade the algorithms performance, such as redundancies and inconsistencies. The approach was validated with a public, real and non-encoded dataset along with several established machine learning methods. The experiments were diligently designed to ensure statistically sound results which indicate that the proposed text processing techniques can in fact enhance SMS spam filtering.