University of Leicester
Browse

A Novel Bursty Event Detection and Tracking Model for Online Social Media Data Based on Supervised Learning

Download (2.29 MB)
thesis
posted on 2024-06-20, 10:00 authored by Ayodeji O. Ayorinde

Detecting Bursty events, mostly perceived as an unsupervised task, is often done using big data with volume running into petabytes of data, most of which are noisy data that do not add value to the intended purpose. Bursty events attracts considerable attention from users, leading to an increased volume of communication about the event. Some unsupervised techniques require the number of clusters to be pre-specified and, as at the time if this research work, no established method for accurately determining the number of events in online social media data existed. This thesis addresses the challenge of having to pre-specify the number of groups in the data and proposes novel methods that can detect and track bursty events without the rigors of having to determine the number of clusters. The model, which is made up of three novel algorithms, also significantly reduces noise at the pre-processing stage, reduces excessive dimensionality, estimates the number of events, and performs classification for tracking event evolution.

For pre-processing, this thesis introduces an Induced Squared Correlation Thresholding (ISCT) algorithm with multi-level dimension and noise reduction strategies. It also estimates the number of events in unstructured textual data. For detecting bursty events, this thesis proposes a novel Multi-Cycle Recursive Clustering Algorithm (MCRCA), which does not require pre-specifying the number of groups and creates homogeneous cluster members with coherent and consistent context within the group. It also boosts the availability of signal data by using statistical inference to reduce noise. For classification and tracking, this thesis proposes a novel Likelihood Extracted Discriminative Attributes (LEDA), an algorithm that can classify long and short documents with high accuracy, thereby making it possible to track event evolution in online social media data streams.

Experiments conducted using publicly available data (IMDB Movie Review, Covid-19 and 20 Newsgroup Datasets) proved that the entirety of the proposed model outperformed LDA, K-Means, Hierarchical and DBSCAN algorithms and it is quite promising for being deployed in a distributed environment with the expectation of reduced latency. LEDA algorithm gave a 100% accuracy on binary classification using IMDB Movie Review dataset and outperformed Word2vec and GloVe techniques. All these algorithms can be used to classify bursty events and track their evolution as new data flows through.

History

Supervisor(s)

John Panneerselvam; Lu Liu

Date of award

2024-05-01

Author affiliation

School of Computing and Mathematical Sciences

Awarding institution

University of Leicester

Qualification level

  • Doctoral

Qualification name

  • PhD

Language

en

Usage metrics

    University of Leicester Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC