Alshattnawi, Sawsan and Shatnawi, Amani and AlSobeh, Anas M.R. and Magableh, Aws A. (2024) Beyond Word-Based Model Embeddings: Contextualized Representations for Enhanced Social Media Spam Detection. Applied Sciences, 14 (6). p. 2254. ISSN 2076-3417
applsci-14-02254-v2.pdf - Published Version
Download (766kB)
Abstract
As social media platforms continue their exponential growth, so do the threats targeting their security. Detecting disguised spam messages poses an immense challenge owing to the constant evolution of tactics. This research investigates advanced artificial intelligence techniques to significantly enhance multiplatform spam classification on Twitter and YouTube. The deep neural networks we use are state-of-the-art. They are recurrent neural network architectures with long- and short-term memory cells that are powered by both static and contextualized word embeddings. Extensive comparative experiments precede rigorous hyperparameter tuning on the datasets. Results reveal a profound impact of tailored, platform-specific AI techniques in combating sophisticated and perpetually evolving threats. The key innovation lies in tailoring deep learning (DL) architectures to leverage both intrinsic platform contexts and extrinsic contextual embeddings for strengthened generalization. The results include consistent accuracy improvements of more than 10–15% in multisource datasets, unlocking actionable guidelines on optimal components of neural models, and embedding strategies for cross-platform defense systems. Contextualized embeddings like BERT and ELMo consistently outperform their noncontextualized counterparts. The standalone ELMo model with logistic regression emerges as the top performer, attaining exceptional accuracy scores of 90% on Twitter and 94% on YouTube data. This signifies the immense potential of contextualized language representations in capturing subtle semantic signals vital for identifying disguised spam. As emerging adversarial attacks exploit human vulnerabilities, advancing defense strategies through enhanced neural language understanding is imperative. We recommend that social media companies and academic researchers build on contextualized language models to strengthen social media security. This research approach demonstrates the immense potential of personalized, platform-specific DL techniques to combat the continuously evolving threats that threaten social media security.
Item Type: | Article |
---|---|
Subjects: | Impact Archive > Multidisciplinary |
Depositing User: | Managing Editor |
Date Deposited: | 08 Mar 2024 11:35 |
Last Modified: | 08 Mar 2024 11:35 |
URI: | http://research.sdpublishers.net/id/eprint/3972 |