Doaa Mostafa, Sally S. Ismail, and Mostafa Aref
A Hybrid Syntactic–Statistical–Semantic Framework for Detecting AI-Generated Text Across Domains
Recent advances in large language models (LLMs) have enabled highly human-like text generation, raising concerns related to misinformation, authorship verification, and academic integrity. Current approaches for detecting LLMgenerated text suffer from several limitations, including limited robustness to linguistic diversity, sensitivity to text length variations and paraphrasing, weak domain generalization, and high computational cost. To address these challenges, this paper proposes a hybrid framework for detecting LLM-generated text that integrates syntactic and statistical features with deep semantic representations learned using GloVe embeddings, Convolutional Neural Networks (CNNs), and Bidirectional Long Short-Term Memory (BiLSTM) networks. By combining linguistic cues with contextual semantics, the proposed model captures both structural and semantic patterns to distinguish human-written text from LLM-generated content. Experiments conducted on the ChatGPT Research Abstracts and ElectAI datasets demonstrate strong cross-domain generalization and robustness to text length variations and paraphrasing. The proposed framework achieves an accuracy of 98.63%, an F1-score of 98.66%, and a minimum false positive rate (FPR) of 0.01. These results indicate the effectiveness, stability, and reliability of the framework for detecting LLM- generated text.
Reference:
DOI: 10.36244/ICJ.2026.1.6
Please cite this paper the following way:
Doaa Mostafa, Sally S. Ismail, and Mostafa Aref, "A Hybrid Syntactic–Statistical–Semantic Framework for Detecting AI-Generated Text Across Domains", Infocommunications Journal, Vol. XVIII, No 1, March 2026, pp. 53-61., https://doi.org/10.36244/ICJ.2026.1.6





