Machine learning approach to detect GAI-disguised academic programming plagiarism

Main Article Content

Oscar KARNALIM

oscar.karnalim@it.maranatha.edu

Yehezkiel David SETIAWAN

2479011@maranatha.ac.id

Maresha Caroline WIJANTO

maresha.cw@it.maranatha.edu

Rossevine Artha NATHASYA

rossevine.an@it.maranatha.edu

Abstract

Plagiarism is a common issue in programming education, and the issue exacerbates with the emergence of Generative Artificial Intelligence (GAI). Plagiarism acts can be disguised with GAI, resulting in pervasive, consistent changes across the entire program. We present a programming plagiarism detector dedicated to GAI disguises. It not only relies on program similarities but also on GAI characteristics. GAI has its own way of writing programs. Our plagiarism detector employs 23 features. Five of them are related to structure (program similarities) while the rest are associated with GAI characteristics (the use of list comprehension, recursion, etc). It features seven machine learning models to choose from: Logistic Regression, Random Forest, XGBoost, LightGBM, CatBoost, Voting Classifier, and Stacking Classifier. According to our evaluation of 6344 instances from the machine intelligence course, Stacking Classifier achieves the highest performance, with 89.17% accuracy, 88.94% precision, 89.17% recall, and 88.77% F-score. It outperforms similarity-based plagiarism detectors (which serve as the baseline) by a factor of 2 in most metrics. All structural features (program similarities) are considered important by our machine learning models, accompanied by several GAI-characteristic features. The prominent GAI characteristics are the use of list comprehension, recursion, and branching condition statements without parentheses.

Keywords:

plagiarism, programming, machine learning, transformers, academic integrity

Sustainable Development Goals (SDG)

  • 4 - Quality education
  • 16 - Peace, justice and strong institutions

References

Article Details

KARNALIM, O., SETIAWAN, Y. D., WIJANTO, M. C., & NATHASYA, R. A. (2026). Machine learning approach to detect GAI-disguised academic programming plagiarism. Applied Computer Science, 22(2), 208–224. https://doi.org/10.35784/acs_8915