Machine learning approach to detect GAI-disguised academic programming plagiarism

Oscar KARNALIM; Yehezkiel David SETIAWAN; Maresha Caroline WIJANTO; Rossevine Artha NATHASYA

doi:10.35784/acs_8915

PDF

Published: Jun 30, 2026

DOI: https://doi.org/10.35784/acs_8915

Issue Vol. 22 No. 2 (2026)

Articles

Path planning in swarm robotics exploration using SARSA and ACO algorithms
Aicha HAFID, Riadh HOCINE, Lahcene GUEZOULI

1-15
Detection of suspicious facial objects in neutral ATMs using deep learning architectures based on YOLOV8 and Faster R-CNN
Marco Manuel ARAGON PAUCAR, Kelvin Yhonson FERNANDEZ ACERO, Erasmo SULLA ESPINOZA

16-32
Assessing the effectiveness of one-stage and two-stage methods for identifying high-voltage power grid equipment in UAV imagery
Thi Thanh Tan NGUYEN, Thi Thu Nga VU

33-47
An automatic speech recognition approach for controlled medications prescription with natural language processing
Luis Enrique COLMENARES-GUILLÉN, Angel Axel MÉNDEZ-MENESES

48-66
Improving image retrieval using CNN with PCA and Optimized K-Means clustering
Mohsin Hasan HUSSEIN, Ali Mohsin Ahmed AL-SABAAWI, Zakaria A. Hamed ALNAISH

67-84
Numerical investigation into the hydrodynamic characteristics of water vortex turbines with varied blade angles
Sarwo EDHY SOFYAN, Zamzami, Akhyar AKHYAR, Suriadi, Agus SASMITO

85-104
Optimization of the corporate cluster structure using the Tabu Search method
Andrzej IMIEŁOWSKI, Łukasz BANAŚ, Bogusław TWARÓG, Janusz BYTNAR

105-116
Application controls audit framework in the context of ERP systems
Sakchai TANGPRASERT, Nalinpat BHUMPENPEIN

117-125
Autonomous AI agents in digital markets: Economic implications for competition, pricing, and regulation
Elmira KYDYRBAYEVA, Balhiya SHOMSHEKOVA, Asset ABZHAKOV, Ainur ASHIMOVA, Assel NURTAYEVA

126-137
Multi-criteria analysis of parameter impact in large-scale robotic 3D printing
Łukasz SOBASZEK, Ivan GAJDOŠ, Pavol ŠTEFČÁK

138-147
Designing cloud-based knowledge management systems to improve organizational innovation
Hayfaa Subhi MALALLAH, Sherzad Mohammad AJEEL

148-168
Data normalisation methods on microarray data
Inggih PERMANA, Shir Li WANG, Hoi Yeh LEE, Suliana SULAIMAN, Hasnatul Nazuha HASSAN

169-179
Log-based learning analytics of gamified Moodle activities: Quantifying student engagement
Iva GRUBJEŠIĆ, Tomislav IVANJKO, Vedran JURIČIĆ

180-192
SFAB-Net: Semantic segmentation network for railway track surface defects based on Spatial Fusion and Adaptive Bottleneck feature enhancement
Qike WU, Sharafiz ABDUL RAHIM, Sai Hong TANG, Muhammad Azim AZIZI, Li ZHANG

193-207
Machine learning approach to detect GAI-disguised academic programming plagiarism
Oscar KARNALIM, Yehezkiel David SETIAWAN, Maresha Caroline WIJANTO, Rossevine Artha NATHASYA

208-224

Authors

Oscar KARNALIM

oscar.karnalim@it.maranatha.edu

Maranatha Christian University, Indonesia

https://orcid.org/0000-0003-4930-6249

Yehezkiel David SETIAWAN

2479011@maranatha.ac.id

Maranatha Christian University, Indonesia

https://orcid.org/0009-0004-0445-6280

Maresha Caroline WIJANTO

maresha.cw@it.maranatha.edu

Maranatha Christian University, Indonesia

https://orcid.org/0000-0003-4131-7760

Rossevine Artha NATHASYA

rossevine.an@it.maranatha.edu

Maranatha Christian University, Indonesia

https://orcid.org/0000-0002-7592-1360

Abstract

Plagiarism is a common issue in programming education, and the issue exacerbates with the emergence of Generative Artificial Intelligence (GAI). Plagiarism acts can be disguised with GAI, resulting in pervasive, consistent changes across the entire program. We present a programming plagiarism detector dedicated to GAI disguises. It not only relies on program similarities but also on GAI characteristics. GAI has its own way of writing programs. Our plagiarism detector employs 23 features. Five of them are related to structure (program similarities) while the rest are associated with GAI characteristics (the use of list comprehension, recursion, etc). It features seven machine learning models to choose from: Logistic Regression, Random Forest, XGBoost, LightGBM, CatBoost, Voting Classifier, and Stacking Classifier. According to our evaluation of 6344 instances from the machine intelligence course, Stacking Classifier achieves the highest performance, with 89.17% accuracy, 88.94% precision, 89.17% recall, and 88.77% F-score. It outperforms similarity-based plagiarism detectors (which serve as the baseline) by a factor of 2 in most metrics. All structural features (program similarities) are considered important by our machine learning models, accompanied by several GAI-characteristic features. The prominent GAI characteristics are the use of list comprehension, recursion, and branching condition statements without parentheses.

Keywords:

plagiarism, programming, machine learning, transformers, academic integrity

Sustainable Development Goals (SDG)

4 - Quality education
16 - Peace, justice and strong institutions

References

Aivaloglou, E., & Meulen, A. van der. (2021). An Empirical Study of Students’ Perceptions on the Setup and Grading of Group Programming Assignments. ACM Transactions on Computing Education (TOCE), 21(3), 1–22. https://doi.org/10.1145/3440994

Albluwi, I. (2019). Plagiarism in programming assessments: a systematic review. ACM Transactions on Computing Education, 20(1), 6:1-6:28. https://doi.org/10.1145/3371156

Allen, J. M., Vahid, F., Downey, K., & Edgcomb, A. D. (2018). Weekly programs in a CS1 class: experiences with auto-graded many-small programs (MSP). ASEE Annual Conference & Exposition, 1–13. https://doi.org/10.18260/1-2--31231

Bandara, U., & Wijayarathna, G. (2011). A machine learning based tool for source code plagiarism detection. International Journal of Machine Learning and Computing, 1(4), 337–343. https://doi.org/10.7763/IJMLC.2011.V1.50

Blanchard, J., Hott, J. R., Berry, V., Carroll, R., Edmison, B., Glassey, R., Karnalim, O., Plancher, B., & Russell, S. (2022). Stop reinventing the wheel! Promoting community software in computing education. In Proceedings of the 2022 Working Group Reports on Innovation and Technology in Computer Science Education (pp. 261–292). Association for Computing Machinery. https://doi.org/10.1145/3571785.3574125

Bradley, S. (2020). Creative assessment in programming: Diversity and divergence. In Proceedings of the Fourth Conference on Computing Education Practice (Article 13). Association for Computing Machinery. https://doi.org/10.1145/3372310.3372325

Bubenkova, L., Pietrikova, E., & Horvath, M. (2025). Code reuse and good clones in programming education. In 2025 IEEE 23rd International Symposium on Applied Machine Intelligence and Informatics (SAMI) (pp. 401–406). IEEE. https://doi.org/10.1109/SAMI63904.2025.10883291

Bulla, L., Midolo, A., Mongiovì, M., & Tramontana, E. (2024). EX-CODE: A robust and explainable model to detect AI-generated code. Information, 15(12), Article 819. https://doi.org/10.3390/info15120819

Cendrowski, H., & Martin, J. (2015). The fraud triangle. In H. Cendrowski & J. Martin (Eds.), The handbook of fraud deterrence (pp. 41–46). John Wiley & Sons. https://doi.org/10.1002/9781119202165.ch5

Cheers, H., Lin, Y., & Smith, S. P. (2021). Academic source code plagiarism detection by measuring program behavioral similarity. IEEE Access, 9, 50391–50412. https://doi.org/10.1109/ACCESS.2021.3069367

Duracik, M., Hrkut, P., Krsak, E., & Toth, S. (2020). Abstract syntax tree based source code antiplagiarism system for large projects set. IEEE Access, 8, 175347–175359. https://doi.org/10.1109/ACCESS.2020.3026422

Ebrahim, F., & Joy, M. (2024). Semantic similarity search for source code plagiarism detection: An exploratory study. In Proceedings of the 2024 Innovation and Technology in Computer Science Education (ITiCSE) (Vol. 1, pp. 360–366). Association for Computing Machinery. https://doi.org/10.1145/3649217.3653622

Eppa, A., & Murali, A. H. (2021). Machine learning techniques for multisource plagiarism detection. In 2021 5th International Conference on Computational Systems and Information Technology for Sustainable Solutions (CSITSS). IEEE. https://doi.org/10.1109/CSITSS54238.2021.9683752

Eppa, A., & Murali, A. (2022). Source code plagiarism detection: A machine intelligence approach. In 2022 4th International Conference on Advances in Electronics, Computers and Communications (ICAECC). IEEE. https://doi.org/10.1109/ICAECC54045.2022.9716671

Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., & Zhou, M. (2020). CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 1536–1547). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.139

Fokam, M. A., & Ajoodha, R. (2021). Influence of contrastive learning on source code plagiarism detection through recursive neural networks. In 2021 3rd International Multidisciplinary Information Technology and Engineering Conference (IMITEC). IEEE. https://doi.org/10.1109/IMITEC52926.2021.9714688

Foltýnek, T., Všianský, R., Meuschke, N., Dlabolová, D., & Gipp, B. (2020). Cross-language source code plagiarism detection using explicit semantic analysis and scored greedy string tilling. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (pp. 523–524). Association for Computing Machinery. https://doi.org/10.1145/3383583.3398594

Fowler, M., & Zilles, C. (2021). Superficial code-guise: Investigating the impact of surface feature changes on students’ programming question scores. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education (pp. 3–9). Association for Computing Machinery. https://doi.org/10.1145/3408877.3432420

Fowler, M., Smith, D. H., & Zilles, C. (2024). Quickly producing ‘isomorphic’ exercises: Quantifying the impact of programming question permutations. In Proceedings of the 2024 Innovation and Technology in Computer Science Education (ITiCSE) (Vol. 1, pp. 178–184). Association for Computing Machinery. https://doi.org/10.1145/3649217.3653617

Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Svyatkovskiy, A., Fu, S., Tufano, M., Deng, S. K., Clement, C., Drain, D., Sundaresan, N., Yin, J., Jiang, D., & Zhou, M. (2020). GraphCodeBERT: Pre-training code representations with data flow. ArXiv, abs/2009.08366. https://arxiv.org/abs/2009.08366

Hawlitschek, A., Berndt, S., & Schulz, S. (2023). Empirical research on pair programming in higher education: A literature review. Computer Science Education, 33(3), 400–428. https://doi.org/10.1080/08993408.2022.2039504

Hoq, M., Shi, Y., Leinonen, J., Babalola, D., Lynch, C., Price, T., & Akram, B. (2024). Detecting ChatGPT-generated code submissions in a CS1 course using machine learning models. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education (Vol. 1, pp. 526–532). Association for Computing Machinery. https://doi.org/10.1145/3626252.3630800

Jovanovic, M., & Campbell, M. (2022). Generative artificial intelligence: Trends and prospects. Computer, 55(10), 107–112. https://doi.org/10.1109/MC.2022.3192720

Karnalim, O., & Kurniawati, G. (2020). Programming style on source code plagiarism and collusion detection. International Journal of Computing, 19(1), 27–38. https://doi.org/10.47839/ijc.19.1.1691

Karnalim, O. (2023). Maintaining academic integrity in programming: Locality-sensitive hashing and recommendations. Education Sciences, 13(1), Article 54. https://doi.org/10.3390/educsci13010054

Karnalim, O., Simon, & Chivers, W. (2023). Gamification to help inform students about programming plagiarism and collusion. IEEE Transactions on Learning Technologies, 16(5), 1–14. https://doi.org/10.1109/TLT.2023.3243893

Karnalim, O., Toba, H., & Johan, M. C. (2024). Detecting AI assisted submissions in introductory programming via code anomaly. Education and Information Technologies, 29(13), 16841–16866. https://doi.org/10.1007/s10639-024-12520-6

Karnalim, O. (2025). Identifying AI generated code with parallel KNN weight outlier detection. In Lecture Notes in Networks and Systems (Vol. 1140, pp. 459–470). Springer. https://doi.org/10.1007/978-3-031-71530-3_29

Kosmyna, N., Hauptmann, E., Yuan, Y. T., Situ, J., Liao, X.-H., Beresnitzky, A. V., Braunstein, I., & Maes, P. (2025). Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing task. ArXiv, abs/2506.08872. https://doi.org/10.48550/arXiv.2506.08872

Li, S., Liu, J., & Dong, Q. (2025). Generative artificial intelligence-supported programming education: Effects on learning performance, self-efficacy and processes. Australasian Journal of Educational Technology, 41(3), 1–25. https://doi.org/10.14742/ajet.9932

Ljubovic, V., & Pajic, E. (2020). Plagiarism detection in computer programming using feature extraction from ultra-fine-grained repositories. IEEE Access, 8, 96505–96514. https://doi.org/10.1109/ACCESS.2020.3000523

Maertens, R., Van Neyghem, M., Geldhof, M., Van Petegem, C., Strijbol, N., Dawyndt, P., & Mesuere, B. (2024). Discovering and exploring cases of educational source code plagiarism with Dolos. SoftwareX, 26, Article 101755. https://doi.org/10.1016/j.softx.2024.101755

Mason, T., Gavrilovska, A., & Joyner, D. A. (2019). Collaboration versus cheating: Reducing code plagiarism in an online MS computer science program. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (pp. 1004–1010). Association for Computing Machinery. https://doi.org/10.1145/3287324.3287443

Nguyen, P. T., Di Rocco, J., Di Sipio, C., Rubei, R., Di Ruscio, D., & Di Penta, M. (2024). GPTSniffer: A CodeBERT-based classifier to detect source code written by ChatGPT. Journal of Systems and Software, 214, Article 112059. https://doi.org/10.1016/j.jss.2024.112059

Novak, M., Joy, M., & Kermek, D. (2019). Source-code similarity detection and detection tools used in academia: A systematic review. ACM Transactions on Computing Education, 19(3), 1–37. https://doi.org/10.1145/3313290

Parr, T. (2013). The definitive ANTLR 4 reference. Pragmatic Bookshelf.

Parthasarathy, P. D., Kapoor, I., Joshi, S., & Thomas, S. (2024). Influence of personality traits on plagiarism through collusion in programming assignments. In Proceedings of the 2024 ACM Conference on International Computing Education Research (Vol. 1, pp. 143–153). Association for Computing Machinery. https://doi.org/10.1145/3632620.3671121

Pham, H., Ha, H., Tong, V., Hoang, D., Tran, D., & Le, T. N. (2024). MAGECODE: Machine-generated code detection method using large language models. IEEE Access, 12, 190186–190202. https://doi.org/10.1109/ACCESS.2024.3509987

Pudasaini, S., Miralles-Pechuán, L., Lillis, D., & Llorens Salvador, M. (2024). Survey on AI-generated plagiarism detection: The impact of large language models on academic integrity. Journal of Academic Ethics, 23(3), 1137–1170. https://doi.org/10.1007/s10805-024-09576-x

Ryman, D., Imbrie, P. K., & Kastner, J. (2022). Enhancement of plagiarism detection techniques via watermarking. In 2022 IEEE Frontiers in Education Conference (FIE). IEEE. https://doi.org/10.1109/FIE56618.2022.9962396

Saǧlam, T., Hahner, S., Schmid, L., & Burger, E. (2024). Obfuscation-resilient software plagiarism detection with JPlag. In Proceedings of the 2024 International Conference on Software Engineering (pp. 264–265). Association for Computing Machinery. https://doi.org/10.1145/3639478.3643074

Schneider, J., Bernstein, A., vom Brocke, J., Damevski, K., & Shepherd, D. C. (2018). Detecting plagiarism based on the creation process. IEEE Transactions on Learning Technologies, 11(3), 348–361. https://doi.org/10.1109/TLT.2017.2705056

Sharma, N., Shinde, S., Bhosale, S., & Patil, S. (2024). SourcePlag: Source code plagiarism detection based on abstract syntax trees. In 2024 IEEE International Conference on Blockchain and Distributed Systems Security (ICBDS). IEEE. https://doi.org/10.1109/ICBDS61829.2024.10837209

Sheahen, D., & Joyner, D. (2016). TAPS: A MOSS extension for detecting software plagiarism at scale. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale (pp. 285–288). Association for Computing Machinery. https://doi.org/10.1145/2876034.2893435

Simon. (2017). Designing programming assignments to reduce the likelihood of cheating. In Proceedings of the 19th Australasian Computing Education Conference (pp. 42–47). Association for Computing Machinery. https://doi.org/10.1145/3013499.3013506

Simon, Sheard, J., Morgan, M., Petersen, A., Settle, A., & Sinclair, J. (2018). Informing students about academic integrity in programming. In Proceedings of the 20th Australasian Computing Education Conference (pp. 113–122). Association for Computing Machinery. https://doi.org/10.1145/3160489.3160502

Spacco, J., Fossati, D., Stamper, J., & Rivers, K. (2013). Towards improving programming habits to create better computer science course outcomes. In Proceedings of the 18th ACM Conference on Innovation and Technology in Computer Science Education (pp. 243–248). Association for Computing Machinery. https://doi.org/10.1145/2462476.2462483

Surahman, E., & Wang, T. H. (2022). Academic dishonesty and trustworthy assessment in online learning: A systematic literature review. Journal of Computer Assisted Learning, 38(6), 1535–1553. https://doi.org/10.1111/jcal.12708

Toba, H., Karnalim, O., Johan, M. C., Tada, T., Djajalaksana, Y. M., & Vivaldy, T. (2023). Inappropriate benefits and identification of ChatGPT misuse in programming tests: A controlled experiment. In Proceedings of the International Conference on Interactive Collaborative Learning (pp. 520–531). Springer. https://doi.org/10.1007/978-3-031-52667-1_50

Toba, H., & Karnalim, O. (2025). Machine learning models to detect AI-assisted code anomaly in introductory programming course. In Lecture Notes in Networks and Systems (Vol. 1140, pp. 163–181). Springer. https://doi.org/10.1007/978-3-031-71530-3_11

Tsang, H. H., Hanbidge, A. S., & Tin, T. (2018). Experiential learning through inter-university collaboration research project in academic integrity. In Proceedings of the 23rd Western Canadian Conference on Computing Education. Association for Computing Machinery. https://doi.org/10.1145/3209635.3209641

Ullah, F., Wang, J., Farhan, M., Habib, M., & Khalid, S. (2018). Software plagiarism detection in multiprogramming languages using machine learning approach. Concurrency and Computation: Practice and Experience, 30(21), e5000. https://doi.org/10.1002/cpe.5000

Ullah, F., Jabbar, S., & Mostarda, L. (2021). An intelligent decision support system for software plagiarism detection in academia. International Journal of Intelligent Systems, 36(6), 2730–2752. https://doi.org/10.1002/int.22399

Viuginov, N., Grachev, P., & Filchenkov, A. (2020). A machine learning based plagiarism detection in source code. In Proceedings of the 3rd International Conference on Algorithms, Computing and Artificial Intelligence (pp. 1–6). Association for Computing Machinery. https://doi.org/10.1145/3446132.3446420

Wang, Y., Wang, W., Joty, S., & Hoi, S. C. H. (2021). CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 8696–8708). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.685

Xie, Y., Wu, S., & Chakravarty, S. (2023). AI meets AI: Artificial intelligence and academic integrity - A survey on mitigating AI-assisted cheating in computing education. In Proceedings of the 24th Annual Conference on Information Technology Education (pp. 79–83). Association for Computing Machinery. https://doi.org/10.1145/3585059.3611449

Xu, Z., & Sheng, V. S. (2024). Detecting AI-generated code assignments using perplexity of large language models. AAAI Conference on Artificial Intelligence, 38(21), 23155–23162. https://doi.org/10.1609/aaai.v38i21.30361

Xu, X., Ni, C., Guo, X., Liu, S., Wang, X., Liu, K., & Yang, X. (2025). Distinguishing LLM-generated from human-written code by contrastive learning. ACM Transactions on Software Engineering and Methodology, 34(4), Article 100. https://doi.org/10.1145/3705300

Yasaswi, J., Kailash, S., Chilupuri, A., Purini, S., & Jawahar, C. V. (2017). Unsupervised learning based approach for plagiarism detection in programming assignments. In Proceedings of the 10th Innovations in Software Engineering Conference (pp. 117–121). Association for Computing Machinery. https://doi.org/10.1145/3021460.3021477

Yasaswi, J., Purini, S., & Jawahar, C. V. (2017). Plagiarism detection in programming assignments using deep features. In 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR) (pp. 652–657). IEEE. https://doi.org/10.1109/ACPR.2017.146

Zhang, Z., & Saber, T. (2025). Machine learning approaches to code similarity measurement: A systematic review. IEEE Access, 13, 51729–51764. https://doi.org/10.1109/ACCESS.2025.3553392

Zhou, Z.-H. (2021). Machine learning. Springer. https://doi.org/10.1007/978-981-15-1967-3

KARNALIM, O., SETIAWAN, Y. D., WIJANTO, M. C., & NATHASYA, R. A. (2026). Machine learning approach to detect GAI-disguised academic programming plagiarism. Applied Computer Science, 22(2), 208–224. https://doi.org/10.35784/acs_8915

Machine learning approach to detect GAI-disguised academic programming plagiarism

Issue Vol. 22 No. 2 (2026)

Archives

Authors

Abstract

Keywords:

Sustainable Development Goals (SDG)

References

License

Article Sidebar

Issue Vol. 22 No. 2 (2026)

Archives

Main Article Content

Authors

Abstract

Keywords:

Sustainable Development Goals (SDG)

References

Article Details

License