Plagiarism Detection Automation: a New Approach to Code Analysis in the Mirera Digital Educational Platform
https://doi.org/10.25682/NIISI.2025.1.0007
Abstract
The task of detecting plagiarism in programming task solutions holds high priority in digital educational platforms due to the necessity of providing accurate and reliable assessment of users' learning progress. Methods that compare submitted solutions solely based on textual similarity, without accounting for the syntactic characteristics of the programming languages in which the solutions are written, often fail to deliver precise and trustworthy results—similarly to statistical approaches based on machine learning. This study proposes a method for block-based comparison of student programming solutions for plagiarism detection, taking into account the syntactic features of programming languages.
About the Authors
D. I. KadinaRussian Federation
A. G. Leonov
Russian Federation
N. S. Martynov
Russian Federation
K. A. Mashchenko
Russian Federation
E. A. Orlov
Russian Federation
A. I. Strekalova
Russian Federation
References
1. W. Murray. “Cheating in Computer Science”. In: Ubiquity (2010), p. 2. doi: 10.1145/1865907.1865908.
2. G. Cosma and M. Joy. “Towards a Definition of Source-Code Plagiarism”. In: IEEE Transactions on Education (2008), pp. 195–200. doi: 10.1109/te.2007.906776.
3. Curtis, G.J. and Popal, R., 2011. An examination of factors related to plagiarism and a five-year follow-up of plagiarism at an Australian university. International Journal for Educational Integrity, 7(1), pp.30-42.
4. Pierce, J. and Zilles, C., 2017, March. Investigating student plagiarism patterns and correlations to grades. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (pp. 471-476).
5. Maryon, Thomas; Dubre, Vandy C. Mrs.; Elliot, Kimberly; Fagan, Mary Helen; Standridge, Emily; and Lieneck, Christian, "COVID-19 Academic Integrity Violations and Trends: A Rapid Review" (2022). Healthcare Policy, Economics and Management Faculty Publications and Presentations. Paper 1.
6. Ambati, S.H., Stakhanova, N. and Branca, E., 2023, October. Learning AI coding style for software plagiarism detection. In International Conference on Security and Privacy in Communication Systems (pp. 467-489). Cham: Springer Nature Switzerland.
7. Hourrane, O., 2019. Rich style embedding for intrinsic plagiarism detection. International Journal of Advanced Computer Science and Applications, 10(11).
8. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. arXiv:1310.4546 [cs, stat] (Oct 2013)
9. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings 31st International Conference on Machine Learning. vol. 32, pp. 1188– 1196 (2014)
10. Zhang, Y., Jin, R. and Zhou, Z.H., 2010. Understanding bag-of-words model: a statistical framework. International journal of machine learning and cybernetics, 1, pp.43-52.
11. El-Rashidy, M.A., Mohamed, R.G., El-Fishawy, N.A. and Shouman, M.A., 2022. Reliable plagiarism detection system based on deep learning approaches. Neural Computing and Applications, 34(21), pp.18837-18858.
12. Schleimer, S., Wilkerson, D.S. and Aiken, A., 2003, June. Winnowing: local algorithms for document fingerprinting. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data (pp. 76-85).
13. Prechelt, L., Malpohl, G. and Philippsen, M., 2002. Finding plagiarisms among a set of programs with JPlag. J. Univers. Comput. Sci., 8(11), p.1016.
14. Joy, M. and Luck, M., 2002. Plagiarism in programming assignments. IEEE Transactions on education, 42(2), pp.129-133.
15. Ahadi, A. and Mathieson, L. (2019). A comparison of three popular source code similarity tools for detecting student plagiarism. In: Proceedings of the Twenty-First Australasian Computing Education Conference, ACE’19, 112–117.
16. Devore-McDonald, B. and Berger, E.D., 2020. Mossad: Defeating software plagiarism detection. Proceedings of the ACM on Programming Languages, 4(OOPSLA), pp.1-28.
17. Леонов А.Г., Мартынов Н.С., Мащенко К.А., Холькина А.А., Шляхов А.В. Автоматизация проверки семантической составляющей текстовых ответов обучающихся в цифровой образовательной платформе // Программные продукты и системы. 2024. Т. 37. № 3. С. 440–452. doi: 10.15827/0236-235X.142.440-452
18. Liu, C., Chen, C., Han, J. and Yu, P.S., 2006, August. GPLAG: detection of software plagiarism by program dependence graph analysis. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 872-881).
19. Hayden Cheers, Yuqing Lin, Weigen Yan, Identifying Plagiarised Programming Assignments with Detection Tool Consensus, Informatics in Education 22(2023), no. 1, 1-19, DOI 10.15388/infedu.2023.05
20. Cheers, H., Lin, Y. and Smith, S.P., 2021. Academic source code plagiarism detection by measuring program behavioral similarity. IEEE Access, 9, pp.50391-50412.
21. Levenshtein, V.I., 1966, February. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady (Vol. 10, No. 8, pp. 707-710).
Review
For citations:
Kadina D.I., Leonov A.G., Martynov N.S., Mashchenko K.A., Orlov E.A., Strekalova A.I. Plagiarism Detection Automation: a New Approach to Code Analysis in the Mirera Digital Educational Platform. SRISA Proceedings. 2025;15(1):52-57. (In Russ.) https://doi.org/10.25682/NIISI.2025.1.0007