Detection confidential information by large language models
Article Sidebar
Open full text
Issue Vol. 15 No. 3 (2025)
-
Objects features extraction by singular projections of data tensor to matrices
Yuriy Bunyak, Roman Kvуetnyy, Olga Sofina, Volodymyr Kotsiubynskyi5-9
-
A hybrid approach combining generalized normal distribution optimization algorithm and fuzzy C-means with Calinski-Harabasz index for clustering optimization
Moatasem Mahmood Ibrahim, Omar Saber Qasim, Talal Fadhil Hussein10-14
-
Method of hybrid logical classification trees based on group selection of discrete features
Igor Povkhan, Andrii Leheza, Oksana Mulesa, Olena Melnyk, Aliya Kintonova15-21
-
Optimal control of electric energy quality, based on lexicographic approach
Yurii Voitiuk, Anatolii Volotskyi, Iuliia Shullie, Maiia Kovalchuk, Laura Duissembayeva22-28
-
Implementation of energy-saving modes for electro-radiation drying of oil-containing material using automation tools
Borys Kotov, Roman Kalinichenko, Serhii Stepanenko, Vasyl Lukach, Volodymyr Hryshchenko, Alvian Kuzmych, Yurii Pantsyr, Ihor Garasymchuk, Volodymyr Vasylyuk29-32
-
Methods and means of laser layer Jones matrix mapping of polycrystalline films of biological fluids
Olexander Ushenko, Iryna Soltys, Olexander Dubolazov, Sergii Pavlov, Vasyl Garasym, Alona Kolomiiets, Bakhyt Yeraliyeva33-37
-
Alzheimer’s disease classification from MRI using vision transformer
Mohith Reddy Kandi, Sree Vijaya Lakshmi Kothapalli, Sivamsh Pavan Rajanala, Suvarna Vani Koneru, Vishnu Pramukh Vattikunta38-44
-
Enhancing early Parkinson’s disease diagnosis through handwriting analysis
Asma Ouabd, Abdelilah Jilbab, Achraf Benba, Ahmed Hammouch45-49
-
Biomechanical foundations and benefits of active orthoses in the treatment of idiopathic scoliosis
Patrycja Tymińska-Wójcik50-54
-
Application of facial recognition technologies for enhancing control in information security systems
Nurzhigit Smailov, Rashida Kadyrova, Kamila Abdulina, Fatima Uralova, Nurgul Kubanova, Akezhan Sabibolda55-58
-
IoT system with frequency converters of physical quantities on FPGA
Oleksandr V. Osadchuk, Iaroslav O. Osadchuk, Valentyn K. Skoshchuk59-66
-
Research on the possibility of reducing the error in measuring the phase shift of radio signals
Sergey Matvienko, Grygoriy Tymchyk, Konstantin Vonsevych, Nataliia Stelmakh67-72
-
Implementation of fiber-optic sensing systems in structural health monitoring of concrete
Nurzhigit Smailov, Akmaral Tolemanova, Amir Aziskhan, Beibarys Sekenov, Akezhan Sabibolda73-76
-
Modelling the working cycle of a heat pump scroll compressor
Bohdan Sydorchuk, Oleksandr Naumchuk77-80
-
Modeling of interception parking lots
Larysa Gumeniuk, Volodymyr Lotysh, Pavlo Humeniuk, Oleksandr Reshetylo, Yuriy Syrota81-86
-
Development and research of W-parameters of potentially unstable four-poles based on the mathematical model of W-parameters of field-effect transistors in the high-frequency range
Oleksandr Voznyak, Kateryna Kovalova, Yurii Polievoda, Liudmyla Kolianovska, Svitlana Ovsienko, Alla Solomon87-90
-
Detection confidential information by large language models
Oleh Deineka, Oleh Harasymchuk, Andrii Partyka, Yurii Dreis, Yuliia Khokhlachova, Yuriy Pepa91-99
-
Ethical simulation of a phishing attack
Justyna Kęczkowska, Karol Wykrota, Mirosław Płaza100-104
-
The effectiveness of machine learning in detecting phishing websites
Jacek Łukasz Wilk-Jakubowski, Aleksandra Sikora, Dawid Maciejski105-109
-
Contemporary approaches to integrating AI agents into library information processes
Mariia Sokil, Andriy Andrukhiv110-116
-
Development of a reinforcement learning-based adaptive scheduling algorithm for commercial smart kitchens
Konrad Kabala, Piotr Dziurzanski, Agnieszka Konrad117-122
-
Optimizing deep learning techniques with stacking BiLSTM and BiGRU models for gold price prediction
Iqbal Kharisudin, Nike Yustina Oktaviani123-130
-
Websites with virtual church tours in Poland – usability and accessibility analysis
Michał Mitura, Mariusz Dzieńkowski131-137
-
Study of feed granulation process based on system analysis – justification of optimization criteria
Mahil Isa Mammadov138-142
Archives
-
Vol. 15 No. 3
2025-09-30 24
-
Vol. 15 No. 2
2025-06-27 24
-
Vol. 15 No. 1
2025-03-31 26
-
Vol. 14 No. 4
2024-12-21 25
-
Vol. 14 No. 3
2024-09-30 24
-
Vol. 14 No. 2
2024-06-30 24
-
Vol. 14 No. 1
2024-03-31 23
-
Vol. 13 No. 4
2023-12-20 24
-
Vol. 13 No. 3
2023-09-30 25
-
Vol. 13 No. 2
2023-06-30 14
-
Vol. 13 No. 1
2023-03-31 12
-
Vol. 12 No. 4
2022-12-30 16
-
Vol. 12 No. 3
2022-09-30 15
-
Vol. 12 No. 2
2022-06-30 16
-
Vol. 12 No. 1
2022-03-31 9
-
Vol. 11 No. 4
2021-12-20 15
-
Vol. 11 No. 3
2021-09-30 10
-
Vol. 11 No. 2
2021-06-30 11
-
Vol. 11 No. 1
2021-03-31 14
Main Article Content
DOI
Authors
Abstract
In today's digital age, the protection of personal and confidential customer data is paramount. With the increasing volume of data being generated and processed, organizations face significant challenges in ensuring that sensitive information is adequately protected. One of the critical steps in safeguarding this data is the detection and classification of personal and confidential information within text documents. This process involves identifying sensitive data, classifying it appropriately, and storing the results in a semi-structured format such for further analysis and action. The need for detecting and classifying sensitive data is driven by regulatory compliance, data security, risk management, and operational efficiency. Various methodologies, including rule-based systems, machine learning models, natural language processing (NLP), and hybrid approaches, are employed to detect and classify sensitive data. Large Language Models (LLMs) like GPT-3 and BERT, trained on extensive text data, are transforming data management and governance, areas crucial for SOC 2 Type 2 compliance. LLMs respond to prompts, guiding their output generation, and can automate tasks like data cataloging, enhancing data quality, ensuring data privacy, and assisting in data integration. These capabilities can support a robust data classification policy, a key requirement for SOC 2 Type 2.
Keywords:
References
[1] Amaratunga T.: Understanding Large Language Models. Apress, 2023. DOI: https://doi.org/10.1007/979-8-8688-0017-7
[2] Berryman J., Ziegler A.: Prompt Engineering for LLMs. O’Reilly, 2024.
[3] Bezzi M.: Large Language Models and Security. IEEE Security & Privacy 22(2), 2024, 60–68 [https://doi.org/10.1109/MSEC.2023.3345568]. DOI: https://doi.org/10.1109/MSEC.2023.3345568
[4] Calder A., Watkins S.: IT Governance: An International Guide to Data Security and ISO27001/ISO27002 (6 edition). CoganPage, 2015.
[5] Jurafsky D., Martin J. H.: Speech and Language Processing (3 edition). Prentice-Hall, Inc., 2024.
[6] Deineka O., et. al.: Designing Data Classification and Secure Store Policy According to SOC 2 Type II. CEUR Workshop Proceedings 3654, 2024, 398–409 [https://ceur-ws.org/Vol-3654/short7.pdf].
[7] Dreis Y., et al.: Model to Formation Data Base of Internal Parameters for Assessing the Status of the State Secret Protection. Cybersecurity Providing in Information and Telecommunication Systems 3654, 2024, 277–289 [https://ceur-ws.org/Vol-3654/paper23.pdf].
[8] Falchenko S., et al.: Method of Fuzzy Classification of Information with Limited Access. IEEE 2nd International Conference on Advanced Trends in Information Theory (IEEE ATIT 2020) 2020, Kyiv, Ukraine, 255–259 [https://doi.org/10.1109/ATIT50783.2020.9349358]. DOI: https://doi.org/10.1109/ATIT50783.2020.9349358
[9] Giulio C. D., et. al.: IT Security and Privacy Standards in Comparison: Improving FedRAMP Authorization for Cloud Service Providers. 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) 2017, Madrid, Spain, 1090–1099 [https://doi.org/10.1109/CCGRID.2017.137]. DOI: https://doi.org/10.1109/CCGRID.2017.137
[10] Gupta B. B., Sheng Q. Z.: Machine Learning for Computer and Cyber Security. Boca Raton, 2019. DOI: https://doi.org/10.1201/9780429504044
[11] Goldberg Y.: Neural Network Methods for Natural Language Processing. Springer, 2017. DOI: https://doi.org/10.1007/978-3-031-02165-7
[12] Manning C. D., Raghavan P., Schütze H.: Introduction to Information Retrieval. Cambridge University Press, 2008. DOI: https://doi.org/10.1017/CBO9780511809071
[13] Martseniuk Y., et. al.: Research of the Centralized Configuration Repository Efficiency for Secure Cloud Service Infrastructure Management. CEUR Workshop Proceedings 3991, 2025, 260–274 [https://ceur-ws.org/Vol-3991/paper19.pdf].
[14] Mitchell M.: Artificial Intelligence: A Guide for Thinking Humans. Penguin, 2019.
[15] Radford A., et. al.: Improving Language Understanding by Generative Pre-Training. 2018 [https://doi.org/10.48550/arXiv.1801.06146].
[16] Raiaan M. A. K.: A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges. IEEE Access 12, 2024, 26839–26874 [https://doi.org/10.1109/ACCESS.2024.3365742]. DOI: https://doi.org/10.1109/ACCESS.2024.3365742
[17] Rothman D.: Transformers for Natural Language Processing: Build and Train State-of-the-Art NLP Models Using Transformers Architecture. Packt Publishing, 2021.
[18] Routray S. K., et. al.: Large Language Models (LLMs): Hypes and Realities. International Conference on Computer Science and Emerging Technologies (CSET) 2023, Bangalore, India, 1–6 [https://doi.org/10.1109/CSET58993.2023.10346621]. DOI: https://doi.org/10.1109/CSET58993.2023.10346621
[19] Rzaieva S., et al.: Methods of Personal Data Protection in Retail: Practical Solutions. Cybersecurity Providing in Information and Telecommunication Systems 3991, 2025, 492–506 [https://ceur-ws.org/Vol-3991/paper35.pdf].
[20] Sabbatella A., et al.: Prompt Optimization in Large Language Models. Mathematics 12(6), 2024, 929 [https://doi.org/10.3390/math12060929]. DOI: https://doi.org/10.3390/math12060929
[21] Shevchenko S., et al.: Protection of Information in Telecommunication Medical Systems based on a Risk-Oriented Approach. Cybersecurity Providing in Information and Telecommunication Systems 3421, 2023, 158–167 [https://ceur-ws.org/Vol-3421/paper16.pdf].
[22] Shevchuk D., et. al.: Designing Secured Services for Authentication, Authorization, and Accounting of Users. Cybersecurity Providing in Information and Telecommunication Systems 3550, 2023, 259–274 [https://ceur-ws.org/Vol-3550/short4.pdf].
[23] Vaswani A., et. al.: Attention is All You Need. 2017 [https://doi.org/10.48550/arXiv.1706.03762].
[24] Wolf T., et. al.: Transformers: State-of-the-Art Natural Language Processing. Association for Computational Linguistics, 2020, 38–45 [https://doi.org/10.18653/v1/2020.emnlp-demos.6]. DOI: https://doi.org/10.18653/v1/2020.emnlp-demos.6
[25] Yang X., et. al.: Exploring the Application of Large Language Models in Detecting and Protecting Personally Identifiable Information in Archival Data: A Comprehensive Study. IEEE International Conference on Big Data (BigData) 2023, Sorrento, Italy, 2116–2123 [https://doi.org/10.1109/BigData59044.2023.10386949]. DOI: https://doi.org/10.1109/BigData59044.2023.10386949
[26] Advancing AI Through Fundamental and Applied Research [https://ai.meta.com/research].
[27] AICPA "SOC 2 – SOC for Service Organizations: Trust Services Criteria". [https://us.aicpa.org/interestareas/frc/assuranceadvisoryservices/soc-for-service-organizations].
[28] Amazon Bedrock – Automating Large-Scale, Fault-Tolerant Distributed Training in the Deep Learning Compiler Stack [https://aws.amazon.com/blogs/aws/amazon-bedrock-automating-large-scale-fault-tolerant-distributed-training-in-the-deep-learning-compiler-stack].
[29] Anthropic. Researching at the Frontier [https://www.anthropic.com/research].
[30] BERT by Google [https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html].
[31] Gelbstein E.: Is Audit Basics: The Domains of Data and Information Audits, 2016 [https://www.isaca.org/resources/isaca-journal/issues/2016/volume-6/is-audit-basics-the-domains-of-data-and-information-audits].
[32] GPT- by OpenAI [https://platform.openai.com/docs/models/gpt-3.5-turbo?utm_source=chatgpt.com].
[33] Mattsson U.: Practical Data Security and Privacy for GDPR and CCPA, 2020. [https://www.isaca.org/resources/isaca-journal/issues/2020/volume-3/practical-data-security-and-privacy-for-gdpr-and-ccpa].
[34] Open AI [https://openai.com/index/teaching-with-ai].
Article Details
Abstract views: 211
Oleh Deineka, Lviv Polytechnic National University
Postgraduate of Cyber Security Department of Information Protection, Lviv Polytechnic National University, Lviv, Ukraine
Oleh Harasymchuk, Lviv Polytechnic National University
Ph.D., Associate Professor at the Department of Information Protection, Lviv Polytechnic National University, Lviv, Ukraine
Andrii Partyka, Lviv Polytechnic National University
Ph.D., Associate Professor at the Department of Information Protection, Lviv Polytechnic National University, Lviv, Ukraine
Yurii Dreis, Mariupol State University
Ph.D., Associate Professor at the Department of Analytics System and Information Technology, Mariupol State University, Kyiv, Ukraine
Yuliia Khokhlachova, State University of Trade and Economics
Ph.D., Professor of the Department of Software Engineering and Cybersecurity, State University of Trade and Economics, Kyiv, Ukraine

