Detection confidential information by large language models

Main Article Content

DOI

Oleh Deineka

deinekaoleg.86@gmail.com

https://orcid.org/0009-0005-9156-3339
Oleh Harasymchuk

oleh.i.harasymchuk@lpnu.ua

https://orcid.org/0000-0002-8742-8872
Andrii Partyka

andrii.i.partyka@lpnu.ua

https://orcid.org/0000-0003-3037-8373
Yurii Dreis

y.dreis@mu.edu.ua

https://orcid.org/0000-0003-2699-1597
Yuliia Khokhlachova

y.khokhlachova@knute.edu.ua

https://orcid.org/0000-0002-0787-5112
Yuriy Pepa

yurka1144@gmail.com

https://orcid.org/0000-0003-2073-1364

Abstract

In today's digital age, the protection of personal and confidential customer data is paramount. With the increasing volume of data being generated and processed, organizations face significant challenges in ensuring that sensitive information is adequately protected. One of the critical steps in safeguarding this data is the detection and classification of personal and confidential information within text documents. This process involves identifying sensitive data, classifying it appropriately, and storing the results in a semi-structured format such for further analysis and action. The need for detecting and classifying sensitive data is driven by regulatory compliance, data security, risk management, and operational efficiency. Various methodologies, including rule-based systems, machine learning models, natural language processing (NLP), and hybrid approaches, are employed to detect and classify sensitive data. Large Language Models (LLMs) like GPT-3 and BERT, trained on extensive text data, are transforming data management and governance, areas crucial for SOC 2 Type 2 compliance. LLMs respond to prompts, guiding their output generation, and can automate tasks like data cataloging, enhancing data quality, ensuring data privacy, and assisting in data integration. These capabilities can support a robust data classification policy, a key requirement for SOC 2 Type 2.

Keywords:

data security, prompt, confidence, quality, information classification

References

Article Details

Deineka, O., Harasymchuk, O., Partyka, A., Dreis, Y., Khokhlachova, Y., & Pepa, Y. (2025). Detection confidential information by large language models. Informatyka, Automatyka, Pomiary W Gospodarce I Ochronie Środowiska, 15(3), 91–99. https://doi.org/10.35784/iapgos.6910
Author Biographies

Oleh Deineka, Lviv Polytechnic National University

Postgraduate of Cyber Security Department of Information Protection, Lviv Polytechnic National University, Lviv, Ukraine

Oleh Harasymchuk, Lviv Polytechnic National University

Ph.D., Associate Professor at the Department of Information Protection, Lviv Polytechnic National University, Lviv, Ukraine

Andrii Partyka, Lviv Polytechnic National University

Ph.D., Associate Professor at the Department of Information Protection, Lviv Polytechnic National University, Lviv, Ukraine

Yurii Dreis, Mariupol State University

Ph.D., Associate Professor at the Department of Analytics System and Information Technology, Mariupol State University, Kyiv, Ukraine

Yuliia Khokhlachova, State University of Trade and Economics

Ph.D., Professor of the Department of Software Engineering and Cybersecurity, State University of Trade and Economics, Kyiv, Ukraine