FEASIBILITY OF USING LOW-PARAMETER LOCAL LLMS IN ANSWERING QUESTIONS FROM ENTERPRISE KNOWLEDGE BASE
Marcin BADUROWICZ
m.badurowicz@pollub.plLublin University of Technology (Poland)
https://orcid.org/0000-0003-2249-4219
Stanisław SKULIMOWSKI
Lublin University of Technology, Faculty of Electrical Engineering and Computer Science, Department of Computer Science (Poland)
https://orcid.org/0000-0002-9049-9516
Maciej LASKOWSKI
Independent researcher (Poland)
https://orcid.org/0009-0006-4255-0686
Abstract
This paper evaluates the feasibility of deploying locally-run Large Language Models (LLMs) for retrieval-augmented question answering (RAG-QA) over internal knowledge bases in small and medium enterprises (SMEs), with a focus on Polish-language datasets. The study benchmarks eight popular open-source and source-available LLMs, including Google’s Gemma-9B and Speakleash’s Bielik-11B, assessing their performance across closed, open, and detailed question types, with metrics for language quality, factual accuracy, response stability, and processing efficiency. The results highlight that desktop-class LLMs, though limited in factual accuracy (with top scores of 45% and 43% for Gemma and Bielik, respectively), hold promise for early-stage enterprise implementations. Key findings include Bielik's superior performance on open-ended and detailed questions and Gemma's efficiency and reliability in closed-type queries. Distribution analyses revealed variability in model outputs, with Bielik and Gemma showing the most stable response distributions. This research underscores the potential of offline-capable LLMs as cost-effective tools for secure knowledge management in Polish SMEs.
Keywords:
large language models, benchmark, retrieval-augmented generation, desktop deployment, quantizationReferences
Ahmed, T., Bird, C., Devanbu, P., & Chakraborty, S. (2024). Studying LLM performance on closed- and open-source data. ArXiv, abs/2402.15100. https://doi.org/10.48550/arXiv.2402.15100
Google Scholar
Aydogan-Kilic, D., Kilic, D. K., & Nielsen, I. E. (2024). Examination of summarized medical records for ICD code classification Via BERT. Applied Computer Science, 20(2), 60-74. https://doi.org/10.35784/acs-2024-16
Google Scholar
B, G., & Purwar, A. (2024). Evaluating the efficacy of open-source LLMs in enterprise-specific RAG systems: A comparative study of performance and scalability. ArXiv, abs/2406.11424. https://doi.org/10.48550/arXiv.2406.11424
Google Scholar
Bonatti, R., Zhao, D., Bonacci, F., Dupont, D., Abdali, S., Li, Y., Lu, Y., Wagle, J., Koishida, K., Bucker, A., Jang, L., & Hui, Z. (2024). Windows agent arena: Evaluating multi-modal OS agents at scale. ArXiv, abs/2409.08264. https://doi.org/10.48550/arXiv.2409.08264
Google Scholar
Bouhsaien, L., & Azmani, A. (2024). The potential of Artificial Intelligence in human resource management. Applied Computer Science, 20(3), 153-170. https://doi.org/10.35784/acs-2024-34
Google Scholar
Cevallos Salas, F. A. (2024). Digital news classification and punctuaction using Machine Learning and text mining techniques. Applied Computer Science, 20(2), 24-42. https://doi.org/10.35784/acs-2024-14
Google Scholar
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. de O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., … Zaremba, W. (2021). Evaluating Large Language Models trained on code. ArXiv, abs/2107.03374. https://doi.org/10.48550/arXiv.2107.03374
Google Scholar
Dahl, M., Magesh, V., Suzgun, M., & Ho, D. E. (2024). Large legal fictions: Profiling legal hallucinations in Large Language Models. Journal of Legal Analysis, 16(1), 64-93. https://doi.org/10.1093/jla/laae003
Google Scholar
Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., Chua, T. S., & Li, Q. (2024). A survey on RAG meeting LLMs: Towards retrieval-augmented Large Language Models. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '24) (pp. 6491-6501). Association for Computing Machinery. https://doi.org/10.1145/3637528.3671470
Google Scholar
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2023). Retrieval-augmented generation for Large Language Models: A survey. ArXiv, abs/2312.1099. https://doi.org/10.48550/arXiv.2312.1099
Google Scholar
Han, R., Zhang, Y., Qi, P., Xu, Y., Wang, J., Liu, L., Wang, W. Y., Min, B., & Castelli, V. (2024). RAG-QA arena: Evaluating domain robustness for long-form retrieval augmented question answering. ArXiv, abs/2407.13998. https://doi.org/10.48550/arXiv.2407.13998
Google Scholar
Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. Le, Lavril, T., Wang, T., Lacroix, T., & El Sayed, W. (2023). Mistral 7B. ArXiv, abs/2310.06825. https://doi.org/10.48550/arXiv.2310.06825
Google Scholar
Kamalloo, E., Upadhyay, S., & Lin, J. (2024). Towards robust QA evaluation via open LLMs. 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24) (pp. 2811-2816). Association for Computing Machinery. https://doi.org/10.1145/3626772.3657675
Google Scholar
Li, J., Yuan, Y., & Zhang, Z. (2024). Enhancing LLM factual accuracy with RAG to counter hallucinations: A case study on domain-specific queries in private knowledge-bases. ArXiv, abs/2403.10446. https://doi.org/10.48550/arXiv.2403.10446
Google Scholar
Lin, J., Tang, J., Tang, H., Yang, S., Chen, W.-M., Wang, W.-C., Xiao, G., Dang, X., Gan, C., & Han, S. (2024). AWQ: Activation-aware weight quantization for on-device LLM compression and acceleration. ArXiv, abs/2306.00978. https://doi.org/10.48550/arXiv.2306.00978
Google Scholar
Menon, K. (2024). Utilizing open-source AI to navigate and interpret technical documents : Leveraging RAG models for enhanced analysis and solutions in product documentation. http://www.theseus.fi/handle/10024/858250
Google Scholar
Meta Llama. (2024a, July 23). meta-llama/Llama-3.1-8B. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/meta-llama/Llama-3.1-8B
Google Scholar
Meta Llama. (2024b, April 24). meta-llama/Meta-Llama-3-8B. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/meta-llama/Meta-Llama-3-8B
Google Scholar
Microsoft. (2024a, September 18). microsoft/Phi-3.5-mini-instruct. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/microsoft/Phi-3.5-mini-instruct
Google Scholar
Microsoft. (2024b, September 20). microsoft/Phi-3-mini-4k-instruct. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
Google Scholar
Mistral AI_. (2024a, September 27). mistralai/Mistral-7B-Instruct-v0.2. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
Google Scholar
Mistral AI_. (2024b, November 6). mistralai/Mistral-Nemo-Instruct-2407. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
Google Scholar
Ociepa, K. (2023). PoLEJ - Polish Open LLM Leaderboard. Azurro. Retrieved October 30, 2024 from https://polej.azurro.pl/
Google Scholar
Soni, S., Datta, S., & Roberts, K. (2023). quEHRy: A question answering system to query electronic health records. Journal of the American Medical Informatics Association, 30(6), 1091-1102. https://doi.org/10.1093/JAMIA/OCAD050
Google Scholar
Soto-Jiménez, F., Martínez-Velásquez, M., Chicaiza, J., Vinueza-Naranjo, P., & Bouayad-Agha, N. (2024). RAG-based question-answering systems for closed-domains: Development of a prototype for the pollution domain. In K. Arai (Ed.), Intelligent Systems and Applications (Vol. 1065, pp. 573-589). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-66329-1_37
Google Scholar
SpeakLeash | Spichlerz. (2024, October 26). speakleash/Bielik-11B-v2.2-Instruct. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct
Google Scholar
SpeakLeash | Spichlerz. (n.d.). Open PL LLM Leaderboard - a Hugging Face Space. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/spaces/speakleash/open_pl_llm_leaderboard
Google Scholar
Tang, J., Liu, Q., Ye, Y., Lu, J., Wei, S., Lin, C., Li, W., Mahmood, M. F. F. Bin, Feng, H., Zhao, Z., Wang, Y., Liu, Y., Liu, H., Bai, X., & Huang, C. (2024). MTVQA: Benchmarking multilingual text-centric visual question answering. ArXiv, abs/2405.11985. https://doi.org/10.48550/arXiv.2405.11985
Google Scholar
Vectara. (2024, December 11) Hallucination Evaluation Leaderboard - a Hugging Face Space. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard
Google Scholar
Zhang, Y., Khalifa, M., Logeswaran, L., Lee, M., Lee, H., & Wang, L. (2023). Merging generated and retrieved knowledge for open-domain QA. 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (EMNLP 2023) (pp. 4710-4728). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.286
Google Scholar
Zhou, Y. Q., Liu, X. J., & Dong, Y. (2022). Build a robust QA system with transformer-based mixture of experts. ArXiv, abs/2204.09598. https://doi.org/10.48550/arXiv.2204.09598
Google Scholar
Zhu, Y., Ren, C., Xie, S., Liu, S., Ji, H., Wang, Z., Sun, T., He, L., Li, Z., Zhu, X., & Pan, C. (2024). REALM: RAG-driven enhancement of multimodal electronic health records analysis via Large Language Models. ArXiv, abs/2402.07016. https://doi.org/10.48550/arXiv.2402.07016
Google Scholar
Authors
Marcin BADUROWICZm.badurowicz@pollub.pl
Lublin University of Technology Poland
https://orcid.org/0000-0003-2249-4219
Authors
Stanisław SKULIMOWSKILublin University of Technology, Faculty of Electrical Engineering and Computer Science, Department of Computer Science Poland
https://orcid.org/0000-0002-9049-9516
Statistics
Abstract views: 60PDF downloads: 20
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.
Most read articles by the same author(s)
- Marcin BADUROWICZ, DETECTION OF SOURCE CODE IN INTERNET TEXTS USING AUTOMATICALLY GENERATED MACHINE LEARNING MODELS , Applied Computer Science: Vol. 18 No. 1 (2022)
- Stanisław SKULIMOWSKI, Jerzy MONTUSIEWICZ, Marcin BADUROWICZ, ENHANCING THE EFFICIENCY OF THE LEVENSHTEIN DISTANCE BASED HEURISTIC METHOD OF ARRANGING 2D APICTORIAL ELEMENTS FOR INDUSTRIAL APPLICATIONS , Applied Computer Science: Vol. 19 No. 4 (2023)
- Marcin Badurowicz, Sebastian Łagowski, USAGE OF IOT EDGE APPROACH FOR ROAD QUALITY ANALYSIS , Applied Computer Science: Vol. 19 No. 1 (2023)
Similar Articles
- Zakaria KADDARI, Ikram El HACHMI, Jamal BERRICH, Rim AMRANI, Toumi BOUCHENTOUF, EVALUATING LARGE LANGUAGE MODELS FOR MEDICAL INFORMATION EXTRACTION: A COMPARATIVE STUDY OF ZERO-SHOT AND SCHEMA-BASED METHODS , Applied Computer Science: Vol. 20 No. 4 (2024)
- Sheikh Amir FAYAZ, Majid ZAMAN, Muheet Ahmed BUTT, Sameer KAUL, HOW MACHINE LEARNING ALGORITHMS ARE USED IN METEOROLOGICAL DATA CLASSIFICATION: A COMPARATIVE APPROACH BETWEEN DT, LMT, M5-MT, GRADIENT BOOSTING AND GWLM-NARX MODELS , Applied Computer Science: Vol. 18 No. 4 (2022)
- Kevin Joy DSOUZA, Zahid Ahmed ANSARI, HISTOPATHOLOGY IMAGE CLASSIFICATION USING HYBRID PARALLEL STRUCTURED DEEP-CNN MODELS , Applied Computer Science: Vol. 18 No. 1 (2022)
- Leszek JASKIERNY, REVIEW OF THE DATA MODELING STANDARDS AND DATA MODEL TRANSFORMATION TECHNIQUES , Applied Computer Science: Vol. 14 No. 4 (2018)
- Victor CHUNG, Jenny ESPINOZA, A LATIN AMERICAN MARKET ASSET VOLATILITY ANALYSIS: A COMPARISON OF GARCH MODEL, ARTIFICIAL NEURAL NETWORKS AND SUPPORT VECTOR REGRESSION , Applied Computer Science: Vol. 19 No. 3 (2023)
- Boutkhil SIDAOUI, PREDICTING STATES OF EPILEPSY PATIENTS USING DEEP LEARNING MODELS , Applied Computer Science: Vol. 20 No. 2 (2024)
- Jarosław ZUBRZYCKI, Natalia SMIDOVA, Jakub LITAK, Andrei AUSIYEVICH, NUMERICAL ANALYSIS OF SPINAL LOADS IN SPONDYLOLISTHESIS TREATMENT USING PEDICLE SCREWS – PRELIMINARY RESEARCH , Applied Computer Science: Vol. 13 No. 3 (2017)
- Dariusz PLINTA, Martin KRAJČOVIČ, APPLICATION OF THE AUGMENTED REALITY IN PRODUCTION PRACTICE , Applied Computer Science: Vol. 13 No. 2 (2017)
- Workineh TESEMA, INEFFICIENCY OF DATA MINING ALGORITHMS AND ITS ARCHITECTURE: WITH EMPHASIS TO THE SHORTCOMING OF DATA MINING ALGORITHMS ON THE OUTPUT OF THE RESEARCHES , Applied Computer Science: Vol. 15 No. 3 (2019)
- Nancy WOODS, Charles ROBERT, ENCAPSULATION OF IMAGE METADATA FOR EASE OF RETRIEVAL AND MOBILITY , Applied Computer Science: Vol. 15 No. 1 (2019)
You may also start an advanced similarity search for this article.