FEASIBILITY OF USING LOW-PARAMETER LOCAL LLMS IN ANSWERING QUESTIONS FROM ENTERPRISE KNOWLEDGE BASE

Marcin BADUROWICZ

m.badurowicz@pollub.pl
Lublin University of Technology (Poland)
https://orcid.org/0000-0003-2249-4219

Stanisław SKULIMOWSKI


Lublin University of Technology, Faculty of Electrical Engineering and Computer Science, Department of Computer Science (Poland)
https://orcid.org/0000-0002-9049-9516

Maciej LASKOWSKI


Independent researcher (Poland)
https://orcid.org/0009-0006-4255-0686

Abstract

This paper evaluates the feasibility of deploying locally-run Large Language Models (LLMs) for retrieval-augmented question answering (RAG-QA) over internal knowledge bases in small and medium enterprises (SMEs), with a focus on Polish-language datasets. The study benchmarks eight popular open-source and source-available LLMs, including Google’s Gemma-9B and Speakleash’s Bielik-11B, assessing their performance across closed, open, and detailed question types, with metrics for language quality, factual accuracy, response stability, and processing efficiency. The results highlight that desktop-class LLMs, though limited in factual accuracy (with top scores of 45% and 43% for Gemma and Bielik, respectively), hold promise for early-stage enterprise implementations. Key findings include Bielik's superior performance on open-ended and detailed questions and Gemma's efficiency and reliability in closed-type queries. Distribution analyses revealed variability in model outputs, with Bielik and Gemma showing the most stable response distributions. This research underscores the potential of offline-capable LLMs as cost-effective tools for secure knowledge management in Polish SMEs.


Keywords:

large language models, benchmark, retrieval-augmented generation, desktop deployment, quantization

Ahmed, T., Bird, C., Devanbu, P., & Chakraborty, S. (2024). Studying LLM performance on closed- and open-source data. ArXiv, abs/2402.15100. https://doi.org/10.48550/arXiv.2402.15100
  Google Scholar

Aydogan-Kilic, D., Kilic, D. K., & Nielsen, I. E. (2024). Examination of summarized medical records for ICD code classification Via BERT. Applied Computer Science, 20(2), 60-74. https://doi.org/10.35784/acs-2024-16
  Google Scholar

B, G., & Purwar, A. (2024). Evaluating the efficacy of open-source LLMs in enterprise-specific RAG systems: A comparative study of performance and scalability. ArXiv, abs/2406.11424. https://doi.org/10.48550/arXiv.2406.11424
  Google Scholar

Bonatti, R., Zhao, D., Bonacci, F., Dupont, D., Abdali, S., Li, Y., Lu, Y., Wagle, J., Koishida, K., Bucker, A., Jang, L., & Hui, Z. (2024). Windows agent arena: Evaluating multi-modal OS agents at scale. ArXiv, abs/2409.08264. https://doi.org/10.48550/arXiv.2409.08264
  Google Scholar

Bouhsaien, L., & Azmani, A. (2024). The potential of Artificial Intelligence in human resource management. Applied Computer Science, 20(3), 153-170. https://doi.org/10.35784/acs-2024-34
  Google Scholar

Cevallos Salas, F. A. (2024). Digital news classification and punctuaction using Machine Learning and text mining techniques. Applied Computer Science, 20(2), 24-42. https://doi.org/10.35784/acs-2024-14
  Google Scholar

Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. de O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., … Zaremba, W. (2021). Evaluating Large Language Models trained on code. ArXiv, abs/2107.03374. https://doi.org/10.48550/arXiv.2107.03374
  Google Scholar

Dahl, M., Magesh, V., Suzgun, M., & Ho, D. E. (2024). Large legal fictions: Profiling legal hallucinations in Large Language Models. Journal of Legal Analysis, 16(1), 64-93. https://doi.org/10.1093/jla/laae003
  Google Scholar

Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., Chua, T. S., & Li, Q. (2024). A survey on RAG meeting LLMs: Towards retrieval-augmented Large Language Models. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '24) (pp. 6491-6501). Association for Computing Machinery. https://doi.org/10.1145/3637528.3671470
  Google Scholar

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2023). Retrieval-augmented generation for Large Language Models: A survey. ArXiv, abs/2312.1099. https://doi.org/10.48550/arXiv.2312.1099
  Google Scholar

Han, R., Zhang, Y., Qi, P., Xu, Y., Wang, J., Liu, L., Wang, W. Y., Min, B., & Castelli, V. (2024). RAG-QA arena: Evaluating domain robustness for long-form retrieval augmented question answering. ArXiv, abs/2407.13998. https://doi.org/10.48550/arXiv.2407.13998
  Google Scholar

Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. Le, Lavril, T., Wang, T., Lacroix, T., & El Sayed, W. (2023). Mistral 7B. ArXiv, abs/2310.06825. https://doi.org/10.48550/arXiv.2310.06825
  Google Scholar

Kamalloo, E., Upadhyay, S., & Lin, J. (2024). Towards robust QA evaluation via open LLMs. 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24) (pp. 2811-2816). Association for Computing Machinery. https://doi.org/10.1145/3626772.3657675
  Google Scholar

Li, J., Yuan, Y., & Zhang, Z. (2024). Enhancing LLM factual accuracy with RAG to counter hallucinations: A case study on domain-specific queries in private knowledge-bases. ArXiv, abs/2403.10446. https://doi.org/10.48550/arXiv.2403.10446
  Google Scholar

Lin, J., Tang, J., Tang, H., Yang, S., Chen, W.-M., Wang, W.-C., Xiao, G., Dang, X., Gan, C., & Han, S. (2024). AWQ: Activation-aware weight quantization for on-device LLM compression and acceleration. ArXiv, abs/2306.00978. https://doi.org/10.48550/arXiv.2306.00978
  Google Scholar

Menon, K. (2024). Utilizing open-source AI to navigate and interpret technical documents : Leveraging RAG models for enhanced analysis and solutions in product documentation. http://www.theseus.fi/handle/10024/858250
  Google Scholar

Meta Llama. (2024a, July 23). meta-llama/Llama-3.1-8B. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/meta-llama/Llama-3.1-8B
  Google Scholar

Meta Llama. (2024b, April 24). meta-llama/Meta-Llama-3-8B. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/meta-llama/Meta-Llama-3-8B
  Google Scholar

Microsoft. (2024a, September 18). microsoft/Phi-3.5-mini-instruct. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/microsoft/Phi-3.5-mini-instruct
  Google Scholar

Microsoft. (2024b, September 20). microsoft/Phi-3-mini-4k-instruct. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
  Google Scholar

Mistral AI_. (2024a, September 27). mistralai/Mistral-7B-Instruct-v0.2. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
  Google Scholar

Mistral AI_. (2024b, November 6). mistralai/Mistral-Nemo-Instruct-2407. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
  Google Scholar

Ociepa, K. (2023). PoLEJ - Polish Open LLM Leaderboard. Azurro. Retrieved October 30, 2024 from https://polej.azurro.pl/
  Google Scholar

Soni, S., Datta, S., & Roberts, K. (2023). quEHRy: A question answering system to query electronic health records. Journal of the American Medical Informatics Association, 30(6), 1091-1102. https://doi.org/10.1093/JAMIA/OCAD050
  Google Scholar

Soto-Jiménez, F., Martínez-Velásquez, M., Chicaiza, J., Vinueza-Naranjo, P., & Bouayad-Agha, N. (2024). RAG-based question-answering systems for closed-domains: Development of a prototype for the pollution domain. In K. Arai (Ed.), Intelligent Systems and Applications (Vol. 1065, pp. 573-589). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-66329-1_37
  Google Scholar

SpeakLeash | Spichlerz. (2024, October 26). speakleash/Bielik-11B-v2.2-Instruct. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct
  Google Scholar

SpeakLeash | Spichlerz. (n.d.). Open PL LLM Leaderboard - a Hugging Face Space. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/spaces/speakleash/open_pl_llm_leaderboard
  Google Scholar

Tang, J., Liu, Q., Ye, Y., Lu, J., Wei, S., Lin, C., Li, W., Mahmood, M. F. F. Bin, Feng, H., Zhao, Z., Wang, Y., Liu, Y., Liu, H., Bai, X., & Huang, C. (2024). MTVQA: Benchmarking multilingual text-centric visual question answering. ArXiv, abs/2405.11985. https://doi.org/10.48550/arXiv.2405.11985
  Google Scholar

Vectara. (2024, December 11) Hallucination Evaluation Leaderboard - a Hugging Face Space. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard
  Google Scholar

Zhang, Y., Khalifa, M., Logeswaran, L., Lee, M., Lee, H., & Wang, L. (2023). Merging generated and retrieved knowledge for open-domain QA. 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (EMNLP 2023) (pp. 4710-4728). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.286
  Google Scholar

Zhou, Y. Q., Liu, X. J., & Dong, Y. (2022). Build a robust QA system with transformer-based mixture of experts. ArXiv, abs/2204.09598. https://doi.org/10.48550/arXiv.2204.09598
  Google Scholar

Zhu, Y., Ren, C., Xie, S., Liu, S., Ji, H., Wang, Z., Sun, T., He, L., Li, Z., Zhu, X., & Pan, C. (2024). REALM: RAG-driven enhancement of multimodal electronic health records analysis via Large Language Models. ArXiv, abs/2402.07016. https://doi.org/10.48550/arXiv.2402.07016
  Google Scholar

Download


Published
2024-12-31

Cited by

BADUROWICZ, M., SKULIMOWSKI, S., & LASKOWSKI, M. (2024). FEASIBILITY OF USING LOW-PARAMETER LOCAL LLMS IN ANSWERING QUESTIONS FROM ENTERPRISE KNOWLEDGE BASE. Applied Computer Science, 20(4), 175–191. https://doi.org/10.35784/acs-2024-46

Authors

Marcin BADUROWICZ 
m.badurowicz@pollub.pl
Lublin University of Technology Poland
https://orcid.org/0000-0003-2249-4219

Authors

Stanisław SKULIMOWSKI 

Lublin University of Technology, Faculty of Electrical Engineering and Computer Science, Department of Computer Science Poland
https://orcid.org/0000-0002-9049-9516

Authors

Maciej LASKOWSKI 

Independent researcher Poland
https://orcid.org/0009-0006-4255-0686

Statistics

Abstract views: 60
PDF downloads: 20


License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.


Similar Articles

1 2 3 4 5 6 7 8 9 > >> 

You may also start an advanced similarity search for this article.