FEASIBILITY OF USING LOW-PARAMETER LOCAL LLMS IN ANSWERING QUESTIONS FROM ENTERPRISE KNOWLEDGE BASE

Marcin BADUROWICZ; Stanisław SKULIMOWSKI; Maciej LASKOWSKI

doi:10.35784/acs-2024-46

Open full text

PDF

Published: Dec 31, 2024

DOI: https://doi.org/10.35784/acs-2024-46

DOI

https://doi.org/10.35784/acs-2024-46

Authors

Marcin BADUROWICZ

m.badurowicz@pollub.pl

Lublin University of Technology

https://orcid.org/0000-0003-2249-4219

Stanisław SKULIMOWSKI

s.skulimowski@pollub.pl

Lublin University of Technology, Faculty of Electrical Engineering and Computer Science, Department of Computer Science

https://orcid.org/0000-0002-9049-9516

Maciej LASKOWSKI

maciej.laskowski@gmail.com

Independent researcher

https://orcid.org/0009-0006-4255-0686

Abstract

This paper evaluates the feasibility of deploying locally-run Large Language Models (LLMs) for retrieval-augmented question answering (RAG-QA) over internal knowledge bases in small and medium enterprises (SMEs), with a focus on Polish-language datasets. The study benchmarks eight popular open-source and source-available LLMs, including Google’s Gemma-9B and Speakleash’s Bielik-11B, assessing their performance across closed, open, and detailed question types, with metrics for language quality, factual accuracy, response stability, and processing efficiency. The results highlight that desktop-class LLMs, though limited in factual accuracy (with top scores of 45% and 43% for Gemma and Bielik, respectively), hold promise for early-stage enterprise implementations. Key findings include Bielik's superior performance on open-ended and detailed questions and Gemma's efficiency and reliability in closed-type queries. Distribution analyses revealed variability in model outputs, with Bielik and Gemma showing the most stable response distributions. This research underscores the potential of offline-capable LLMs as cost-effective tools for secure knowledge management in Polish SMEs.

Keywords:

large language models, benchmark, retrieval-augmented generation, desktop deployment, quantization

References

Ahmed, T., Bird, C., Devanbu, P., & Chakraborty, S. (2024). Studying LLM performance on closed- and open-source data. ArXiv, abs/2402.15100. https://doi.org/10.48550/arXiv.2402.15100

Aydogan-Kilic, D., Kilic, D. K., & Nielsen, I. E. (2024). Examination of summarized medical records for ICD code classification Via BERT. Applied Computer Science, 20(2), 60-74. https://doi.org/10.35784/acs-2024-16 DOI: https://doi.org/10.35784/acs-2024-16

B, G., & Purwar, A. (2024). Evaluating the efficacy of open-source LLMs in enterprise-specific RAG systems: A comparative study of performance and scalability. ArXiv, abs/2406.11424. https://doi.org/10.48550/arXiv.2406.11424

Bonatti, R., Zhao, D., Bonacci, F., Dupont, D., Abdali, S., Li, Y., Lu, Y., Wagle, J., Koishida, K., Bucker, A., Jang, L., & Hui, Z. (2024). Windows agent arena: Evaluating multi-modal OS agents at scale. ArXiv, abs/2409.08264. https://doi.org/10.48550/arXiv.2409.08264

Bouhsaien, L., & Azmani, A. (2024). The potential of Artificial Intelligence in human resource management. Applied Computer Science, 20(3), 153-170. https://doi.org/10.35784/acs-2024-34 DOI: https://doi.org/10.35784/acs-2024-34

Cevallos Salas, F. A. (2024). Digital news classification and punctuaction using Machine Learning and text mining techniques. Applied Computer Science, 20(2), 24-42. https://doi.org/10.35784/acs-2024-14 DOI: https://doi.org/10.35784/acs-2024-14

Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. de O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., … Zaremba, W. (2021). Evaluating Large Language Models trained on code. ArXiv, abs/2107.03374. https://doi.org/10.48550/arXiv.2107.03374

Dahl, M., Magesh, V., Suzgun, M., & Ho, D. E. (2024). Large legal fictions: Profiling legal hallucinations in Large Language Models. Journal of Legal Analysis, 16(1), 64-93. https://doi.org/10.1093/jla/laae003 DOI: https://doi.org/10.1093/jla/laae003

Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., Chua, T. S., & Li, Q. (2024). A survey on RAG meeting LLMs: Towards retrieval-augmented Large Language Models. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '24) (pp. 6491-6501). Association for Computing Machinery. https://doi.org/10.1145/3637528.3671470 DOI: https://doi.org/10.1145/3637528.3671470

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2023). Retrieval-augmented generation for Large Language Models: A survey. ArXiv, abs/2312.1099. https://doi.org/10.48550/arXiv.2312.1099

Han, R., Zhang, Y., Qi, P., Xu, Y., Wang, J., Liu, L., Wang, W. Y., Min, B., & Castelli, V. (2024). RAG-QA arena: Evaluating domain robustness for long-form retrieval augmented question answering. ArXiv, abs/2407.13998. https://doi.org/10.48550/arXiv.2407.13998 DOI: https://doi.org/10.18653/v1/2024.emnlp-main.249

Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. Le, Lavril, T., Wang, T., Lacroix, T., & El Sayed, W. (2023). Mistral 7B. ArXiv, abs/2310.06825. https://doi.org/10.48550/arXiv.2310.06825

Kamalloo, E., Upadhyay, S., & Lin, J. (2024). Towards robust QA evaluation via open LLMs. 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24) (pp. 2811-2816). Association for Computing Machinery. https://doi.org/10.1145/3626772.3657675 DOI: https://doi.org/10.1145/3626772.3657675

Li, J., Yuan, Y., & Zhang, Z. (2024). Enhancing LLM factual accuracy with RAG to counter hallucinations: A case study on domain-specific queries in private knowledge-bases. ArXiv, abs/2403.10446. https://doi.org/10.48550/arXiv.2403.10446

Lin, J., Tang, J., Tang, H., Yang, S., Chen, W.-M., Wang, W.-C., Xiao, G., Dang, X., Gan, C., & Han, S. (2024). AWQ: Activation-aware weight quantization for on-device LLM compression and acceleration. ArXiv, abs/2306.00978. https://doi.org/10.48550/arXiv.2306.00978

Menon, K. (2024). Utilizing open-source AI to navigate and interpret technical documents : Leveraging RAG models for enhanced analysis and solutions in product documentation. http://www.theseus.fi/handle/10024/858250

Meta Llama. (2024a, July 23). meta-llama/Llama-3.1-8B. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/meta-llama/Llama-3.1-8B

Meta Llama. (2024b, April 24). meta-llama/Meta-Llama-3-8B. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/meta-llama/Meta-Llama-3-8B

Microsoft. (2024a, September 18). microsoft/Phi-3.5-mini-instruct. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/microsoft/Phi-3.5-mini-instruct

Microsoft. (2024b, September 20). microsoft/Phi-3-mini-4k-instruct. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct

Mistral AI_. (2024a, September 27). mistralai/Mistral-7B-Instruct-v0.2. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

Mistral AI_. (2024b, November 6). mistralai/Mistral-Nemo-Instruct-2407. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407

Ociepa, K. (2023). PoLEJ - Polish Open LLM Leaderboard. Azurro. Retrieved October 30, 2024 from https://polej.azurro.pl/

Soni, S., Datta, S., & Roberts, K. (2023). quEHRy: A question answering system to query electronic health records. Journal of the American Medical Informatics Association, 30(6), 1091-1102. https://doi.org/10.1093/JAMIA/OCAD050 DOI: https://doi.org/10.1093/jamia/ocad050

Soto-Jiménez, F., Martínez-Velásquez, M., Chicaiza, J., Vinueza-Naranjo, P., & Bouayad-Agha, N. (2024). RAG-based question-answering systems for closed-domains: Development of a prototype for the pollution domain. In K. Arai (Ed.), Intelligent Systems and Applications (Vol. 1065, pp. 573-589). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-66329-1_37 DOI: https://doi.org/10.1007/978-3-031-66329-1_37

SpeakLeash | Spichlerz. (2024, October 26). speakleash/Bielik-11B-v2.2-Instruct. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct

SpeakLeash | Spichlerz. (n.d.). Open PL LLM Leaderboard - a Hugging Face Space. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/spaces/speakleash/open_pl_llm_leaderboard

Tang, J., Liu, Q., Ye, Y., Lu, J., Wei, S., Lin, C., Li, W., Mahmood, M. F. F. Bin, Feng, H., Zhao, Z., Wang, Y., Liu, Y., Liu, H., Bai, X., & Huang, C. (2024). MTVQA: Benchmarking multilingual text-centric visual question answering. ArXiv, abs/2405.11985. https://doi.org/10.48550/arXiv.2405.11985

Vectara. (2024, December 11) Hallucination Evaluation Leaderboard - a Hugging Face Space. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard

Zhang, Y., Khalifa, M., Logeswaran, L., Lee, M., Lee, H., & Wang, L. (2023). Merging generated and retrieved knowledge for open-domain QA. 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (EMNLP 2023) (pp. 4710-4728). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.286 DOI: https://doi.org/10.18653/v1/2023.emnlp-main.286

Zhou, Y. Q., Liu, X. J., & Dong, Y. (2022). Build a robust QA system with transformer-based mixture of experts. ArXiv, abs/2204.09598. https://doi.org/10.48550/arXiv.2204.09598

Zhu, Y., Ren, C., Xie, S., Liu, S., Ji, H., Wang, Z., Sun, T., He, L., Li, Z., Zhu, X., & Pan, C. (2024). REALM: RAG-driven enhancement of multimodal electronic health records analysis via Large Language Models. ArXiv, abs/2402.07016. https://doi.org/10.48550/arXiv.2402.07016

BADUROWICZ, M., SKULIMOWSKI, S., & LASKOWSKI, M. (2024). FEASIBILITY OF USING LOW-PARAMETER LOCAL LLMS IN ANSWERING QUESTIONS FROM ENTERPRISE KNOWLEDGE BASE. Applied Computer Science, 20(4), 175–191. https://doi.org/10.35784/acs-2024-46

Article Sidebar

Main Article Content

DOI

Authors

Abstract

Keywords:

References

Article Details

License