FEASIBILITY OF USING LOW-PARAMETER LOCAL LLMS IN ANSWERING QUESTIONS FROM ENTERPRISE KNOWLEDGE BASE
Article Sidebar
Open full text
Issue Vol. 20 No. 4 (2024)
-
STUDY ON DEEP LEARNING MODELS FOR THE CLASSIFICATION OF VR SICKNESS LEVELS
Haechan NA, Yoon Sang KIM1-13
-
ENHANCING TOMATO LEAF DISEASE DETECTION THROUGH MULTIMODAL FEATURE FUSION
Puja SARAF, Jayantrao PATIL, Rajnikant WAGH14-38
-
NOVEL MULTI-MODAL OBSTRUCTION MODULE FOR DIABETES MELLITUS CLASSIFICATION USING EXPLAINABLE MACHINE LEARNING
Reehana SHAIK, Ibrahim SIDDIQUE39-62
-
COMPUTATIONAL SYSTEM FOR EVALUATING HUMAN PERCEPTION IN VIDEO STEGANOGRAPHY
Marcin PERY, Robert WASZKOWSKI63-76
-
PUPIL DIAMETER AND MACHINE LEARNING FOR DEPRESSION DETECTION: A COMPARATIVE STUDY WITH DEEP LEARNING MODELS
Islam MOHAMED, Mohamed EL-WAKAD, Khaled ABBAS, Mohamed ABOAMER, Nader A. Rahman MOHAMED77-99
-
CLASSIFICATION AND PREDICTION OF BENTHIC HABITAT FROM SCIENTIFIC ECHOSOUNDER DATA: APPLICATION OF MACHINE LEARNING ALGORITHMS
Baigo HAMUNA, Sri PUJIYATI, Jonson Lumban GAOL, Totok HESTIRIANOTO100-116
-
ENHANCEMENT OF ARTIFICIAL IMMUNE SYSTEMS FOR THE TRAVELING SALESMAN PROBLEM THROUGH HYBRIDIZATION WITH NEIGHBORHOOD IMPROVEMENT AND PARAMETER FINE-TUNING
Peeraya THAPATSUWAN, Warattapop THAPATSUWAN, Chaichana KULWORATIT117-137
-
EVALUATING LARGE LANGUAGE MODELS FOR MEDICAL INFORMATION EXTRACTION: A COMPARATIVE STUDY OF ZERO-SHOT AND SCHEMA-BASED METHODS
Zakaria KADDARI, Ikram El HACHMI, Jamal BERRICH, Rim AMRANI, Toumi BOUCHENTOUF138-148
-
EXPLORING THE EXPEDIENCY OF BLOCKCHAIN-BASED SOLUTIONS: REVIEW AND CHALLENGES
Francisco Javier MORENO ARBOLEDA, Georgia GARANI, Sergio Andrés ARBOLEDA ZULUAGA149-174
-
FEASIBILITY OF USING LOW-PARAMETER LOCAL LLMS IN ANSWERING QUESTIONS FROM ENTERPRISE KNOWLEDGE BASE
Marcin BADUROWICZ, Stanisław SKULIMOWSKI, Maciej LASKOWSKI175-191
-
SHARPNESS IMPROVEMENT OF MAGNETIC RESONANCE IMAGES USING A GUIDED-SUBSUMED UNSHARP MASK FILTER
Manar AL-ABAJI, Zohair AL-AMEEN192-210
-
FUZZY REGION MERGING WITH HIERARCHICAL CLUSTERING TO FIND OPTIMAL INITIALIZATION OF FUZZY REGION IN IMAGE SEGMENTATION
Wawan GUNAWAN211-220
Archives
-
Vol. 21 No. 3
2025-10-05 12
-
Vol. 21 No. 2
2025-06-27 12
-
Vol. 21 No. 1
2025-03-31 12
-
Vol. 20 No. 4
2025-01-31 12
-
Vol. 20 No. 3
2024-09-30 12
-
Vol. 20 No. 2
2024-08-14 12
-
Vol. 20 No. 1
2024-03-30 12
-
Vol. 19 No. 4
2023-12-31 10
-
Vol. 19 No. 3
2023-09-30 10
-
Vol. 19 No. 2
2023-06-30 10
-
Vol. 19 No. 1
2023-03-31 10
-
Vol. 18 No. 4
2022-12-30 8
-
Vol. 18 No. 3
2022-09-30 8
-
Vol. 18 No. 2
2022-06-30 8
-
Vol. 18 No. 1
2022-03-30 7
-
Vol. 16 No. 4
2020-12-30 8
-
Vol. 16 No. 3
2020-09-30 8
-
Vol. 16 No. 2
2020-06-30 8
-
Vol. 16 No. 1
2020-03-30 8
Main Article Content
DOI
Authors
Abstract
This paper evaluates the feasibility of deploying locally-run Large Language Models (LLMs) for retrieval-augmented question answering (RAG-QA) over internal knowledge bases in small and medium enterprises (SMEs), with a focus on Polish-language datasets. The study benchmarks eight popular open-source and source-available LLMs, including Google’s Gemma-9B and Speakleash’s Bielik-11B, assessing their performance across closed, open, and detailed question types, with metrics for language quality, factual accuracy, response stability, and processing efficiency. The results highlight that desktop-class LLMs, though limited in factual accuracy (with top scores of 45% and 43% for Gemma and Bielik, respectively), hold promise for early-stage enterprise implementations. Key findings include Bielik's superior performance on open-ended and detailed questions and Gemma's efficiency and reliability in closed-type queries. Distribution analyses revealed variability in model outputs, with Bielik and Gemma showing the most stable response distributions. This research underscores the potential of offline-capable LLMs as cost-effective tools for secure knowledge management in Polish SMEs.
Keywords:
References
Ahmed, T., Bird, C., Devanbu, P., & Chakraborty, S. (2024). Studying LLM performance on closed- and open-source data. ArXiv, abs/2402.15100. https://doi.org/10.48550/arXiv.2402.15100
Aydogan-Kilic, D., Kilic, D. K., & Nielsen, I. E. (2024). Examination of summarized medical records for ICD code classification Via BERT. Applied Computer Science, 20(2), 60-74. https://doi.org/10.35784/acs-2024-16 DOI: https://doi.org/10.35784/acs-2024-16
B, G., & Purwar, A. (2024). Evaluating the efficacy of open-source LLMs in enterprise-specific RAG systems: A comparative study of performance and scalability. ArXiv, abs/2406.11424. https://doi.org/10.48550/arXiv.2406.11424
Bonatti, R., Zhao, D., Bonacci, F., Dupont, D., Abdali, S., Li, Y., Lu, Y., Wagle, J., Koishida, K., Bucker, A., Jang, L., & Hui, Z. (2024). Windows agent arena: Evaluating multi-modal OS agents at scale. ArXiv, abs/2409.08264. https://doi.org/10.48550/arXiv.2409.08264
Bouhsaien, L., & Azmani, A. (2024). The potential of Artificial Intelligence in human resource management. Applied Computer Science, 20(3), 153-170. https://doi.org/10.35784/acs-2024-34 DOI: https://doi.org/10.35784/acs-2024-34
Cevallos Salas, F. A. (2024). Digital news classification and punctuaction using Machine Learning and text mining techniques. Applied Computer Science, 20(2), 24-42. https://doi.org/10.35784/acs-2024-14 DOI: https://doi.org/10.35784/acs-2024-14
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. de O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., … Zaremba, W. (2021). Evaluating Large Language Models trained on code. ArXiv, abs/2107.03374. https://doi.org/10.48550/arXiv.2107.03374
Dahl, M., Magesh, V., Suzgun, M., & Ho, D. E. (2024). Large legal fictions: Profiling legal hallucinations in Large Language Models. Journal of Legal Analysis, 16(1), 64-93. https://doi.org/10.1093/jla/laae003 DOI: https://doi.org/10.1093/jla/laae003
Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., Chua, T. S., & Li, Q. (2024). A survey on RAG meeting LLMs: Towards retrieval-augmented Large Language Models. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '24) (pp. 6491-6501). Association for Computing Machinery. https://doi.org/10.1145/3637528.3671470 DOI: https://doi.org/10.1145/3637528.3671470
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2023). Retrieval-augmented generation for Large Language Models: A survey. ArXiv, abs/2312.1099. https://doi.org/10.48550/arXiv.2312.1099
Han, R., Zhang, Y., Qi, P., Xu, Y., Wang, J., Liu, L., Wang, W. Y., Min, B., & Castelli, V. (2024). RAG-QA arena: Evaluating domain robustness for long-form retrieval augmented question answering. ArXiv, abs/2407.13998. https://doi.org/10.48550/arXiv.2407.13998 DOI: https://doi.org/10.18653/v1/2024.emnlp-main.249
Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. Le, Lavril, T., Wang, T., Lacroix, T., & El Sayed, W. (2023). Mistral 7B. ArXiv, abs/2310.06825. https://doi.org/10.48550/arXiv.2310.06825
Kamalloo, E., Upadhyay, S., & Lin, J. (2024). Towards robust QA evaluation via open LLMs. 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24) (pp. 2811-2816). Association for Computing Machinery. https://doi.org/10.1145/3626772.3657675 DOI: https://doi.org/10.1145/3626772.3657675
Li, J., Yuan, Y., & Zhang, Z. (2024). Enhancing LLM factual accuracy with RAG to counter hallucinations: A case study on domain-specific queries in private knowledge-bases. ArXiv, abs/2403.10446. https://doi.org/10.48550/arXiv.2403.10446
Lin, J., Tang, J., Tang, H., Yang, S., Chen, W.-M., Wang, W.-C., Xiao, G., Dang, X., Gan, C., & Han, S. (2024). AWQ: Activation-aware weight quantization for on-device LLM compression and acceleration. ArXiv, abs/2306.00978. https://doi.org/10.48550/arXiv.2306.00978
Menon, K. (2024). Utilizing open-source AI to navigate and interpret technical documents : Leveraging RAG models for enhanced analysis and solutions in product documentation. http://www.theseus.fi/handle/10024/858250
Meta Llama. (2024a, July 23). meta-llama/Llama-3.1-8B. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/meta-llama/Llama-3.1-8B
Meta Llama. (2024b, April 24). meta-llama/Meta-Llama-3-8B. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/meta-llama/Meta-Llama-3-8B
Microsoft. (2024a, September 18). microsoft/Phi-3.5-mini-instruct. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/microsoft/Phi-3.5-mini-instruct
Microsoft. (2024b, September 20). microsoft/Phi-3-mini-4k-instruct. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
Mistral AI_. (2024a, September 27). mistralai/Mistral-7B-Instruct-v0.2. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
Mistral AI_. (2024b, November 6). mistralai/Mistral-Nemo-Instruct-2407. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
Ociepa, K. (2023). PoLEJ - Polish Open LLM Leaderboard. Azurro. Retrieved October 30, 2024 from https://polej.azurro.pl/
Soni, S., Datta, S., & Roberts, K. (2023). quEHRy: A question answering system to query electronic health records. Journal of the American Medical Informatics Association, 30(6), 1091-1102. https://doi.org/10.1093/JAMIA/OCAD050 DOI: https://doi.org/10.1093/jamia/ocad050
Soto-Jiménez, F., Martínez-Velásquez, M., Chicaiza, J., Vinueza-Naranjo, P., & Bouayad-Agha, N. (2024). RAG-based question-answering systems for closed-domains: Development of a prototype for the pollution domain. In K. Arai (Ed.), Intelligent Systems and Applications (Vol. 1065, pp. 573-589). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-66329-1_37 DOI: https://doi.org/10.1007/978-3-031-66329-1_37
SpeakLeash | Spichlerz. (2024, October 26). speakleash/Bielik-11B-v2.2-Instruct. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/speakleash/Bielik-11B-v2.2-Instruct
SpeakLeash | Spichlerz. (n.d.). Open PL LLM Leaderboard - a Hugging Face Space. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/spaces/speakleash/open_pl_llm_leaderboard
Tang, J., Liu, Q., Ye, Y., Lu, J., Wei, S., Lin, C., Li, W., Mahmood, M. F. F. Bin, Feng, H., Zhao, Z., Wang, Y., Liu, Y., Liu, H., Bai, X., & Huang, C. (2024). MTVQA: Benchmarking multilingual text-centric visual question answering. ArXiv, abs/2405.11985. https://doi.org/10.48550/arXiv.2405.11985
Vectara. (2024, December 11) Hallucination Evaluation Leaderboard - a Hugging Face Space. Hugging Face. Retrieved October 30, 2024 from https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard
Zhang, Y., Khalifa, M., Logeswaran, L., Lee, M., Lee, H., & Wang, L. (2023). Merging generated and retrieved knowledge for open-domain QA. 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (EMNLP 2023) (pp. 4710-4728). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.286 DOI: https://doi.org/10.18653/v1/2023.emnlp-main.286
Zhou, Y. Q., Liu, X. J., & Dong, Y. (2022). Build a robust QA system with transformer-based mixture of experts. ArXiv, abs/2204.09598. https://doi.org/10.48550/arXiv.2204.09598
Zhu, Y., Ren, C., Xie, S., Liu, S., Ji, H., Wang, Z., Sun, T., He, L., Li, Z., Zhu, X., & Pan, C. (2024). REALM: RAG-driven enhancement of multimodal electronic health records analysis via Large Language Models. ArXiv, abs/2402.07016. https://doi.org/10.48550/arXiv.2402.07016
Article Details
Abstract views: 487
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.
