EVALUATING LARGE LANGUAGE MODELS FOR MEDICAL INFORMATION EXTRACTION: A COMPARATIVE STUDY OF ZERO-SHOT AND SCHEMA-BASED METHODS
Zakaria KADDARI
z.kaddari@ump.ac.maUniversité Mohammed Premier, National School of Applied Sciences, LaRSA laboratory, AIRES team (Morocco)
https://orcid.org/0000-0003-4034-5612
Ikram El HACHMI
Université Mohammed Premier, Faculty of Medicine and Pharmacy Oujda (Morocco)
https://orcid.org/0009-0008-7928-3088
Jamal BERRICH
Université Mohammed Premier, Faculty of Medicine and Pharmacy Oujda (Morocco)
https://orcid.org/0000-0001-8443-7223
Rim AMRANI
Université Mohammed Premier, Faculty of Medicine and Pharmacy Oujda (Morocco)
https://orcid.org/0000-0003-3906-5533
Toumi BOUCHENTOUF
Université Mohammed Premier, Faculty of Medicine and Pharmacy Oujda (Morocco)
https://orcid.org/0000-0002-2689-8678
Abstract
This study investigates the application of large language models, particularly ChatGPT, in the extraction and structuring of medical information from free-text patient reports. The authors explore two distinct methods: a zero-shot extraction approach and a schema-based extraction approach. The dataset, consisting of 1230 anonymized French medical reports from the Department of Neonatology of the Mohammed VI University Hospital, served as the basis for these experiments. The findings indicate that while ChatGPT demonstrates a significant capability in structuring medical data, certain challenges remain, particularly with complex and non-standardized text formats. The authors evaluate the model's performance using precision, recall, and F1 score metrics, providing a comprehensive assessment of its applicability in clinical settings.
Keywords:
Medical Information Extraction, Large Language Models, ChatGPT, schema-based extractionReferences
Agrawal, M., Hegselmann, S., Lang, H., Kim, Y., & Sontag, D. (2022). Large Language Models are few-shot clinical information extractors. ArXiv, abs/2205.12689. https://doi.org/10.48550/arXiv.2205.12689
Google Scholar
Bergomi, L., Tommaso, M., Antonazzo, P., Alberghi, L., Bellazzi, R., Preda, L., Bortolotto, C., & Parimbelli, E. (2024). Reshaping free-text radiology notes into structured reports with generative question answering transformers. Artificial Intelligence in Medicine, 154, 102924. https://doi.org/10.1016/j.artmed.2024.102924
Google Scholar
Bhate, N., Mittal, A., He, Z., & Luo, X. (2023). Zero-shot learning with minimum instruction to extract social determinants and family history from clinical notes using GPT Model. IEEE International Conference on Big Data (BigData) (pp. 1476-1480). IEEE. https://doi.org/10.1109/BigData59044.2023.10386811
Google Scholar
Huang, J., Yang, D. M., Rong, R., Nezafati, K., Treager, C., Chi, Z., Wang, S., Cheng, X., Guo, Y., Klesse, L. J., Xiao, G., Peterson, E. D., Zhan, X., & Xie, Y. (2024). A critical assessment of using ChatGPT for extracting structured data from clinical notes. Npj Digital Medicine, 7(1), 106. https://doi.org/10.1038/s41746-024-01079-8
Google Scholar
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2024). A Survey on hallucination in Large Language Models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 3703155. https://doi.org/10.1145/3703155
Google Scholar
Kaddari, Z., Mellah, Y., Berrich, J., Belkasmi, M. G., & Bouchentouf, T. (2021). Natural language processing: challenges and future directions. In T. Masrour, I. El Hassani, & A. Cherrafi (Eds.), Artificial Intelligence and Industrial Applications (Vol. 144, pp. 236–246). Springer International Publishing. https://doi.org/10.1007/978-3-030-53970-2_22
Google Scholar
Kernberg, A., Gold, J., & Mohan, V. (2024). Using ChatGPT-4 to create structured medical notes from audio recordings of physician-patient encounters: Comparative study. Journal of Medical Internet Research, 26, e54419. https://doi.org/10.2196/54419
Google Scholar
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. ArXiv, abs/2203.02155. https://doi.org/10.48550/arXiv.2203.02155
Google Scholar
Patra, B. G., Lepow, L. A., Kasi Reddy Jagadeesh Kumar, P., Vekaria, V., Sharma, M. M., Adekkanattu, P., Fennessy, B., Hynes, G., Landi, I., Sanchez-Ruiz, J. A., Ryu, E., Biernacka, J. M., Nadkarni, G. N., Talati, A., Weissman, M., Olfson, M., Mann, J. J., Zhang, Y., Charney, A. W., & Pathak, J. (2024). Extracting social support and social isolation information from clinical psychiatry notes: Comparing a rule-based natural language processing system and a large language model. Journal of the American Medical Informatics Association. https://doi.org/10.1093/jamia/ocae260
Google Scholar
Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3, 121-154. https://doi.org/10.1016/j.iotcps.2023.04.003
Google Scholar
Straka, M., Náplava, J., Straková, J., & Samuel, D. (2021). RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model. In K. Ekštein, F. Pártl, & M. Konopík (Eds.), Text, Speech, and Dialogue (Vol. 12848, pp. 197-209). Springer International Publishing. https://doi.org/10.1007/978-3-030-83527-9_17
Google Scholar
Tsai, R. T.-H., Wu, S.-H., Chou, W.-C., Lin, Y.-C., He, D., Hsiang, J., Sung, T.-Y., & Hsu, W.-L. (2006). Various criteria in the evaluation of biomedical named entity recognition. BMC Bioinformatics, 7, 92. https://doi.org/10.1186/1471-2105-7-92
Google Scholar
Yifan, Y., Jinhao, D., Kaidi, X., Yuanfang, C., Zhibo, S., & Yue, Z. (2024). A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confidence Computing, 4(2), 100211. https://doi.org/10.1016/j.hcc.2024.100211
Google Scholar
Zelina, P., Halamkova, J., & Novacek, V. (2022). Unsupervised extraction, labelling and clustering of segments from clinical notes. 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 1362-1368). IEEE. http://dx.doi.org/10.1109/BIBM55620.2022.9995229
Google Scholar
Zhan, X., Humbert-Droz, M., Mukherjee, P., & Gevaert, O. (2021). Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases. Patterns, 2(7), 100289. https://doi.org/10.1016/j.patter.2021.100289
Google Scholar
Authors
Zakaria KADDARIz.kaddari@ump.ac.ma
Université Mohammed Premier, National School of Applied Sciences, LaRSA laboratory, AIRES team Morocco
https://orcid.org/0000-0003-4034-5612
Authors
Ikram El HACHMIUniversité Mohammed Premier, Faculty of Medicine and Pharmacy Oujda Morocco
https://orcid.org/0009-0008-7928-3088
Authors
Jamal BERRICHUniversité Mohammed Premier, Faculty of Medicine and Pharmacy Oujda Morocco
https://orcid.org/0000-0001-8443-7223
Authors
Rim AMRANIUniversité Mohammed Premier, Faculty of Medicine and Pharmacy Oujda Morocco
https://orcid.org/0000-0003-3906-5533
Authors
Toumi BOUCHENTOUFUniversité Mohammed Premier, Faculty of Medicine and Pharmacy Oujda Morocco
https://orcid.org/0000-0002-2689-8678
Statistics
Abstract views: 118PDF downloads: 21
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.
Similar Articles
- Muhammad Hasyimsyah BATUBARA, Awal Kurnia Putra NASUTION , NURMALINA, Fachrur RIZHA, CHATGPT IN COMMUNICATION: A SYSTEMATIC LITERATURE REVIEW , Applied Computer Science: Vol. 20 No. 3 (2024)
- Fernando Andrés CEVALLOS SALAS, DIGITAL NEWS CLASSIFICATION AND PUNCTUACTION USING MACHINE LEARNING AND TEXT MINING TECHNIQUES , Applied Computer Science: Vol. 20 No. 2 (2024)
- Zahid Zamir, CAN THE SYSTEM, INFORMATION, AND SERVICE QUALITIES IMPACT EMPLOYEE LEARNING, ADAPTABILITY, AND JOB SATISFACTION? , Applied Computer Science: Vol. 19 No. 1 (2023)
- Sheikh Amir FAYAZ, Majid ZAMAN, Muheet Ahmed BUTT, Sameer KAUL, HOW MACHINE LEARNING ALGORITHMS ARE USED IN METEOROLOGICAL DATA CLASSIFICATION: A COMPARATIVE APPROACH BETWEEN DT, LMT, M5-MT, GRADIENT BOOSTING AND GWLM-NARX MODELS , Applied Computer Science: Vol. 18 No. 4 (2022)
- Yehor TATARCHENKO, Volodymyr LYFAR, Halyna TATARCHENKO, INFORMATION MODEL OF SYSTEM OF SUPPORT OF DECISION MAKING DURING MANAGEMENT OF IT COMPANIES , Applied Computer Science: Vol. 16 No. 1 (2020)
- Bartosz CIEŚLA, Grzegorz GUNIA, DEVELOPMENT OF INTEGRATED MANAGEMENT INFORMATION SYSTEMS IN THE CONTEXT OF INDUSTRY 4.0 , Applied Computer Science: Vol. 15 No. 4 (2019)
- Ziadeddine MAKHLOUF, Abdallah MERAOUMIA , Laimeche LAKHDAR, Mohamed Yassine HAOUAM , ENHANCING MEDICAL DATA SECURITY IN E-HEALTH SYSTEMS USING BIOMETRIC-BASED WATERMARKING , Applied Computer Science: Vol. 20 No. 1 (2024)
- Dilek AYDOGAN-KILIC, Deniz Kenan KILIC, Izabela Ewa NIELSEN, EXAMINATION OF SUMMARIZED MEDICAL RECORDS FOR ICD CODE CLASSIFICATION VIA BERT , Applied Computer Science: Vol. 20 No. 2 (2024)
- Marcin BADUROWICZ, Stanisław SKULIMOWSKI, Maciej LASKOWSKI, FEASIBILITY OF USING LOW-PARAMETER LOCAL LLMS IN ANSWERING QUESTIONS FROM ENTERPRISE KNOWLEDGE BASE , Applied Computer Science: Vol. 20 No. 4 (2024)
- Nataliya SHABLIY, Serhii LUPENKO, Nadiia LUTSYK, Oleh YASNIY, Olha MALYSHEVSKA, KEYSTROKE DYNAMICS ANALYSIS USING MACHINE LEARNING METHODS , Applied Computer Science: Vol. 17 No. 4 (2021)
You may also start an advanced similarity search for this article.