EVALUATING LARGE LANGUAGE MODELS FOR MEDICAL INFORMATION EXTRACTION: A COMPARATIVE STUDY OF ZERO-SHOT AND SCHEMA-BASED METHODS
Zakaria KADDARI
z.kaddari@ump.ac.maUniversité Mohammed Premier, National School of Applied Sciences, LaRSA laboratory, AIRES team (Morocco)
https://orcid.org/0000-0003-4034-5612
Ikram El HACHMI
Université Mohammed Premier, Faculty of Medicine and Pharmacy Oujda (Morocco)
https://orcid.org/0009-0008-7928-3088
Jamal BERRICH
Université Mohammed Premier, Faculty of Medicine and Pharmacy Oujda (Morocco)
https://orcid.org/0000-0001-8443-7223
Rim AMRANI
Université Mohammed Premier, Faculty of Medicine and Pharmacy Oujda (Morocco)
https://orcid.org/0000-0003-3906-5533
Toumi BOUCHENTOUF
Université Mohammed Premier, Faculty of Medicine and Pharmacy Oujda (Morocco)
https://orcid.org/0000-0002-2689-8678
Abstract
This study investigates the application of large language models, particularly ChatGPT, in the extraction and structuring of medical information from free-text patient reports. The authors explore two distinct methods: a zero-shot extraction approach and a schema-based extraction approach. The dataset, consisting of 1230 anonymized French medical reports from the Department of Neonatology of the Mohammed VI University Hospital, served as the basis for these experiments. The findings indicate that while ChatGPT demonstrates a significant capability in structuring medical data, certain challenges remain, particularly with complex and non-standardized text formats. The authors evaluate the model's performance using precision, recall, and F1 score metrics, providing a comprehensive assessment of its applicability in clinical settings.
Keywords:
Medical Information Extraction, Large Language Models, ChatGPT, schema-based extractionReferences
Agrawal, M., Hegselmann, S., Lang, H., Kim, Y., & Sontag, D. (2022). Large Language Models are few-shot clinical information extractors. ArXiv, abs/2205.12689. https://doi.org/10.48550/arXiv.2205.12689
Google Scholar
Bergomi, L., Tommaso, M., Antonazzo, P., Alberghi, L., Bellazzi, R., Preda, L., Bortolotto, C., & Parimbelli, E. (2024). Reshaping free-text radiology notes into structured reports with generative question answering transformers. Artificial Intelligence in Medicine, 154, 102924. https://doi.org/10.1016/j.artmed.2024.102924
Google Scholar
Bhate, N., Mittal, A., He, Z., & Luo, X. (2023). Zero-shot learning with minimum instruction to extract social determinants and family history from clinical notes using GPT Model. IEEE International Conference on Big Data (BigData) (pp. 1476-1480). IEEE. https://doi.org/10.1109/BigData59044.2023.10386811
Google Scholar
Huang, J., Yang, D. M., Rong, R., Nezafati, K., Treager, C., Chi, Z., Wang, S., Cheng, X., Guo, Y., Klesse, L. J., Xiao, G., Peterson, E. D., Zhan, X., & Xie, Y. (2024). A critical assessment of using ChatGPT for extracting structured data from clinical notes. Npj Digital Medicine, 7(1), 106. https://doi.org/10.1038/s41746-024-01079-8
Google Scholar
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2024). A Survey on hallucination in Large Language Models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 3703155. https://doi.org/10.1145/3703155
Google Scholar
Kaddari, Z., Mellah, Y., Berrich, J., Belkasmi, M. G., & Bouchentouf, T. (2021). Natural language processing: challenges and future directions. In T. Masrour, I. El Hassani, & A. Cherrafi (Eds.), Artificial Intelligence and Industrial Applications (Vol. 144, pp. 236–246). Springer International Publishing. https://doi.org/10.1007/978-3-030-53970-2_22
Google Scholar
Kernberg, A., Gold, J., & Mohan, V. (2024). Using ChatGPT-4 to create structured medical notes from audio recordings of physician-patient encounters: Comparative study. Journal of Medical Internet Research, 26, e54419. https://doi.org/10.2196/54419
Google Scholar
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. ArXiv, abs/2203.02155. https://doi.org/10.48550/arXiv.2203.02155
Google Scholar
Patra, B. G., Lepow, L. A., Kasi Reddy Jagadeesh Kumar, P., Vekaria, V., Sharma, M. M., Adekkanattu, P., Fennessy, B., Hynes, G., Landi, I., Sanchez-Ruiz, J. A., Ryu, E., Biernacka, J. M., Nadkarni, G. N., Talati, A., Weissman, M., Olfson, M., Mann, J. J., Zhang, Y., Charney, A. W., & Pathak, J. (2024). Extracting social support and social isolation information from clinical psychiatry notes: Comparing a rule-based natural language processing system and a large language model. Journal of the American Medical Informatics Association. https://doi.org/10.1093/jamia/ocae260
Google Scholar
Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3, 121-154. https://doi.org/10.1016/j.iotcps.2023.04.003
Google Scholar
Straka, M., Náplava, J., Straková, J., & Samuel, D. (2021). RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model. In K. Ekštein, F. Pártl, & M. Konopík (Eds.), Text, Speech, and Dialogue (Vol. 12848, pp. 197-209). Springer International Publishing. https://doi.org/10.1007/978-3-030-83527-9_17
Google Scholar
Tsai, R. T.-H., Wu, S.-H., Chou, W.-C., Lin, Y.-C., He, D., Hsiang, J., Sung, T.-Y., & Hsu, W.-L. (2006). Various criteria in the evaluation of biomedical named entity recognition. BMC Bioinformatics, 7, 92. https://doi.org/10.1186/1471-2105-7-92
Google Scholar
Yifan, Y., Jinhao, D., Kaidi, X., Yuanfang, C., Zhibo, S., & Yue, Z. (2024). A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confidence Computing, 4(2), 100211. https://doi.org/10.1016/j.hcc.2024.100211
Google Scholar
Zelina, P., Halamkova, J., & Novacek, V. (2022). Unsupervised extraction, labelling and clustering of segments from clinical notes. 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 1362-1368). IEEE. http://dx.doi.org/10.1109/BIBM55620.2022.9995229
Google Scholar
Zhan, X., Humbert-Droz, M., Mukherjee, P., & Gevaert, O. (2021). Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases. Patterns, 2(7), 100289. https://doi.org/10.1016/j.patter.2021.100289
Google Scholar
Authors
Zakaria KADDARIz.kaddari@ump.ac.ma
Université Mohammed Premier, National School of Applied Sciences, LaRSA laboratory, AIRES team Morocco
https://orcid.org/0000-0003-4034-5612
Authors
Ikram El HACHMIUniversité Mohammed Premier, Faculty of Medicine and Pharmacy Oujda Morocco
https://orcid.org/0009-0008-7928-3088
Authors
Jamal BERRICHUniversité Mohammed Premier, Faculty of Medicine and Pharmacy Oujda Morocco
https://orcid.org/0000-0001-8443-7223
Authors
Rim AMRANIUniversité Mohammed Premier, Faculty of Medicine and Pharmacy Oujda Morocco
https://orcid.org/0000-0003-3906-5533
Authors
Toumi BOUCHENTOUFUniversité Mohammed Premier, Faculty of Medicine and Pharmacy Oujda Morocco
https://orcid.org/0000-0002-2689-8678
Statistics
Abstract views: 119PDF downloads: 22
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.
Similar Articles
- Islam MOHAMED, Mohamed EL-WAKAD, Khaled ABBAS, Mohamed ABOAMER, Nader A. Rahman MOHAMED, PUPIL DIAMETER AND MACHINE LEARNING FOR DEPRESSION DETECTION: A COMPARATIVE STUDY WITH DEEP LEARNING MODELS , Applied Computer Science: Vol. 20 No. 4 (2024)
- Robert KARPIŃSKI, Przemysław KRAKOWSKI, Józef JONAK, Anna MACHROWSKA, Marcin MACIEJEWSKI, COMPARISON OF SELECTED CLASSIFICATION METHODS BASED ON MACHINE LEARNING AS A DIAGNOSTIC TOOL FOR KNEE JOINT CARTILAGE DAMAGE BASED ON GENERATED VIBROACOUSTIC PROCESSES , Applied Computer Science: Vol. 19 No. 4 (2023)
- Sahar ZAMANI KHANGHAH, Keivan MAGHOOLI, EMOTION RECOGNITION FROM HEART RATE VARIABILITY WITH A HYBRID SYSTEM COMBINED HIDDEN MARKOV MODEL AND POINCARE PLOT , Applied Computer Science: Vol. 20 No. 1 (2024)
- Mahmoud BAKR, Sayed ABDEL-GABER, Mona NASR, Maryam HAZMAN, TOMATO DISEASE DETECTION MODEL BASED ON DENSENET AND TRANSFER LEARNING , Applied Computer Science: Vol. 18 No. 2 (2022)
- Nouhaila BOUALOULOU, Taoufiq BELHOUSSINE DRISSI, Benayad NSIRI, CNN AND LSTM FOR THE CLASSIFICATION OF PARKINSON'S DISEASE BASED ON THE GTCC AND MFCC , Applied Computer Science: Vol. 19 No. 2 (2023)
- Sri INDRA MAIYANTI, Anita DESIANI, Syafrina LAMIN, P PUSPITAHATI, Muhammad ARHAMI, Nuni GOFAR, Destika CAHYANA, ROTATION-GAMMA CORRECTION AUGMENTATION ON CNN-DENSE BLOCK FOR SOIL IMAGE CLASSIFICATION , Applied Computer Science: Vol. 19 No. 3 (2023)
- Wulan Dewi, Wiranto Herry Utomo, PLANT CLASSIFICATION BASED ON LEAF EDGES AND LEAF MORPHOLOGICAL VEINS USING WAVELET CONVOLUTIONAL NEURAL NETWORK , Applied Computer Science: Vol. 17 No. 1 (2021)
- Malek M. AL-NAWASHI , Obaida M. AL-HAZAIMEH, Mutaz Kh. KHAZAALEH , A NEW APPROACH FOR BREAST CANCER DETECTION- BASED MACHINE LEARNING TECHNIQUE , Applied Computer Science: Vol. 20 No. 1 (2024)
- Roman GALAGAN, Serhiy ANDREIEV, Nataliia STELMAKH, Yaroslava RAFALSKA, Andrii MOMOT, AUTOMATION OF POLYCYSTIC OVARY SYNDROME DIAGNOSTICS THROUGH MACHINE LEARNING ALGORITHMS IN ULTRASOUND IMAGING , Applied Computer Science: Vol. 20 No. 2 (2024)
- Ferra Arik TRIDALESTARI, Hanung Nindito PRASETYO, THE EFFECT OF INFORMATION TECHNOLOGY AND ENTREPRENEURSHIP ON THE E-SERVICES QUALITY THAT HAVE AN IMPACT ON CUSTOMER VALUE: EVIDENCE FROM INDONESIA SMEs , Applied Computer Science: Vol. 19 No. 4 (2023)
<< < 1 2 3 4 5 6 7 8 9 10 > >>
You may also start an advanced similarity search for this article.