EXPLOITING BERT FOR MALFORMED SEGMENTATION DETECTION TO IMPROVE SCIENTIFIC WRITINGS

Abdelrahman Halawa; Shehab  Gamalel-Din; Abdurrahman Nasr

doi:10.35784/acs-2023-20

Open full text

PDF

Published: Jun 30, 2023

DOI: https://doi.org/10.35784/acs-2023-20

DOI

https://doi.org/10.35784/acs-2023-20

Authors

Abdelrahman Halawa

ahalawa@azhar.edu.eg

Al-Azhar University

https://orcid.org/0009-0004-7107-1049

Shehab Gamalel-Din

drshehabg@yahoo.com

https://orcid.org/0000-0002-0696-6119

Abdurrahman Nasr

anasr@azhar.edu.eg

Abstract

Writing a well-structured scientific documents, such as articles and theses, is vital for comprehending the document's argumentation and understanding its messages. Furthermore, it has an impact on the efficiency and time required for studying the document. Proper document segmentation also yields better results when employing automated Natural Language Processing (NLP) manipulation algorithms, including summarization and other information retrieval and analysis functions. Unfortunately, inexperienced writers, such as young researchers and graduate students, often struggle to produce well-structured professional documents. Their writing frequently exhibits improper segmentations or lacks semantically coherent segments, a phenomenon referred to as "mal-segmentation." Examples of mal-segmentation include improper paragraph or section divisions and unsmooth transitions between sentences and paragraphs. This research addresses the issue of mal-segmentation in scientific writing by introducing an automated method for detecting mal-segmentations, and utilizing Sentence Bidirectional Encoder Representations from Transformers (sBERT) as an encoding mechanism. The experimental results section shows a promising results for the detection of mal-segmentation using the sBERT technique.

Keywords:

NLP, text segmentation, mal-segmentation, BERT

References

Halawa, A., Gamalel-Din, S. ., & Nasr, A. (2023). EXPLOITING BERT FOR MALFORMED SEGMENTATION DETECTION TO IMPROVE SCIENTIFIC WRITINGS. Applied Computer Science, 19(2), 126–141. https://doi.org/10.35784/acs-2023-20

Article Sidebar

Main Article Content

DOI

Authors

Abstract

Keywords:

References

Article Details

License