EXPLOITING BERT FOR MALFORMED SEGMENTATION DETECTION TO IMPROVE SCIENTIFIC WRITINGS
Article Sidebar
Open full text
Issue Vol. 19 No. 2 (2023)
-
CNN AND LSTM FOR THE CLASSIFICATION OF PARKINSON'S DISEASE BASED ON THE GTCC AND MFCC
Nouhaila BOUALOULOU, Taoufiq BELHOUSSINE DRISSI, Benayad NSIRI1-24
-
MASK FACE INPAINTING BASED ON IMPROVED GENERATIVE ADVERSARIAL NETWORK
Qingyu Liu, Roben A. Juanatas25-42
-
APPLICATION OF THE REAL-TIME FAN SCHEDULING IN THE EXPLORATION-EXPLOITATION TO OPTIMIZE MINIMUM FUNCTIONS OBJECTIVES
Mariano LARIOS, Perfecto M. QUINTERO-FLORES , Mario ANZURES-GARCÍA , Miguel CAMACHO-HERNANDEZ43-54
-
APPLICATION OF GENETIC ALGORITHMS TO THE TRAVELING SALESMAN PROBLEM
Tomasz Sikora, Wanda Gryglewicz-Kacerka55-62
-
THE POTENTIAL FOR REAL-TIME TESTING OF HIGH FREQUENCY TRADING STRATEGIES THROUGH A DEVELOPED TOOL DURING VOLATILE MARKET CONDITIONS
Mantas Vaitonis, Konstantinas Korovkinas63-81
-
NAVIGATION STRATEGY FOR MOBILE ROBOT BASED ON COMPUTER VISION AND YOLOV5 NETWORK IN THE UNKNOWN ENVIRONMENT
Thanh-Lam BUI, Ngoc-Tien TRAN82-95
-
A NEW METHOD FOR GENERATING VIRTUAL MODELS OF NONLINEAR HELICAL SPRINGS BASED ON A RIGOROUS MATHEMATICAL MODEL
Krzysztof Michalczyk, Mariusz Warzecha, Robert Baran96-111
-
HYBRID FEATURE SELECTION AND SUPPORT VECTOR MACHINE FRAMEWORK FOR PREDICTING MAINTENANCE FAILURES
Mouna TARIK, Ayoub MNIAI, Khalid JEBARI112-124
-
CLASSIFICATION OF PARKINSON'S DISEASE IN BRAIN MRI IMAGES USING DEEP RESIDUAL CONVOLUTIONAL NEURAL NETWORK
Puppala Praneeth, Majety Sathvika, Vivek Kommareddy, Madala Sarath, Saran Mallela, Koneru Suvarna Vani, Prasun Chkrabarti125-146
-
EXPLOITING BERT FOR MALFORMED SEGMENTATION DETECTION TO IMPROVE SCIENTIFIC WRITINGS
Abdelrahman Halawa, Shehab Gamalel-Din; Abdurrahman Nasr126-141
Archives
-
Vol. 21 No. 3
2025-10-05 12
-
Vol. 21 No. 2
2025-06-27 12
-
Vol. 21 No. 1
2025-03-31 12
-
Vol. 20 No. 4
2025-01-31 12
-
Vol. 20 No. 3
2024-09-30 12
-
Vol. 20 No. 2
2024-08-14 12
-
Vol. 20 No. 1
2024-03-30 12
-
Vol. 19 No. 4
2023-12-31 10
-
Vol. 19 No. 3
2023-09-30 10
-
Vol. 19 No. 2
2023-06-30 10
-
Vol. 19 No. 1
2023-03-31 10
-
Vol. 18 No. 4
2022-12-30 8
-
Vol. 18 No. 3
2022-09-30 8
-
Vol. 18 No. 2
2022-06-30 8
-
Vol. 18 No. 1
2022-03-30 7
-
Vol. 17 No. 4
2021-12-30 8
-
Vol. 17 No. 3
2021-09-30 8
-
Vol. 17 No. 2
2021-06-30 8
-
Vol. 17 No. 1
2021-03-30 8
Main Article Content
DOI
Authors
Abstract
Writing a well-structured scientific documents, such as articles and theses, is vital for comprehending the document's argumentation and understanding its messages. Furthermore, it has an impact on the efficiency and time required for studying the document. Proper document segmentation also yields better results when employing automated Natural Language Processing (NLP) manipulation algorithms, including summarization and other information retrieval and analysis functions. Unfortunately, inexperienced writers, such as young researchers and graduate students, often struggle to produce well-structured professional documents. Their writing frequently exhibits improper segmentations or lacks semantically coherent segments, a phenomenon referred to as "mal-segmentation." Examples of mal-segmentation include improper paragraph or section divisions and unsmooth transitions between sentences and paragraphs. This research addresses the issue of mal-segmentation in scientific writing by introducing an automated method for detecting mal-segmentations, and utilizing Sentence Bidirectional Encoder Representations from Transformers (sBERT) as an encoding mechanism. The experimental results section shows a promising results for the detection of mal-segmentation using the sBERT technique.
Keywords:
References
Almuhareb, A. a.-T. (2019). Arabic word segmentation with long short-term memory neural networks and word embedding. IEEE Access, 7, 12879-12887. https://doi.org/10.1109/ACCESS.2019.2893460 DOI: https://doi.org/10.1109/ACCESS.2019.2893460
Barrow, J., Jain, R., Morariu, V., & Manjunatha, V. (2020). A joint model for document segmentation and segment labeling. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, (pp. 313-322). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.29 DOI: https://doi.org/10.18653/v1/2020.acl-main.29
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., & Specia, L. (2017). Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv. https://doi.org/10.48550/arXiv.1708.00055 DOI: https://doi.org/10.18653/v1/S17-2001
Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N., John, R. S., Constant, N., Guajardo- Cespedes, M., Yuan, S., Tar, Ch., Sung, Y.-H. Strope, B., & Kurzweil, R. (2018). Universal sentence encoder. arXiv. https://doi.org/10.48550/arXiv.1803.11175 DOI: https://doi.org/10.18653/v1/D18-2029
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv. https://doi.org/10.48550/arXiv.1810.04805
Galanopoulos, D., & Mezaris, V.(2019). Temporal lecture video fragmentation using word embeddings. In Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., & Vrochidis, S. (Eds.) MultiMedia Modeling: 25th International Conference, MMM 2019, Thessaloniki, Greece, January 8--11, 2019, Proceedings, Part II (vol. 25, pp. 254--265). Springer. https://doi.org/10.1007/978-3-030-05716-9_21 DOI: https://doi.org/10.1007/978-3-030-05716-9_21
Hearst, M. A. (1997). Text tiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics, 23(1), 33-64.
Hinkel, E. (2001). Matters of cohesion in L2 academic texts. Applied language learning, 12(2), 111-132.
ielts-mentor. (2022). Retrieved from https://www.ielts-mentor.com/reading-sample/gt-reading/3162- employment-in-japan ?
Levy, C. M., & Ransdell. S. (1996). The science of writing: Theories, methods, individual differences and applications. Routledge. https://doi.org/10.4324/9780203811122 DOI: https://doi.org/10.4324/9780203811122
Lin, M., Nunamaker, J.F., Chau, M., & Chen, H. (2004). Segmentation of lecture videos based on text: a method combining multiple linguistic features. 37th Annual Hawaii International Conference on System Sciences. (pp. 9-9). IEEE. https://doi.org/10.1109/HICSS.2004.1265045 DOI: https://doi.org/10.1109/HICSS.2004.1265045
Lin, M., Chau, M., Cao, J., & Nunamaker, J. F. (2005). Automated video segmentation for lecture videos: A linguistics-based approach. International Journal of Technology and Human Interaction (IJTHI), 1(2), 27-45. https://doi.org/10.4018/jthi.2005040102 DOI: https://doi.org/10.4018/jthi.2005040102
Lo, K., Jin, Y., Tan, W., Liu, M., Du, L., & Buntine, W. (2021). Transformer over Pre-trained Transformer for Neural Text Segmentation with Enhanced Topic Coherence. arXiv. https://doi.org/10.48550/arXiv.2110.07160 DOI: https://doi.org/10.18653/v1/2021.findings-emnlp.283
Luckert, M., & Schaefer- Kehnert, M. (2016). Using machine learning methods for evaluating the quality of technical documents.
Maraj, A., Martin, M. V., & Makrehchi, M. (2021). A More Effective Sentence-Wise Text Segmentation Approach Using BERT. In Llads, J., Lopresti, D., & Uchida, S (Eds.), Document Analysis and Recognition--ICDAR 2021, (pp. 236-250). Springer. https://doi.org/10.1007/978-3-030-86337-1_16 DOI: https://doi.org/10.1007/978-3-030-86337-1_16
Ponceleon, D., & Srinivasan, S. (2001). Automatic discovery of salient segments in imperfect speech transcripts. Proceedings of the tenth international conference on Information and knowledge management, 490- 497. The ACM Digital Library. https://doi.org/10.1145/502585.502668 DOI: https://doi.org/10.1145/502585.502668
Precision_and_recall. (2022). Retrieved from wikipedia: https://en.wikipedia.org/wiki/Precision_and_recall?oldformat=true
Reimers, N., & Gurevyvh, I. (2019). Sentence-BERT: Sentence embeddings using siamese BERT-networks. arXiv. https://doi.org/10.48550/arXiv.1908.10084 DOI: https://doi.org/10.18653/v1/D19-1410
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. IEEE conference on computer vision and pattern recognition (CVPR) (pp.815-823). IEEE. https://doi.org/10.1109/CVPR.2015.7298682 DOI: https://doi.org/10.1109/CVPR.2015.7298682
Shah, R. R., Yu, Y., Skaikh, A. D., & Zimmermann, R. (2015). TRACE: linguistic-based approach for automatic lecture video segmentation leveraging Wikipedia texts. 2015 IEEE International Symposium on Multimedia (ISM) (pp. 217-220). IEEE. https://doi.org/10.1109/ISM.2015.18 DOI: https://doi.org/10.1109/ISM.2015.18
Soares, E. R., & Barrére, E. (2019). An optimization model for temporal video lecture segmentation using word2vec and acoustic features. Proceedings of the 25th Brazillian Symposium on Multimedia and the Web, 513-520. The ACM Digital Library. https://doi.org/10.1145/3323503.3349548 DOI: https://doi.org/10.1145/3323503.3349548
Solbiati, A., Heffernan, K., Damaskinos, G., Poddar, S., Modi, S., & Cali, J. (2021). Unsupervised topic segmentation of meetings with BERT embeddings. arXiv. https://doi.org/10.48550/arXiv.2106.12978
Glavas, G., & Somasundaran, S. (2020). Two-level transformer and auxiliary coherence modeling for improved text segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 7797-7804. https://doi.org/10.1609/aaai.v34i05.6284 DOI: https://doi.org/10.1609/aaai.v34i05.6284
Text_segmentation. (2011). Retrieved from wikipedia: https://en.wikipedia.org/wiki/Text_segmentation
Ugur Akinci, G. K. (2012). Writing Transition Phrases and Sentences: 12 Types of Sentence and Paragraph Transitions with 112 Examples.
University, UAH. (n.d.). WRITING EFFECTIVE TRANSITIONS. Retrieved from https://www.uah.edu/images/administrative/student-successcenter/resources/handouts/handouts_2019/writing_effective_transitions.pdf
Wang, Y., Li, S., & Yang, J. (2018). Toward fast and accurate neural discourse segmentation. arXiv. https://doi.org/10.48550/arXiv.1808.09147 DOI: https://doi.org/10.18653/v1/D18-1116
Article Details
Abstract views: 256
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.
