DIGITAL NEWS CLASSIFICATION AND PUNCTUACTION USING MACHINE LEARNING AND TEXT MINING TECHNIQUES
Article Sidebar
Open full text
Issue Vol. 20 No. 2 (2024)
-
FEW-SHOT LEARNING WITH PRE-TRAINED LAYERS INTEGRATION APPLIED TO HAND GESTURE RECOGNITION FOR DISABLED PEOPLE
Mohamed ELBAHRI, Nasreddine TALEB, Sid Ahmed El Mehdi ARDJOUN, Chakib Mustapha Anouar ZOUAOUI1-23
-
DIGITAL NEWS CLASSIFICATION AND PUNCTUACTION USING MACHINE LEARNING AND TEXT MINING TECHNIQUES
Fernando Andrés CEVALLOS SALAS24-42
-
MODELING THE OPTIMAL MEASUREMENT TIME WITH A PROBE ON THE MACHINE TOOL USING MACHINE LEARNING METHODS
Jerzy JÓZWIK, Magdalena ZAWADA-MICHAŁOWSKA, Monika KULISZ, Paweł TOMIŁO, Marcin BARSZCZ, Paweł PIEŚKO, Michał LELEŃ, Kamil CYBUL43-59
-
EXAMINATION OF SUMMARIZED MEDICAL RECORDS FOR ICD CODE CLASSIFICATION VIA BERT
Dilek AYDOGAN-KILIC, Deniz Kenan KILIC, Izabela Ewa NIELSEN60-74
-
THE UTILIZATION OF 6G IN INDUSTRY 4.0
Hanan M. SHUKUR, Shavan ASKAR, Subhi R.M. ZEEBAREE75-89
-
APPLICATION OF EEMD-DFA ALGORITHMS AND ANN CLASSIFICATION FOR DETECTION OF KNEE OSTEOARTHRITIS USING VIBROARTHROGRAPHY
Anna MACHROWSKA, Robert KARPIŃSKI, Marcin MACIEJEWSKI, Józef JONAK, Przemysław KRAKOWSKI90-108
-
PREDICTING STATES OF EPILEPSY PATIENTS USING DEEP LEARNING MODELS
Boutkhil SIDAOUI109-125
-
IMPROVING E-LEARNING BY FACIAL EXPRESSION ANALYSIS
Amina KINANE DAOUADJI, Fatima BENDELLA126-137
-
EXPLORING THE IMPACT OF ARTIFICIAL INTELLIGENCE ON HUMANROBOT COOPERATION IN THE CONTEXT OF INDUSTRY 4.0
Hawkar ASAAD, Shavan ASKAR, Ahmed KAKAMIN, Nayla FAIQ138-156
-
AN AUTHENTICATION METHOD BASED ON A DIOPHANTINE MODEL OF THE COIN BAG PROBLEM
Krzysztof NIEMIEC, Grzegorz BOCEWICZ157-174
-
PREDICTION OF PATIENT’S WILLINGNESS FOR TREATMENT OF MENTAL ILLNESS USING MACHINE LEARNING APPROACHES
Mohammed Chachan YOUNIS175-193
-
AUTOMATION OF POLYCYSTIC OVARY SYNDROME DIAGNOSTICS THROUGH MACHINE LEARNING ALGORITHMS IN ULTRASOUND IMAGING
Roman GALAGAN, Serhiy ANDREIEV, Nataliia STELMAKH; Yaroslava RAFALSKA; Andrii MOMOT194-204
Archives
-
Vol. 21 No. 3
2025-10-05 12
-
Vol. 21 No. 2
2025-06-27 12
-
Vol. 21 No. 1
2025-03-31 12
-
Vol. 20 No. 4
2025-01-31 12
-
Vol. 20 No. 3
2024-09-30 12
-
Vol. 20 No. 2
2024-08-14 12
-
Vol. 20 No. 1
2024-03-30 12
-
Vol. 19 No. 4
2023-12-31 10
-
Vol. 19 No. 3
2023-09-30 10
-
Vol. 19 No. 2
2023-06-30 10
-
Vol. 19 No. 1
2023-03-31 10
-
Vol. 18 No. 4
2022-12-30 8
-
Vol. 18 No. 3
2022-09-30 8
-
Vol. 18 No. 2
2022-06-30 8
-
Vol. 18 No. 1
2022-03-30 7
-
Vol. 16 No. 4
2020-12-30 8
-
Vol. 16 No. 3
2020-09-30 8
-
Vol. 16 No. 2
2020-06-30 8
-
Vol. 16 No. 1
2020-03-30 8
Main Article Content
DOI
Authors
Abstract
Persistent growth of information in recent decades, along with the development of new information technologies for its management, have made it essential to develop systems that allow to synthesize this massive information or better known as big data. In this article, a feedback based system for massive processing of digital newspapers is presented. This system synthesizes the most relevant information from different news stories obtained from several sources. System is fed with information from the Internet using web scraping techniques. All this information is stored in a data lake which has been implemented using NoSQL databases. Next, data processing is performed, focusing on words, their relevance, and their correlation with other words from related content groups or headlines. In order to perform this aggrupation, machine learning Large Language Model (LLM), K Nearest Neighbors (KNN) and text mining techniques are used. New text mining algorithms are also developed to adjust thresholds during content aggregation and synthesis. Finally, the results visualization mechanism is presented which allow users to give a punctuation to the news stories. This mechanism represents a feedback punctuation for the system which will be considered into the global punctuation, which is the basis to show the results. This system can be useful to summarize all the information contained in the news stories which are stored in Internet, providing users a fast way to be informed.
Keywords:
References
Abramowicz, W. & Tolksdorf, R. (2010). Business information systems. 13th International Conference. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-12814-1 DOI: https://doi.org/10.1007/978-3-642-12814-1
Aggarwal, C. C., & Zhai, C. (Eds.). (2012). Mining text data. Springer New York. DOI: https://doi.org/10.1007/978-1-4614-3223-4
Almeida, I. (2023). Introduction to Large Language Models for business leaders: Responsible AI strategy beyond fear and hype. Now Next Later AI.
Amerland, D. (2013). Google Semantic Search: Search Engine Optimization (SEO) Techniques that get your company more traffic, increase brand impact, and amplify your online presence. Pearson Education.
Balusamy, B., Abirami, R. N., Kadry, S., & Gandomi, A. H. (2021). Big Data: Concepts, Technology, and Architecture. John Wiley & Sons. DOI: https://doi.org/10.1002/9781119701859
Bao, Z., Borovica-Gajic, R., Qiu, R., Choudhury, F., & Yang, Z. (Eds.). (2023). Databases theory and applications. 34th Australasian Database Conference (ADC 2023). Springer Nature Switzerland. DOI: https://doi.org/10.1007/978-3-031-47843-7
Berry, M. W., & Kogan, J. (Eds.). (2010). Text Mining: Applications and theory. John Wiley & Sons. DOI: https://doi.org/10.1002/9780470689646
Bobadilla, J. (2021). Machine Learning y Deep Learning: Usando Python, Scikit y Keras. Ediciones de la U.
Bustamante, N., & Guillén, S. (2020). Big Data y Mass Media. Aula Magna Proyecto clave McGraw Hill.
Campesato, O. (2023). Transformer, BERT, and GPT3: Including ChatGPT and Prompt Engineering. Mercury Learning and Information. DOI: https://doi.org/10.1515/9781683928973
Cevallos, F. (2024, April 9). GitHub dataset for digital news classification and punctuation using Machine Learning and Text Mining techniques. Github, Inc. Retrieved from https://github.com/fcevallosepn/news
Chen, J., Huynh, V.-N., Tang, X., & Wu, J. (Eds.). (2023). Knowledge and systems science. 22nd International Symposium. Springer Nature Singapore. DOI: https://doi.org/10.1007/978-981-99-8318-6
De Ville, B. (2001). Microsoft data mining: Integrated business intelligence for e-commerce and knowledge management. Digital Press.
Gils, B. (2023). Data in context: Models as enablers for managing and using data. Springer Nature Switzerland.
Gorelik, A. (2019). The Enterprise Big Data lake: Delivering the promise of Big Data and data science. O'Reilly Media.
Hildebrandt, M., & Gutwirth, S. (2008). Profiling the European citizen: Cross-disciplinary. Springer Netherlands. DOI: https://doi.org/10.1007/978-1-4020-6914-7
Johri, P., Verma, J. K., & Paul, S. (Eds.). (2020). Applications of Machine Learning (Algorithms for Intelligent Systems). Springer Nature Singapore. DOI: https://doi.org/10.1007/978-981-15-3357-0
Kannan, R., Rasool, R. U., Jin, H., & Balasundaram, S. R. (Eds.). (2016). Managing and processing Big Data in cloud computing. IGI Global. https://doi.org/10.4018/978-1-4666-9767-6 DOI: https://doi.org/10.4018/978-1-4666-9767-6
Koul, N., (2023). Prompt engineering for Large Language Models. Nimrita Koul.
Kumar, S. (2020). Can webometrics predict the academic rankings of institutes? The Journal of Prediction Markets, 14(2), 61-76. https://doi.org/10.5750/jpm.v14i2.1816 DOI: https://doi.org/10.5750/jpm.v14i2.1816
Nisbet, R., Miner, G., & Yale, K. (2017). Handbook of statistical analysis and data mining applications. Elsevier Science.
Ortega, J. M. (2022). Big data, machine learning y data science en python. RA-MA S.A. Editorial y Publicaciones.
Pasupuleti, P., & Purra, B. S. (2015). Data Lake Development with Big Data. Packt Publishing.
Rahman El Sheikh, A. A., & Alnoukari, M. (Eds.). (2012). Business Intelligence and Agile Methodologies for Knowledge-Based Organizations: Cross-Disciplinary Applications. IGI Global. https://doi.org/10.4018/978-1-61350-050-7 DOI: https://doi.org/10.4018/978-1-61350-050-7
Rajaguru, H., & Prabhakar, S. K. (2017). KNN classifier and K-Means clustering for robust classification of epilepsy from EEG signals. A detailed analysis. Anchor Academic Publishing.
Ribeiro, J. A. (2019). Big Data for executives and market professionals - Second edition. Amazon Digital.
Rúa Pérez, J. (2009). Tecnologìa, innovación y empresa. Lulu Press, Incorporated.
Sánchez Trujillo, M., & Pérez Hernández, J. A. (2021). Metodología CRISP-DM en la gestión de proyecto de Data Mining. Caso enfermedades dermatológicas. International Conference on Project Management. EAN Universidad.
Sarkis, A. (2023). Training Data for Machine Learning. O'Reilly Media.
Suganthi, K., Karthik, R., Rajesh, G., & Ching, P. H. C. (Eds.). (2021). Machine Learning and Deep Learning techniques in wireless and Mobile Networking Systems. CRC Press. DOI: https://doi.org/10.1201/9781003107477
Wang, L., Licheng, J., Shi, G., Li, X., & Liu, J. (Ed.). (2006). Fuzzy systems and knowledge discovery. Third International Conference. Springer Berlin Heidelberg. DOI: https://doi.org/10.1007/11881599
Zong, C., Xia, R., & Zhang, J. (2021). Text Data Mining. Springer Nature Singapore. DOI: https://doi.org/10.1007/978-981-16-0100-2
Article Details
Abstract views: 951
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.
