Comparative analysis of selected programs for optical text recognition


Abstract

The aim of the article is to compare three programs for the optical text recognition. The problem of the optical text recognition has been defined. Next, briefly the functionality of this technology was described. The most important programs realizing the discussed problem were also characterized. The selected programs were tested using two samples of machine writing in Polish. The speed of the text recognition process was determined. The correctness of characters and words recognition in the analyzed text was also specified.


Keywords

Optical Character Recognition; OCR; Tesseract; Ocrad; GOCR

[1] Bieniecki, Analiza wymagań dla metod przetwarzania wstępnego obrazów w automatycznym rozpoznawaniu tekstu, http://wbieniec.kis.p.lodz.pl/research/files/05_Bronislawow_OCR.pdf [12.11.2017].
[2] Tobias Blanke, Michael Bryant, Mark Hedges, Open source optical character recognition for historical research, Journal of Documentation 68 (2012), 659-683.
[3] Inad Aljarrah, Osama Al-Khaleel, Khaldoon Mhaidat, Mu’ath Alrefai, Abdullah Alzu’bi, Mohammad Rabab’ah, Automated System for Arabic Optical Character Recognition with Lookup Dictionary, Journal of Emerging Technologies in Web Intelligence 4 (2012), 362-370.
[4] Abbyy Technology Portal, https://abbyy.technology/en:start, [22.11.2017].
[5] The Tesseract open source OCR engine, http://code.google.com/p/tesseract-ocr [20.11.2017].
[6] https://products.aspose.com/ocr, [01.11.2017].
[7] GOCR open-source character recognition, http://jocr.sourceforge.net, [25.11.2017].
[8] www.gnu.org/software/ocrad/manual/ocrad_manual.html, [10.12.2017].
[9] Review of Linux OCR software, https://www.mathstat.dal.ca/~selinger/ocr-test [01.12.2017].
[10] Linux OCR Software Comparison, httpswww.splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison, [02.12.2017].
Download

Published : 2018-09-30


Łukasik, E., & Zientarski, T. (2018). Comparative analysis of selected programs for optical text recognition . Journal of Computer Sciences Institute, 7, 191-194. https://doi.org/10.35784/jcsi.676

Edyta Łukasik  e.lukasik@pollub.pl
Institute of Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, Poland  Poland
Tomasz Zientarski 
Institute of Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, Poland  Poland