Comparative analysis of selected programs for optical text recognition
Edyta Łukasik
e.lukasik@pollub.plInstitute of Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, Poland (Poland)
Tomasz Zientarski
Institute of Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, Poland (Poland)
Abstract
The aim of the article is to compare three programs for the optical text recognition. The problem of the optical text recognition has been defined. Next, briefly the functionality of this technology was described. The most important programs realizing the discussed problem were also characterized. The selected programs were tested using two samples of machine writing in Polish. The speed of the text recognition process was determined. The correctness of characters and words recognition in the analyzed text was also specified.
Keywords:
Optical Character Recognition; OCR; Tesseract; Ocrad; GOCRReferences
[1] Bieniecki, Analiza wymagań dla metod przetwarzania wstępnego obrazów w automatycznym rozpoznawaniu tekstu, http://wbieniec.kis.p.lodz.pl/research/files/05_Bronislawow_OCR.pdf [12.11.2017].
[2] Tobias Blanke, Michael Bryant, Mark Hedges, Open source optical character recognition for historical research, Journal of Documentation 68 (2012), 659-683.
[3] Inad Aljarrah, Osama Al-Khaleel, Khaldoon Mhaidat, Mu’ath Alrefai, Abdullah Alzu’bi, Mohammad Rabab’ah, Automated System for Arabic Optical Character Recognition with Lookup Dictionary, Journal of Emerging Technologies in Web Intelligence 4 (2012), 362-370.
[4] Abbyy Technology Portal, https://abbyy.technology/en:start, [22.11.2017].
[5] The Tesseract open source OCR engine, http://code.google.com/p/tesseract-ocr [20.11.2017].
[6] https://products.aspose.com/ocr, [01.11.2017].
[7] GOCR open-source character recognition, http://jocr.sourceforge.net, [25.11.2017].
[8] www.gnu.org/software/ocrad/manual/ocrad_manual.html, [10.12.2017].
[9] Review of Linux OCR software, https://www.mathstat.dal.ca/~selinger/ocr-test [01.12.2017].
[10] Linux OCR Software Comparison, httpswww.splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison, [02.12.2017].
[2] Tobias Blanke, Michael Bryant, Mark Hedges, Open source optical character recognition for historical research, Journal of Documentation 68 (2012), 659-683.
[3] Inad Aljarrah, Osama Al-Khaleel, Khaldoon Mhaidat, Mu’ath Alrefai, Abdullah Alzu’bi, Mohammad Rabab’ah, Automated System for Arabic Optical Character Recognition with Lookup Dictionary, Journal of Emerging Technologies in Web Intelligence 4 (2012), 362-370.
[4] Abbyy Technology Portal, https://abbyy.technology/en:start, [22.11.2017].
[5] The Tesseract open source OCR engine, http://code.google.com/p/tesseract-ocr [20.11.2017].
[6] https://products.aspose.com/ocr, [01.11.2017].
[7] GOCR open-source character recognition, http://jocr.sourceforge.net, [25.11.2017].
[8] www.gnu.org/software/ocrad/manual/ocrad_manual.html, [10.12.2017].
[9] Review of Linux OCR software, https://www.mathstat.dal.ca/~selinger/ocr-test [01.12.2017].
[10] Linux OCR Software Comparison, httpswww.splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison, [02.12.2017].
Łukasik, E., & Zientarski, T. (2018). Comparative analysis of selected programs for optical text recognition . Journal of Computer Sciences Institute, 7, 191–194. https://doi.org/10.35784/jcsi.676
Authors
Edyta Łukasike.lukasik@pollub.pl
Institute of Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, Poland Poland
Authors
Tomasz ZientarskiInstitute of Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, Poland Poland
Statistics
Abstract views: 292PDF downloads: 243
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.