• Build Tesseract 5 in Conda Environment

    Build Tesseract 5 in Conda Environment

    Here’s a short guide to building Tesseract 5 from source (master branch on GitHub). I’m writing this mainly because conda offers as packages only versions of Tesseract up to 4.1.1 – at least at this moment. The other reason is...


  • Improving Tesseract 4's OCR Accuracy through Image Preprocessing

    Improving Tesseract 4's OCR Accuracy through Image Preprocessing

    In this work I took a look at Tesseract 4’s performance at recognizing characters from a challenging dataset and proposed a minimalistic convolution-based approach for input image preprocessing that can boost the character-level accuracy from 13.4% to 61.6% (+359% relative...


  • Evaluating the Robustness of OCR Systems

    Evaluating the Robustness of OCR Systems

    In this article, I’m going to discuss about my Bachelor’s degree final project, which is about evaluating the robustness of OCR systems (such as Tesseract or Google’s Cloud Vision) when adversarial samples are presented as inputs. It’s somewhere in-between fuzzing...