Hello again; I’m back - once again sacrificing my time for homework so I can publish something that I find more interesting.

So if anyone is still reading this: the whole article is about OCR (which stands for Optical Character Recognition) by using a method called Naive Bayes (aka the probabilistic approach). Considering the amount of information about this subject I thought some code/example might come in handy.

// ps: if you’re still wondering, it’s the same Bayes from the math class.

About Naive Bayes

This is a method that is used to solve classification problems. Basically OCR can be classified as a classification problem (see what I did there? :D). Here I’m going to explain this in a generic manner and then add the OCR part.

The idea is this: given an input, the program should be able to match it with a known output by taking into account various properties and comparing them to the ones that we used for training.

Naive Bayes implies a probabilistic approach; so it outputs a bunch of probabilities that our object is, in fact, an object that it already knows.

It is all based on Bayes’ theorem which looks like this:

\[P(A|B,C) = \frac{P(A,B,C)}{P(B,C)} = \frac{P(A,B,C)}{P(A)} \cdot \frac{P(A)}{P(B,C)} = \frac{P(A)\cdot P(B,C|A)}{P(B,C)}\]

However, this theorem is supposed to work with independent variables, and we can’t always guarantee that. So, it is incorrect to expand the denominator like this:

\[P(B,C) = P(B) \cdot P(C)\]

You can try it but you may get probabilities higher than 1 for some cases.

To avoid this, it might be better to think this in terms of likelihoods and to use this property:

\[P(A|B,C) \propto P(A) \cdot P(B|A) \cdot P(C|A)\]

The equation above says that the likelihood is directly proportional to the probability itself, so we can consider only the likelihood when we do the classification (highest likelihood = highest chance).

You can later convert the likelihoods to probabilities between 0 and 1 - I’ll explain below.

Implementing a basic OCR

Moving on, it should be clear by now how our program should look:

  • take a set of training images
  • train using each image by taking the color of each pixel (where white color = ~P and black = P)
  • for each unknown image: using its pixels compute the likelihood it resembles an image that was used during the training procedure

Something like this:

Naive Bayes model for Basic OCR

Naive Bayes model for Basic OCR

Where P(one|…) is the probability the image matches the one I used to train the classifier.

Computing the likelihoods

First, there’s going to be one big table containing both the data from the images and some additional sums that we’ll need in order to compute some probabilities - usually it’s called Frequency Table.

The table looks like this:

onetwothree...
P10 or 10 or 10 or 1...
P20 or 10 or 10 or 1...
P30 or 10 or 10 or 1...
...............
sums_colssum_col#1sum_col#2sum_col#3...


Where table[P1][one] is:

  • 1 if the training image containing an 1 has the first pixel(P1) colored black (so the pixel contributes to drawing the digit)
  • 0 if the pixel is white

sum_col#1 means the sum of the first (current) column.

We need these in order to compute some probabilities while applying Bayes’ theorem.

Once all the sums are computed we can start determining likelihoods.

Now, we implement the Bayes’ theorem and apply the data from the table.


Intermission.

The formula, for an image with 3 pixels, should look like this:

\[P(one|P1, P2, \sim P3) = P(one) \cdot P(P1|one) \cdot P(P2 | one) \cdot (1 - P(P3|one))\]

Where:

P(one) = 1 / number_of_digits_we_are_training_for

P(P1|one) = table[P1][one] / sum_col#1

P(~P3|one) = 1 - table[P3][one] / sum_col#1

^ ehm…these are actually likelihoods, not probabilities. Don’t get confused.


Now the real deal - applying this for N_OF_IMAGES images, each one containing IMG_SIZE * IMG_SIZE pixels.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// for each known image
for (int k = 0; k < N_OF_IMAGES; k++)
{
    // determining P(one) / P(two) / P(three) 
    likelihoods[k] = 1 / (double)N_OF_IMAGES;

    int crtPixel = 0;

    for (int i = 0; i < IMG_SIZE; i++)
        for (int j = 0; j < IMG_SIZE; j++)
        {
            // getting the current pixel's position in the table
            crtPixel = i * IMG_SIZE + j;
            
            // applying the formula
            likelihoods[k] *= (testBmp.GetPixel(i, j).ToArgb() != Color.White.ToArgb() ? frequencies[crtPixel, k] / frequencies[N_OF_PIXELS, k] : (1 - frequencies[crtPixel, k] / frequencies[N_OF_PIXELS, k]));
        }

    // I use this to convert from likelihoods to probabilities (0-1)
    totalLikelihood += likelihoods[k];
}

Short note:

You can get the probability for each image by doing: likelihoods[k] / totalLikelihood.


Laplace smoothing

Given the fact that the frequency table may contain 0’s (not all pixels are black), we need to apply some smoothing to prevent working with null probabilities in our formula.

We add some “fake” inputs in the frequency table - kind of like a black background image for each digit. If it makes it easier think of it as adding 1 to the numerator and n to the denominator (n = number of pixels) when computing probabilities.

So…if in our frequency table there’s something like: table[P1][one] == 0 and sum_col#1 == 10, then we’d get P(P1|one) = 0/10 = 0

Now, with Laplace smoothing: table[P1][one] == 1 and sum_col#1 == 10 + n (let’s say n = 100 pixels, because the images I’m using are 10x10). Notice that P(P1|one) = 1/(10+100) = 0.009.

Results

These were the inputs

Images used for Training and Testing the Naive Bayes model

Images used for Training and Testing the Naive Bayes model

And the outputs:

The results achieved by the current implementation

The results achieved by the current implementation

Complete Sourcecode

The sourcecode that I used in this article; I guess there’s no need for additional documentation.

Final note: to me it seems to be working, but it’s the first time I’m coding stuff like this, so… it might contain mistakes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
using System;
using System.Drawing;
using System.IO;

namespace Bayes_Classifier
{
    class Program
    {
        static void Main(string[] args)
        {
            string[] files = Directory.GetFiles(AppDomain.CurrentDomain.BaseDirectory + "/train/", "*.png");

            int N_OF_IMAGES = files.Length;
            int IMG_SIZE = 10; // 10x10 image
            int N_OF_PIXELS = IMG_SIZE * IMG_SIZE;

            double[,] table = new double[N_OF_PIXELS+1, N_OF_IMAGES];

            // Laplace smoothing
            for (int i = 0; i < N_OF_IMAGES; i++)
                for (int j = 0; j < N_OF_PIXELS; j++)
                    table[j, i] = 1;

            Bitmap bmp = null;

            // copying data in the frequency table
            for (int i = 0; i < N_OF_IMAGES; i++)
            {
                bmp = (Bitmap)Image.FromFile(files[i]);

                for (int j = 0; j < IMG_SIZE; j++)
                {
                    for (int k = 0; k < IMG_SIZE; k++)
                        if (bmp.GetPixel(j, k).ToArgb() != Color.White.ToArgb())
                            table[j * IMG_SIZE + k, i]++;
                }
            }

            // -- computing the sums
            for (int i = 0; i < N_OF_PIXELS; i++)
            {
                for (int j = 0; j < N_OF_IMAGES; j++)
                    table[N_OF_PIXELS, j] += table[i, j];
            }


            string[] testFiles = Directory.GetFiles(AppDomain.CurrentDomain.BaseDirectory + "/test/", "*.png");

            for (int testFileIndex = 0; testFileIndex < testFiles.Length; testFileIndex++)
            {
                Console.WriteLine("File: " + testFiles[testFileIndex]);
                Bitmap testBmp = (Bitmap)Image.FromFile(testFiles[testFileIndex]);

                double[] likelihoods = new double[N_OF_IMAGES];
                double totalLikelihood = 0;

                for (int k = 0; k < N_OF_IMAGES; k++)
                {
                    likelihoods[k] = 1 / (double)N_OF_IMAGES;

                    int crtPixel = 0;

                    for (int i = 0; i < IMG_SIZE; i++)
                        for (int j = 0; j < IMG_SIZE; j++)
                        {
                            crtPixel = i * IMG_SIZE + j;
                            likelihoods[k] *= (testBmp.GetPixel(i, j).ToArgb() != Color.White.ToArgb() ? table[crtPixel, k] / table[N_OF_PIXELS, k] : (1 - table[crtPixel, k] / table[N_OF_PIXELS, k]));
                        }

                    totalLikelihood += likelihoods[k];
                }


                for (int i = 0; i < N_OF_IMAGES; i++)
                    Console.WriteLine((i + 1) + " : " + likelihoods[i] / totalLikelihood);

                Console.WriteLine();
            }
            Console.ReadLine();

        }
    }
}