Recognizing Handwritten Digits with scikit-learn

Handwritten digits recognition is an area of machine learning, in which a machine is trained to identify handwritten digits. Handwritten character recognition is one of the practically important issues in pattern recognition applications. The applications of digit recognition include in postal mail sorting, bank check processing, form data entry, etc. 

In this blog, we are going to recognize handwritten single digits (0-9) using different machine learning algorithms and then a comparison is made to find out the algorithm which provide the best results. Digits is the dataset used here, which consists of 1,797 images of 8*8 pixel each. Each image is a handwritten digit in grayscale.

So, let’s get started.

1)    Firstly, we have to import important libraries to be used in this analysis. The dataset is loaded as shown below:



2)    Since we have loaded, we can now find out the shape of the dataset.

3)    The images of the handwritten digits are contained in a digits.images array. Each element of this array is an image that is represented by an 8x8 matrix of numerical values of 0 to 15.


4)    Now by the following command given below, we will obtain a grayscale image of digit.

5)    The target values are contained in the digits.targets array and the size of the target values is 1797.We can it by running the commands given below:

6)    By using the matplotlib library, we can visualize the images and labels in our Dataset. First 1791 elements of dataset are considered as training set and the remaining last as validation set.



7)    Now we are training the svc estimator that we have defined earlier. And also, we are testing our estimator on the validation set.



This can be clearly seen that the svc estimator has learned correctly, as it has recognized all the six digits of validation set correctly.

 

8)    Now let us see the Scikit-Learn 4-Step Modeling Pattern and use logistic regression to recognize the characters.

Firstly, the splitting of the dataset into training and test set are done.

 

Step-1: Import the model to be used.

Step-2: Make an instance of the model.

Step-3: Train the model.

Step-4: Predict the labels of new data and measure the performance of our model.

9)    Now, we will be using Seaborn for our confusion matrix. Basically, a confusion matrix is a table that is often used to evaluate the accuracy of a classification model.


CONCLUSION:

We can clearly see that our model has been successful in recognizing the handwritten characters with an accuracy of 95.11%.

Comments

Popular posts from this blog

Performing Analysis of Meteorological Data