This notebook details how to recognize a sudoku puzzle from a picture. We'll make use of simple image processing algorithms (edge detection, thresholds... ) and character recognition using the K-Nearest Neighbors (KNN) algorithm. It is a very simple but effective algorithm for solving multi-class classification problems. This puzzle matrix is a 9x9 array of known numbers 1-9 or 0s where the number is unknown
This ipython notebook is divided into two parts. The first part is related to computer vision and the second part is related to machine learning. To complete this task it is necessary to use a computer vision library, in this case we are going to use OpenCV library.
The first thing we need to do is to identify the puzzle. We have some challenges: The lines are not perfect, the black grid lines have similar color as a lot of elements in the image or the small squares are difficult to extract.
To simplify the problem we assumed: the puzzle is the biggest square in the image and the puzzle will be orientated reasonably correctly.
In the next picture you can see a sudoku puzzle in a newspaper. To load the image it's used imread.
%matplotlib inline import pylab as pl #import Opencv library try: import cv2 except ImportError: print "You must have OpenCV installed" #load image image_sudoku_original = cv2.imread('../../../data/ocr/sudoku.jpg') #Show Images _=pl.imshow(image_sudoku_original) _=pl.axis("off")
Once we have read the image we have to detect the lines. For this task, It used an adaptativeThreshold to extract the edge of the image (for each pixel in the image take the average value of the surrounding area). This function accepts a gray scale image, with the function cvtColor we change the color space (from RGB to gray scale). In this picture it's possible to see letters, lines or numbers. Light pixels are the paper and dark pixels are the ink.
The adaptativeThreshold function takes as first argument the input image, the second argument returns the binary image, the third argument is the non-zero value assigned to the pixels for which the condition is satisfied. The fourth is the adaptive thresholding algorithm, the fifth argument is the threshold type, the next argument is size of the pixel neighborhood and finally the last argument is a constant that is subtracted from the mean or weight mean.
#gray image image_sudoku_gray = cv2.cvtColor(image_sudoku_original,cv2.COLOR_BGR2GRAY) #adaptive threshold thresh = cv2.adaptiveThreshold(image_sudoku_gray,255,1,1,11,15) #show image _=pl.imshow(thresh, cmap=pl.gray()) _=pl.axis("off")
We have to find the connected contours in the image (using the function findContours). These function returns a vector with the corners of the contours. We'll find the sukodu in this contours.
The assumption is that the sudoku puzzle has 4 sides and it's convex. Checking the number of the contour is equal to four and using the funcion of OpenCV isContourConvex to check if the square is convex. We obtain the possible candidates.
We need to filter the candidates using the assumption: the sudoku puzzle is the biggest square in the image. We calculate the area of the possible contours (contourArea). The biggest square is the sudoku puzzle.
#find the countours contours0,hierarchy = cv2.findContours( thresh, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE) #size of the image (height, width) h, w = image_sudoku_original.shape[:2] #copy the original image to show the posible candidate image_sudoku_candidates = image_sudoku_original.copy() #biggest rectangle size_rectangle_max = 0; for i in range(len(contours0)): #aproximate countours to polygons approximation = cv2.approxPolyDP(contours0[i], 4, True) #has the polygon 4 sides? if(not (len (approximation)==4)): continue; #is the polygon convex ? if(not cv2.isContourConvex(approximation) ): continue; #area of the polygon size_rectangle = cv2.contourArea(approximation) #store the biggest if size_rectangle> size_rectangle_max: size_rectangle_max = size_rectangle big_rectangle = approximation
In the image below it's possible to see the 4 sides of the sudoku puzzle in red
#show the best candidate approximation = big_rectangle for i in range(len(approximation)): cv2.line(image_sudoku_candidates, (big_rectangle[(i%4)], big_rectangle[(i%4)]), (big_rectangle[((i+1)%4)], big_rectangle[((i+1)%4)]), (255, 0, 0), 2) #show image _=pl.imshow(image_sudoku_candidates, cmap=pl.gray()) _=pl.axis("off")
Now we have the sudoku puzzle segmented. We have got the corner points of the puzzle. It's currently not really usable for much. The sudoku puzzle is a bit distorted. It's necessary to correct the skewed perspective of the image. We need a way to mapping from the puzzle in the original picture back into a square. Where each corner of the sudoku puzzle corresponds to a corner on the a new image.
We use a transformation that will map one arbitrary 2D quadrilateral into another. We can use a perspective transformation:
This perspective transformation maps a point $ (x, y) $ in one quadrilateral into a new $ (X, Y) $ in another quadrilateral. These two equations contain 8 unknowns, but we have 8 values. (the corners $x$ and $y$ coordinates of the puzzle). Solving these equations gives us the $a,b,c,d,e,f,g,h$ which provide us with a mapping to get our puzzle out nice and straight.
The OpenCV function getperspectivetransform resolved the perspective transformation. Calculates a perspective transform from four pairs of the corresponding points, where the first parameter are the coordinates of quadrangle vertices in the source image and the second parameter are the coordinates of the corresponding quadrangle vertices in the destination image.
To applies a perspective transformation to an image we used the function warpPerspective. This function transforms the source image using the specified matrix. We obtains the image below.
We need to sort the corner of the sudoku puzzle and then associate each point with the new image dimension.
import numpy as np IMAGE_WIDHT = 16 IMAGE_HEIGHT = 16 SUDOKU_SIZE= 9 N_MIN_ACTVE_PIXELS = 10 #sort the corners to remap the image def getOuterPoints(rcCorners): ar = ; ar.append(rcCorners[0,0,:]); ar.append(rcCorners[1,0,:]); ar.append(rcCorners[2,0,:]); ar.append(rcCorners[3,0,:]); x_sum = sum(rcCorners[x, 0, 0] for x in range(len(rcCorners)) ) / len(rcCorners) y_sum = sum(rcCorners[x, 0, 1] for x in range(len(rcCorners)) ) / len(rcCorners) def algo(v): return (math.atan2(v - x_sum, v - y_sum) + 2 * math.pi) % 2*math.pi ar.sort(key=algo) return ( ar, ar, ar, ar)
The dataset images have 16x16 pixels. The size of the new image will be 144x144, because we have to divide each row and col by 9 and we have to return the same size of the images.
#point to remap points1 = np.array([ np.array([0.0,0.0] ,np.float32) + np.array([144,0], np.float32), np.array([0.0,0.0] ,np.float32), np.array([0.0,0.0] ,np.float32) + np.array([0.0,144], np.float32), np.array([0.0,0.0] ,np.float32) + np.array([144,144], np.float32), ],np.float32) outerPoints = getOuterPoints(approximation) points2 = np.array(outerPoints,np.float32) #Transformation matrix pers = cv2.getPerspectiveTransform(points2, points1 ); #remap the image warp = cv2.warpPerspective(image_sudoku_original, pers, (SUDOKU_SIZE*IMAGE_HEIGHT, SUDOKU_SIZE*IMAGE_WIDHT)); warp_gray = cv2.cvtColor(warp, cv2.COLOR_BGR2GRAY) #show image _=pl.imshow(warp_gray, cmap=pl.gray()) _=pl.axis("off")