Home / 
Blog / 
Connecting the dots

Connecting the dots

On May 10th, 2019 by Frank Weilandt in Hands-on Machine Learning

We are currently working on an exciting project where the computer reads numbers in technical drawings. These numbers are used to tag the objects in the drawings. Companies often gather large amounts of similar drawings over the years, but their automatic interpretation is challenging. To simplify our presentation, we take only dots as objects.


Now we can turn this into a task which is easy for kids, but more challenging for the computer: Connecting numbered dots. There are thick dots printed on a sheet of paper, each dot has a number next to it. Then one draws lines from one dot to the next, sorted by the given numbers. If this description does not ring a bell, have a look at this Wikipedia entry.


In general, this can be quite a challenging task for a computer: If only a single digit is misinterpreted, sorting the dots becomes impossible. Also, a lot of these images for kids contain some printed lines which are neither dots nor numbers. In the examples here, we restricted ourselves to puzzles created using the website http://www.picturedots.com/. The algorithm assumes that we have at most 99 dots, that each number belongs only to one dot and that the numbers do not intersect in the given image file.


A lot of assumptions. But the overall strategy of separating the dots from the digits and then reading the digits using OCR still makes sense for similar tasks.

Preparations

This notebook uses Python 3. We begin by importing the necessary packages: We use OpenCV (version 3) several times throughout this notebook. The Wrapper pytesseract for the OCR tool Tesseract is applied just once in a cell further down. One can also imagine another tool to read each digit, maybe also a classifier you trained yourself adapted to the font used for the digits. Here, luckily, no training was necessary. All the files are read from or written to the directory image_dir. We use Matplotlib to visualize images if show_images is set to True.

import os
import cv2
import pytesseract

show_images = True
image_dir = 'dots'
image_file = os.path.join(image_dir, 'input.png')

im = cv2.imread(image_file, cv2.IMREAD_COLOR)
if show_images:
    import matplotlib.pyplot as plt
    %matplotlib inline
    plt.figure(figsize=(15,10))
    plt.imshow(im);

Boxing dots and numbers

The following function finds bounding boxes for all the objects in the image. Each box is classified as either the bounding box of a dot or a digit. Note that we assume that there are no other objects in the image. If there were more objects, one would need a rule to exclude them. To distinguish digits from dots, we use that a box enclosing a digit has larger height than width. This gives us two lists of boxes: rects_digits around the digits and rects_dots around the dots. The function also returns the thresholded image im_th and the contours ctrs, which can be used for further inspection of the intermediate steps.

def find_dots_and_digits(im):
    im_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
    ret, im_th = cv2.threshold(im_gray, 30, 255, cv2.THRESH_BINARY_INV)
    im2, ctrs, _ = cv2.findContours(im_th, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    rects = [cv2.boundingRect(ctr) for ctr in ctrs]
    rects_digits = []
    rects_dots = []
    for rect in rects:
        if rect[3] > 1.1*rect[2]:
            rects_digits.append(rect)
        else:
            rects_dots.append(rect)
    return (im_th, ctrs, rects_digits, rects_dots)

Using Tesseract


Now, we read the digits using Tesseract OCR and write the output of the OCR into the dictionary rect_to_ocr. There are two important parameters for Tesseract here: Page segmentation mode (PSM) and the language which is expected. We put PSM to number 10. This means that Tesseract treats the input as a single character. This applies here since each box contains exactly one digit.


In our experiments, Tesseract sometimes interpreted a digit as a letter. To avoid this, one could use a character whitelist. But our version of Tesseract 4.0 does not support this feature. Hence, we experimented a bit with the character set used. Telling the software to expect digits or Hebrew letters removed the confusions and correctly identified the digits. If you want to experiment yourself, an explanation of the options of Tesseract can be found here.

im_th, ctrs, rects_digits, rects_dots = find_dots_and_digits(im)
width_boxes_digits = 0
rect_to_ocr = {}

for rect in rects_digits:
    box = 255 - im_th[rect[1]-2:rect[1]+rect[3]+2, rect[0]-2:rect[0]+rect[2]+2]
    text = pytesseract.image_to_string(box, config='--psm 10', lang='heb')
    rect_to_ocr[rect] = int(text)
    width_boxes_digits += rect[2]/len(rects_digits)

Here we zoom into the image to visualize some of the bounding boxes found.

if show_images:
    im_boxes = im.copy()
    for rect in rects_dots:
        cv2.rectangle(im_boxes, (rect[0], rect[1]),
                      (rect[0] + rect[2], rect[1] + rect[3]), (0, 255, 0), 2)
    for rect in rects_digits:
        cv2.rectangle(im_boxes, (rect[0], rect[1]),
                      (rect[0] + rect[2], rect[1] + rect[3]), (0, 0, 255), 2)
    plt.figure(figsize=(20,10))
    plt.imshow(im_boxes[400:800, 1300:1800])


The following cell creates the dictionary numbered_dots. It assigns each number to the midpoint of the closest dot. The dictionary is built as follows: For each dot, one looks for digits which are on the right hand side of the dot and not too far away. This strategy suffices for the kind of images we used. If there are two digits, we sort their bounding rectangles by the x coordinate and combine both digits read into one number.

numbered_dots = {}
for rect_dot in rects_dots:
    rects_close_digits = []
    for rect_digit in rects_digits:
        if 0 < rect_digit[0] - rect_dot[0] < 3 * width_boxes_digits:
            if -5 < rect_dot[1] - rect_digit[1] < 2 * width_boxes_digits:
                rects_close_digits.append(rect_digit)
    rects_close_digits.sort()
    number = rect_to_ocr[rects_close_digits[0]]
    if len(rects_close_digits) == 2:
        number = 10 * number + rect_to_ocr[rects_close_digits[1]]
    midpoint = (rect_dot[0] + rect_dot[2]//2, rect_dot[1] + rect_dot[3]//2)
    numbered_dots[number] = midpoint

It's an apple!


Now we can finally combine all the information and draw the connecting lines into the given image. The final image is written to image_dir.

picture = im.copy()
for n in range(1, len(numbered_dots)):
    cv2.line(picture, numbered_dots[n], numbered_dots[n+1], (255, 0, 0), 5)
cv2.imwrite(os.path.join(image_dir, 'connected_dots.png'), picture)
if show_images:
    plt.figure(figsize=(15,10))
    plt.imshow(picture)

Further reading

May 24th, 2019

This post presents some key learnings from our work on identifying roofs on satellite images. Our aim was to develop a planing tool for the placement of solar panels on roofs. For this purpose... read more

April 23rd, 2019

There is a growing demand for automatically processing letters and other documents. Of course, modern OCR (optical character recognition) methods can digitize the text. But the next step... read more


Find out what dida can do for you