Home / 
Blog / 
Applying OCR to technical drawings

Applying OCR to technical drawings

Frank Weilandt (PhD)

On May 10th, 2019 in Hands-on Machine Learning

We are currently working on an exciting project where the computer reads numbers and letters in technical drawings. These numbers and letters are used to tag the objects in the drawings. Here are some examples of this type of technical drawings (click on the images to enlarge them):


Companies often gather large amounts of similar drawings over the years, but their automatic interpretation is challenging. For humans it might be obvious which object or part of the drawing a number refers to, but how can we teach a computer to make out the connections? Clearly, this calls for computer vision, and OCR (optical character recognition) in particular.


To simplify our presentation, we take only dots as objects. Now we can turn this into a task which is easy for kids, but more difficult for the computer: Connecting numbered dots. There are thick dots printed on a sheet of paper, each dot has a number next to it. Then one draws lines from one dot to the next, sorted by the given numbers. If this description does not ring a bell, have a look at this Wikipedia entry.


In general, this can be quite a challenging task for a computer: If only a single digit is misinterpreted, sorting the dots becomes impossible. Also, a lot of these images for kids contain some printed lines which are neither dots nor numbers. In the examples here, we restricted ourselves to puzzles created using the website http://www.picturedots.com/. The algorithm assumes that we have at most 99 dots, that each number belongs only to one dot and that the numbers do not intersect in the given image file.


A lot of assumptions. But the overall strategy of separating the dots from the digits and then reading the digits using OCR still makes sense for similar tasks.

Preparations

This notebook uses Python 3. We begin by importing the necessary packages: We use the computer vision library OpenCV (version 3) several times throughout this notebook. The Python-wrapper pytesseract for the Google Tesseract-OCR engine is applied just once in a cell further down. One can also imagine another tool to read each digit, maybe also a classifier you trained yourself adapted to the font used for the digits. Here, luckily, no training was necessary. All the files are read from or written to the directory image_dir. We use Matplotlib to visualize images if show_images is set to True.

import os
import cv2
import pytesseract

show_images = True
image_dir = 'dots'
image_file = os.path.join(image_dir, 'input.png')

im = cv2.imread(image_file, cv2.IMREAD_COLOR)
if show_images:
    import matplotlib.pyplot as plt
    %matplotlib inline
    plt.figure(figsize=(15,10))
    plt.imshow(im);

Boxing dots and numbers

The following function finds bounding boxes for all the objects in the image. Each box is classified as either the bounding box of a dot or a digit. Note that we assume that there are no other objects in the image. If there were more objects, one would need a rule to exclude them. To distinguish digits from dots, we use that a box enclosing a digit has larger height than width. This gives us two lists of boxes: rects_digits around the digits and rects_dots around the dots. The function also returns the thresholded image im_th and the contours ctrs, which can be used for further inspection of the intermediate steps.

def find_dots_and_digits(im):
    im_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
    ret, im_th = cv2.threshold(im_gray, 30, 255, cv2.THRESH_BINARY_INV)
    im2, ctrs, _ = cv2.findContours(im_th, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    rects = [cv2.boundingRect(ctr) for ctr in ctrs]
    rects_digits = []
    rects_dots = []
    for rect in rects:
        if rect[3] > 1.1*rect[2]:
            rects_digits.append(rect)
        else:
            rects_dots.append(rect)
    return (im_th, ctrs, rects_digits, rects_dots)

Using Tesseract

Now, we read the digits using Tesseract OCR and write the output of the OCR into the dictionary rect_to_ocr. There are two important parameters for Tesseract here: Page segmentation mode (PSM) and the language which is expected. We put PSM to number 10. This means that Tesseract treats the input as a single character. This applies here since each box contains exactly one digit.


In our experiments, Tesseract sometimes interpreted a digit as a letter. To avoid this, one could use a character whitelist. But our version of Tesseract 4.0 does not support this feature. Hence, we experimented a bit with the character set used. Telling the software to expect digits or Hebrew letters removed the confusions and correctly identified the digits. If you want to experiment yourself, an explanation of the options of Tesseract can be found here.

im_th, ctrs, rects_digits, rects_dots = find_dots_and_digits(im)
width_boxes_digits = 0
rect_to_ocr = {}

for rect in rects_digits:
    box = 255 - im_th[rect[1]-2:rect[1]+rect[3]+2, rect[0]-2:rect[0]+rect[2]+2]
    text = pytesseract.image_to_string(box, config='--psm 10', lang='heb')
    rect_to_ocr[rect] = int(text)
    width_boxes_digits += rect[2]/len(rects_digits)

Here we zoom into the image to visualize some of the bounding boxes found.

if show_images:
    im_boxes = im.copy()
    for rect in rects_dots:
        cv2.rectangle(im_boxes, (rect[0], rect[1]),
                      (rect[0] + rect[2], rect[1] + rect[3]), (0, 255, 0), 2)
    for rect in rects_digits:
        cv2.rectangle(im_boxes, (rect[0], rect[1]),
                      (rect[0] + rect[2], rect[1] + rect[3]), (0, 0, 255), 2)
    plt.figure(figsize=(20,10))
    plt.imshow(im_boxes[400:800, 1300:1800])


The following cell creates the dictionary numbered_dots. It assigns each number to the midpoint of the closest dot. The dictionary is built as follows: For each dot, one looks for digits which are on the right hand side of the dot and not too far away. This strategy suffices for the kind of images we used. If there are two digits, we sort their bounding rectangles by the x coordinate and combine both digits read into one number.

numbered_dots = {}
for rect_dot in rects_dots:
    rects_close_digits = []
    for rect_digit in rects_digits:
        if 0 < rect_digit[0] - rect_dot[0] < 3 * width_boxes_digits:
            if -5 < rect_dot[1] - rect_digit[1] < 2 * width_boxes_digits:
                rects_close_digits.append(rect_digit)
    rects_close_digits.sort()
    number = rect_to_ocr[rects_close_digits[0]]
    if len(rects_close_digits) == 2:
        number = 10 * number + rect_to_ocr[rects_close_digits[1]]
    midpoint = (rect_dot[0] + rect_dot[2]//2, rect_dot[1] + rect_dot[3]//2)
    numbered_dots[number] = midpoint

It's an apple!

Now we can finally combine all the information and draw the connecting lines into the given image. The final image is written to image_dir.

picture = im.copy()
for n in range(1, len(numbered_dots)):
    cv2.line(picture, numbered_dots[n], numbered_dots[n+1], (255, 0, 0), 5)
cv2.imwrite(os.path.join(image_dir, 'connected_dots.png'), picture)
if show_images:
    plt.figure(figsize=(15,10))
    plt.imshow(picture)

Further reading

August 12th, 2019

Natural Language Processing (short: NLP, sometimes also called Computational Linguistics) is one of the fields which has undergone a revolution since methods from... read more

July 15th, 2019

In the past five to ten years, hardly any topic has seen such a stellar rise in popularity as Deep Learning. Since 2009 the number of Deep Learning papers published per year has more than... read more


Find out what dida can do for you