© unsplash@thisisengineering

© unsplash@thisisengineering
Manufacturing & Automotive

Information extraction from technical manuals


Technical manuals provide the basis for technicians to maintain production plans, cars or virtually every type of machinery. For engineers and mechanics, it is often difficult to find the relevant information for the task at hand, as most manuals or technical drawings are either paper-based or pdf files. If the files are available as pdf, it remains a challenge to find relevant information as manuals often have tens or hundreds of pages and searching within documents is keyword based without further semantic understanding.


Most mechanical work implies working with machines from different manufacturers. As there is no uniform format and terminology across manufacturers, it is often challenging to find the same type of information for models of different manufacturers. Moreover, even for one manufacturer, format and terminology might change over time, especially if the machinery is long-lived.

Another challenge is that content within manuals come in very different formats such as tabular data, free text, bullets lists or drawings. Since the information mechanics are looking for is often found in a combination of these formats, a solution solely based on NLP algorithms might not deliver satisfactory results. Moreover, training data might be scarce as some parts might be prevalent only in a small fraction of models.

Potential solution approaches

Imagine the focus lies on the NLP part of the project, it is an option to train a model based on generic contextualized word embeddings such as BERT. BERT has the advantage of including synonyms and context within free text and strings compared to traditional word embeddings such as Word2Vec and fasttext and is state-of-the-art in modern NLP applications. However, it could be shown that for very specific domains such as engineering, domain specific word embeddings might increase the model performance.

A (domain specific) BERT embedding can then trained to perform question answering, a technique to allow mechanics to ask natural language questions to receive highlighted areas of the manual concerning this particular tasks.

Related Case Studies

Natural Language Processing

Legal review of rental contracts

Different methods from the field of NLP helped us to create a software that spots errors in legal contracts.
Our solution

Related webinars

Automated answering of questions with neural networks: BERT

In this webinar we will present a method based on the BERT model for automated answering of questions.

Mattes Mollenhauer

Machine Learning Scientist

Labeling Tools - The second step on the way to the successful implementation of an NLP project

The success of an NLP project consists of a series of steps from data preparation to modeling and deployment. Since the input data are often scanned documents, the data preparation step initially involves the use of text recognition tools (OCR for short) and later on also the use of so-called labeling tools. In this webinar we will deal with the topic of selecting a suitable labeling tool.

Ewelina Fiebig

Machine Learning Scientist

Fabian Gringel

Machine Learning Scientist

Semantic search and understanding of natural text with neural networks: BERT

In this webinar you will get an introduction to the application of BERT for Semantic Search using a real case study: Every year millions of citizens interact with public authorities and are regularly overwhelmed by the technical language used there. We have successfully used BERT to deliver the right answer from government documents with the help of colloquial queries - without having to use technical terms in the queries.

Konrad Schultka

Machine Learning Scientist

Jona Welsch

Machine Learning Scientist

Recurrent neural networks: How computers learn to read

The webinar will give an introduction to the functioning of RNNs and illustrate their use in an example project from the field of legal tech

Fabian Gringel

Machine Learning Scientist