© unsplash/@nci

© unsplash/@nci
Healthcare & Pharma

Information extraction for electronic health records

Context

The digitization of the healthcare sector is a crucial task in order to improve processes for important time sensitive medical decisions and to relieve medical personnel from documentation duties. However, many patient records are still handwritten and in non-standardized form, what encourages the implementation of a machine learning model, that is capable of producing automatically digitized and standardized patient records.

Challenges

In case of handwritten records, a major challenge lies in the digitization of handwritten documents and to choose the right OCR tool in order to get accurate digital representations of the health records. However, handwriting is still a major challenge in OCR due to the diversity in handwritings. A low quality output of the OCR tool would therefore prevent all further processing.

If the record is digital, doctors edit a lot of information unstructured formats such as free text. Moreover, abbreviations are very common but not consistently used across physicians.

To analyze and visualize the information about cases, these free text entries need to be converted to structured data.

Potential solution approaches

For the text extraction from handwritten documents, the Google Cloud Vision OCR tool is by now the only viable option. If the OCR outputs us of sufficient quality, the recognized text data can be analyzed like the digitized free text entries.

To analyze the free text entries from both sources, the text input needs to be contextualized to the case and match entries with common terminology and abbreviations ("dictionaries") to produce structured and machine readble data. For this, a labeled dataset needs to be created based on expert knowledge from physicians.

The machine learning model extracts the relevant information from the text data using Natural Language Processing (NLP) techniques, such as word embedding, naive Bayes classifiers and TF-IDF algorithms. These methods allow the model to understand the relation between words and sentences and the underlying meaning by training with the labeled text data.

If the free text input data is structured, it can be visualized, exploratory data analysis can be applied or relationships can be mapped, e.g. with graph neural networks.

Related webinars

Text recognition (OCR) - The first step on the way to a successful implementation of an NLP project

In this talk we will deal with the topic of text recognition.

Ewelina Fiebig

Machine Learning Scientist

Fabian Gringel

Machine Learning Scientist

Labeling Tools - The second step on the way to the successful implementation of an NLP project

The success of an NLP project consists of a series of steps from data preparation to modeling and deployment. Since the input data are often scanned documents, the data preparation step initially involves the use of text recognition tools (OCR for short) and later on also the use of so-called labeling tools. In this webinar we will deal with the topic of selecting a suitable labeling tool.

Ewelina Fiebig

Machine Learning Scientist

Fabian Gringel

Machine Learning Scientist

Semantic search and understanding of natural text with neural networks: BERT

In this webinar you will get an introduction to the application of BERT for Semantic Search using a real case study: Every year millions of citizens interact with public authorities and are regularly overwhelmed by the technical language used there. We have successfully used BERT to deliver the right answer from government documents with the help of colloquial queries - without having to use technical terms in the queries.

Konrad Schultka

Machine Learning Scientist

Jona Welsch

Machine Learning Scientist

Automated answering of questions with neural networks: BERT

In this webinar we will present a method based on the BERT model for automated answering of questions.

Mattes Mollenhauer

Machine Learning Scientist

Recurrent neural networks: How computers learn to read

The webinar will give an introduction to the functioning of RNNs and illustrate their use in an example project from the field of legal tech

Fabian Gringel

Machine Learning Scientist