dida Logo

Natural Language Processing

Natural Language Processing deals with how to recognize patterns in natural, unstructured text. Think of structured text as data in a database or excel table, for instance a register of names. By unstructured information we mean text in emails, documents, manuals etc. The term natural highlights that the text data has been generated by a human for another human.

Criteria to discover NLP projects

Criteria to discover attractive process automation projects, where visual information plays a crucial role:

Currently the process is cost-intensive and/or a faster decision creates substantial value

A (trained) human could make a good decision mainly based on text

There is enough data available (as a rule of thumb: 500 - 10.000 documents. This, of course, is highly dependent on the use-case)

The last years have seen tremendous improvements with regards to the quality of pattern recognition in unstructured data. The reason for this is next to hardware improvements mainly a group of algorithms, which go by the name of neural nets or deep learning. A key feature of these approaches is that given enough training data, they form their own set of rules in order to achieve a certain goal. This way millions of implicit rules may be defined to successfully recognize even rather complex patterns. In our experience, only by combining know how of internal operations with natural language processing expertise, projects can be framed well. Feel free to approach us with questions, especially whether we deem your project to be technically feasible.

Our process

1. Process evaluation

Together we discuss your process automation projects along three different dimensions: cost savings, strategic value and technical feasibility. After settling for a specific project, we put special emphasis on the needs of the end users.

2. Innovative solutions

We are an experienced team of machine learners. Our algorithms find complicated patterns in unstructured, mostly visual and text data. Once detected, these patterns are the basis for the automation of the underlying process.

3. Decision-support software

We make a point of integrating our customers in the project's code repository as well as in weekly progress meetings. Agility, clean code and a modular program structure help us to deliver easy-to-maintain software, that simply works.

Projects in Natural Language Processing

View all projects

Legal Review of Rental Contracts

Different methods from the field of NLP helped us to create a software that spots errors in legal contracts

Automatic Checking of Service Charge Statements

Semantic Search for Public Administration

dida developed an AI based algorithm to extract relevant information from authority documents

Numeric Attribute Extraction from Product Descriptions

Automatically extract numerical attributes from product descriptions in order to enrich the existing database.

Extracting information from customer requests

In this project we created a model that when given a free form vet appointment reason can extract symptoms, diseases and requested services. This data can then be used by our client to improve scheduling and preparation.

Blog Posts in Natural Language Processing

View all blog posts

Recommendation systems - an overview

By Konrad Mundinger August 29th, 2022

In this blog post, I will give an overview of the underlying basic concepts, common use cases and limitations of recommendation systems. Among other topics, I will discuss content-based and collaborative filtering.

Computer Vision

Image Captioning with Attention

By Madina Kasymova May 31st, 2022

In this article, we examine how an image caption generation pipeline works. In particular, we look at the attention mechanism - a very promising approach to image captioning.

Natural Language Processing

OpenAI Codex: Why the revolution is still missing

By Fabian Gringel February 18th, 2022

In this blog post, I'll explain how Codex from OpenAI works, and in particular how it differs from GPT-3. I will outline why I think it should be used with caution and is not ready yet to revolutionize the software development process.


Ethics in Natural Language Processing

By Marty Oelschläger (PhD) December 20th, 2021

I explain why language models tend to reproduce stereotypes and prejudices with potentially harmful consequences - and how to use them with care.

Natural Language Processing

GPT-3 and beyond - Part 2: Shortcomings and remedies

By Fabian Gringel October 24th, 2021

Here I explain in which situations GPT-3 fails and why it is far from having proper natural language understanding, which approaches can help to mitigate these issues and might lead to the next breakthrough and what alternatives to GPT-3 there are already.

Computer Vision

Data-centric Machine Learning: Making customized ML solutions production-ready

By David Berscheid October 6th, 2021

In this article, we will see why many ML Projects do not make it into production, introduce the concepts of model- and data-centric ML, and give examples how we at dida improve projects by applying data-centric techniques.

Natural Language Processing

GPT-3 and beyond - Part 1: The basic recipe

By Fabian Gringel September 27th, 2021

In this blog article I will explain how GPT-3 works, why some people think it’s dangerous, and how you can try out a GPT-3-like model yourself for free.

Computer Vision

CLIP: Mining the treasure trove of unlabeled image data

By Fabian Gringel June 21st, 2021

Contrastive Language-Image Pretraining (short: CLIP) makes use of image captions to train a zero-shot image classifier. In this blog article I will give a rough non-technical outline of how CLIP works, and I will also show how you can try CLIP out yourself!

Use Cases in Natural Language Processing

View all use cases

Webinars in Natural Language Processing

View all webinars