Our client

Semantic Search for Public Administration

Our algorithm helps citizens through the bureaucracy of registering a business.

Input:Free-text business descriptions
Output:    The according industry codes
Goal:Simplify business registration for citizens

Starting Point

Citizens registering a new business in Germany have to provide an industry code along with their registration. This industry code is chosen from a list of over 800 different codes, each described and defined in complicated “public administration language”. Finding the correct code from all these options is hard, especially if someone is not accustomed to the language used in these descriptions.

Our client PublicPlan offers a chatbot that enables citizens to access public administration services. They wanted to enhance its functionality by offering an intuitive search function for industry codes.

The solution is implemented as a chatbot

The solution is implemented as a chatbot

Registering a new business can be tricky

Registering a new business can be tricky

Challenges

The input to the algorithm developed here should be the citizen’s free-text description of the business he or she wants to register.

As authorities often use words and turns of phrases differing widely from colloquial language, finding the correct industry code for a specific business registration is a non-trivial task.

Therefore, simple text search algorithms are not sufficient to find the correct industry code.

Solution

We adapted and trained an AI architecture especially suited for Natural Language Processing tasks to solve the task. For a given colloquial business description, the trained model suggests the relevant industry codes.

The training data for the AI were historical colloquial business descriptions and corresponding business codes. This data was provided by the client.

The final product was deployed as a functionality of a chatbot, which the client already had. Our solution is flexible and easy to maintain: New industry codes can be integrated with very little effort.

Below you can see three example outputs of the algorithm for different business descriptions. Because Machine Learning models for Natural Language Processing are usually language-specific, the example below is in German.

Wolf Winkler

Principal Consultant - AI, Automation and Digital Innovation

wolf.winkler@dida.do

Technical Details

The Client's Requirements

The client had an existing chatbot solution, which could be used as an interface so that citizens can type in their business description using colloquial language and receive the five most relevant industry codes as a response from the chatbot.

The chatbot solution allows routing different user questions to corresponding API endpoints, meaning that we received the colloquial business descriptions written by users as API calls. Our algorithm was supposed to create a response to these API calls containing the 5 most relevant industry codes.

There was an existing solution using basic word embeddings, which often showed unsatisfactory results. This indicated that a better semantic understanding of the definitions and descriptions of different industry codes as well as the colloquial descriptions of a business was needed.

Our Solution

Backend: Python, spaCy, PyTorch, NumPy, Pandas, fastAPI, Pydantic, Docker, Elasticsearch
Infrastructure: GCloud (Training), Git, dvc, tensorboard

The basis for our algorithm is a version of BERT, a neural network architecture developed by Google, which was already pretrained on a large german text corpus. We finetuned the existing layers and enriched BERTs output with custom features. By adding a few new layers and postprocessing steps we were able to build a text classifier, which leverages the full semantic capabilities of BERT, while being performant enough to run on a CPU.

This final classifier was then trained using historical business description - industry code pairs. It takes a business description as input and outputs a relevance score for each industry code. The 5 highest-ranking industry codes are then sent back as a response to the original user request in ranked order.

Get quarterly AI news

Receive news about Machine Learning and news around dida.

Successfully signed up.

Valid email address required.

Email already signed up.

Something went wrong. Please try again.

By clicking "Sign up" you agree to our privacy policy.

dida Logo