Public administration in Germany suffers from a shortage of young talent. According to a McKinsey study, this shortage will amount to 731.000 employees by 2030. At the same time, public authorities are often recognized as old fashioned regarding their work processes. Digitalizing public services could save costs, reduce waiting times and maintain a satisfactory service level even with a reduced number of employees. One hurdle to overcome is the administrative jargon which is often not intuitive to citizens.
The automation of public and governmental services requires an user interface, where users can query their needs and the service provides all related relevant information. This can be quite challenging, since the user queries contain informal formulations of the customer needs, which may differ strongly from the administrative jargon. Thus, the automated service has to understand the meaning of the queries and translate them to administrative jargon.
For example, the registration of a company requires an industry code. The official description of this industry code may be very different from the business purpose that a user formulates in free text. For example, if a user wants to register a beauty salon, he/she could enter 'I paint nails' and should find the right business code for beauty salons.
This is especially challenging if a business code includes a number of very diverse services. For instance, the business code 'Miscellaneous other services' includes piercing studios as well as facility management. Traditional keyword based approaches are therefore not promising to deliver good results.
The intention of the user queries can be resolved by a model based on the pre-trained BERT model (bidirectional encoder representations from transformers), developed by Google. The key advantage of the BERT model is that it is able to recognize semantically similar words and expressions by considering their context. For example, it knows that a 'car' is also a 'motor vehicle' in this context. The base model is then further specialized for the specific context by training it on problem-specific labeled data. In this case, this data consists of past user input that is manually mapped to the correct industry code.
Receive news about Machine Learning and news around dida.
Successfully signed up.
Valid email address required.
Email already signed up.
Something went wrong. Please try again.