AI project: Automatic Verification of Service Charge Statements

Starting Point

Our client is an online tenant protection club that provides its customers with the expertise of tenancy lawyers. Many customer inquiries concern the validity of service charge statements.

These may be invalid for various reasons:

Material Errors:
- Cost items not allocable to tenants
- Costs disproportionately high
- Distribution key inadmissible
- ...
Formal errors:
- Invoicing not on time
- Incorrect billing period
- Statement and allocation of costs not comprehensible
- ...

Until now, all incoming service charge statements have been checked manually by tenancy law experts, which is time-consuming and correspondingly cost-intensive.

The goal of the project was to develop AI-powered software that automatically checks service charge statements for these errors.

Challenges

The test should be live, i.e. it may only take a few seconds.
The source data are mostly scanned documents uploaded by customers. Therefore, the software must be robust against poor scan quality.
The results must be interpretable and transparent so that they can be reviewed by a legal expert if necessary.
The logic of the checks should be adaptable to potential future legal changes.

Solution

For the development of the algorithms, we combined methods from the fields of Natural Language Processing and Computer Vision and relied on the following methods, among others:

OCR (automatic character recognition)
Fuzzy string search
Feature Engineering
Neural Networks (R-CNN)
Regular expressions

The developed software checks a three-page service charge statement in about 10 seconds and achieves accuracies of 88-95% for the checks of the different error types. It is thus on par with the performance of a tenancy law expert.

Technical Background

Table extraction

In almost all service charge statements, a large part of the relevant information is summarized in a single table. The recognition and extraction of this table are essential for many steps in the checks.

Since the performance of existing commercial and open source solutions for table extraction was not sufficient in tests (only 60-70% of tables were correctly recognized), we decided to develop our own custom solution:

We use a CascadeTabNet to identify areas of the document where tables are located. This identification takes place exclusively on the image level.
Subsequently, we analyze the positions of the strings within these areas and their relative arrangement to each other in order to recognize columns and rows of the tables and to be able to read out their contents in a structured way.

Using this approach, we were able to increase the accuracy of table recognition to 93%.

Review of the cost items

Based on the extracted table, listed cost items can be read out and evaluated. We want to check whether they can actually be passed on to the tenants.

Due to the often poor quality of uploaded documents, we decided to use an approach that is robust against OCR errors: the individual cost items are compared (as strings) with lists of

known admissible and
known inadmissible positions.

The comparison is done using a fuzzy string search, which outputs a similarity value for a pair of strings to be compared:

>>> fuzz.ratio("cable fees", "cable fees") -> 100 
>>> fuzz.ratio("cable fees", "cable/TV fees") -> 87
>>> fuzz.ratio("cable fees", "property tax") -> 18

Since there are a variety of different algorithms for fuzzy string search (corresponding to different definitions of string similarity), we trained a machine learning classifier to consider and weight multiple types of similarity scores. Based on the associated similarity scores, the classifier makes an estimate of whether a given item is allocable or not:

Contact

If you would like to speak with us about this project, please reach out and we will schedule an introductory meeting right away.

Automatic Checking of Service Charge Statements

Starting Point

Challenges

Solution

Technical Background

Table extraction

Review of the cost items

Contact

Please check the CAPTCHA box.

Thank you, we received your message and will contact you as soon as possible.

Related projects

Automatic Planning of Solar Systems

Predicting Potential Reach of Video Ad Campaigns