Automatic Checking of Service Charge Statements

Our software protects tenants against excessive service charge bills.

Input:Scanned service charge statement
Output:    Assessment of the validity of the cost items
Goal:Lower the cost of the review process

Key Facts

10 Seconds
for a complete review
88-95% Accuracy
on the individual tasks

Starting Point

Our client is an online tenant protection club that provides its customers with the expertise of tenancy lawyers. Many customer inquiries concern the validity of service charge statements.

These may be invalid for various reasons:

  • Material Errors:

    • Cost items not allocable to tenants

    • Costs disproportionately high

    • Distribution key inadmissible

    • ...

  • Formal errors:

    • Invoicing not on time

    • Incorrect billing period

    • Statement and allocation of costs not comprehensible

    • ...

Until now, all incoming service charge statements have been checked manually by tenancy law experts, which is time-consuming and correspondingly cost-intensive. 

The goal of the project was to develop AI-powered software that automatically checks service charge statements for these errors.

Not all costs may be passed on to the tenants

Not all costs may be passed on to the tenants

Challenges

  • The test should be live, i.e. it may only take a few seconds.

  • The source data are mostly scanned documents uploaded by customers. Therefore, the software must be robust against poor scan quality.

  • The results must be interpretable and transparent so that they can be reviewed by a legal expert if necessary.

  • The logic of the checks should be adaptable to potential future legal changes.

Solution

For the development of the algorithms, we combined methods from the fields of Natural Language Processing and Computer Vision and relied on the following methods, among others:

  • OCR (automatic character recognition)

  • Fuzzy string search

  • Feature Engineering

  • Neural Networks (R-CNN)

  • Regular expressions

The developed software checks a three-page service charge statement in about 10 seconds and achieves accuracies of 88-95% for the checks of the different error types. It is thus on a par with the performance of a tenancy law expert.

Wolf Winkler

Principal Consultant - AI, Automation and Digital Innovation

wolf.winkler@dida.do

Technical Details

Table extraction

In almost all service charge statements, a large part of the relevant information is summarized in a single table. The recognition and extraction of this table are essential for many steps in the checks.

Since the performance of existing commercial and open source solutions for table extraction was not sufficient in tests (only 60-70% of tables were correctly recognized), we decided to develop our own custom solution:

  1. We use a CascadeTabNet to identify areas of the document where tables are located. This identification takes place exclusively on the image level.

  2. Subsequently, we analyze the positions of the strings within these areas and their relative arrangement to each other in order to recognize columns and rows of the tables and to be able to read out their contents in a structured way.

Using this approach, we were able to increase the accuracy of table recognition to 93%.

Review of the cost items 

Based on the extracted table, listed cost items can be read out and evaluated. We want to check whether they can actually be passed on to the tenants.

Due to the often poor quality of uploaded documents, we decided to use an approach that is robust against OCR errors: the individual cost items are compared (as strings) with lists of

  1. known admissible and 

  2. known inadmissible positions.

The comparison is done using a fuzzy string search, which outputs a similarity value for a pair of strings to be compared:

>>> fuzz.ratio("cable fees", "cable fees") -> 100 
>>> fuzz.ratio("cable fees", "cable/TV fees") -> 87
>>> fuzz.ratio("cable fees", "property tax") -> 18


Since there are a variety of different algorithms for fuzzy string search (corresponding to different definitions of string similarity), we trained a machine learning classifier to consider and weight multiple types of similarity scores. Based on the associated similarity scores, the classifier makes an estimate of whether a given item is allocable or not:

Get quarterly AI news

Receive news about Machine Learning and news around dida.

Successfully signed up.

Valid email address required.

Email already signed up.

Something went wrong. Please try again.

By clicking "Sign up" you agree to our privacy policy.

dida Logo