Different methods from the field of NLP helped us to create software that spots errors in rental contracts.
Input: | Scan or photo of a rental contract |
Output: | Effectiveness of the cosmetic repair clause |
Goal: | Automate the review process in an explainable way |
Our client, MieterEngel, is an online tenant protection club that offers its clients the legal review of their tenancy agreements by experienced partner lawyers. The aim of the project is to develop an intelligent software tool that supports lawyers in analyzing contracts.
Until now, the process has been very labour-intensive. Customers contact the tenant protection club with photos of their rental contract. The MieterEngel service team checks them for legibility. If they are legible, they are forwarded to partner lawyers. They have to read the entire contract carefully in order to prepare an expert opinion tailored to the client's questions.
Overall, the process is...
slower (separate readability check by service team)
more expensive (use of partner lawyers for tasks that require little legal expertise)
and more prone to error (overlooking relevant provisions)
...than it should be.
The goal of the project was to develop a software tool that automates repetitive workflows and allows the attorney and service team to focus on their areas of expertise. The use of artificial intelligence, i.e. self-learning algorithms, was the obvious choice.
First, however, a number of open questions had to be answered, including the following:
Scope: Which areas can be advantageously automated and where must or should the lawyers' expertise be used instead? How can both spheres be optimally interlocked?
Transparency: A "black box" algorithm that does not arrive at its assessments in a transparent way would not be accepted by lawyers as users. How can we present the decision-making process of the AI in such a transparent way that it is comprehensible for the users?
Legal situation: The legal situation to be represented by the algorithm is by no means unambiguous. How can the software take into account contradictory judgements on precedents and differing legal opinions?
In the course of the project, we have developed a program that fully automates all steps from the evaluation of the quality of the document image to the decision on the admissibility of contractual regulations. The results of the individual steps are prepared in such a way that the user can understand the decision-making process and adapt it if necessary.
The implementation took place within an agile project framework (MVP after 6 months, continuous user feedback). Together with the client's IT department and Tobias Sterbak, we implemented the solution as a web application, which has various advantages:
fast roll-out of improvements
unlimited addition of users
independence from the operating system, etc.
See the video below to get an impression of the workflow (the software is in German only).
The central element of our client's service is an online platform on which customers can upload their digitized documents and partner lawyers can then retrieve them.
The platform allows the lawyer to completely process the requests in the frontend, i.e. to read rental agreements, formulate expert opinions, and communicate with clients.
In principle, the platform can be expanded with additional modules that optimize or even automate existing processes.
Background: Description of the previous process
In order to be able to initiate the review of a customer rental agreement, the client first needs scans or photos of the document. They are checked for legibility by the client's service team and then made available to a partner lawyer. If they are not legible, the client will receive a message asking him to improve the quality of the image.
The lawyer examines the contract for relevant information and wording. Since rental agreements generally do not follow a given scheme, but their structure can vary greatly, the lawyer must read the entire agreement carefully. For example, the contract provisions relevant to the renovation obligation are often spread over the entire length of the contract. The workload for the lawyer is therefore relatively high, even if his legal expertise is effectively only required to evaluate a few relevant sentences. In addition, even experienced tenancy law experts occasionally overlook individual contractual provisions.
In order to inform the client of the result of his review, the partner lawyer prepares an expert opinion. Since inquiries and explanations repeat, it is beneficial to use text modules, which must however be manually selected and inserted by the lawyer.
Available input: scan or photograph of the rental agreement as a PDF file.
Desired output: Explained and textually substantiated assessment of the effectiveness of the cosmetic repair clause.
Technically we had to meet the following challenges:
At the beginning of the project, we only had unlabeled data. For high-quality annotations, we had to involve not only data scientists but also tenancy law experts.
We only had a few hundred rental contracts available for training purposes. At the same time, the algorithms had to learn relatively complex patterns.
Among the PDF documents, there were many photographs of poor quality taken with a smartphone. Many rental contracts had handwritten additions. However, the software should also work under these circumstances.
Backend: Python, TensorFlow, Keras, spaCy, NLTK, scikit-learn, Flask
Infrastructure: GCloud (Training and Google Cloud Vision), Docker, brat rapid annotation tool
In the following, we present solutions to some selected partial problems.
In order to apply supervised learning algorithms, we developed an annotation scheme that systematically includes all information relevant for the decision on effectiveness. The scheme was developed iteratively in joint dida workshops with selected tenancy law experts and evaluated continuously. It was important that the relationships of the contract components determined by the structure of the annotations correspond as closely as possible to the common legal logic.
Based on the annotation scheme developed, the decision-making process could be divided into two steps: Firstly, a classification of the sentences that make up the contract text, which results in the determination of the legal structure of the contract text. The results of the classification can be easily displayed in the contract text (e.g. by colour-coded) and thus checked by the user. Secondly, this is followed by a rule-based (and thus completely transparent and individually modifiable according to legal opinion) evaluation of this information.
The depth of information of the classification results also allows automated suggestions of text modules for the lawyer's reply, so that in the ideal case the lawyer only has to check and confirm the suggestions of the algorithm.
Recurrent neural networks (RNNs) are suitable for text classification. After some experiments with different model architectures, we decided on a Bi-LSTM (bidirectional Long short-term memory), which we found to perform superior to the alternative approaches. Due to the small data base, we used pre-trained word embeddings. In addition, we supplemented the model with the attention mechanism, which, according to current research results, significantly increases the performance of RNNs. This has been confirmed in the present case.
We would be happy to present more detailed information to you in a personal meeting.
A selection of projects we have done
We developed a multi-level security system with facial recognition for automatic access control.
Our algorithm helps citizens through the bureaucracy of registering a business.
Given a free form vet appointment reason we extract symptoms, diseases and requested services.
We simulate internet traffic and bidding scenarios to predict the reach of advertising campaigns.