Different methods from the field of NLP helped us to create a software that spots errors in legal contracts
Customers contact the tenant protection club with their rental agreements and special or general review requests. The club then passes the documents on to partner lawyers, who prepare expert opinions tailored to the client's questions.
In order to be able to initiate the review of a customer rental agreement, the client first needs scans or photos of the document. They are checked for legibility by the client's service team and then made available to a partner lawyer. If they are not legible, the client will receive a message asking him to improve the quality of the image.
The lawyer examines the contract for relevant information and wording. Since rental agreements generally do not follow a given scheme, but their structure can vary greatly, the lawyer must read the entire agreement carefully. For example, the contract provisions relevant to the renovation obligation are often spread over the entire length of the contract. The workload for the lawyer is therefore relatively high, even if his legal expertise is effectively only required to evaluate a few relevant sentences. In addition, even experienced tenancy law experts occasionally overlook individual contractual provisions.
In order to inform the client of the result of his review, the partner lawyer prepares an expert opinion. Since inquiries and explanations repeat, it is beneficial to use text modules, which must however be manually selected and inserted by the lawyer.
Overall, the process is slower (separate readability check by service team), more expensive (use of partner lawyers for tasks that require little legal expertise) and more prone to error (overlooking relevant provisions) than it should be.
The goal of the project was to develop a software tool that automates repetitive workflows and allows the attorney and service team to focus on their areas of expertise. The use of artificial intelligence, i.e. self-learning algorithms, was the obvious choice.
First, however, a number of open questions had to be answered, including the following:
Scope: Which areas can be advantageously automated and where must or should the lawyers' expertise be used instead? How can both spheres be optimally interlocked?
Transparency: A "black box" algorithm that does not arrive at its assessments in a transparent way would not be accepted by lawyers as users. How can we present the decision-making process of the AI in such a transparent way that it is comprehensible for the users?
Legal situation: The legal situation to be represented by the algorithm is by no means unambiguous. How can the software take into account contradictory judgements on precedents and differing legal opinions?
In the course of the project we have developed a software which fully automates all steps from the evaluation of the quality of the document image to the decision on the admissibility of contractual regulations. The results of the individual steps are prepared in such a way that the user can understand the decision-making process and adapt it if necessary.
The implementation took place within an agile project framework (MVP after 6 months, continuous user feedback) and was implemented together with the customer's IT department as a web application, which has various advantages: e.g. fast roll-out of improvements, unlimited addition of users, independence from the operating system, etc.
The central element of our client's service is an online platform on which customers can upload their digitised documents and partner lawyers can then retrieve them.
The platform allows the lawyer to completely process the requests in the frontend, i.e. to read rental agreements, formulate expert opinions and communicate with clients.
In principle, the platform can be expanded with additional modules that optimize or even automate existing processes.
Available input: scan or photograph of the rental agreement as PDF file.
Desired output: Explained and textually substantiated assessment of the effectiveness of the cosmetic repair clause.
Technically we had to meet the following challenges:
At the beginning of the project, we only had unlabeled data. For high-quality annotations, we had to involve not only data scientists but also tenancy law experts.
We only had a few hundred rental contracts available for training purposes. At the same time, the algorithms had to learn relatively complex patterns.
Among the PDF documents, there were many photographs of poor quality taken with a smartphone. Many rental contracts had handwritten additions. However, the software should also work under these circumstances.
Backend: Python, TensorFlow, Keras, spaCy, NLTK, scikit-learn, Flask
Infrastructure: GCloud (Training and Google Cloud Vision), Docker, brat rapid annotation tool
In the following we present solutions of some selected partial problems.
In order to apply supervised learning algorithms, we developed an annotation scheme that systematically includes all information relevant for the decision on effectiveness. The scheme was developed iteratively in joint dida workshops with selected tenancy law experts and evaluated continuously. It was important that the relationships of the contract components determined by the structure of the annotations correspond as closely as possible to the common legal logic.
2. Decision Process
Based on the annotation scheme developed, the decision-making process could be divided into two steps: Firstly, a classification of the sentences that make up the contract text, which results in the determination of the legal structure of the contract text. The results of the classification can be easily displayed in the contract text (e.g. by colour-coded) and thus checked by the user. Secondly, this is followed by a rule-based (and thus completely transparent and individually modifiable according to legal opinion) evaluation of this information.
The depth of information of the classification results also allows automated suggestions of text modules for the lawyer's reply, so that in the ideal case the lawyer only has to check and confirm the suggestions of the algorithm.
3. ML architecture
Recurrent neural networks (RNNs) are suitable for text classification. After some experiments with different model architectures we decided for a Bi-LSTM (bidirectional Long short-term memory), which we found to perform superior to the alternative approaches. Due to the small data base, we used pre-trained word embeddings. In addition, we supplemented the model with the attention mechanism, which, according to current research results, significantly increases the performance of RNNs. This has been confirmed in the present case.
A Bi-LSTM with attention. The inputs xi are the words of a sentence.
We would be happy to present more detailed information to you in a personal meeting.
Receive news about Machine Learning and news around dida.
Successfully signed up.
Valid email address required.
Email already signed up.
Something went wrong. Please try again.