The central element of our client's service is an online platform on which customers can upload their digitized documents and partner lawyers can then retrieve them.
The platform allows the lawyer to completely process the requests in the frontend, i.e. to read rental agreements, formulate expert opinions, and communicate with clients.
In principle, the platform can be expanded with additional modules that optimize or even automate existing processes.
Background: Description of the previous process
In order to be able to initiate the review of a customer rental agreement, the client first needs scans or photos of the document. They are checked for legibility by the client's service team and then made available to a partner lawyer. If they are not legible, the client will receive a message asking him to improve the quality of the image.
The lawyer examines the contract for relevant information and wording. Since rental agreements generally do not follow a given scheme, but their structure can vary greatly, the lawyer must read the entire agreement carefully. For example, the contract provisions relevant to the renovation obligation are often spread over the entire length of the contract. The workload for the lawyer is therefore relatively high, even if his legal expertise is effectively only required to evaluate a few relevant sentences. In addition, even experienced tenancy law experts occasionally overlook individual contractual provisions.
In order to inform the client of the result of his review, the partner lawyer prepares an expert opinion. Since inquiries and explanations repeat, it is beneficial to use text modules, which must however be manually selected and inserted by the lawyer.
Challenges
Available input: scan or photograph of the rental agreement as a PDF file.
Desired output: Explained and textually substantiated assessment of the effectiveness of the cosmetic repair clause.
Technically we had to meet the following challenges:
At the beginning of the project, we only had unlabeled data. For high-quality annotations, we had to involve not only data scientists but also tenancy law experts.
We only had a few hundred rental contracts available for training purposes. At the same time, the algorithms had to learn relatively complex patterns.
Among the PDF documents, there were many photographs of poor quality taken with a smartphone. Many rental contracts had handwritten additions. However, the software should also work under these circumstances.
Solution
Technologies used
Backend: Python, TensorFlow, Keras, spaCy, NLTK, scikit-learn, Flask
Infrastructure: GCloud (Training and Google Cloud Vision), Docker, brat rapid annotation tool
In the following, we present solutions to some selected partial problems.
1. Labeling
In order to apply supervised learning algorithms, we developed an annotation scheme that systematically includes all information relevant for the decision on effectiveness. The scheme was developed iteratively in joint dida workshops with selected tenancy law experts and evaluated continuously. It was important that the relationships of the contract components determined by the structure of the annotations correspond as closely as possible to the common legal logic.