FewTuRe: Few-shot learning for information extraction from invoices and documents


FewTuRe is a research project that aims to employ Few-Shot Learning to automate document processing for SMEs, extracting data from documents with minimal training. In collaboration with the Machine Learning Group at HU Berlin and the Federal Ministry of Education and Research of Germany, dida employs transfer learning and generative AI to handle data scarcity and complex formats, ensuring a scalable, transparent solution for digital transformation.

Input

10-20 labeled examples of semi-structured documents

Output

JSON-formatted information with confidence scores

Goal

AI-powered document automation


Starting Point


The goal of this project is to explore how large amounts of semi-structured documents can be processed systematically, inexpensively, and reliably into digital information using only small amounts of enterprise-native training data. This project is all about Few-Shot Learning.


Challenges


The amount of data available to an SME is usually much smaller compared to that of larger companies. This lack of data can affect the performance of general ML models, as these typically require significant amounts of data to achieve satisfactory accuracy. In particular, the reduced availability of such data poses a crucial challenge when processing linguistic and visual information.

Another challenge for ML scientists is the semi-structured nature of documents themselves: their layout contains a mixture of free text, tables, and other unique formatting choices. The issuers of these documents are not bound to follow any standard templates, which is why the proposed solution needs to have seen dissimilar documents of the same type (e.g., invoices) and learned to recognize different fields regardless of their positioning.


Solutions & Approaches


In this project, the following topics are investigated, among others: OCR, few-shot fine-tuning, reinforcement learning and multi-modal LLMs.

To train an Information Extraction model, it is customary to use a labeled dataset ranging from 1,000 to 10,000 entries. In the absence of such datasets - which often happens due to budget constraints or simple data scarcity - Few-Shot Learning presents a viable compromise, enabling fine-tuning of a pre-trained model with only minimal input data. The ultimate goal here is to develop models that perform well using the smallest possible number of data examples (e.g., 10 to 20).


Technical Background


To extract insights applicable to the project’s initial goal, it is first necessary to investigate the subtleties of the domain. dida reached out to its previous industry partners and was able to obtain some proprietary data vital for research.

The core technologies that will be incorporated into the final product include large language models and reinforcement learning - for their document understanding capabilities - and generative AI - for post-processing and presenting results in a comprehensible way.

To battle the above-mentioned challenges, we will be employing the following combination of techniques to develop a solution that is both secure and reliable. Few-shot learning, as implemented by dida, will encompass the following findings from parameter-efficient fine-tuning research:

  1. LoRA (Low-Rank Adaptation) – First and foremost, to achieve the highest possible accuracy with the limited number of examples available, we will use LoRA to freeze most of the pre-trained model’s parameters and reduce the number of trainable parameters. This will allow us to “inject” information about the documents into the knowledge base of the model without causing the loss of previously acquired capabilities.

  1. BitFit – Secondly, BitFit, a technique that modifies only the model’s bias terms, will be employed due to its demonstrated effectiveness on small to medium datasets. This approach ensures that only absolutely necessary parameters are changed, avoiding the need for full fine-tuning.

  1. Multilingual LLMs for Transfer Learning – The third technique worth mentioning, due to its ingenious simplicity, is the deployment of multilingual LLMs for transfer learning. We will train a model in the language used in the documents, allowing it to be utilized in another language already present within the model’s knowledge base.

Of course, achieving satisfactory performance with as few as ten examples is an ambitious goal. To ensure that the model meets our standards, we will introduce explainability methods into the solution. These methods will:

  • Enable users to track the extracted information back to its original location within the document.

  • Provide a confidence score for each prediction.

This will allow us to better understand how the trained algorithm utilizes visual information while also granting the end-user the benefit of the doubt - ensuring they can verify the extracted content and assess its accuracy.


Contact


If you would like to speak with us about this project, please reach out and we will schedule an introductory meeting right away.


Related projects