To extract insights applicable to the project’s initial goal, it is first necessary to investigate the subtleties of the domain. dida reached out to its previous industry partners and was able to obtain some proprietary data vital for research.
The core technologies that will be incorporated into the final product include large language models and reinforcement learning - for their document understanding capabilities - and generative AI - for post-processing and presenting results in a comprehensible way.
To battle the above-mentioned challenges, we will be employing the following combination of techniques to develop a solution that is both secure and reliable. Few-shot learning, as implemented by dida, will encompass the following findings from parameter-efficient fine-tuning research:
LoRA (Low-Rank Adaptation) – First and foremost, to achieve the highest possible accuracy with the limited number of examples available, we will use LoRA to freeze most of the pre-trained model’s parameters and reduce the number of trainable parameters. This will allow us to “inject” information about the documents into the knowledge base of the model without causing the loss of previously acquired capabilities.
BitFit – Secondly, BitFit, a technique that modifies only the model’s bias terms, will be employed due to its demonstrated effectiveness on small to medium datasets. This approach ensures that only absolutely necessary parameters are changed, avoiding the need for full fine-tuning.
Multilingual LLMs for Transfer Learning – The third technique worth mentioning, due to its ingenious simplicity, is the deployment of multilingual LLMs for transfer learning. We will train a model in the language used in the documents, allowing it to be utilized in another language already present within the model’s knowledge base.
Of course, achieving satisfactory performance with as few as ten examples is an ambitious goal. To ensure that the model meets our standards, we will introduce explainability methods into the solution. These methods will:
This will allow us to better understand how the trained algorithm utilizes visual information while also granting the end-user the benefit of the doubt - ensuring they can verify the extracted content and assess its accuracy.