Data extraction in the age of LLMs

Axel Besinger and Augusto Stoffel (PhD) May 31st, 2024

In recent years, the advent of Large Language Models (LLMs) has changed the landscape of data extraction. These LLMs boast unparalleled text processing capabilities and come pre-trained on vast amounts of data, rendering them effective for information retrieval tasks. However, traditional methods such as graph neural networks and extractive models have historically been favored for their efficiency in resource utilization. Despite this, the question persists: how do LLMs compare with those models in practical data extraction applications? This presentation aims to delve into this inquiry, providing a comprehensive examination of LLMs' advantages and disadvantages compared to extractive models. Drawing from our project experiences and internal research, we aim to elucidate the practical implications of utilizing LLMs for data extraction, offering insights into their efficacy, resource requirements, and overall performance in real-world scenarios. Through this exploration, attendees will gain a deeper understanding of the role of LLMs in modern data extraction workflows and the considerations involved in their implementation.

Link to the information extraction software: smartextract (https://smartextract.ai)