From OCR to LLMs: The journey to reliable data extraction from complex retail documents


Axel Besinger and Augusto Stoffel (PhD)

AI-powered data extraction works - until it doesn’t. When handling structured tables in invoices, orders, or financial documents, we expect OCR, LLMs, and Vision AI to extract data reliably. However, complex documents - e.g. nested tables, irregular structures, and edge cases - pose real challenges for document data extraction AI models. With our solution smartextract, we tackled a real-world customer challenge: automating order entry from complex order documents and tables for a German shoe retailer: OCR and text-based LLMs struggled, Vision LLMs were inconsistent. Only extensive customization could solve the appearing problems - including segmentation, few-shot prompting, fine-tuning, and even the possibility of training a custom computer vision model. In this talk, we will show why standard AI models struggle with complex tables and demonstrate in which cases segmentation helps. Further, we will show benchmarks of commercial vs. open-source models and discuss the trade-offs between OCR, LLMs, and computer vision models.