Recorded talks


Data Extraction in the Age of LLMs


Axel Besinger and Augusto Stoffel (PhD)

May 31st, 2024


In recent years, the advent of Large Language Models (LLMs) has changed the landscape of data extraction. These LLMs boast unparalleled text processing capabilities and come pre-trained on vast amounts of data, rendering them effective for information retrieval tasks. However, traditional methods such as graph neural networks and extractive models have historically been favored for their efficiency in resource utilization. Despite this, the question persists: how do LLMs compare with those models in practical data extraction applications? This presentation aims to delve into this inquiry, providing a comprehensive examination of LLMs' advantages and disadvantages compared to extractive models. Drawing from our project experiences and internal research, we aim to elucidate the practical implications of utilizing LLMs for data extraction, offering insights into their efficacy, resource requirements, and overall performance in real-world scenarios. Through this exploration, attendees will gain a deeper understanding of the role of LLMs in modern data extraction workflows and the considerations involved in their implementation. Link to the information extraction software: smartextract ( https://smartextract.ai )

Information extraction: from graph neural networks to transformers


Augusto Stoffel

April 28th, 2023


This talk aims to compare two prominent classes of models used in information extraction from semi-structured documents: Graph Neural Networks (GNNs) and specialized transformer-based architectures. While transformers are renowned for their text processing capabilities and come with pretrained weights, GNNs have the benefit of requiring much less computational power. The objective is to evaluate how these two types of models perform in practical scenarios, based on both project experience and internal research.

Open NLP meetup: Ethics in Natural Language Processing


Marty Oelschläger and Sara Zanzottera

December 1st, 2022


This talk covers two main topics. The first part delves into the ethical considerations in Natural Language Processing (NLP), discussing how language models are developed and used responsibly, addressing issues such as data privacy, algorithmic bias, and the societal impacts of automated language systems. The second segment provides a hands-on introduction to image retrieval, explaining the techniques and algorithms that enable the searching and finding of images based on content, metadata, or descriptive tags. This could include demonstrations of indexing images, feature extraction, and the use of search queries to navigate large image databases effectively.

ML for Remote Sensing: Analyse satellite data automatically


Moritz Besser and Jona Welsch

December 6th, 2021


The availability of Remote Sensing data and especially satellite data has seen a strong increase in the last years. With increasing data volumes, manual evaluation of these data becomes less efficient. Machine Learning methods are predestined to bridge this gap between data availability and need for evaluation expertise, making it possible for a larger user group to extract information from satellite data and apply this information in an enterprise context. In the upcoming webinar, Moritz Besser (Machine Learning Consultant) and Jona Welsch (Machine Learning Project Lead) will give an overview of the different types of available satellite data, Machine Learning methods used for their evaluation, as well as practical use cases.
Webinar thumbnail

Real Added Value from ML Projects - Our Success Factors


Petar Tomov and Philipp Jackmuth

October 26th, 2021


The progress made in machine learning (ML) in the last 10-15 years is so impressive that many companies in Germany have now also set up their own departments for this area. We have had the privilege of supporting some of these companies in recent years, for example in the transfer of proof-of-concepts (POCs) to production. In our upcoming webinar, Philipp Jackmuth (Managing Director of dida) and Dr. Petar Tomov (Machine Learning Project Manager) will share their experiences on the decisive factors that distinguish successful from failed ML projects.

Graph neural networks for information extraction with PyTorch


Augusto Stoffel

July 30th, 2021


In Augusto Stoffel's talk, he introduces graph neural networks (GNNs) by comparing them to convolutional neural networks (CNNs). He describes how an image can be represented as a graph to naturally transition into the basics of GNN architecture. The talk then covers Python implementations, particularly in the PyTorch framework, and focuses on GNN applications in information extraction from tabular documents in the field of NLP.
© unsplash/Paul Volkmer

Automated answering of questions with neural networks: BERT


Mattes Mollenhauer

May 26th, 2021


In this webinar we will present a method based on the BERT model for automated answering of questions. The potential applications are manifold: the ideas of this approach can be used for example in chatbots, information extraction from texts and Q&A sections of websites. As a concrete example, we discuss the extraction of information from biomedical research using the open CORD-19 data set for COVID-19 research.
© Alina Grubnyak

Recurrent neural networks: How computers learn to read


Fabian Gringel

May 26th, 2021


Applications of Natural Language Processing such as semantic search (Google), automated text translation (e.g. DeepL) or text classification (e.g. email spam filter) have become an integral part of our everyday life. In many areas of NLP, decisive progress is based on the development and research of a class of artificial neural networks that are particularly well adapted to the sequential structure of natural languages: recurrent neural networks, in short: RNNs. The webinar will give an introduction to the functioning of RNNs and illustrate their use in an example project from the field of legal tech. It will conclude with an outlook on the future importance of RNNs amidst alternative approaches such as BERT and Convolutional Neural Networks.
© unsplash/Raymond Rasmusson

Labeling Tools - The second step on the way to the successful implementation of an NLP project


Ewelina Fiebig and Fabian Gringel

May 26th, 2021


The success of an NLP project consists of a series of steps from data preparation to modeling and deployment. Since the input data are often scanned documents, the data preparation step initially involves the use of text recognition tools (OCR for short) and later on also the use of so-called labeling tools. In this webinar we will deal with the topic of selecting a suitable labeling tool.
© unsplash/Markus Spiske

Semantic search and understanding of natural text with neural networks: BERT


Konrad Schultka and Jona Welsch

May 26th, 2021


In this webinar you will get an introduction to the application of BERT for Semantic Search using a real case study: Every year millions of citizens interact with public authorities and are regularly overwhelmed by the technical language used there. We have successfully used BERT to deliver the right answer from government documents with the help of colloquial queries - without having to use technical terms in the queries.

Detecting Convective Clouds in Geostationary Satellite


William Clemens

February 26th, 2020


Detecting convective clouds is crucial for weather forecasting and climate studies. In his work, William Clemens, a Machine Learning Scientist at dida, leverages Convolutional Neural Networks (CNNs) to analyze geostationary satellite data for this purpose. CNNs are particularly adept at image recognition tasks, making them suitable for identifying the complex patterns and structures characteristic of convective clouds. Clemens's approach likely involves training the CNNs on large datasets of satellite imagery labeled with the presence of convective clouds, enabling the model to learn the distinguishing features of these clouds.