After taking an exciting journey in the last five years, we want to celebrate with you with a day long conference.

The dida conference will include machine learning talks (both technical and applied), panel discussions, workshops, space for networking and good food. You are invited to join us for a day full of machine learning and networking.

We have a limited number of places available. Please register here for free:

Conference schedule: 10:00-23:00

-

Thomas Schnake (TU Berlin)

Extending explainable artificial intelligence - explaining the classification of graphs, sequences and more in advanced domains

Explainable artificial intelligence (XAI) is becoming more and more important for the applicability of artificial intelligence (AI), particularly in the light of the coming EU-law for AI regulations (AI Act). Yet, understanding the prediction of a machine learning model can be very broad, depending on the data and the domain it is applied on. In many cases a simple pixel-wise heat-map is not sufficient to strengthen the intuition of an applicant. We want to go deeper into the question, what complex structures a neural network is able to learn from, particularly when learning from graphs. In addition, we will see how the expectation of what an explanation should reflect, varies significantly in different domains.

-

Jona Welsch (dida)

Information extraction with BERT from free-form text

This talk explains how to extract information from free-form text using modern Deep Learning methods, such as BERT. Guided by the findings of an actual project with our client idealo, we show how to sensibly combine easy, rule based algorithms and Deep Learning methods to go from unstructured seller data (such as product descriptions) to a structured database of product properties. After a quick introduction into transformer based language encoders (such as BERT), the advantages and disadvantages of different approaches for extracting product properties from text are discussed. We also show how to use existing enterprise databases to generate (weakly labelled) training data to mitigate the need for large labelling efforts.

-

Prof. Dr. Marco Körner (TU Munich)

Machine learning in remote sensing

TBA

-

Denis Stalz-John (codecentric)

Empowering fast and reproducible machine learning

Machine learning researchers and developers face challenges in conducting fast and reproducible experiments. To address these challenges, we present a set of methods for modularized configurations and reproducible experiments. Our approach enables researchers to easily set up and configure experiments, and run them locally or on Kubernetes. We also provide a customizable dashboard for analysis, allowing researchers to visualize and investigate their results. Through our methods, researchers can streamline their workflow and accelerate their research towards production. In this talk, we discuss the benefits of our approach and how it can be used to improve the efficiency and reproducibility of machine learning projects.

-

Lunch break

We will offer free food at the event location.

-

Jakiw Pidstrigach (University of Potsdam)

Introduction to diffusion models

In the last few years, diffusion models have shown huge empirical success in image modeling tasks. They are used for medical inverse problems and form the backbone of models like Stable Diffusion or DALL-E 2. In this talk, we will introduce diffusion models and their inner workings and implement them on some small examples.

-

Dr. Augusto Stoffel (dida)

Information extraction: from graph neural networks to transformers

The last years have seen much research activity in the field of information extraction from semi-structured documents containing tables and related visual structures. Until recently, graph neural networks were among the state-of-the-art solutions for such tasks. By now, specialized transformer-based architectures have been developed. The latter have much stronger text processing capabilities and pretrained weights on large amounts of data are readily available. On the other hand, GNN models require drastically less compute resources, and one could argue that there isn't much language modeling involved in a typical information extraction task. This brings up the question: How do these two classes of models compare in practice? This talk will provide a survey of the models and attempt to answer this question based on our project experience and internal research.

-

Jakob Pörschmann (Google)

One decade of learnings from serving machine learning models to billions of users

Google is arguably the largest user of ML in production. At scales of Billions of users and Thousands of models served every day, manual processes obviously don't suffice. Emerged from necessity, Google is also among the most active contributors to the MLOps research and open-source community. This talk gives an overview of what we learned over the years. It provides an insight into the tools and best practices we use ourselves. Finally, we explore how all this is translated into the design of MLOps tools on GCP.

-

Jakob Scharlau (dida)

Domain-specific semantic search: finding the fitting document

This talk wants to take a look into the topic of information retrieval and explain how machine learning methods like pre-trained language models can be used to build search systems with semantic understanding of language. For this, we will go into some of the recently developed techniques and the different factors to take into account when building such a tool. Afterwards, we will present a concrete example of a production system we developed whose task is to find the correct technical document from a search query written in a more colloquial language.

-

Panel Discussion

Künstliche Intelligenz, menschliche Learnings (in German)

Was kann ich tun, damit mein KI-Projekt so richtig schief geht? Dieser und weiteren Fragen werden wir in unserem kurzweiligen Panel auf den Grund gehen. Die Teilnehmer sind allesamt praxiserprobt in der Definition und Umsetzung von Software- und KI-Projekten und werden uns von ihren Erfahrungen und Best-Practices berichten.

Teilnehmer: Dr. Jan Anderssen (idealo), Dr. Amit Ghosh (INWT), Robert Heesen (Enpal), Andreas Henninger (APCOA). Moderation: Philipp Jackmuth (dida)

-

Dinner & networking

Workshops

-

Room 1

Paper reading group
Dr. William Clemens

dida has a weekly internal reading group where our ML scientists discuss recent papers and how they can help with our projects. In this session we'll hold a live reading group meeting, paper to be announced closer to the time.

-

Room 1

MLOps session
Denis Stalz-John

This workshop will be the hands-on continuation of Denis' talk "Empowering fast and reproducible machine learning".

-

Room 1

An introduction to JAX
Dr. Rustam Antia & Dr. Augusto Stoffel

JAX is a powerful and elegant Python library for high-performance computing in machine learning that has been gaining popularity. In this session we will first cover the basics of JAX and then walk through an implementation of the metalearning algorithm MAML to illustrate the simplicity and expressiveness of the core JAX abstractions. Join us to learn how to leverage functional programming and the magic of grad, vmap, and jit for your own projects and take your machine learning development to the next level.

-

Room 2

Get to know your team better (in German)
Iris van Baarsen

In einem Team gibt es immer unterschiedliche Charaktere, Bedürfnisse und Fähigkeiten, was eine Zusammenarbeit und auch das Lösen von Konflikten nicht gerade einfacher macht. Ist man sich der Unterschiede jedoch bewusst, kann auch ein bewussterer und typengerechterer Umgang erfolgen. Wir schauen uns in dieser Session die verschiedenen Typen an, finden heraus, wer DU bist und welche Aufgaben und Eigenschaften zu den unterschiedlichen Typen passen.

-

Room 2

Managing unconscious biases
Angela Maennel & Dr. Marty Oelschlanger

No matter if in the workplace or in private, unconscious biases travel with us and strongly shape our actions without us even noticing. We want to have a close look at the mechanisms which influence us behind the scenes and look at techniques for managing unconscious bias within this workshop.

-

Room 2

Speakers

Thomas Schnake

Thomas has a background in mathematics and is a PhD student in the machine learning group of Klaus-Robert Müller at TU Berlin.

Jona Welsch

Jona has a background in physics and works as a machine learning project lead at dida.

Prof. Dr. Marco Körner

Marco is a professor at TU Munich and works at the intersection of remote sensing and machine learning.

Denis Stalz-John

Denis works as a machine learning specialist at codecentric. He focuses on bringing machine learning to production as well as enabling fast experimentation.

Jakiw Pidstrigach

Jakiw is a PhD student in mathematics at the University of Potsdam, focusing on stochastic processes and diffusion based generative modeling.

Dr. Augusto Stoffel

Augusto holds a PhD in mathematics (University of Notre Dame, USA) and did research in the field of algebraic topology and its application as a foundation of quantum field theory. At dida he works as a machine learning scientist.


Jakob Pörschmann

Jakob has a background in data science and works as a customer engineer at Google.

Jakob Scharlau

Jakob studied theoretical physics (Uni Heidelberg) and worked on the intersection of quantum information theory and thermodynamics. At dida, he works as a machine learning project lead.

Iris van Baarsen

Iris has many years of professional and management experience in the field of human resources. She is responsible for recruiting and organizational development at dida. In addition, she has been working as a business coach for more than ten years.

Dr. William Clemens

Will holds a PhD in string theory and quantum chromodynamics at the University of Southampton. At dida, he works as a machine learning scientist.

Angela Maennel

Angela studied mathematics at ETH Zürich. At dida, she works as a machine learning scientist focusing on NLP projects.

Dr. Marty Oelschläger

Marty holds a PhD in physics (HU Berlin) focussing on fluctuation-induced phenomena, where he investigated the interplay of classical and quantum statistics. At dida, he works as a machine learning scientist.

Dr. Jan Anderssen

As domain lead in e-commerce, the PhD computer linguist has extensive experience with IT projects in general and data projects in particular.

Robert Heesen

In his role as VP partnerships and business development Robert has a lot of experience in setting up and optimizing (IT-) processes. Previously, he worked in IT sales and helped clients identify and define AI projects.

Andreas Henninger

With over 20 years of experience in the IT field, the physicist understands the various perspectives on IT projects, may it be from the perspective of an application developer, project lead or head of IT.

Dr. Amit Ghosh

The PhD statistician is the managing director and co-founder of INWT Statistics, a specialized company for data science, data analysis consulting, and predictive analytics.

Dr. Rustam Antia

Rustam holds a PhD in mathematics from UT Austin, where he did research in derived algebraic geometry. At dida, he works as a machine learning scientist.

For further questions, please send an e-mail to conference@dida.do.

Location

B-Part, at the Gleisdreieckpark, Luckenwalder Str. 6b, 10963 Berlin