Blog - Theory & Algorithms


Deep Learning vs Machine Learning: What is the difference?


Serdar Palaoglu


In the realm of artificial intelligence, two fundamental concepts, Machine Learning and Deep Learning, have emerged as key components in the advancement of computer-based learning systems. Machine Learning serves as a foundational principle where computers gain the ability to learn from data without explicit programming. Deep Learning, an evolution within the Machine Learning framework, utilizes artificial neural networks inspired by the human brain to achieve complex data analysis. This article delves into a comprehensive exploration of these domains, elucidating their differences, practical applications, and significance in artificial intelligence.

How ChatGPT is fine-tuned using Reinforcement Learning


Thanh Long Phan


At the end of 2022, OpenAI released ChatGPT (a Transformer-based language model) to the public. Although based on the already widely discussed GPT-3, it launched an unprecedented boom in generative AI. It is capable of generating human-like text and has a wide range of applications, including language translation, language modeling, and generating text for applications such as chatbots. Feel free to also read our introduction to LLMs . ChatGPT seems to be so powerful that many people consider it to be a substantial step towards artificial general intelligence. The main reason for the recent successes of language models such as ChatGPT lies in their size (in terms of trainable parameters). But making language models bigger does not inherently make them better at following a user's intent. A bigger model can also become more toxic and more likely to "hallucinate". To mitigate these issues and to more generally align models to user intentions, one option is to apply Reinforcement Learning. In this blog post, we will present an overview of the training process of ChatGPT, and have a closer look at the use of Reinforcement Learning in language modeling. Also interesting: Our aggregated collection of LLM content .

Early Classification of Crop Fields through Satellite Image Time Series


Tiago Sanona


In a fast paced and always changing global economy the ability to classify crop fields via remote sensing at the end of a growth cycle does not provide the much needed immediate insight required by decision makers. To address this problem we developed a model that allows continuous classification of crop fields at any point in time and improves predictions as more data becomes available. In practice, we developed a single model capable of delivering predictions about which crops are growing at any point in time based on satellite data. The data available at the time of inference could be a few images at the beginning of the year or a full time series of images from a complete growing cycle. This exceeds the capabilities of current deep learning solutions that either only offer predictions at the end of the growing cycle or have to use multiple models that are specialized to return results from pre-specified points in time. This article details the key changes we employed to the model described in a previous blog post “Classification of Crop fields through Satellite Image Time Series” that enlarges its functionality and performance. The results presented in this article are based on a research paper recently published by dida. For more detailed information about this topic and other experiments on this model please check out the original manuscript: “Early Crop Classification via Multi-Modal Satellite Data Fusion and Temporal Attention” .

Collaborative Filtering in Recommender Systems


Konrad Mundinger


In this blog post, I give an overview and provide some Python code for several collaborative filtering techniques. This is the second blog post in a series of articles about recommendation engines. Check out the first article if you want to get an overview of recommendation systems in general or need a refresher on the terminology. The Jupyter notebook I used for creating the plots will be made available soon. The techniques will be illustrated on the famous MovieLens-100K dataset. It contains 100.000 user-movie rating pairs from 943 users on 1682 movies. For most of the algorithms, I have used an existing implementation from the surprise library for Python. Even though it needs some getting used to, I think it is a nice library that you should check out if you are starting to play around with recommendation engines.

An Introduction to Metric Learning


William Clemens (PhD)


Probably the most common form of problem we tackle with machine learning is classification, that is taking new data points and putting them into one of a number of fixed sets or classes. But what if we don’t necessarily know all the classes when we train the model? A good example of this is face recognition where we want a system that can store faces and then identify if any new images it sees contain that face. Obviously, we can’t retrain the model every time we add someone new to the database so we need a better solution. One way to solve this problem is metric learning. In metric learning, our goal is to learn a metric or distance measure between different data points. If we train our model correctly then this distance measure will put examples of the same class close together and different classes further apart.

Recommendation systems - an overview


Konrad Mundinger


Recommendation systems are everywhere. We use them to buy clothes, find restaurants and choose which TV show to watch. In this blog post, I will give an overview of the underlying basic concepts, common use cases and discuss some limitations. This is the first of a series of articles about recommendation engines. Stay tuned for the follow-ups, where we will explore some of the mentioned concepts in much more detail! Already in 2010, 60 % of watch time on Youtube came from recommendations [1] and personalized recommendations are said to increase conversion rates on e-commerce sites by up to 5 times [2]. It is safe to say that if customers are presented with a nice pre-selection of products they will be less overwhelmed, more likely to consume something and have an overall better experience on the website. But how do recommendation engines work? Let's dive right in.

GPT-3 and beyond - Part 2: Shortcomings and remedies


Fabian Gringel


In the first part of this article I have described the basic idea behind GPT-3 and given some examples of what it is good at. This second and final part is dedicated to the “beyond” in the title. Here you will learn in which situations GPT-3 fails and why it is far from having proper natural language understanding, which approaches can help to mitigate the issues and might lead to the next breakthrough, what alternatives to GPT-3 there are already, and, in case you are wondering, what's the connection between GPT-3 and an octopus. Update February 14th '22: I have also included a section about OpenAI's new InstructGPT.

Classification of Crop Fields through Satellite Image Time Series


Tiago Sanona


The field of remote sensing has been benefiting from the advancements made in Machine Learning (ML). In this article we explore a state of the art model architecture, the Transformer , initially developed for Natural Language Processing (NLP) problems, which is now widely used with many forms of sequential data. Following the paper by Garnot et al. , we utilize an altered version of this architecture to classify crop fields from time series of satellite images . With this, we achieve better results than traditional methods (e. g. random forests) and with less resources than recurrent networks.

Extracting information from technical drawings


Frank Weilandt (PhD)


Did you ever need to combine data about an object from two different sources, say, images and text? We are often facing such challenges during our work at dida. Here we present an example from the realm of technical drawings. Such drawings are used in many fields for specialists to share information. They consist of drawings that follow very specific guidelines so that every specialist can understand what is depicted on them. Normally, technical drawings are given in formats that allow indexing, such as svg, html, dwg, dwf, etc. but many, especially older ones, only exist in image format (jpeg, png, bmp, etc.), for example from book scans. This kind of drawings is hard to access automatically which makes its use hard and time consuming. In this regard, automatic detection tools could be used to facilitate the search. In this blogpost, we will demonstrate how both traditional and deep-learning based computer vision techniques can be applied for information extraction from exploded-view drawings. We assume that such a drawing is given together with some textual information for each object on the drawing. The objects can be identified by numbers connected to them. Here is a rather simple example of such a drawing: An electric drill machine. There are three key components on each drawing: The numbers, the objects and the auxiliary lines. The auxiliary lines are used to connect the objects to the numbers. The task at hand will be to find all objects of a certain kind / class over a large number of drawings , e.g. the socket with number 653 in the image above appears in several drawings and even in drawings from other manufacturers. This is a typical classification task, but with a caveat: Since there is additional information for each object accessible through the numbers, we need to assign each number on the image to the corresponding object first. Next we describe this auxiliary task can be solved by using traditional computer vision techniques.

Visual Transformers: How an architecture designed for NLP enters the field of Computer Vision


Konrad Mundinger


Since its first introduction in late 2017, the Transformer has quickly become the state of the art architecture in the field of natural language processing (NLP). Recently, researchers started to apply the underlying ideas to the field of computer vision and the results suggest that the resulting Visual Transformers are outperforming their CNN-based predecessors in terms of both speed and accuracy. In this blogpost, we will have a closer look at how to apply transformers to computer vision tasks and what it means to tokenize an image.

CLIP: Mining the treasure trove of unlabeled image data


Fabian Gringel


Digitization and the internet in particular have not only provided us with a seemingly inexhaustible source of textual data, but also of images. In the case of texts, this treasure has been lifted in the form of task-agnostic pretraining by language models such as BERT or GPT-3. Contrastive Language-Image Pretraining (short: CLIP) now does a similar thing with images, or rather: the combination of images and texts. In this blog article I will give a rough non-technical outline of how CLIP works, and I will also show how you can try CLIP out yourself! If you are more technically minded and care about the details, then I recommend reading the original publication , which I think is well written and comprehensible.

Understanding graph neural networks by way of convolutional nets


Augusto Stoffel (PhD)


In this article, we will introduce the basic ideas behind graph neural networks (GNNs) through an analogy with convolutional neural networks (CNNs), which are very well known due to their prevalence in the field of computer vision. In fact, we'll see that convolutional nets are an example of GNNs, albeit one where the underlying graph is very simple, perhaps even boring. Once we see how to think of a convolutional net through this lens, it won't be hard to replace that boring graph with more interesting ones, and we'll arrive naturally at the general concept of GNN. After that, we will survey some applications of GNNs, including our use here at dida. But let's start with the basics.

What is Reinforcement Learning? (Part 2)


Matthias Werner


In the previous post we introduced the basics of reinforcement learning (RL) and the type of problem it can be applied to. The discussed setting was limited in the sense that we were dealing with a single agent acting in a stationary environment. Now we will take it one step further and discuss Multi-Agent Reinforcement Learning ( MARL ). Here we deal with multiple explicitly modeled agents in the same environment, hence every agent is part of the environment as it is perceived by all others. Since all agents learn over time and start to behave differently, the assumption of a stationary environment is violated.

BERT for question answering (Part 1)


Mattes Mollenhauer (PhD)


In this article, we are going to have a closer look at BERT - a state-of-the-art model for a range of various problems in natural language processing. BERT was developed by Google and published in 2018 and is for example used as a part of Googles search engine . The term BERT is an acronym for the term Bidirectional Encoder Representations from Transformers , which may seem quiet cryptic at first. The article is split up into two parts: In the first part we are going to see how BERT works and in the second part we will have a look at some of its practical applications - in particular, we are going to examine the problem of automated question answering .

What is Reinforcement Learning? (Part 1)


Matthias Werner


Machine Learning concerns itself with solving complicated tasks by having a software learn the rules of a process from data. One can try to discover structure in an unknown data set (unsupervised learning) or one can try to learn a mathematical function between related quantities (supervised learning). But what if you wanted the algorithm to learn to react to its environment and to behave in a particular way? No worries, machine learning’s got you covered! This branch of Machine Learning (ML) is called Reinforcement Learning (RL). In this post we will give a quick introduction to the general framework and look at a few basic solution attempts in more detail. Finally, we will give a visual example of RL at work and discuss further approaches. In the second part of the blog post we will discuss Multi-Agent Reinforcement Learning (MARL).

What is Bayesian Linear Regression? (Part 1)


Matthias Werner


Bayesian regression methods are very powerful, as they not only provide us with point estimates of regression parameters, but rather deliver an entire distribution over these parameters. This can be understood as not only learning one model, but an entire family of models and giving them different weights according to their likelihood of being correct. As this weight distribution depends on the observed data, Bayesian methods can give us an uncertainty quantification of our predictions representing what the model was able to learn from the data. The uncertainty measure could be e.g. the standard deviation of the predictions of all the models, something that point estimators will not provide by default. Knowing what the model doesn't know helps to make AI more explainable. To clarify the basic idea of Bayesian regression, we will stick to discussing Bayesian Linear Regression (BLR). BLR is the Bayesian approach to linear regression analysis. We will start with an example to motivate the method. To make things clearer, we will then introduce a couple of non-Bayesian methods that the reader might already be familiar with and discuss how they relate to Bayesian regression. In the following I assume that you have elementary knowledge of linear algebra and stochastics. Let's get started!

Beat Tracking with Deep Neural Networks


Julius Richter


This is the last post in the three part series covering machine learning approaches for time series and sequence modeling. In the first post , the basic principles and techniques for serial sequences in artificial neural networks were shown. The second post introduced a recent convolutional approach for time series called temporal convolutional network (TCN), which shows great performance on sequence-to-sequence tasks ( Bai, 2018 ). In this post, however, I will talk about a real world application which employs a machine learning model for time series analysis. To this end, I will present a beat tracking algorithm, which is a computational method for extracting the beat positions from audio signals. The presented beat tracking system ( Davies, 2019 ) is based on the TCN architecture which captures the sequential structure of audio input.

Temporal convolutional networks for sequence modeling


Julius Richter


This blog post is the second in a three part series covering machine learning approaches for time series. In the first post , I talked about how to deal with serial sequences in artificial neural networks. In particular, recurrent models such as the LSTM were presented as an approach to process temporal data in order to analyze or predict future events. In this post, however, I will present a simple but powerful convolutional approach for sequences which is called Temporal Convolutional Network (TCN). The network architecture was proposed in ( Bai, 2018 ) and shows great performance on sequence-to-sequence tasks like machine translation or speech synthesis in text-to-speech (TTS) systems. Before I describe the architectural elements in detail, I will give a short introduction about sequence-to-sequence learning and the background of TCNs.

Machine Learning Approaches for Time Series


Julius Richter


This post is the first part of a series of posts that are linked together as they all deal with the topic of time series and sequence modeling, respectively. In order to give a comprehensive piece of content easy to grasp, the series is segmented into three parts: How to deal with time series and serial sequences? A recurrent approach. Temporal Convolutional Networks (TCNs) for sequence modeling. Beat tracking in audio files as an application of sequence modeling.