Blog


LLM strategies part 1: Possibilities of implementing Large Language Models in your organization


David Berscheid


Large Language Models (LLMs) are a highly discussed topic in current strategy meetings of organizations across all industries. This article is the first part of two, providing some guidelines for organizations to determine their LLM strategy. It will help you identify the strategy with the most benefits while finding ways of solving associated complexities. For more content on LLMs, see our LLM hub .

X-ROCKET to the moon


Felix Brunner


These are the voyages of the encoder model X-ROCKET. Its continuing mission: to explore strange, new time series; to seek out new explanations and new interpretations; to boldly seek meaning where no one has sought before. Previously in this series, we completed our training in the basics of time series classification in part one and learned how to operate X-ROCKET in part two . But enough with all the talking, it is time to fire up the X-ROCKET engines and see this model in action. Let’s rocket! Data, prepare for takeoff! We will use the “ AsphaltPavementTypeCoordinates ” dataset from Souza (2018) as an example. This dataset consists of 2,111 examples of accelerometer data recorded from cars passing over various types of pavement. Every time series example in the dataset has three channels (corresponding to the X, Y, and Z directions), each of which is measured at 100 Hz. The length of recordings varies from 66 time observations up to 2,371. The classes are “flexible” (38.6%), “cobblestone” (25.0%), and “dirt road” (36.4%). According to the description, the best model achieved an accuracy of 80.66% on this task, which we will use as a benchmark. So, Houston, we have our problem — a relatively balanced three-way multivariate time series classification problem, to be precise. The aeon module provides a simple way to load this dataset for our machine learning task. We will also use scikit-learn to follow the original authors and divide the full dataset into equally-sized train and test splits: from aeon.datasets import load_classification from sklearn.model_selection import train_test_split X, y, meta = load_classification("AsphaltPavementTypeCoordinates") X_train, X_test, y_train, y_test = train_test_split( X, z, test_size=0.5, random_state=0 ) How to build a ROCKET Next, let’s put together a suitable vessel to encode this dataset. Having installed the xrocket module with its dependencies in our environment, we can immediately import the full encoder module. Then, all we have to do is initialize an instance of it with suitable parameters for our problem. Since our dataset has three channels the choice of in_channels is clear. Next, as the time series length varies widely within our dataset, it makes sense to set max_kernel_span to a value suitable also for the shorter examples, let’s do 100 in this case. Finally, we leave combination_order and feature_cap at its default values of one and 10,000 for now: from xrocket import XRocket encoder = XRocket( in_channels=3, max_kernel_span=100, combination_order=1, feature_cap=10_000, ) Given these inputs, our encoder is automatically set up to have the usual 84 MiniROCKET kernels at 12 distinct dilation values. With three data channels, X-ROCKET chooses to use three pooling thresholds for each kernel-dilation-channel combination to stay within the feature_cap . Hence, the embedding dimension is 84 12 3 * 3 = 9,072. To finally prepare this contraption for boarding, all we have to do is find suitable values for the 9,072 pooling thresholds. We do this by fitting our XRocket instance to a data example. As the model operates on PyTorch tensors, where the first dimension is reserved for stacking multiple examples in a batch, all we have to do is transform the data from a 2D numpy array into a 3D tensor and feed it to the encoder : from torch import Tensor encoder.fit(Tensor(X_train[0]).unsqueeze(0)) Punch it! Now that our X-ROCKET is calibrated, let’s start the countdown. Again, inputs need to be in the 3D tensor format, so we need to transform the examples to PyTorch tensors before passing them to the model. Due to the varying time series lengths, we can not concatenate multiple examples into a batch so easily. Therefore it is more convenient to encode the examples one by one and collect the embeddings in two lists, one for the training set and one for the test set. Time to go to full thrust, godspeed! embed_train, embed_test = [], [] for x in X_train: embed_train.append(encoder(Tensor(x).unsqueeze(0))) for x in X_test: embed_test.append(encoder(Tensor(x).unsqueeze(0))) 8.02 seconds on a moderately fast consumer-grade CPU later, the embeddings of both the train and the test set are ready. That is, we now have a representation of the varying-size input data in fixed-dimensional vectors. Hence, the time has come to make this a tabular problem with named features stored in a DataFrame . The encoder provides the attribute feature_names that readily contains the names of each embedding value as a tuple of (pattern, dilation, channel, threshold). Let’s put these tuples in an index and name them accordingly. Then finally, we create the frames to store the transformed datasets. Who said time series classification had to be rocket science? from torch import concat import pandas as pd feature_names = pd.Index(encoder.feature_names) df_train = pd.DataFrame(data=concat(embed_train), columns=feature_names) df_test = pd.DataFrame(data=concat(embed_test), columns=feature_names) Giving X-ROCKET a purpose As with so many things in the universe, X-ROCKET struggles to find its way without a head. To make sure it can follow its trajectory to the intended destination — time series classification — let’s find a suitable prediction head that delivers the payload. As mentioned before, any prediction model that fits the intended purpose is fine in principle. Note that in theory, this also includes deep PyTorch feed-forward neural networks, which allow to run backpropagation end to end back to the X-ROCKET weights to improve its embeddings. But don’t panic, it is possible to find answers even without Deep Thought! Since we are eventually interested in the explainability of the predictions, let’s pick a simple and explainable classification model instead. Scikit-learn’s RandomForestClassifier is a solid start on that end, all we have to do is load it and fit it on our training data: from sklearn.ensemble import RandomForestClassifier clf = RandomForestClassifier(random_state=0) clf.fit(df_train, y_train) Wow, it almost went off like a rocket! Just 3.13 seconds later, we have our classifier. Let’s see how it does on the dataset. Since the original work claims to achieve 80.66% accuracy, let’s score our model in the same way on the hold-out set as they did: from sklearn.metrics import accuracy_score pred_test = clf.predict(df_test) acc_test = accuracy_score(y_test, pred_test) And there we have it, our model achieves an accuracy of 90.19% on the test set! Not bad, but is it enough to make a little rocket man proud? To conclusively answer that question, of course, more rigorous comparisons are warranted. Nevertheless, this appears to have been a successful launch! Where no ROCKET man has gone before The time has come to take X-ROCKET to the final frontier on its ultimate quest for meaning. Since the model seems to work acceptably well, it is valid to also analyze the explanations it provides about its predictions. Luckily, the random forest classifier we chose provides an attribute called feature_importances_ , which ascribes importance scores to all features of the model. Since we have stored the corresponding index in feature_names , we can easily bring together both arrays: feature_importances = pd.Series( data=clf.feature_importances_, index=feature_names, ) As it is, analyzing this object is only so useful. For example, we can see that the most important embedding for our model is the pattern HLHLLLHLL at dilation two in the Y-channel with pooling threshold of -10.84. An H in the pattern indicates a high value, while an L indicates a low such that the pattern looks something like |_|___|__ . However, it is now easy to pool importance values to examine the relative importances of, say, the input channels. Summing over each channel we get the importance scores below. Given the way X-ROCKET removes the randomness in the way the embeddings are put together, the same features are extracted from each channel and each dilation value. Hence, comparing grouped feature importances this way offers a fair comparison. Relative importances of the input channels for the predictions. That is, the Y-channel seems to be the clear favorite, followed by the X-channel. Similarly, if we sum over the various dilation values, a clear insight is that higher frequencies are the ones that matter. With entries being recorded at 100 Hz, a dilation value of 2 means a frequency of 50 Hz, for example. As can be seen in the image below, most information is contained in these higher frequencies, that is, the ones with smaller dilation values. Relative importances of various frequency dilations for the predictions. What did the doctor say to the ROCKET? “Time to get your booster shot!” Accordingly, one might wonder what could be ways to provide an extra performance boost to this rocket ship. In machine learning space, of course, the possibilities are endless. For example, one could try alternative model heads such as gradient boosting algorithms, or better optimize the corresponding hyperparameters. On a different route, one could think about how to improve the data quality or augment the existing dataset with artificial examples. However, this is beyond the scope of this simple demonstration. What would be interesting to see though, is if the encoder can be further improved to gain additional insight into the drivers of predictiveness when also considering multi-channel features besides the previously seen univariate ones. So let’s leave everything unchanged, but only alter the encoder by setting combination_order=2 and increase the number of features slightly with feature_cap=15_000 when initializing X-ROCKET. The resulting embedding is now 12,096-dimensional with 6 channel combinations instead of only the 3 channels, and 2 pooling thresholds for each output. Besides a slight increase in test set accuracy to 91.13% , we again observe that the Y-channel again seems to be the most important, but now combinations of Y with the other channels carry increased importances: Relative importance of input channel combinations for the predictions. Conclusions In this series of articles, we have seen how an existing time series encoder framework can be restructured to derive new insight into the prediction drivers. Part one has shed light on some of the advances in machine learning for the time series domain. Then, part two and this third part presented X-ROCKET, an explainable time series encoder, both technically and with a practical usage example. While this construct has completed its mission in the example here, it is important to point out that the explanations provided by X-ROCKET are only as good as the model’s prediction capabilities on the respective problem. That is, there is no point in interpreting a model that does not perform well enough in terms of its predictions. Hence, there is no guarantee that the same approach works equally well in different settings, in particular, if there is little signal in the input data. Nonetheless, rockets are cool, there is no getting around that! References Dempster, A., Schmidt, D. F., & Webb, G. I. (2021, August). Minirocket: A very fast (almost) deterministic transform for time series classification. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining (pp. 248–257). Souza, V. M. (2018). Asphalt pavement classification using smartphone accelerometer and complexity invariant distance. Engineering Applications of Artificial Intelligence , 74, 198–211. This article was created within the “AI-gent3D — AI-supported, generative 3D-Printing” project , funded by the German Federal Ministry of Education and Research (BMBF) with the funding reference 02P20A501 under the coordination of PTKA Karlsruhe.

Inside X-ROCKET: Explaining the explainable ROCKET


Felix Brunner


Welcome to the bridge, pilot! In this second part of our three-part journey, we will have a detailed look at the interior of the X-ROCKET implementation . After setting the stage of time series classification and a basic introduction of the ROCKET model in part one , this article provides a tour of the necessary mechanisms for explainable embeddings, before part three will launch X-ROCKET into a bumpy space race on a real dataset. The blueprint to explainability Again, our goal is to add explainability to a potent time series encoder, the ROCKET. One way to achieve this is by tracing each element of the embedding vectors to its origins and thereby attaching meaning to them. Put differently, if we manage to meaningfully name each embedding element, we effectively transform downstream tasks into tabular problems. Now with the complexities and nonlinearities of neural networks, this is usually easier said than done. In the case of ROCKET, however, the architecture is shallow enough to shed light on its inner workings with a little bit of engineering and trickery. More precisely, the MiniROCKET of Dempster et al. (2021) will serve as a starting point, to which we add transparency by fully backtracking its encoding mechanisms. While convolutions do not necessarily need to be implemented in a deep-learning framework, doing so can help computational speed by leveraging GPUs. Accordingly, there already exist good implementations of various ROCKET variants in Python. For example, the original authors’ numpy code is part of the sktime library, and tsai contains a GPU-ready PyTorch version of it. However, although these implementations are already computationally very efficient, our endeavors require a few changes that are more easily achieved after restructuring the model. Let’s dive more into the technical details of the X-ROCKET implementation . As mentioned before, ROCKET architectures resemble very simple CNNs, so why not also structure their implementation like a neural network? That is, let’s treat the steps of the calculation as layer objects and plug them together in line with the ideas behind ROCKET. More precisely, we define modules for each calculation step such that it is easier to understand the underlying computational graph. The diagram below schematically presents the full architecture of X-ROCKET. An input time series is served to several dilation blocks in parallel, each of which consists of a convolutional module, a channel mixing module, and a threshold pooling module. After processing the data sequentially in its submodules, each dilation block outputs a vector of embeddings. Finally, these embeddings are concatenated together to form the full X-ROCKET output embedding, which downstream models can pick up to produce a prediction — in our case a classification. Note that the interpretability of the final prediction depends on how explainable the downstream prediction model is. While explainable AI (XAI) is a very active field of research with a whole literature dedicated to making algorithms explainable, we will follow the original authors’ suggestion to use relatively simple prediction heads that are explainable without any additional sophistication. Full overview of the X-ROCKET architecture. In what follows, I provide a more detailed look at the various modules that make up X-ROCKET. ROCKET convolutions The first step in processing the data is by applying convolutional kernels that scan for fixed patterns in the data. As we are dealing with time series, 1-dimensional kernels are the appropriate choice. The drawing below illustrates how the convolutions are applied. Given a sequence of input data, convolutional kernels are applied by sliding them over the input and summing element-wise products in the respective window. Effectively, this scans the input for the prevalence of the respective pattern and results in an output that has the same shape as the input. Note how in the image below, the output sequence always has large values, when there is a peak in the input. Conversely, the output is negative if there is a dip in the input. This is due to the fact that in this example, the input is filtered for the pattern [-1, 2, -1] , which has the shape of a spike itself. X-ROCKET uses the same 84 filters with a length of nine values as suggested in Dempster et al. (2021) , but in contrast to the original authors, we always pad the inputs to obtain identical-length output sequences. To maintain explainability in this step, it is enough to store the kernels corresponding to each output sequence. Illustration of a 1D convolution. Channel mixing When dealing with multivariate time series, that is, time series with multiple channels, one might want to consider correlations of patterns in multiple channels. While the original implementation mainly focuses on the univariate case and suggests naïvely adding random combinations of ROCKET convolutions together, we want to provide a balanced comparison of features. Therefore, X-ROCKET removes the randomness and instead provides the option to expand the feature pool with channel combinations up to a chosen order. As an additional option, channels can be combined multiplicatively instead of additively for closer resemblance to the concept of a correlation. Explainability in this step is ensured by remembering the channels the mixed outputs are built with. Illustration of channel combinations. PPV threshold pooling The transformations up to this point have anything but reduced the size of the data. That is, applying multiple convolutional filters to each channel and adding combinations of the input channels on top of single-channel convolutional outputs results in a far greater number of equal-length output channels than were originally put in. Therefore, it is time to collapse the time dimension through a pooling mechanism. Following the original paper’s suggestions, X-ROCKET applies proportion-of-positive-values pooling (PPV). More precisely, the values in each intermediary channel are thresholded at one or more bias values per channel, where the bias values are automatically chosen based on representative examples in an initial fitting step. Then, PPV counts the fraction of values that surpass the respective threshold across the timeline. Finally, the resulting percentages directly serve as feature values in the embedding vector. Hence, for explainability, elements in the embedding can be unambiguously linked to a combination of convolutional kernel, one or more input channels, and a threshold value. Illustration of proportion-of-positive-values pooling via thresholds. Dilation blocks With the considered convolutional kernels only spanning nine observations, the capacity of the model is so far limited to detect a very narrow set of input characteristics. To change that, multiple dilation values are applied to identical kernels simultaneously to widen their receptive fields. X-ROCKET achieves this in practice by executing the aforementioned sequence of convolution, channel mixing, and PPV thresholding in multiple dilation blocks in parallel. In principle, dilations are a standard procedure in the context of CNNs, but most architectures only use a single value at each step. Having said that, a similar idea has recently shown promise to drastically improve the contextual capabilities of LLMs by enlarging context windows through dilated attention (see Ding et al. (2023) ). To better understand how filter dilation works, consider the drawing below. Applying a dilation value is spreading the kernel over a longer period of time, and thereby scanning lower frequencies for the respective patterns. For example, the resulting activation with a dilation value of two indicates the occurrence of the pattern at half the data frequency. For explainability, it is therefore important to store the dilation value corresponding to each embedding element as well. Illustration of frequency dilations. The full model Coming back to the full model, we can now put the pieces together. To initialize the encoder, we need to choose a few hyperparameters that determine the exact structure of the model. First, the number of input channels in_channels needs to be specified according to the number of channels in the data. Second, to automatically choose the dilation values to consider, the model requires to set an upper bound for the width of the convolutional receptive fields, called the max_kernel_span . Typically, X-ROCKET then picks 20–30 distinct frequencies to consider. Next, the combination_order determines how many channels are combined together when looking for correlations. By default, this keyword argument is set to 1 for simplicity. Finally, the feature_cap limits the dimensionality of the output to 10,000 features by default. X-ROCKET then builds the feature pool deterministically, that is, it is careful to include all channel-dilation-kernel combinations. Hence, the resulting number of features needs to be a multiple of all possible combinations and is not necessarily close to the specified value. If there is room within the feature cap, multiple thresholds are applied to each channel-dilation-kernel combination in the pooling step to create additional features. Finally, to turn the embeddings into predictions, the encoder needs to be combined with a prediction model. As we are interested in interpretability, explainable models are the suggested choice here. Having effectively structured the problem tabularly through the X-ROCKET encoder, many models for tabular data are valid candidates. For example, scikit-learn offers a large selection of insightful algorithms for tabular data. Similarly, gradient boosting algorithms such as XGBoost are high-performance alternatives. Note that standardizing the embedding vectors may be an essential intermediary processing step to ensure the interpretability of some of these prediction algorithms. Finally, with the X-ROCKET code living in the PyTorch framework, it is also easy to combine the encoder with a deep feed-forward neural network. However, anything beyond a single linear layer might again be difficult to interpret in this case. In the next and final part , I will show a simple usage example of the X-ROCKET implementation that also illustrates what kind of insight one can derive from X-ROCKET besides pure predictive performance. References Dempster, A., Schmidt, D. F., & Webb, G. I. (2021, August). Minirocket: A very fast (almost) deterministic transform for time series classification. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining (pp. 248–257). Ding, J., Ma, S., Dong, L., Zhang, X., Huang, S., Wang, W., & Wei, F. (2023). Longnet: Scaling transformers to 1,000,000,000 tokens. arXiv preprint arXiv:2307.02486 . Drawings were created in excalidraw . This article was created within the “AI-gent3D — AI-supported, generative 3D-Printing” project , funded by the German Federal Ministry of Education and Research (BMBF) with the funding reference 02P20A501 under the coordination of PTKA Karlsruhe.

Explainable time series classification with X-ROCKET


Felix Brunner


With the lasting hypes in the domains computer vision and natural language processing, time series are frequently overlooked when talking about impactful applications of machine learning. However, time series data is ubiquitous in many domains and predictive modeling of such data often carries significant business value. One important task in this context is time series classification, which is attracting rising levels of attention due to its diverse applications in domains such as finance, healthcare, and manufacturing. Numerous techniques have been developed to tackle the unique challenges posed by time series data, where increased capacity often comes at the expense of interpretability and computational speed. While the race for a common state-of-the-art embedding model for time series continues, the R and O m C onvolutional KE rnel T ransform (ROCKET) of Dempster et al. (2020) has gained significant attention as a simple yet powerful encoder model. In this series of articles, I will introduce the model’s underlying ideas, and show an augmentation that adds explainability to its embeddings for use in downstream tasks. It consists of three parts: This first part provides background information on time series classification and ROCKET. The second part sheds light on the inner workings of the X-ROCKET implementation . The third part takes us on an exploration of X-ROCKET’s capabilities in a practical setting. The fundamentals of time series classification A common task in the time series domain is to identify which of a set of categories an input belongs to. For example, one might be interested in diagnosing the state of a production machine given a sequence of sensor measurements or in predicting the health of an organism from biomedical observations over a time interval. Formally, the problem can be described as follows: Given a sequence of observations at a regular frequency, calculate the probabilities of the input belonging to one of a fixed set of classes. The input data for each example is usually structured as a 1D-array of numerical values in the univariate case, or a 2D-array if there are multiple channels. A prediction model then calculates class probabilities as its output. In this context, models are commonly composed of an encoder block that produces feature embeddings, and a classification algorithm that processes the embeddings to calculate the output probabilities, as schematically indicated in the diagram below. Illustration of a time series classification pipeline (drawn in excalidraw ). While classification algorithms in machine learning have matured, it is less clear how to extract suitable features from time series inputs. Traditional time series approaches, such as Dynamic Time Warping and Fourier transforms, have shown promise in handling time series similarity and feature extraction. More recently, with the advent of deep learning, Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) have emerged as dominant methodologies to capture sequential patterns and spatial features, respectively. Finally, Transformer-based models with temporal attention have shown promise to further advance the field of time series classification in the most up-to-date research (e.g. Zerveas et al. (2021) ). Despite these advancements, there are still substantial challenges to harvesting time series data. Where images or texts are immediately interpretable by our human brains in most cases, examining the fluctuations in time series recordings can be unintuitive to the extent that it is impossible to assign class labels in the first place. In particular, it is often unclear how informative specific time series recordings are in the first place, which is aggravated by the widespread prevalence of noise. Hence, it is an open question how the data should be processed to extract potential signals from an input. Additionally, unlike images, time series often vary in terms of length, so methods for feature extraction should be able to summarize inputs in fixed-dimensional embedding vectors independent of input size. Finally, time series data may or may not be stationary, which potentially has adverse effects on prediction quality. What works? So what is the go-to-model for time series classification? Alas, the answer is not that simple. This is mainly due to the lack of widely accepted benchmarks, which makes it impossible to fairly compare the numerous and diverse models proposed in the literature. But even if one wanted to construct such a unified benchmark dataset, it is not clear what it would contain to be representative of the diversity that is time series. In other words, measuring a model’s performance on low-frequency weather data might not be a good indication of its success with high-frequency audio files or DNA sequences. To get a sense of how different data in the time series domain can be, compare for example the visualizations of examples from various datasets in Middlehurst et al. (2023) below. Moreover, there is an important distinction between univariate and multivariate time series, that is, if one or more different variables are being measured simultaneously. Unfortunately, evidence is particularly thin for the multivariate case, which is the more relevant case in many practical applications. Visualizations of examples from various time series datasets from Middlehurst et al. (2023) . Having said that, there are a few resources that attempt to compare different methods in the domain of time series classification. On the one hand, the constantly updated time series classification leaderboard on Papers with code provides scores for a few models on selected datasets. On the other hand, members of the research group behind the time series classification website have published papers (compare, e.g., Bagnall et al. (2017) , Ruiz et al. (2021) , and Middlehurst et al. (2023) ) that conduct horse races between time series classification methods on their time series data archive. While the former favors a variety of RNNs and CNNs on its benchmarks, non-deep learning methods such as ROCKET fare particularly well on the latter. Therefore, it would be presumptuous to declare a general winner, and the answer is a resolute “well, it depends”. In many cases, there are additional considerations besides pure performance that tip the scales when it comes to model choice. With limited availability of training data, more complex and capable models that require extensive training are often out of place. Ideally, there would be a pre-trained encoder model that could be used out-of-the-box to extract meaningful patterns from any time series input without additional training and could be fine-tuned to a specific use case with moderate effort as is the case in computer vision or NLP. Hence, there is often a trade-off between performance and computational efficiency. Moreover, practical applications often require predictions to be explainable; that is, domain experts often demand to understand what features of the input time series evoke a prediction. This is particularly true for sensitive use cases such as in health care or for autonomous driving. Therefore, choosing an explainable model is often crucial for the suitability and credibility of machine learning techniques. Team ROCKET to the rescue One relatively simple modeling approach for time series embeddings is the so-called ROCKET, short for RandOm Convolutional KErnel Transform. This methodology was first introduced in Dempster et al. (2020) and has been further developed in subsequent research papers. Noteworthy variants here are the MiniROCKET of Dempster et al. (2021) , the MultiROCKET of Tan et al. (2022) , and HYDRA of Dempster et al. (2023) . A main advantage over more complex methods is that ROCKET models are very fast in terms of computation and do normally not require any training to learn an informative embedding mapping, while predictive performance is on par with state-of-the-art models. For example, Ruiz et al. (2021) find that training time is orders of magnitude faster for ROCKET compared to other time series classification algorithms that achieve similar accuracy (see image below). This difference mainly stems from the fact that ROCKET encoders scan an input for a pre-defined set of possibly dispensable patterns and then only let the classifier learn which ones matter, instead of learning everything from scratch. Model comparison chart taken from Ruiz et al. (2021) . The main idea behind ROCKET encodings banks on the recent successes of Convolutional Neural Networks (CNNs) and transfers them to feature extraction in time series datasets. In contrast to most CNNs in the image domain, however, the architecture does not involve any hidden layers or other non-linearities. Instead, a large number of preset kernels is convolved with the input separately, resulting in a transformation that indicates the strength of occurrences of the convolutional patterns in different parts of the input sequence. This process is repeated with various dilation values, which is the same as scanning at different frequencies. As for the choice of filters, the original paper suggests using random kernels, while later installments use a small set of deterministic patterns. Next, the high-dimensional outputs of this step are pooled across time via proportion of positive values pooling (PPV), that is, by counting the times when the convolutional activations surpass channel-wise bias thresholds which can be learned from representative examples. As a result, the output of the encoder is a feature vector that summarizes the input time series independent of its length. The transformed features can then serve as the input to any prediction algorithm that can deal with feature redundancy. For example, the original work advises to use simple algorithms like regularized linear models. For a more detailed explanation of the transformations, please refer to the original authors’ paper or to the more detailed descriptions in the second installment of this article. So if ROCKET achieves state-of-the-art performance while being computationally much more efficient than most methods, what could possibly go wrong? Well oftentimes, performance is not everything… Coming back to the explainability requirements that machine learning models often encounter in practice, is ROCKET a suitable model? As it comes, the answer is no. However, the algorithm requires only slight changes to attach meaning to its embeddings. In the second part , I will demonstrate how this can be achieved by means of a slightly altered implementation, the explainable ROCKET — or short, X-ROCKET. References Bagnall, A., Lines, J., Bostrom, A., Large, J., & Keogh, E. (2017). The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data mining and knowledge discovery , 31, 606–660. Dempster, A., Petitjean, F., & Webb, G. I. (2020). ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery , 34(5), 1454–1495. Dempster, A., Schmidt, D. F., & Webb, G. I. (2021, August). Minirocket: A very fast (almost) deterministic transform for time series classification. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining (pp. 248–257). Dempster, A., Schmidt, D. F., & Webb, G. I. (2023). Hydra: Competing convolutional kernels for fast and accurate time series classification. Data Mining and Knowledge Discovery , 1–27. Middlehurst, M., Schäfer, P., & Bagnall, A. (2023). Bake off redux: a review and experimental evaluation of recent time series classification algorithms. arXiv preprint arXiv:2304.13029 . Ruiz, A. P., Flynn, M., Large, J., Middlehurst, M., & Bagnall, A. (2021). The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery , 35(2), 401–449. Tan, C. W., Dempster, A., Bergmeir, C., & Webb, G. I. (2022). MultiRocket: multiple pooling operators and transformations for fast and effective time series classification. Data Mining and Knowledge Discovery , 36(5), 1623–1646. Zerveas, G., Jayaraman, S., Patel, D., Bhamidipaty, A., & Eickhoff, C. (2021, August). A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining (pp. 2114–2124). This article was created within the “AI-gent3D — AI-supported, generative 3D-Printing” project , funded by the German Federal Ministry of Education and Research (BMBF) with the funding reference 02P20A501 under the coordination of PTKA Karlsruhe.

Extend the knowledge of your Large Language Model with RAG


Thanh Long Phan, Fabian Dechent


Large Language Models (LLMs) have rapidly gained popularity in Natural Language tasks due to their remarkable human-like ability to understand and generate text. Amidst great advances, there are still challenges to be solved on the way to building perfectly reliable assistants. LLMs are known to make up answers, often producing text that adheres to the expected style, but lacks accuracy or factual grounding. Generated words and phrases are chosen as they are likely to follow previous text, where the likelihood is adjusted to fit the training corpus as closely as possible. This gives rise to the possibility that a piece of information is outdated, if the corpus is not updated and the model retrained. Or that it is just factually incorrect, while the generated words have the quality of sounding correct and can be matched to the required genre. The core problem here is that the LLM does not know, what it does not know. In addition, even if a piece of information is correct, it is hard to track its source in order to enable fact-checking. In this article, we introduce RAG (Retrieval-Augmented Generation) as a method to address both problems and which thus aims to enhance the reliability and accuracy of information generated by LLMs.

Fairness in Machine Learning


Cornelius Braun


In a previous blog post , we explained the plenitude of human biases that are often present in real-world data sets. Since practitioners may be forced to work with biased data, it is crucial to know about ways in which the fairness of model decisions can nevertheless be guaranteed. Thus, in this post, I explain the most important ideas around fairness in machine learning (ML). This includes a short summary of the main metrics to measure the fairness of your model decisions and an overview of tools that can help you guarantee or improve your model's fairness.

What is Kernel in Machine Learning?


Serdar Palaoglu


In the realm of machine learning, kernels hold a pivotal role, especially in algorithms designed for classification and regression tasks like Support Vector Machines (SVMs). The kernel function is the heart of these algorithms, adept at simplifying the complexity inherent in data. It transforms non-linear relationships into a linear format, making them accessible for algorithms that traditionally only handle linear data. This transformation is important for allowing SVMs to unravel and make sense of complex patterns and relationships. Kernels achieve this without the computational intensity of mapping data to higher dimensions explicitly. Their efficiency and effectiveness in revealing hidden patterns make them a cornerstone in modern machine learning. As we explore kernels further, we uncover their significance in enhancing the performance and applicability of SVMs in diverse scenarios.

Ensembles in Machine Learning: Combining Multiple Models


Serdar Palaoglu


In the ever-evolving landscape of machine learning, the quest for improved predictive accuracy has led to the development of ensemble methods. These techniques harness the collective power of multiple models to achieve better performance than any single model could on its own. This article delves into ensemble learning, exploring how the combination of diverse algorithms can lead to more robust, generalizable, and accurate machine learning solutions. Serdar Palaoglu

Deep Learning vs Machine Learning: What is the difference?


Serdar Palaoglu


In the realm of artificial intelligence, two fundamental concepts, Machine Learning and Deep Learning, have emerged as key components in the advancement of computer-based learning systems. Machine Learning serves as a foundational principle where computers gain the ability to learn from data without explicit programming. Deep Learning, an evolution within the Machine Learning framework, utilizes artificial neural networks inspired by the human brain to achieve complex data analysis. This article delves into a comprehensive exploration of these domains, elucidating their differences, practical applications, and significance in artificial intelligence.

What is Meta-Learning? Benefits, Applications and Challenges


Jan Macdonald (PhD)


Data-driven algorithms, such as machine learning and particularly deep learning models, have achieved unprecedented successes in diverse application areas, ranging from computer vision to audio and signal processing to natural language processing. Most commonly, machines “learn” to solve a specific task in a supervised manner by observing a large amount of labeled example data. Think of an image classification model that learns to distinguish different animals by being presented with many example images of each different animal type. This differs significantly from the way we humans tend to learn: After having been exposed to recognizing different animals repeatedly throughout our life, we are able to learn the concept of a new type of animal after seeing only very few examples. Incorporating such “adaptive” learning strategies into the field of machine learning is at the core of meta-learning. This was already explored in the 1980s and 1990s, e.g., by Schmidhuber (Schmidhuber, 1987) and Bengio et al. (Bengio et al., 1991). Recently, with the rapid improvements in deep learning, the interest in neural network based meta-learning approaches has increased and a wide range of variants have been proposed and developed. We will take a more detailed look at a selection of them below.

Latest developments in the world of Natural Language Processing: A comparison of different language models


Justus Tschötsch


Natural language processing (NLP) is a rapidly evolving sub-field of artificial intelligence. With ever new developments and breakthroughs, language models are already able to understand and generate human-like language with impressive accuracy. To keep track and catch up, we will compare different language models and have a look at the latest advancements, opportunities, and challenges of natural language processing.

How ChatGPT is fine-tuned using Reinforcement Learning


Thanh Long Phan


At the end of 2022, OpenAI released ChatGPT (a Transformer-based language model) to the public. Although based on the already widely discussed GPT-3, it launched an unprecedented boom in generative AI. It is capable of generating human-like text and has a wide range of applications, including language translation, language modeling, and generating text for applications such as chatbots. Feel free to also read our introduction to LLMs . ChatGPT seems to be so powerful that many people consider it to be a substantial step towards artificial general intelligence. The main reason for the recent successes of language models such as ChatGPT lies in their size (in terms of trainable parameters). But making language models bigger does not inherently make them better at following a user's intent. A bigger model can also become more toxic and more likely to "hallucinate". To mitigate these issues and to more generally align models to user intentions, one option is to apply Reinforcement Learning. In this blog post, we will present an overview of the training process of ChatGPT, and have a closer look at the use of Reinforcement Learning in language modeling. Also interesting: Our aggregated collection of LLM content .

Early Classification of Crop Fields through Satellite Image Time Series


Tiago Sanona


In a fast paced and always changing global economy the ability to classify crop fields via remote sensing at the end of a growth cycle does not provide the much needed immediate insight required by decision makers. To address this problem we developed a model that allows continuous classification of crop fields at any point in time and improves predictions as more data becomes available. In practice, we developed a single model capable of delivering predictions about which crops are growing at any point in time based on satellite data. The data available at the time of inference could be a few images at the beginning of the year or a full time series of images from a complete growing cycle. This exceeds the capabilities of current deep learning solutions that either only offer predictions at the end of the growing cycle or have to use multiple models that are specialized to return results from pre-specified points in time. This article details the key changes we employed to the model described in a previous blog post “Classification of Crop fields through Satellite Image Time Series” that enlarges its functionality and performance. The results presented in this article are based on a research paper recently published by dida. For more detailed information about this topic and other experiments on this model please check out the original manuscript: “Early Crop Classification via Multi-Modal Satellite Data Fusion and Temporal Attention” .

Leveraging Machine Learning for Environmental Protection


Edit Szügyi


Machine Learning has been solving complex problems for decades. Just think about how Computer Vision methods can reliably predict life-threatening diseases, self-driving cars are on their way to revolutionize traffic safety, or automatic translation gives us the ability to talk to just about anyone on the planet. The power of Machine Learning has been embraced by many branches of industry and science. There are some areas however where the potential of Machine Learning is harder to see and also less utilized. One of these is environmental protection. Protecting the natural environment is one of the biggest challenges our generation is facing, with pressing issues such as climate change, plastic pollution or resource depletion. Let us now look at how Machine Learning has been and can be used as a tool in environmental protection.

Managing layered requirements with pip-tools


Augusto Stoffel (PhD)


When building Python applications for production, it's good practice to pin all dependency versions, a process also known as “freezing the requirements”. This makes the deployments reproducible and predictable. (For libraries and user applications, the needs are quite different; in this case, one should support a large range of versions for each dependency, in order to reduce the potential for conflicts.) In this post, we explain how to manage a layered requirements setup without forgoing the improved conflict resolution algorithm introduced recently in pip. We provide a Makefile that you can use right away in any of your projects!

Collaborative Filtering in Recommender Systems


Konrad Mundinger


In this blog post, I give an overview and provide some Python code for several collaborative filtering techniques. This is the second blog post in a series of articles about recommendation engines. Check out the first article if you want to get an overview of recommendation systems in general or need a refresher on the terminology. The Jupyter notebook I used for creating the plots will be made available soon. The techniques will be illustrated on the famous MovieLens-100K dataset. It contains 100.000 user-movie rating pairs from 943 users on 1682 movies. For most of the algorithms, I have used an existing implementation from the surprise library for Python. Even though it needs some getting used to, I think it is a nice library that you should check out if you are starting to play around with recommendation engines.

An Introduction to Metric Learning


William Clemens (PhD)


Probably the most common form of problem we tackle with machine learning is classification, that is taking new data points and putting them into one of a number of fixed sets or classes. But what if we don’t necessarily know all the classes when we train the model? A good example of this is face recognition where we want a system that can store faces and then identify if any new images it sees contain that face. Obviously, we can’t retrain the model every time we add someone new to the database so we need a better solution. One way to solve this problem is metric learning. In metric learning, our goal is to learn a metric or distance measure between different data points. If we train our model correctly then this distance measure will put examples of the same class close together and different classes further apart.

Recommendation systems - an overview


Konrad Mundinger


Recommendation systems are everywhere. We use them to buy clothes, find restaurants and choose which TV show to watch. In this blog post, I will give an overview of the underlying basic concepts, common use cases and discuss some limitations. This is the first of a series of articles about recommendation engines. Stay tuned for the follow-ups, where we will explore some of the mentioned concepts in much more detail! Already in 2010, 60 % of watch time on Youtube came from recommendations [1] and personalized recommendations are said to increase conversion rates on e-commerce sites by up to 5 times [2]. It is safe to say that if customers are presented with a nice pre-selection of products they will be less overwhelmed, more likely to consume something and have an overall better experience on the website. But how do recommendation engines work? Let's dive right in.

The best (Python) tools for remote sensing


Emilius Richter


An estimated number of 906 Earth observation satellites are currently in orbit, providing science and industry with many terabytes of data every day. The satellites operate with both radar as well as optical sensors and cover different spectral ranges with varying spectral, spatial, and temporal resolutions. Due to this broad spectrum of geospatial data, it is possible to find new applications for remote sensing methods in many industrial and governmental institutions. On our website, you can find some projects in which we have successfully used satellite data and possible use cases of remote sensing methods for various industries . Well-known satellite systems and programs include Sentinel-1 (radar) and Sentinel-2 (optical) from ESA, Landsat (optical) from NASA, TerraSAR-X and TanDEM-X (both radar) from DLR, and PlanetScope (optical) from Planet. There are basically two types of geospatial data: raster data and vector data . Raster data Raster data are a grid of regularly spaced pixels, where each pixel is associated with a geographic location, and are represented as a matrix. The pixel values depend on the type of information that is stored, e.g., brightness values for digital images or temperature values for thermal images. The size of the pixels also determines the spatial resolution of the raster. Geospatial raster data are thus used to represent satellite imagery. Raster images usually contain several bands or channels, e.g. a red, green, and blue channel. In satellite data, there are also often infrared and/or ultraviolet bands. Vector data Vector data represent geographic features on the earth's surface, such as cities, country borders, roads, bodies of water, property rights, etc.. Such features are represented by one or more connected vertices, where a vertex defines a position in space by x-, y- and z-values. A single vertex is a point, multiple connected vertices are a line, and multiple (>3) connected and closed vertices are called polygons. The x-, y-, and z-values are always related to the corresponding coordinate reference system (CRS) that is stored in vector files as meta information. The most common file formats for vector data are GeoJSON, KML, and SHAPEFILE. In order to process and analyze these data, various tools are required. In the following, I will present the tools we at dida have had the best experience with and which are regularly used in our remote sensing projects. I present the tools one by one, grouped into the following sections: Requesting satellite data EOBrowser Sentinelsat Sentinelhub Processing raster data Rasterio Pyproj SNAP pyroSAR Rioxarray Processing vector data Shapely Python-geojson Geojson.io Geopandas Fiona Providing geospatial data QGIS GeoServer Leafmap Processing meteorological satellite data Wetterdienst Wradlib

Project proposals - the first step to a successful ML project


Emilius Richter


Many machine learning (ML) projects are doomed to fail. This can be due to various reasons and often they occur in combination. To avoid failure, all involved stakeholders need to understand the technical and organizational requirements of the project. Besides all preliminary discussions that define the project, it is important to summarize the project-relevant information in a comprehensive proposal. It should cover the technical and organizational requirements, possible problem areas and technical restrictions. In this article, I will describe the most important modules in machine learning project proposals. For a software provider like dida, the project proposal is the first step towards meeting the needs of the customer.

Image Captioning with Attention


Madina Kasymova


One sees an image and easily tells what is happening in it because it is humans’ basic ability to grasp and describe details about an image by just having a glance. Can machines recognize different objects and their relationships in an image and describe them in a natural language just like humans do? This is the problem image captioning tries to solve. Image captioning is all about describing images in natural language (such as English), combining two core topics of artificial intelligence: computer vision and natural language processing . Image captioning is an incredible application of deep learning that evolved considerably in recent years. This article will provide a high-level overview of image captioning architecture and explore the attention mechanism – the most common approach proposed to solve this problem. The most recent image captioning works have shown benefits in using a transformer-based approach, which is based solely on attention and learns the relationships between elements of a sequence without using recurrent or convolutional layers. We will not be considering transformer-based architectures here, instead we will focus only on the attention-based approach.

AI Index Report 2022: key findings about the status quo of AI


David Berscheid


The AI Index Report tracks and collects data regarding the worldwide development of artificial intelligence (AI). This years fifth edition, by the independent initiative at the Stanford Institute for Human-Centered Artificial Intelligence (HAI), is again aimed at informing relevant stakeholders like policy makers, researcher or related industries about the enormous advances of AI, the technological and societal stages of most prominent AI disciplines, as well as creating awareness for arising problems. In this article, we will discuss a selection of the report’s machine learning (ML)-related key messages as well as respectively add dida’s perspective to the following topics: Research and Development Technical Performance Technical AI Ethics The Economy and Education AI Policy and Governance For the full report please visit the original source here .

Data Privacy: Machine Learning and the GDPR


Ana Guerra


Datasets are essential for the research and development of models in the fields of Natural Language Processing (NLP) and Machine Learning (ML). However, while the use, collection, and storage of data increases, concerns about data privacy intensify as well. To be in line with best practices, it is relevant to understand what data privacy means and how it is regulated. This post will therefore offer a brief overview of how data privacy is regulated within the European Union. Besides following EU regulation, data driven projects have also to be ethically responsible. In consequence, this article ends with some words about ethics while processing personal data.

How to implement a labeling tool for image classification in a Jupyter notebook


Felix Brunner


'Hotdog' or 'not hotdog'? That could be the question — at least when performing an image classification task. To be able to address this or a similarly important question by means of a machine learning model, we first need to come up with a labeled dataset for training. That is, we sometimes have to manually look at hundreds or even thousands of images that do or do not contain hotdogs, and decide if they do. One way to do that would be to open up one image at a time and keep track of image classes in another file, e.g., a spreadsheet. However, such a heavy-handed approach sounds rather tedious and is likely prone to fat-fingering errors. Wouldn't it be great if there was a streamlined solution that makes this labeling process more efficient, even fun? That is exactly right and also what we set out to do in this article: Create a simple annotation tool to easily assign class labels to a set of images.

Ethics in Natural Language Processing


Marty Oelschläger (PhD)


AI and machine learning have entered significantly into our day-to-day lives. For example, we use search queries and are startled or even angered if the algorithm did not understand what we were actually looking for. Just image what an effort it would be to process all those queries by human individuals. In case you can't imagine, CollegeHumor already prepared a vision of that: Fortunately, we taught machines --- at least to some degree --- to "understand" human language. This branch of machine learning is called natural language processing (NLP). We already gave an introduction , if you want to review the basics. However, since search engines, chat bots, and other NLP algorithms are not humans, we can employ them on large scale, i.e. on global scale. Since there are ubiquitous and used by very different people in various contexts, we want them to be objective and neutral (and not to be an annoyed and skeptical man as in the video above). But what if they are not the neutral number crunchers? What if they are subjective and even carry harmful stereotypes against specific groups?

GPT-3 and beyond - Part 2: Shortcomings and remedies


Fabian Gringel


In the first part of this article I have described the basic idea behind GPT-3 and given some examples of what it is good at. This second and final part is dedicated to the “beyond” in the title. Here you will learn in which situations GPT-3 fails and why it is far from having proper natural language understanding, which approaches can help to mitigate the issues and might lead to the next breakthrough, what alternatives to GPT-3 there are already, and, in case you are wondering, what's the connection between GPT-3 and an octopus. Update February 14th '22: I have also included a section about OpenAI's new InstructGPT.

Data-centric Machine Learning: Making customized ML solutions production-ready


David Berscheid


By 2021, there is little doubt that Machine Learning (ML) brings great potential to today’s world. In a study by Bitkom , 30% of companies in Germany state that they have planned or least discussed attempts to leverage the value of ML. But while the companies’ willingness to invest in ML is rising, Accenture estimates that 80% – 85% of these projects remain a proof of concept and are not brought into production. Therefore at dida, we made it our core mission to bridge that gap between proof of concept and production software, which we achieve by applying data-centric techniques, among other things. In this article, we will see why many ML Projects do not make it into production, introduce the concepts of model- and data-centric ML, and give examples how we at dida improve projects by applying data-centric techniques.

GPT-3 and beyond - Part 1: The basic recipe


Fabian Gringel


GPT-3 is a neural network capable of solving a wide range of natural language processing (NLP) tasks, which has been presented by OpenAI in summer 2020 (upscaling the previous models GPT and GPT-2). For various tasks it has set new state-of-the-art performances and is considered by many as a substantial step into the direction of artificial general intelligence. “General intelligence” refers to the capability of not only behaving intelligently with respect to one set task, but also being able to adapt to and accomplish new, unforeseen tasks. This blog article is the first of a two-article-series on GPT-3. In this first article I will explain how GPT-3 works, what it is good at and why some people think it’s dangerous, and how you can try out a GPT-3-like model for free. The second part will deal with GPT-3’s weaknesses and where to expect the next breakthrough in the future.

Classification of Crop Fields through Satellite Image Time Series


Tiago Sanona


The field of remote sensing has been benefiting from the advancements made in Machine Learning (ML). In this article we explore a state of the art model architecture, the Transformer , initially developed for Natural Language Processing (NLP) problems, which is now widely used with many forms of sequential data. Following the paper by Garnot et al. , we utilize an altered version of this architecture to classify crop fields from time series of satellite images . With this, we achieve better results than traditional methods (e. g. random forests) and with less resources than recurrent networks.

Extracting information from technical drawings


Frank Weilandt (PhD)


Did you ever need to combine data about an object from two different sources, say, images and text? We are often facing such challenges during our work at dida. Here we present an example from the realm of technical drawings. Such drawings are used in many fields for specialists to share information. They consist of drawings that follow very specific guidelines so that every specialist can understand what is depicted on them. Normally, technical drawings are given in formats that allow indexing, such as svg, html, dwg, dwf, etc. but many, especially older ones, only exist in image format (jpeg, png, bmp, etc.), for example from book scans. This kind of drawings is hard to access automatically which makes its use hard and time consuming. In this regard, automatic detection tools could be used to facilitate the search. In this blogpost, we will demonstrate how both traditional and deep-learning based computer vision techniques can be applied for information extraction from exploded-view drawings. We assume that such a drawing is given together with some textual information for each object on the drawing. The objects can be identified by numbers connected to them. Here is a rather simple example of such a drawing: An electric drill machine. There are three key components on each drawing: The numbers, the objects and the auxiliary lines. The auxiliary lines are used to connect the objects to the numbers. The task at hand will be to find all objects of a certain kind / class over a large number of drawings , e.g. the socket with number 653 in the image above appears in several drawings and even in drawings from other manufacturers. This is a typical classification task, but with a caveat: Since there is additional information for each object accessible through the numbers, we need to assign each number on the image to the corresponding object first. Next we describe this auxiliary task can be solved by using traditional computer vision techniques.

Visual Transformers: How an architecture designed for NLP enters the field of Computer Vision


Konrad Mundinger


Since its first introduction in late 2017, the Transformer has quickly become the state of the art architecture in the field of natural language processing (NLP). Recently, researchers started to apply the underlying ideas to the field of computer vision and the results suggest that the resulting Visual Transformers are outperforming their CNN-based predecessors in terms of both speed and accuracy. In this blogpost, we will have a closer look at how to apply transformers to computer vision tasks and what it means to tokenize an image.

CLIP: Mining the treasure trove of unlabeled image data


Fabian Gringel


Digitization and the internet in particular have not only provided us with a seemingly inexhaustible source of textual data, but also of images. In the case of texts, this treasure has been lifted in the form of task-agnostic pretraining by language models such as BERT or GPT-3. Contrastive Language-Image Pretraining (short: CLIP) now does a similar thing with images, or rather: the combination of images and texts. In this blog article I will give a rough non-technical outline of how CLIP works, and I will also show how you can try CLIP out yourself! If you are more technically minded and care about the details, then I recommend reading the original publication , which I think is well written and comprehensible.

21 questions we ask our clients: Starting a successful ML project


Emilius Richter


Automating processes using machine learning (ML) algorithms can increase the efficiency of a system beyond human capacity and thus becomes more and more popular in many industries. But between an idea and a well-defined project there are several points that need to be considered in order to properly assess the economic potential and technical complexity of the project. Especially for companies like dida that offer custom workflow automation software, a well-prepared project helps to quickly assess the feasibility and the overall technical complexity of the project goals -which, in turn, makes it possible to deliver software that fulfills the client's requirements. In this article, we discuss which topics should be considered in advance and why the questions we ask are important to start a successful ML software project.

Enhancing Search with Question Answering


Angela Maennel


What is called open-domain question answering in machine learning papers is nothing else than answering a question based on a large collection of texts, such as answering the question of a visitor of a large website, using the website's content. Due to recent progress in machine reading comprehension, open-domain question answering systems have drastically improved. They used to rely on redundancy of information but now they are able to “read” more carefully. Modern systems are able to quote a section of text that answers the question or even reformulate it. What is still an aspiration is to generate longer, paragraph-length answers or to use multiple sources to puzzle together an answer. Google recently implemented such a feature into their search engine. If they find a passage that answers the question typed into the search field, the first result shows the corresponding website with the passage highlighted. There are many different systems that tackle open-domain question answering, here I will go into detail on one system in particular, DrQA (by Chen et al. 2017 ). This particular system splits the task into two parts for each of which it is easier to get data than for the combined task. I will also explain how this idea can be used to create a question answering system for a website from an already existing search function.

The best image labeling tools for Computer Vision


Dmitrii Iakushechkin


Creating a high quality data set is a crucial part of any machine learning project . In practice, this often takes longer than the actual training and hyperparameter optimization. Thus choosing an appropriate tool for labeling is essential. Here we will have a closer look at some of the best image labeling tools for Computer Vision tasks: labelme labelImg CVAT hasty.ai Labelbox We will install and configure the tools and illustrate their capabilities by applying them to label real images for an object detection task. We will proceed by looking at the above tools one by one. Our collection of computer vision content also clearly shows how central the use of such labeling tools is for us as machine learning specialists.

Using satellite imagery for greenfield exploration


Fabian Dechent


Unsurprisingly, a major requirement that makes mining endeavours successful is the right location - one where the enterprise knows with confidence that the soil bears high grade minerals of interest. Finding such a site, however, poses a significant challenge. Conventionally, when mining enterprises pursue greenfield exploration, field studies and drillings are conducted. As these are very expensive, they should only serve as a last assurance after potentially interesting regions have been identified. This is where Remote Sensing comes into play. In this article, we will have a look at the possibilities that spaceborne imaging provides for greenfield exploration. Let’s have a satellite scout promising spots.

Understanding graph neural networks by way of convolutional nets


Augusto Stoffel (PhD)


In this article, we will introduce the basic ideas behind graph neural networks (GNNs) through an analogy with convolutional neural networks (CNNs), which are very well known due to their prevalence in the field of computer vision. In fact, we'll see that convolutional nets are an example of GNNs, albeit one where the underlying graph is very simple, perhaps even boring. Once we see how to think of a convolutional net through this lens, it won't be hard to replace that boring graph with more interesting ones, and we'll arrive naturally at the general concept of GNN. After that, we will survey some applications of GNNs, including our use here at dida. But let's start with the basics.

Understanding and converting MGRS coordinates in Python


Tiago Sanona


Working with satellite data , one needs to understand and possibly convert the coordinates the data is given in. Sometimes, especially if released by official bodies, satellite data is provided in MGRS tiles , which are derived from the UTM coordinate system. For example, this is true for Sentinel-2 tiles. I want to answer the following three questions in this post, using the Python libraries mgrs and pyproj : What is the difference between MGRS and UTM? To which MGRS tile does a certain point referenced in latitude and longitude degrees belong to? How can I express a MGRS tile in Lat/Lon coordinates? Before we answer these questions, let's first look into what MGRS is.

Monitoring urban development from space


Johan Dettmar


Urbanisation on a global scale is happening at an ever increasing rate. In the year 2008, more than 50% of the worlds population lived in cities and it is predicted that by 2050 about 64% of the developing world and 86% of the developed world will be urbanised. This trend puts significant stress on infrastructure planning. Providing everything from sanitation, water systems and transportation to adequate housing for more than 1.1 billion new urbanites over the next 10 years will be an extraordinary challenge. In a research project for the European Space Agency's program "AI for social impact", dida assessed the use of state-of-the-art computer vision methods for monitoring urban development over time of three rapidly growing cities in west Africa: Lagos, Accra and Luanda. The population of these cities are expected to grow by 30-55% in size by the end of 2030 which means that in-situ data collection about how these cities develop is almost impossible given the available resources. Instead, we came up with a concept that would rely solely on satellite images and machine learning.

How to identify duplicate files with Python


Ewelina Fiebig


Suppose you are working on an NLP project. Your input data are probably files like PDF, JPG, XML, TXT or similar and there are a lot of them. It is not unusual that in large data sets some documents with different names have exactly the same content, i.e. they are duplicates. There can be various reasons for this. Probably the most common one is improper storage and archiving of the documents. Regardless of the cause, it is important to find the duplicates and remove them from the data set before you start labeling the documents. In this blog post I will briefly demonstrate how the contents of different files can be compared using the Python module filecmp . After the duplicates have been identified, I will show how they can be deleted automatically.

Detecting illegal mines from space


Matthias Werner


Throughout the globe, rain forests and other natural landscapes are endangered by illegal mining, which transforms areas formerly rich in flora and fauna into wasteland. In order for local governments to take countermeasures, they first need to know about the locations of illegal mines. In countries covered by vast areas of impenetrable rain forest, such as Brazil or Congo, obtaining this information is a difficult problem. In this blog post I describe an approach to detect illegal mines based on deep learning and remote sensing, that we have developed to support the conservation efforts of governments and NGOs. In particular, we use a U-Net for semantic segmentation , a branch of computer vision. As part of the project of automatic detection of illegal mines , we were also joined by scientists from the Institute of Mineral Resources Engineering of the RWTH Aachen University, who contributed their mining-specific expertise. The project was funded by the European Space Agency .

How to extract text from PDF files


Lovis Schmidt


In NLP projects the input documents often come as PDFs. Sometimes the PDFs already contain underlying text information, which makes it possible to extract text without the use of OCR tools. In the following I want to present some open-source PDF tools available in Python that can be used to extract text. I will compare their features and point out some drawbacks. Those tools are PyPDF2 , pdfminer and PyMuPDF . There are other Python PDF libraries which are either not able to extract text or focused on other tasks. Furthermore, there are tools that are able to extract text from PDF documents, but which are not available in Python. Both will not be discussed here. You might also want to read about past dida projects where we developed an information extraction with AI for product descriptions, an information extraction from customer requests or an information extraxction from PDF invoices .

What is Reinforcement Learning? (Part 2)


Matthias Werner


In the previous post we introduced the basics of reinforcement learning (RL) and the type of problem it can be applied to. The discussed setting was limited in the sense that we were dealing with a single agent acting in a stationary environment. Now we will take it one step further and discuss Multi-Agent Reinforcement Learning ( MARL ). Here we deal with multiple explicitly modeled agents in the same environment, hence every agent is part of the environment as it is perceived by all others. Since all agents learn over time and start to behave differently, the assumption of a stationary environment is violated.

BERT for question answering (Part 1)


Mattes Mollenhauer (PhD)


In this article, we are going to have a closer look at BERT - a state-of-the-art model for a range of various problems in natural language processing. BERT was developed by Google and published in 2018 and is for example used as a part of Googles search engine . The term BERT is an acronym for the term Bidirectional Encoder Representations from Transformers , which may seem quiet cryptic at first. The article is split up into two parts: In the first part we are going to see how BERT works and in the second part we will have a look at some of its practical applications - in particular, we are going to examine the problem of automated question answering .

What is Reinforcement Learning? (Part 1)


Matthias Werner


Machine Learning concerns itself with solving complicated tasks by having a software learn the rules of a process from data. One can try to discover structure in an unknown data set (unsupervised learning) or one can try to learn a mathematical function between related quantities (supervised learning). But what if you wanted the algorithm to learn to react to its environment and to behave in a particular way? No worries, machine learning’s got you covered! This branch of Machine Learning (ML) is called Reinforcement Learning (RL). In this post we will give a quick introduction to the general framework and look at a few basic solution attempts in more detail. Finally, we will give a visual example of RL at work and discuss further approaches. In the second part of the blog post we will discuss Multi-Agent Reinforcement Learning (MARL).

Can we do without labeled data? (Un)supervised ML


Lorenzo Melchior


It seems to be a common mistake to believe that machine learning is usually an unsupervised task : you have data (without pre-existing labels) that you train e.g. a neural network on for tasks like classification or image segmentation. The truth is that most models in machine learning are supervised , that is, they rely on labeled training data . But labeling often takes a lot of time and can be very tedious. In this blog post I want to find out if I am able to perform the same classification task once with labels, once without. For this task I will use the famous MNIST data set , which contains 60,000 training and 10,000 validation images of handwritten digits, all of them labeled. Every image consists of 28x28 greyscale pixels and contains only one digit, located in the center of the image. To make things easier, I use the CSV version of the data set.

The best free labeling tools for text annotation in NLP


Fabian Gringel


In this blog post I'm going to present the three best free text annotation tools for manually labeling documents in NLP ( Natural Language Processing ) projects. You will learn how to install, configure and use them and find out which one of them suits your purposes best. The tools I'm going to present are brat , doccano , INCEpTION . The selection is based on this comprehensive scientific review article and our hands-on experience of dida's NLP projects . I will discuss the tools one by one. For each of them, I will first give a general overview about what the tool is suited for, and then provide details (or links) regarding installation, configuration and usage. You might also find it interesting to check out our NLP content collection .

How to recognise objects in videos with PyTorch


William Clemens (PhD)


Self-driving cars still have difficulties in detecting objects in front of them with sufficient reliability. In general, though, the performance of state-of-the-art object detection models is already very impressive - and they are not too difficult to apply. Here I will walk you through streaming a YouTube video into Python and then applying a pre-trained PyTorch model to it in order to detect objects. We'll be applying a model pre-trained on the object detection dataset COCO . (In reality, the model would of course be fine tuned to the task at hand.)

Digital public administration: intuitive online access through AI


Jona Welsch


The following article describes how AI can help to establish digital public administration services. To begin with, a fundamental problem is described that AI can solve at this point: Authorities often speak a language that is very different from the colloquial language. Using the example of business registrations and the AI model "BERT", a possible solution is explained and ideas for further areas of application are shown.

What is Bayesian Linear Regression? (Part 1)


Matthias Werner


Bayesian regression methods are very powerful, as they not only provide us with point estimates of regression parameters, but rather deliver an entire distribution over these parameters. This can be understood as not only learning one model, but an entire family of models and giving them different weights according to their likelihood of being correct. As this weight distribution depends on the observed data, Bayesian methods can give us an uncertainty quantification of our predictions representing what the model was able to learn from the data. The uncertainty measure could be e.g. the standard deviation of the predictions of all the models, something that point estimators will not provide by default. Knowing what the model doesn't know helps to make AI more explainable. To clarify the basic idea of Bayesian regression, we will stick to discussing Bayesian Linear Regression (BLR). BLR is the Bayesian approach to linear regression analysis. We will start with an example to motivate the method. To make things clearer, we will then introduce a couple of non-Bayesian methods that the reader might already be familiar with and discuss how they relate to Bayesian regression. In the following I assume that you have elementary knowledge of linear algebra and stochastics. Let's get started!

Beat Tracking with Deep Neural Networks


Julius Richter


This is the last post in the three part series covering machine learning approaches for time series and sequence modeling. In the first post , the basic principles and techniques for serial sequences in artificial neural networks were shown. The second post introduced a recent convolutional approach for time series called temporal convolutional network (TCN), which shows great performance on sequence-to-sequence tasks ( Bai, 2018 ). In this post, however, I will talk about a real world application which employs a machine learning model for time series analysis. To this end, I will present a beat tracking algorithm, which is a computational method for extracting the beat positions from audio signals. The presented beat tracking system ( Davies, 2019 ) is based on the TCN architecture which captures the sequential structure of audio input.

Comparison of OCR tools: how to choose the best tool for your project


Fabian Gringel


Optical character recognition (short: OCR) is the task of automatically extracting text from images (coming as typical image formats such as PNG or JPG, but possibly also as a PDF file). Nowadays, there are a variety of OCR software tools and services for text recognition which are easy to use and make this task a no-brainer. In this blog post, I will compare four of the most popular tools: Tesseract OCR ABBYY FineReader Google Cloud Vision Amazon Textract I will show how to use them and assess their strengths and weaknesses based on their performance on a number of tasks. After reading this article you will be able to choose and apply an OCR tool suiting the needs of your project. Note that we restrict our focus on OCR for document images only, as opposed to any images containing text incidentally. Now let’s have a look at the document images we will use to assess the OCR engines.

Temporal convolutional networks for sequence modeling


Julius Richter


This blog post is the second in a three part series covering machine learning approaches for time series. In the first post , I talked about how to deal with serial sequences in artificial neural networks. In particular, recurrent models such as the LSTM were presented as an approach to process temporal data in order to analyze or predict future events. In this post, however, I will present a simple but powerful convolutional approach for sequences which is called Temporal Convolutional Network (TCN). The network architecture was proposed in ( Bai, 2018 ) and shows great performance on sequence-to-sequence tasks like machine translation or speech synthesis in text-to-speech (TTS) systems. Before I describe the architectural elements in detail, I will give a short introduction about sequence-to-sequence learning and the background of TCNs.

Machine Learning Approaches for Time Series


Julius Richter


This post is the first part of a series of posts that are linked together as they all deal with the topic of time series and sequence modeling, respectively. In order to give a comprehensive piece of content easy to grasp, the series is segmented into three parts: How to deal with time series and serial sequences? A recurrent approach. Temporal Convolutional Networks (TCNs) for sequence modeling. Beat tracking in audio files as an application of sequence modeling.

How to distribute a Tensorflow model as a JavaScript web app


Johan Dettmar


Anyone wanting to train a Machine Learning (ML) model these days has a plethora of Python frameworks to choose from. However, when it comes to distributing your trained model to something other than a Python environment, the number of options quickly drops. Luckily there is Tensorflow.js , a JavaScript (JS) subset of the popular Python framework with the same name. By converting a model such that it can be loaded by the JS framework, the inference can be done effectively in a web browser or a mobile app. The goal of this article is to show how to train a model in Python and then deploy it as a JS app which can be distributed online.

Detecting clouds in satellite images using convolutional neural networksd


William Clemens (PhD)


Here I’m going to walk through how we approached the problem of detecting convective clouds in satellite data including what we are looking for (and why!) and the machine learning approach we used. This post will consist of four sections: First we will introduce convective clouds and give a brief overview of the problem. In section 2 we will discuss the satellite data we are working with. In section 3 we discuss how we go about manually labelling the data, which is a particularly difficult task requiring the use of some external data. Finally, in section 4 we will give a brief overview of the neural network architecture that we use, the U-Net, and how we go about training it. You can also have a look at my talk at 2020's Applied Machine Learning Days in Lausanne, Switzerland:

How Google Cloud facilitates Machine Learning projects


Johan Dettmar


Since not only the complexity of Machine Learning (ML) models but also the size of data sets continue to grow, so does the need for computer power. While most laptops today can handle a significant workload, the performance is often simply not enough for our purposes at dida. In the following article, we walk you through some of the most common bottlenecks and show how cloud services can help to speed things up.

Data Augmentation with GANs for Defect Detection


Lorenzo Melchior


In Machine Learning, an insufficient amount of training data often hinders the performance of classification algorithms. Experience shows that shortage of training data is rather the rule than the exception, which is why people have come up with clever data augmentation methods. In this blog post I demonstrate how you can create new images of a distribution of images with a Generative Adversarial Network ( GAN ). This can be applied as a data augmentation method for problems such as defect detection in industrial production.

Pattern Recognition in Medical Imaging


Matthias Werner


Artificial intelligence (AI) and in particular computer vision promise to be valuable aids for diagnosing diseases based on medical imaging techniques . For humans, it takes years of academic and on-the-job training to e.g. perform medical diagnosis from X-ray images. As we will see, it is also quite a challenge for intelligent algorithms. At this year's KIS-RIS-PACS and DICOM convention organized by the Department of Medicine at the University of Mainz, Germany, researchers from radiology and adjacent fields gathered to discuss the state-of-the-art of AI in their field. Philipp Jackmuth from dida was the speaker of choice for this topic and here we will discuss key points of his talk.

What is Natural Language Processing (NLP)?


Fabian Gringel


Natural Language Processing (short: NLP , sometimes also called Computational Linguistics ) is one of the fields which has undergone a revolution since methods from Machine Learning (ML) have been applied to it. In this blog post I will explain what NLP is about and show how Machine Learning comes into play. In the end you will have learned which problems NLP deals with, what kinds of methods it uses and how Machine Learning models can be adapted to the specific structure of natural language data.

Semantic segmentation of satellite images


Nelson Martins (PhD)


This post presents some key learnings from our project on identifying roofs on satellite images . Our aim was to develop a planing tool for the placement of solar panels on roofs. For this purpose we set up a machine learning model that accurately partitions those images into different types of roof parts and background. We learned that the UNet model with dice loss enforced with a pixel weighting strategy outperforms cross entropy based loss functions by a significant margin in semantic segmentation of satellite images. The following idealized pipeline illustrates the functionality of the planning tool:

Extracting information from documents


Frank Weilandt (PhD)


There is a growing demand for automatically processing letters and other documents. Powered by machine learning, modern OCR (optical character recognition) methods can digitize the text. But the next step consists of interpreting it. This requires approaches from fields such as information extraction and NLP (natural language processing) . Here we go through some heuristics how to read the date of a letter automatically using the Python OCR tool pytesseract . Hopefully, you can adapt some ideas to your own project.