Computer Vision


We develop custom computer vision solutions, actively participate in computer vision research and regularly publish technical content about related aspects.

Talk with Sr. Machine Learning Scientist Mark Bugden (PhD) about your computer vision projects.

  • 30min
  • |
  • online

Learn more ->

What is a Convolutional Neural Network? - Explained


Mark Bugden (PhD)


Neural Networks, particularly Convolutional Neural Networks (CNNs), have surged in popularity over the past few years. They are ubiquitous in many image recognition and processing tasks and have also found applications in several areas not based on image analysis. In this article, we will give an introduction to CNNs, and answer some of those questions you have been too embarrassed to ask your IT department. Note: If you are interested in a 30min conversation with our dedicated Computer Vision contact person regarding the topics of CNNs and Computer Vision, please take a look at our free Computer Vision Talk offering.

Early Classification of Crop Fields through Satellite Image Time Series


Tiago Sanona


In a fast paced and always changing global economy the ability to classify crop fields via remote sensing at the end of a growth cycle does not provide the much needed immediate insight required by decision makers. To address this problem we developed a model that allows continuous classification of crop fields at any point in time and improves predictions as more data becomes available. In practice, we developed a single model capable of delivering predictions about which crops are growing at any point in time based on satellite data. The data available at the time of inference could be a few images at the beginning of the year or a full time series of images from a complete growing cycle. This exceeds the capabilities of current deep learning solutions that either only offer predictions at the end of the growing cycle or have to use multiple models that are specialized to return results from pre-specified points in time. This article details the key changes we employed to the model described in a previous blog post “Classification of Crop fields through Satellite Image Time Series” that enlarges its functionality and performance. The results presented in this article are based on a research paper recently published by dida. For more detailed information about this topic and other experiments on this model please check out the original manuscript: “Early Crop Classification via Multi-Modal Satellite Data Fusion and Temporal Attention” . Note: If you are interested in a 30min conversation with our dedicated contact person for Remote Sensing and Earth Observation, please take a look at our free Earth Observation Talk offering.

Leveraging Machine Learning for Environmental Protection


Edit Szügyi


Machine Learning has been solving complex problems for decades. Just think about how Computer Vision methods can reliably predict life-threatening diseases, self-driving cars are on their way to revolutionize traffic safety, or automatic translation gives us the ability to talk to just about anyone on the planet. The power of Machine Learning has been embraced by many branches of industry and science. There are some areas however where the potential of Machine Learning is harder to see and also less utilized. One of these is environmental protection. Protecting the natural environment is one of the biggest challenges our generation is facing, with pressing issues such as climate change, plastic pollution or resource depletion. Let us now look at how Machine Learning has been and can be used as a tool in environmental protection.

The best (Python) tools for remote sensing


Emilius Richter


An estimated number of 906 Earth observation satellites are currently in orbit, providing science and industry with many terabytes of data every day. The satellites operate with both radar as well as optical sensors and cover different spectral ranges with varying spectral, spatial, and temporal resolutions. Due to this broad spectrum of geospatial data, it is possible to find new applications for remote sensing methods in many industrial and governmental institutions. On our website, you can find some projects in which we have successfully used satellite data and possible use cases of remote sensing methods for various industries . Well-known satellite systems and programs include Sentinel-1 (radar) and Sentinel-2 (optical) from ESA, Landsat (optical) from NASA, TerraSAR-X and TanDEM-X (both radar) from DLR, and PlanetScope (optical) from Planet. There are basically two types of geospatial data: raster data and vector data . Raster data Raster data are a grid of regularly spaced pixels, where each pixel is associated with a geographic location, and are represented as a matrix. The pixel values depend on the type of information that is stored, e.g., brightness values for digital images or temperature values for thermal images. The size of the pixels also determines the spatial resolution of the raster. Geospatial raster data are thus used to represent satellite imagery. Raster images usually contain several bands or channels, e.g. a red, green, and blue channel. In satellite data, there are also often infrared and/or ultraviolet bands. Vector data Vector data represent geographic features on the earth's surface, such as cities, country borders, roads, bodies of water, property rights, etc.. Such features are represented by one or more connected vertices, where a vertex defines a position in space by x-, y- and z-values. A single vertex is a point, multiple connected vertices are a line, and multiple (>3) connected and closed vertices are called polygons. The x-, y-, and z-values are always related to the corresponding coordinate reference system (CRS) that is stored in vector files as meta information. The most common file formats for vector data are GeoJSON, KML, and SHAPEFILE. In order to process and analyze these data, various tools are required. In the following, I will present the tools we at dida have had the best experience with and which are regularly used in our remote sensing projects. I present the tools one by one, grouped into the following sections: Requesting satellite data EOBrowser Sentinelsat Sentinelhub Processing raster data Rasterio Pyproj SNAP pyroSAR Rioxarray Processing vector data Shapely Python-geojson Geojson.io Geopandas Fiona Providing geospatial data QGIS GeoServer Leafmap Processing meteorological satellite data Wetterdienst Wradlib

Image Captioning with Attention


Madina Kasymova


One sees an image and easily tells what is happening in it because it is humans’ basic ability to grasp and describe details about an image by just having a glance. Can machines recognize different objects and their relationships in an image and describe them in a natural language just like humans do? This is the problem image captioning tries to solve. Image captioning is all about describing images in natural language (such as English), combining two core topics of artificial intelligence: computer vision and natural language processing . Image captioning is an incredible application of deep learning that evolved considerably in recent years. This article will provide a high-level overview of image captioning architecture and explore the attention mechanism – the most common approach proposed to solve this problem. The most recent image captioning works have shown benefits in using a transformer-based approach, which is based solely on attention and learns the relationships between elements of a sequence without using recurrent or convolutional layers. We will not be considering transformer-based architectures here, instead we will focus only on the attention-based approach.

How to implement a labeling tool for image classification in a Jupyter notebook


Felix Brunner


'Hotdog' or 'not hotdog'? That could be the question — at least when performing an image classification task. To be able to address this or a similarly important question by means of a machine learning model, we first need to come up with a labeled dataset for training. That is, we sometimes have to manually look at hundreds or even thousands of images that do or do not contain hotdogs, and decide if they do. One way to do that would be to open up one image at a time and keep track of image classes in another file, e.g., a spreadsheet. However, such a heavy-handed approach sounds rather tedious and is likely prone to fat-fingering errors. Wouldn't it be great if there was a streamlined solution that makes this labeling process more efficient, even fun? That is exactly right and also what we set out to do in this article: Create a simple annotation tool to easily assign class labels to a set of images.

Extracting information from technical drawings


Frank Weilandt (PhD)


Did you ever need to combine data about an object from two different sources, say, images and text? We are often facing such challenges during our work at dida. Here we present an example from the realm of technical drawings. Such drawings are used in many fields for specialists to share information. They consist of drawings that follow very specific guidelines so that every specialist can understand what is depicted on them. Normally, technical drawings are given in formats that allow indexing, such as svg, html, dwg, dwf, etc. but many, especially older ones, only exist in image format (jpeg, png, bmp, etc.), for example from book scans. This kind of drawings is hard to access automatically which makes its use hard and time consuming. In this regard, automatic detection tools could be used to facilitate the search. In this blogpost, we will demonstrate how both traditional and deep-learning based computer vision techniques can be applied for information extraction from exploded-view drawings. We assume that such a drawing is given together with some textual information for each object on the drawing. The objects can be identified by numbers connected to them. Here is a rather simple example of such a drawing: An electric drill machine. There are three key components on each drawing: The numbers, the objects and the auxiliary lines. The auxiliary lines are used to connect the objects to the numbers. The task at hand will be to find all objects of a certain kind / class over a large number of drawings , e.g. the socket with number 653 in the image above appears in several drawings and even in drawings from other manufacturers. This is a typical classification task, but with a caveat: Since there is additional information for each object accessible through the numbers, we need to assign each number on the image to the corresponding object first. Next we describe this auxiliary task can be solved by using traditional computer vision techniques.

Visual Transformers: How an architecture designed for NLP enters the field of Computer Vision


Konrad Mundinger


Since its first introduction in late 2017, the Transformer has quickly become the state of the art architecture in the field of natural language processing (NLP). Recently, researchers started to apply the underlying ideas to the field of computer vision and the results suggest that the resulting Visual Transformers are outperforming their CNN-based predecessors in terms of both speed and accuracy. In this blogpost, we will have a closer look at how to apply transformers to computer vision tasks and what it means to tokenize an image.

CLIP: Mining the treasure trove of unlabeled image data


Fabian Gringel


Digitization and the internet in particular have not only provided us with a seemingly inexhaustible source of textual data, but also of images. In the case of texts, this treasure has been lifted in the form of task-agnostic pretraining by language models such as BERT or GPT-3. Contrastive Language-Image Pretraining (short: CLIP) now does a similar thing with images, or rather: the combination of images and texts. In this blog article I will give a rough non-technical outline of how CLIP works, and I will also show how you can try CLIP out yourself! If you are more technically minded and care about the details, then I recommend reading the original publication , which I think is well written and comprehensible.

The best image labeling tools for Computer Vision


Dmitrii Iakushechkin


Creating a high quality data set is a crucial part of any machine learning project . In practice, this often takes longer than the actual training and hyperparameter optimization. Thus choosing an appropriate tool for labeling is essential. Here we will have a closer look at some of the best image labeling tools for Computer Vision tasks: labelme labelImg CVAT hasty.ai Labelbox We will install and configure the tools and illustrate their capabilities by applying them to label real images for an object detection task. We will proceed by looking at the above tools one by one. Our collection of computer vision content also clearly shows how central the use of such labeling tools is for us as machine learning specialists.

Using satellite imagery for greenfield exploration


Fabian Dechent


Unsurprisingly, a major requirement that makes mining endeavours successful is the right location - one where the enterprise knows with confidence that the soil bears high grade minerals of interest. Finding such a site, however, poses a significant challenge. Conventionally, when mining enterprises pursue greenfield exploration, field studies and drillings are conducted. As these are very expensive, they should only serve as a last assurance after potentially interesting regions have been identified. This is where Remote Sensing comes into play. In this article, we will have a look at the possibilities that spaceborne imaging provides for greenfield exploration. Let’s have a satellite scout promising spots.

Understanding graph neural networks by way of convolutional nets


Augusto Stoffel (PhD)


In this article, we will introduce the basic ideas behind graph neural networks (GNNs) through an analogy with convolutional neural networks (CNNs), which are very well known due to their prevalence in the field of computer vision. In fact, we'll see that convolutional nets are an example of GNNs, albeit one where the underlying graph is very simple, perhaps even boring. Once we see how to think of a convolutional net through this lens, it won't be hard to replace that boring graph with more interesting ones, and we'll arrive naturally at the general concept of GNN. After that, we will survey some applications of GNNs, including our use here at dida. But let's start with the basics.

Understanding and converting MGRS coordinates in Python


Tiago Sanona


Working with satellite data , one needs to understand and possibly convert the coordinates the data is given in. Sometimes, especially if released by official bodies, satellite data is provided in MGRS tiles , which are derived from the UTM coordinate system. For example, this is true for Sentinel-2 tiles. I want to answer the following three questions in this post, using the Python libraries mgrs and pyproj : What is the difference between MGRS and UTM? To which MGRS tile does a certain point referenced in latitude and longitude degrees belong to? How can I express a MGRS tile in Lat/Lon coordinates? Before we answer these questions, let's first look into what MGRS is.

Monitoring urban development from space


Johan Dettmar


Urbanisation on a global scale is happening at an ever increasing rate. In the year 2008, more than 50% of the worlds population lived in cities and it is predicted that by 2050 about 64% of the developing world and 86% of the developed world will be urbanised. This trend puts significant stress on infrastructure planning. Providing everything from sanitation, water systems and transportation to adequate housing for more than 1.1 billion new urbanites over the next 10 years will be an extraordinary challenge. In a research project for the European Space Agency's program "AI for social impact", dida assessed the use of state-of-the-art computer vision methods for monitoring urban development over time of three rapidly growing cities in west Africa: Lagos, Accra and Luanda. The population of these cities are expected to grow by 30-55% in size by the end of 2030 which means that in-situ data collection about how these cities develop is almost impossible given the available resources. Instead, we came up with a concept that would rely solely on satellite images and machine learning.

Detecting illegal mines from space


Matthias Werner


Throughout the globe, rain forests and other natural landscapes are endangered by illegal mining, which transforms areas formerly rich in flora and fauna into wasteland. In order for local governments to take countermeasures, they first need to know about the locations of illegal mines. In countries covered by vast areas of impenetrable rain forest, such as Brazil or Congo, obtaining this information is a difficult problem. In this blog post I describe an approach to detect illegal mines based on deep learning and remote sensing, that we have developed to support the conservation efforts of governments and NGOs. In particular, we use a U-Net for semantic segmentation , a branch of computer vision. As part of the project of automatic detection of illegal mines , we were also joined by scientists from the Institute of Mineral Resources Engineering of the RWTH Aachen University, who contributed their mining-specific expertise. The project was funded by the European Space Agency .

How to recognise objects in videos with PyTorch


William Clemens (PhD)


Self-driving cars still have difficulties in detecting objects in front of them with sufficient reliability. In general, though, the performance of state-of-the-art object detection models is already very impressive - and they are not too difficult to apply. Here I will walk you through streaming a YouTube video into Python and then applying a pre-trained PyTorch model to it in order to detect objects. We'll be applying a model pre-trained on the object detection dataset COCO . (In reality, the model would of course be fine tuned to the task at hand.)

Comparison of OCR tools: how to choose the best tool for your project


Fabian Gringel


Optical character recognition (short: OCR) is the task of automatically extracting text from images (coming as typical image formats such as PNG or JPG, but possibly also as a PDF file). Nowadays, there are a variety of OCR software tools and services for text recognition which are easy to use and make this task a no-brainer. In this blog post, I will compare four of the most popular tools: Tesseract OCR ABBYY FineReader Google Cloud Vision Amazon Textract I will show how to use them and assess their strengths and weaknesses based on their performance on a number of tasks. After reading this article you will be able to choose and apply an OCR tool suiting the needs of your project. Note that we restrict our focus on OCR for document images only, as opposed to any images containing text incidentally. Now let’s have a look at the document images we will use to assess the OCR engines.

Detecting clouds in satellite images using convolutional neural networks


William Clemens (PhD)


Here I’m going to walk through how we approached the problem of detecting convective clouds in satellite data including what we are looking for (and why!) and the machine learning approach we used. This post will consist of four sections: First we will introduce convective clouds and give a brief overview of the problem. In section 2 we will discuss the satellite data we are working with. In section 3 we discuss how we go about manually labelling the data, which is a particularly difficult task requiring the use of some external data. Finally, in section 4 we will give a brief overview of the neural network architecture that we use, the U-Net, and how we go about training it. You can also have a look at my talk at 2020's Applied Machine Learning Days in Lausanne, Switzerland:

Data Augmentation with GANs for Defect Detection


Lorenzo Melchior


In Machine Learning, an insufficient amount of training data often hinders the performance of classification algorithms. Experience shows that shortage of training data is rather the rule than the exception, which is why people have come up with clever data augmentation methods. In this blog post I demonstrate how you can create new images of a distribution of images with a Generative Adversarial Network ( GAN ). This can be applied as a data augmentation method for problems such as defect detection in industrial production.

Pattern Recognition in Medical Imaging


Matthias Werner


Artificial intelligence (AI) and in particular computer vision promise to be valuable aids for diagnosing diseases based on medical imaging techniques . For humans, it takes years of academic and on-the-job training to e.g. perform medical diagnosis from X-ray images. As we will see, it is also quite a challenge for intelligent algorithms. At this year's KIS-RIS-PACS and DICOM convention organized by the Department of Medicine at the University of Mainz, Germany, researchers from radiology and adjacent fields gathered to discuss the state-of-the-art of AI in their field. Philipp Jackmuth from dida was the speaker of choice for this topic and here we will discuss key points of his talk.

Semantic segmentation of satellite images


Nelson Martins (PhD)


This post presents some key learnings from our project on identifying roofs on satellite images . Our aim was to develop a planing tool for the placement of solar panels on roofs. For this purpose we set up a machine learning model that accurately partitions those images into different types of roof parts and background. We learned that the UNet model with dice loss enforced with a pixel weighting strategy outperforms cross entropy based loss functions by a significant margin in semantic segmentation of satellite images. The following idealized pipeline illustrates the functionality of the planning tool: