dida Logo
Our client

Convective Clouds Detection

We automated the detection of certain cloud structures for Deutscher Wetterdienst (DWD).

Input:Meteosat satellite images
Output:    Classification of cloud type
Goal:Detect clouds that can be dangerous for aviation

Key Facts

98%
Accuracy
15 Minutes
Update frequency

Starting Point

Our client here is Deutscher Wetterdienst (DWD). Part of their responsibility is to prepare weather reports for pilots in a system called METAR. We had been tasked with creating a model to assist the detection of convective clouds using satellite imagery to support other forms of detection.

Cumulonimbus (CB) clouds cause significant downdrafts which are dangerous for aircraft taking off and landing. As such it is important for our client, Deutscher Wetterdienst (DWD), to reliably detect them as well as Towering Cumulus (TCU) which can evolve into Cumulonimbus clouds.

The data we use for this project is from a geostationary satellite called Meteosat Second Generation. This gives us an image of Germany from a fixed viewpoint every 5 or 15 minutes. Our goal will be to classify each pixel in the image into CB, TCU, or neither.

Challenges

The most important component of any machine learning solution is a large and good quality training data set and this project was no exception. The satellite images used for training have to be annotated by humans, to provide the algorithm with examples of the relevant cloud types. The task is extremely difficult even for humans to complete so labeling presents a challenge in itself.

Additionally, the input data is in an unusual format that requires extensive preprocessing using domain-specific knowledge before machine learning can be applied.

Solution

In order to accomplish this, we manually labeled a dataset using external radar data to help inform the decision-making process. In addition, our labels were reviewed by a trained meteorologist in order to ensure that they were correct.

We trained a deep neural network using the satellite data as the input and our manual labels as the targets. Its performance was measured using a hold-back dataset that was not used at all during training. On this dataset, we were able to obtain a 98% accuracy.

Here are some example outputs from the test set with CB clouds labeled in blue and TCU in green. Both examples show an area of central Europe focused on Germany.

Philipp Jackmuth

Managing Director

p.jackmuth@dida.do

Tell us a little about your project requirements and we'll get in touch with you.

Technical Details

Background

Pilots receive a weather prediction called a METAR report describing conditions at the destination airport. Currently, this is produced manually by observations from trained meteorologists on the ground but there is an effort by the DWD to automate as much as possible to free up the meteorologists’ time for other tasks. We will be focused on the section of this report that concerns convective clouds.

In order for an automated system to be reliable and trusted it must be multiply redundant and so models must be based on different data sources. Algorithms using radar and lightning data already exist and our goal here was to provide an independent model, based on Meteosat Second Generation (MSG) data, to support these. As such the machine learning component of the system is a semantic segmentation of the satellite data into three classes: CB, TCU or neither.

Challenges

Labelling here was a significant challenge. The signs of convective clouds are extremely subtle and it is difficult for even a trained human to recognise them in the data.

In addition, there were also a number of technical challenges regarding the input data: The satellite orbits above the equator and takes an image of the entire earth disk. This puts it at an angle of ~20° above normal. As a result, the image must be geometrically transformed to give the correct perspective.

In addition, we have 12 channels: 3 visual, which are all different wavelengths of red rather than RGB, and 9 infrared channels. One of the visual channels, the High Resolution Visual (HRV) channel has three times the resolution of the others. This presents us with a choice, we can downsample the HRV channel or upsample the others. We chose to upsample the remaining channels to ensure that the fine structure in the HRV channel is preserved.

Our Solution

The labelling process was conducted using a composite image of the HRV channel together with one of the infrared channels that made the relevant clouds stand out more. However, even this is not sufficient to make a judgement in all cases so external data in the form of radar and ground observations were also consulted in order to label correctly. Additionally, a trained meteorologist reviewed all labels to ensure that they were correct.

The model itself is a U-Net implemented in the PyTorch framework and trained using the Adam optimiser with a one cycle learning rate annealing scheduler.

Extensive data augmentation was also utilised to get the most from our dataset.

Technologies used

Backend: Python, PyTorch, SatPy, OpenCV, Numpy
Infrastructure: GCloud (Training), Git, nevergrad, tensorboard

Further Projects

A selection of projects we have done

Contamination detection in industrial 3D printing

ML-based detection of nozzle contamination in industrial 3D printing

House thats part of rental contract reviewed by machine learning
MieterEngel logo

Legal Review of Rental Contracts

Different methods from the field of NLP helped us to create software that spots errors in rental contracts.

Many unsorted files
felmo logo

Extracting information from customer requests

Given a free form vet appointment reason we extract symptoms, diseases and requested services.

Urban growth of a city
ESA (European Space Agency) logo

Monitoring Urban Growth and Change

An image segmentation algorithm that supports sustainable city planning.