Leveraging Machine Learning for Environmental Protection

Edit Szügyi

Machine Learning has been solving complex problems for decades. Just think about how Computer Vision methods can reliably predict life-threatening diseases, self-driving cars are on their way to revolutionize traffic safety, or automatic translation gives us the ability to talk to just about anyone on the planet. The power of Machine Learning has been embraced by many branches of industry and science.

There are some areas however where the potential of Machine Learning is harder to see and also less utilized. One of these is environmental protection. Protecting the natural environment is one of the biggest challenges our generation is facing, with pressing issues such as climate change, plastic pollution or resource depletion. Let us now look at how Machine Learning has been and can be used as a tool in environmental protection.

The technology is already there

Data is all around us. New technologies and devices can collect large quantities of data - think about satellite images for example. This kind of data holds great insight not only about the present moment but about the potential future as well. However, tasking human experts to continuously monitor a global or even a local live feed of imagery is not only impractical but often unfeasible.

This is where Machine Learning enters the equation. Unlike humans, computers and more specifically Machine Learning models excel at looking through large quantities of data, and more importantly, making sense of that data by finding patterns. This understanding can then help us make better decisions.

The technology behind successful Machine Learning methods is fueled by decades of research and real-life applications, so it is mature and reliable. With the right fine-tuning to environmentally relevant data sets, these methods can achieve far-reaching results. A Machine Learning model that detects tumors in medical images is just a few adjustments away from being used on satellite images. One that accurately predicts stock prices can be tweaked to make predictions about water or air quality.


Making predictions

The most standard of all Machine Learning tasks is making predictions. When we feed training data to a model, we assume that the distribution of the training data is representative of the data in general. With this assumption, the model can predict how the data will look at a future time or a different location, which is not present in the training data itself.

Academic research has proven that advanced Deep Learning methods are able to predict the effects of changing weather patterns and other complex factors on crop yield, which could be crucial for addressing issues such as food insecurity. By predicting the future supply and demand of an operation, overproduction can be reduced, which translates to less waste as well as a more (cost-)efficient supply chain.

Kenyan scientists have created a warning system that can alert local farmers about locust invasions using values of soil moisture, wind, humidity, surface temperatures and vegetation index 2-3 months before the actual invasion, giving them ample time to prepare and save precious resources.

Semantic segmentation

 A photograph of a crosswalk next to the result of semantically segmenting the image.

Machine Learning, more specifically Computer Vision methods can be used to detect semantic objects and separate them from their background image, by deciding for each pixel in an image if it belongs to a certain category or not.

We at dida have developed a tool that detects illegal mines. These mines endanger natural landscapes and the first step in order for local governments to act against them is to find out their locations. This is not a trivial task due to the vast areas that have to be monitored. With the help of our automatic surveillance system, the relevant authorities in Guyana, Peru and Suriname can now have timely access to the location and extent of illegal mining sites.

Anomaly detection

 Photo of a forest fire

When we feed data to a Machine Learning model, it learns what is normal and expected based on that data. Armed with that information, it can then determine which events are improbable in new, previously unseen data, as they deviate from the trend. This is the standard method for preventing credit card fraud in FinTech, but there are also a lot of ways environmental sciences can benefit from anomaly detection.

A success story of a real-life application is a software product built on the same technology, saving acres of land and human lives by detecting wildfires within minutes of their outbreak, which can mean hours before a human could notice and report the emergency.


As stated in the introduction, data is everywhere. However, this is mostly true for unlabeled data, which makes advancing unsupervised ML methods such as clustering very important. T-distributed stochastic neighbor embeddings (t-SNE) have been used to represent complex, multidimensional data in a 2 or 3-dimensional space where humans can identify clusters.

To give an example of an application, this method has been used to create more efficient electricity networks by clustering households and understanding their specific categories of demand.

Identifying important features

 Photograph of a smoking factory chimney

Nature provides us with a lot of remarkably complex data sets, one example is air quality data. There is a huge number of factors that play a constitutive role in air quality. When the aim is to reduce air pollution, it is important to identify the most important of those factors. Given a suitable task, we can train a Machine Learning model on the air quality data. Once it makes good predictions on the training data, we can assume that it has learned the factors that are actually relevant for making generalizations about the data, so we can then turn to observing the model itself.

If we look at the air pollution data and study the Latent Space (a mathematical representation of the feature values the Machine Learning model learns from its training data), we learn which features out of the many are the most important and with that knowledge we are able to make an impact by trying to change a limited number of factors.


Machine learning can also aid scientists in discovering new materials and chemicals. One huge problem where Machine Learning synthesis can help is developing new materials that can be used instead of plastic to eventually reduce plastic waste.

While experimenting with biodegradable polymer materials, Machine Learning models are trained on polymeric properties data to identify promising hypothetical polymers. Generative models such as Variational Autoencoders or Generative Adversarial Networks (GANs) can be trained on existing materials to then generate novel hypothetical ones via their decoders, which then have to be validated by chemical experiments.


Environmental experts agree that Machine Learning methods are powerful tools for tackling environmental problems and the number of academic papers on the topic had been growing significantly in the past decade. There are significant initiatives calling for cooperation, such as the Climate Change AI that was kick-started by the 2019 paper co-authored by some of the top Machine Learning experts. However, real-life applications are harder to find. This is not unexpected as putting academic studies into production is in general a rather slow process, and the interest in Machine Learning tools in environmental studies is relatively novel.

When talking about Machine Learning and the environment, one must also consider the other side of the coin, for instance, the huge amount of carbon emission caused by running complex models. Other general issues in Machine Learning also have to be considered. Sometimes there is a lack of historical or global data or the interpretability of the models is questionable. While we do have to be mindful of these drawbacks, they should definitely not steer anyone away from using Machine Learning methods. Different Machine Learning models excel at different tasks, and part of a successful project is to be mindful of all requirements and challenges when choosing or implementing a solution. For instance, not all models are black boxes, and there is the whole research field of Interpretable AI devoted to easing human understanding of decisions made by Machine Learning algorithms.


In order to tackle an issue as grave as the decay of our natural environment, we need all hands on deck. In this post, we have sampled a small but colorful subset of Machine Learning projects which are either already part of the solution or showing excellent research results and ready to be put into practice. By dividing the ideas according to the common Machine Learning tasks that they involve, we show that we can break down problems occurring in environmental protection to be similar to problems in any other field. This means that the technology to help solve these problems is tried-and-tested, so with the right expertise, Machine Learning models can indeed take part in helping to protect the natural environment.