The best (Python) tools for remote sensing


Emilius Richter


Satellite that is used for remote sensing

An estimated number of 906 Earth observation satellites are currently in orbit, providing science and industry with many terabytes of data every day. The satellites operate with both radar as well as optical sensors and cover different spectral ranges with varying spectral, spatial, and temporal resolutions. Due to this broad spectrum of geospatial data, it is possible to find new applications for remote sensing methods in many industrial and governmental institutions. On our website, you can find some projects in which we have successfully used satellite data and possible use cases of remote sensing methods for various industries.

Well-known satellite systems and programs include Sentinel-1 (radar) and Sentinel-2 (optical) from ESA, Landsat (optical) from NASA, TerraSAR-X and TanDEM-X (both radar) from DLR, and PlanetScope (optical) from Planet.

There are basically two types of geospatial data: raster data and vector data.

Raster data

Raster data are a grid of regularly spaced pixels, where each pixel is associated with a geographic location, and are represented as a matrix. The pixel values depend on the type of information that is stored, e.g., brightness values for digital images or temperature values for thermal images. The size of the pixels also determines the spatial resolution of the raster. Geospatial raster data are thus used to represent satellite imagery. Raster images usually contain several bands or channels, e.g. a red, green, and blue channel. In satellite data, there are also often infrared and/or ultraviolet bands.

Vector data

Vector data represent geographic features on the earth's surface, such as cities, country borders, roads, bodies of water, property rights, etc.. Such features are represented by one or more connected vertices, where a vertex defines a position in space by x-, y- and z-values. A single vertex is a point, multiple connected vertices are a line, and multiple (>3) connected and closed vertices are called polygons. The x-, y-, and z-values are always related to the corresponding coordinate reference system (CRS) that is stored in vector files as meta information. The most common file formats for vector data are GeoJSON, KML, and SHAPEFILE.

In order to process and analyze these data, various tools are required. In the following, I will present the tools we at dida have had the best experience with and which are regularly used in our remote sensing projects. I present the tools one by one, grouped into the following sections:

  • Requesting satellite data

    • EOBrowser

    • Sentinelsat

    • Sentinelhub

  • Processing raster data

    • Rasterio

    • Pyproj

    • SNAP

    • pyroSAR

    • Rioxarray

  • Processing vector data

    • Shapely

    • Python-geojson

    • Geojson.io

    • Geopandas

    • Fiona

  • Providing geospatial data

    • QGIS

    • GeoServer

    • Leafmap

  • Processing meteorological satellite data

    • Wetterdienst

    • Wradlib


Requesting satellite data


EOBrowser

EOBrowser is a web application from sentinelhub, which can be used to display various satellite data, such as Sentinel-1, Sentinel-2, Landsat, MODIS, etc. The desired region for which the data should be displayed can be selected via the search bar, by uploading KML or GeoJSON files, or by drawing polygons. The "Advanced search" button can be used to adjust certain parameters that refine the search. The search provides a link to download the corresponding raw data as well as different visualizations.

For Sentinel-2 data, for example, the RGB image can be displayed, but also the visualized Normalized Difference Water Index (NDWI) or Normalized Difference Vegetation Index (NDWI).

Sentinelsat

Sentinelsat is a Python package that can be used to request sentinel data from the Copernicus Open Access Hub. The implementation is very simple and the request is based on parameters like region (e.g. in form of a GeoJSON file), date, platform (e.g. Sentinel-2), product type (e.g. L1C), maximum cloud cover, etc.

However, sentinelsat does not provide tools to process the downloaded raw data and convert it into suitable formats. This can be implemented relatively easily with the help of rasterio.

 sentinelhub logo

Sentinelhub

The Python package sentinelhub is the Python interface to use the sentinelhub services. The methods and functions allow to download and process the different supported satellite data (same as EOBrowser). For example, there are methods to display images with a cloud mask or mosaics of the least cloudy images. The many functions are described in detail in both the Python and the API documentation in an understandable way. Sentinelhub also offers to provide satellite data and meta information via the OGC services WMS, WMTS, WCS, and WFS.

Unfortunately, many features of sentinelhub are not available for free. Depending on the plan some features cannot be used and there are limits on data requests.


Processing raster data


Rasterio

Rasterio is a Python package that can be used to process geospatial raster data, such as GeoTIFFs. For this purpose many modules and functions are available, for example, to process raw data from satellites, read raster data, retrieve their geographic metadata, transform their coordinates, crop the images, merge multiple images, and save the data in other formats. This large number of functions and the ease of implementation make Rasterio a standard tool in the analysis of satellite data.

The code example below shows how to use Rasterio to read a Sentinel-2 scene and display and save it as an RGB image:

import numpy as np
import rasterio
from rasterio import plot

file = 'S2BMSIL2A20210903T144719N0301R139T19LDG20210903T184347.tif'
# opening the GeoTIFF file and reading the R, G and B bands
with rasterio.open(file) as src:
	b, g, r = src.read(1), src.read(2), src.read(3)
	rgb = np.array([r, g, b])
	meta = src.meta.copy()

# updating the metadata to 3 bands
meta.update({'count': 3})

# saving the data as new GeoTIFF
with rasterio.open(file[:-4] + '_rgb.tif', 'w', **meta) as dest:
	dest.write(rgb)

# adjusting the RGB image for creating a nice plot
rgb[rgb > 3000] = 3000
img = ((rgb / rgb.max()) * 255).astype(np.uint8)
plot.show(img)

The output should look like this:

 Satellite image for remote sensing

Pyproj

The Python library pyproj is a Python interface for PROJ, a software library to transform geographic coordinates from one coordinate reference system (CRS) to another. Both geodetic transformations and cartographic projections are supported. Pyproj is a very simple tool and essential in remote sensing projects to unify coordinate reference systems that may arise from different data sources used.

The following example shows how to perform a simple coordinate transformation of the boundaries of the Sentinel-2 image above:

import rasterio
import pyproj

file = 'S2B_MSIL2A_20210903T144719_N0301_R139_T19LDG_20210903T184347.tif'
# reading the GeoTIFF file and returning the bounds and the coordinate reference
# system (CRS) which is EPS:32719 for Sentinel-2
with rasterio.open(file) as src:
    bounds = src.bounds
    crs_from = src.crs

# defining the CRS to be transformed into
epsg_id = 4326
crs_to = pyproj.CRS.from_epsg(epsg_id)

# defining the transformer which transforms from one CRS to another
proj = pyproj.Transformer.from_crs(crs_from=crs_from, crs_to=crs_to)
bounds_4326 = proj.transform_bounds(*bounds)

print('bounds in EPSG 32719:', bounds)
print('bounds in EPSG 4326:', bounds_4326)

The output would be:

bounds in EPSG 32719: BoundingBox(left=399960.0, bottom=8590240.0, right=509760.0, top=8700040.0)
bounds in EPSG 4326: (-12.752446747147742, -69.92157939546101, -11.7580241630838, -68.91008576435853)

SNAP

SNAP stands for Sentinel Application Platform and is a common toolbox for scientific exploration of satellite data developed by ESA. SNAP offers an intuitive platform for processing, modeling, and visualizing satellite imagery, especially for the Sentinel missions. The software is optimized to handle large amounts of satellite data. It also facilitates the processing of SAR data, like from Sentinel-1.

The SNAP software comes with an installation of the Sen2Cor tool. Sen2Cor is a processor that performs atmospheric correction for Sentinel-2 imagery, i.e. it converts Level-1C Top-Of-Atmosphere (TOA) products to Level-2A Bottom-Of-Atmosphere (BOA) reflectance products. Technical specifications for these products can be found here: Level-1C and Level-2A.

You can also use the SNAP tools and methods within Python. A helpful guide on how to install and configure the SNAP modules for Python can be found here.

pyroSAR

With the Python library pyroSAR it is possible to process Synthetic Aperture Radar (SAR) satellite data, such as Sentinel-1, TerraSAR-X, TanDEM-X or ALOS-2/PALSAR-2. Methods that access SNAP and GAMMA Remote Sensing tools and functions developed specifically for Sentinel-1 are available for this purpose.

Rioxarray

xarray is a Python package that provides arrays with labels, such as dimensions, coordinates, and other specific attributes. Thus, it makes the work with large dimensional arrays more intuitive. Rioxarray combines the functionalities of rasterio with all advantages of xarray.

Rioxarray simplifies the work with large dimensional raster files. It gives a good overview of the attributes of a raster file, such as boundaries, dimensions, coordinate reference system, and more. Unfortunately, the package is not well documented. And for more sophisticated operations on satellite data, one still needs to use rasterio.


Processing vector data


Shapely

Shapely is a Python library for analyzing and manipulating planar geospatial features. The different geographic objects are represented by individual classes. A point is defined by the Point class, a curve by the LineString and LinearRing classes, and a surface by the Polygon class. A collection of objects are implemented by the MultiPoint, MultiLineString, and MultiPolygon classes. These classes are the standard for processing and manipulating vector data and are also used by other software libraries and Python packages.

Shapely has a variety of built-in methods that determine both the geometric properties of an object such as area, boundary, and length, but also the relationships to other objects such as whether one object contains another, whether one object intersects with another, whether one object touches another, and much more. Shapely itself does not have functions to read vector data, for which other libraries such as python-geojson, fiona or geopandas (see below) are needed. However, the transformation of the geospatial features into shapely objects is simple.

In the code example below, a few of the standard methods presented above are demonstrated using features from a GeoJSON file:

import shapely.geometry
import geojson

file = 'natural_areas.geojson'
# reading GeoJSON file with geojson package (see below)
with open('natural_areas.geojson') as f:
    natural_areas = geojson.load(f)

natural_areas_shapely = []
# converting every polygon into a shapely polygon object
for polygon in natural_areas['features']:
    p = shapely.geometry.asShape(polygon['geometry'])
    # since the GeoJSON were downloaded from another source,
    # errors in the polygon can occur (e.g. it crosses itself)
    if p.is_valid:
      natural_areas_shapely.append(p)
    else:
      natural_areas_shapely.append(p.buffer(0))

print('geom_type:', natural_areas_shapely[1].geom_type)
print('area in km^2:', natural_areas_shapely[1].area)
print('bounds:', natural_areas_shapely[1].bounds)
print('object contains point:', natural_areas_shapely[1].contains(shapely.geometry.Point(-76.86, -6.81)))
print('object contains another object:', natural_areas_shapely[1].intersects(natural_areas_shapely[2]))

The output would be:

geom_type: Polygon
area in km^2: 0.010795427253500077
bounds: (-76.952331, -6.932597, -76.794716, -6.783479)
object contains point: True
object contains another object: False

Python-geojson

GeoJSON is a standard format to store geospatial features like points, lines or polygons, and their coordinates. The Python package of the same name provides functions to read this data, process it, and save it as GeoJSON. Because of its ease of use, geojson is a very useful tool when working with vector data that is available as GeoJSON.

The code example above demonstrates how to read a GeoJSON file with the geojson package.

Geojson.io

Geojson.io is a geobrowser that allows you to display geospatial features on a map. You can choose between maps from OpenStreetMap, Mapbox, or satellite images from Maxar. The satellite images come with a high spatial resolution but are often not quite up to date. Geojson.io offers also the possibility to upload and display GeoJSON files and to draw points, lines, and polygons. The features can be saved as GeoJSON, Shapefile, KML, WKT, and other formats. This makes geojson.io a very useful tool to create and visualize geographic vector data without any programming knowledge.

 GeoPandas logo

Geopandas

Geopandas is an extension of pandas, a standard library for data analysis and manipulation. The data types and methods known from pandas are extended by geographic and geometric objects and methods based on shapely. Geopandas also has functions to read and store vector data. These functions depend on fiona (see below).

In geopandas, shapely objects can be grouped in a GeoSeries, to which one can apply the various geometric methods but also classical operations for data analysis and manipulation. In addition, geopandas has practical tools for geocoding.

Fiona

With fiona, all raster and vector file formats supported by GDAL can be read and data can also be saved in this format. Beyond that fiona has a few basic functions for geometric analysis, e.g. the determination of boundaries of an object, and for coordinate transformations, which depend on shapely and pyproj.


Providing geospatial data


QGIS

QGIS is a free geographic information system for the desktop. It is a software solution that allows displaying, editing, creating, analyzing, and publishing geographic information (raster and vector data).

Since QGIS is equipped with many functions and features, the full potential of the software is revealed mainly to people who work with it a lot. But the display of satellite data is easy to implement and helps a lot in the exploration of geospatial data.

 GeoServer logo

GeoServer

GeoServer is a free server that can be used to publish geographic data. It uses the Open Geospatial Consortium standard protocols Web Map Service (WMS), Web Feature Service (WFS), Web Coverage Service (WCS), Web Processing Service (WPC), and Web Map Tile Service (WMTS).

Geospatial raster and vector data can be published using the above protocols and subsequently used in Web GIS applications (e.g. via Leaflet, Mapbox or OpenLayers) or in GIS software (e.g. QGIS or ArcGIS). This gives you the possibility to provide your geographic data interactively without long loading times and large data transfers.

Leafmap

Leafmap is a Python package for interactive visualizations of geospatial data in Jupyter notebooks. With a few lines of code, it is easy to design interactive maps. Leafmap is very powerful in that it builds on and integrates various geospatial mapping and analysis tools such as ipyleaflet, whiteboxtools, and folium.

Leafmap supports all common vector and raster data formats. You can select from different base maps including satellite maps, hybrid maps, streets, terrain, and many more. It is possible to draw your own polygons and export them as vector files or use them for subsequent operations. The package also provides an upload functionality. The maps are highly configurable as there exist many style options for polygons, lines, markers, popups, and legends. In the case you host your geospatial data on a tile server, e.g. via Geoserver, leafmap can stream raster and vector data from services like WMS or WFS.


Processing meteorological satellite data


Wetterdienst

Wetterdienst is a Python package that can be used to retrieve meteorological data from the German Meteorological Service (DWD). These include historical weather data, forecasts based on weather models, and radar data (RADOLAN and RADVOR among others). The temporal resolution of the simple radar data (such as temperature, atmospheric precipitation, etc.) ranges from monthly to annual. RADOLAN data are provided with minute-by-minute to daily temporal resolution.

Unfortunately, there are currently no built-in functions to create time series of RADOLAN data, which are commonly used for the analysis of weather data.

Wradlib

Wradlib is a Python library that can be used to process various weather radar data, including RADOLAN, DX, and BUFR. Especially helpful is that the weather data can be converted to suitable raster formats, e.g. GeoTIFF.


Conclusion


Due to the large number of earth observation satellites, remote sensing methods find new application areas in many fields and provide valuable information about the earth's surface and its composition. dida has successfully demonstrated in several projects how remote sensing methods in combination with machine learning can provide new insights that are difficult to achieve by classical non-machine learning manual methods. However, to be able to process and analyze different satellite data, different tools are needed. Some of them, which we use regularly at dida, have been explained and demonstrated in this article. I hope this has provided a technical foundation to get started with your next remote sensing project. We also completed some projects involving remote sensing, for example the automatic classification of crop types and the automatic detection of artisanal and small mines.