The steep increase in urban population and the urban growth associated with it poses challenges to public authorities and utility companies, especially in developing and emerging countries. They need to ensure sufficient living space, good public transport and provide adequate infrastructure, such as schools, sanitary systems, electricity and waste management.
However, infrastructure planning takes years or decades whereas population influx for fast growing urban areas is in the ten to hundred thousands per year. Through constant monitoring of urban areas urban population can be estimated and infrastructure needs derived at earlier stages. Moreover, in retrospective, urban growth and settlement movements can be mapped to certain events to understand the underlying causalities.
The automated monitoring of urban change in terms of land spread, build-up height and density can be achieved by processing satellite image data. For this, a machine learning model can be trained with labeled satellite data to recognize patterns, such as changes in land cover and land use, road network or sealing, and to make predictions for new unseen satellite imagery based on past settlement patterns.
However, the spatial resolution of open access satellite data auch as Landsat-8, Sentinel-2 and Sentinel-1 may be to small, so finer details are hard to extract, which leads to a bad performance for the prediction. This holds particularly for the height estimation of urban areas. The satellite data for training can be taken from several open data sources, which provide data in the visible and radar spectrum, which both can be used to analyze urban growth.
The solution for monitoring urban growth and change with satellite data consists of two tasks: horizontal segmentation and vertical estimation. machine learning models for image segmentation tasks are normally special convolutional neural networks (CNN), such as U-Net or Mask R-CNN algorithms.
For the height estimation of satellite imagery, the approach Im2Height can be used, whose architecture is composed of a convolutional sub-network and a deconvolutional sub-network. This method is particularly prone to small resolution data, because the height difference between pixels can be very high, so the right choice of data source is quite important.
Also data of different wavelength spectra should be used which could incorporate different features of high objects, e.g. shadows of skyscrapers in visual data in contrast to radar data.