Metadata
- Name
- Continental Europe land cover mapping at 30m resolution based CORINE and LUCAS on samples
- Repository
- ZENODO
- Identifier
- doi:10.5281/zenodo.4725429
- Description
- Annual land cover mapping for continental Europe based on Ensemble Machine Learning (EML), samples obtained from LUCAS (Land Use and Coverage Area frame Survey) and CLC (CORINE Land Cover) Maps, and several harmonized raster layers (e.g. GLAD Landsat ARD imagery and Continental EU DTM). The EML predicted the dominant land cover, probabilities and uncertainties for 33 classes compatible with CLC over 20 years (2000–2019), and was implemented in R and Python (eumap library).
The raster layers were mainly composed by the GLAD Landsat ARD imagery, which were downloaded for the years 1999 to 2020 considering the Continental Europe extent (land mask area and tiling system), screened to reduce cloud cover (GLAD quality assessment band), aggregated by season according with three different quantiles (i.e. 25th, 50th and 75th), and gap-filled using the Temporal Moving Window Median approach available in the eumap library. The images for each season were selected using the same calendar dates for all period:
Winter: December 2 of previous year until March 20 of current year
Spring: March 21 until June 24 of current year
Summer: June 25 until September 12 of current year
Fall: September 13 until December 1 of current year
In addition to Landsat spectral data, the EML considered night lights (VIIRS/SUOMI NPP), Global surface water frequency, Continental EU DTM, Landsat spectral indices (SAVI, NDVI, NBR, NBR2, REI and NDWI) and the max/min. monthly geometric temperature, estimated on a pixel basis and for each month.
The training data were obtained from the geographic location of LUCAS (in-situ source) and the centroid of all polygons of CORINE (supplementary source), harmonized according to the 33 CLC and organized by year, where each unique combination of longitude, latitude and year was treated as a independent sample with the following classes (the class descriptions are here):
111: Urban fabric
122: Road and rail networks and associated land
123: Port areas
124: Airports
131: Mineral extraction sites
132: Dump sites
133: Construction sites
141: Green urban areas
211: Non-irrigated arable land
212: Permanently irrigated arable land
213: Rice fields
221: Vineyards
222: Fruit trees and berry plantations
223: Olive groves
231: Pastures
311: Broad-leaved forest
312: Coniferous forest
321: Natural grasslands
322: Moors and heathland
323: Sclerophyllous vegetation
324: Transitional woodland-shrub
331: Beaches, dunes, sands
332: Bare rocks
333: Sparsely vegetated areas
334: Burnt areas
335: Glaciers and perpetual snow
411: Inland wetlands
421: Maritime wetlands
511: Water courses
512: Water bodies
521: Coastal lagoons
522: Estuaries
523: Sea and ocean
The LUCAS points with a unique land cover class received a confidence rating of 100%, while CORINE points received 85%, values which were considered by EML as sample weight in the training phase. The points were used in a spacetime overlay approach, which considered the location and the year to retrieve the pixel values of all rasters. Some specific land cover samples (i.e. 111, 122, 131, 141, 211, 221, 222, 223, 231, 311, 312, 321, 411, 512) were screened according to convergence with pre-existing mapping products (OSM roads, OSM railways and Copernicus-OSM buildings; Copernicus high resolution layers), where, for example, “111: Urban fabric” samples located in low density building areas (> 50% according to Copernicus-OSM building layer) were removed from the final training data ( ~5.3 million samples and 178 covariates/features).
Using this training data, three ML models were trained to predict probabilities (i.e. Random Forest, XGBoost, Artificial Neural Network), which served as input to train a linear meta-model (i.e. Logistic regression classifier), responsable for predicting the final land cover probabilities of all classes. The hyperparameter optimization was conducted using a 5-fold spatial cross validation, based on a 30x30km tilling system. The uncertainties were calculated for all classes according to the standard deviation of the three predicted probabilities for each pixel, and the highest probability was selected as the dominant land cover class, resulting in 20 annual maps for continental Europe.
The training samples, covariates/features and fitted models are available through lcv_landcover.hcl_lucas.corine.eml_p_landmapper_full.lz4, a LandMapper class instance that can be loaded by eumap library (check the code demonstration). The production code used to generate the current version of the annual land cover maps is available in the spatial layer repository and considered a lighter LandMapper class instance (lcv_landcover.hcl_lucas.corine.eml_p_landmapper_light.lz4,), which not includes the training samples.
Only the dominant land cover classes are provided here. To access the probabilities and uncertainties use:
Open Data Science Europe viewer: https://maps.opendatascience.eu
S3 Cloud Object Service: https://medium.com/swlh/europe-from-above-space-time-machine-learning-reveals-our-changing-environment-1b05cb7be520
A publication describing, in detail, all processing steps, accuracy assessment and general analysis of land-cover changes in continental Europe is under preparation. To suggest any improvement/fix use https://gitlab.com/geoharmonizer_inea/spatial-layers/-/issues - Data or Study Types
- multiple
- Source Organization
- Unknown
- Access Conditions
- available
- Year
- 2021
- Access Hyperlink
- https://doi.org/10.5281/zenodo.4725429
Distributions
- Encoding Format: HTML ; URL: https://doi.org/10.5281/zenodo.4725429