Skip to main content

Zonal Statistics: A Comprehensive Guide

Zonal statistics calculates aggregate statistics (e.g., mean, sum, maximum) of the pixel values of a raster that fall within areas defined by a Polygon dataset. The method has diverse applications such as determining vegetation health in an agricultural field, assessing surface water availability, or approximating building heights from DSM rasters.

Building Polygon features colored based on DSM aggregate values.

Applications

  • Height approximation: Estimate the height of buildings from DSM
  • Crop yield estimation: Determine vegetation health within crop polygons using NDVI rasters
  • Water resource management: Assess surface water availability in watersheds using NDWI rasters
  • Forestry: Track changes in forest cover over time within administrative boundaries

Implementing zonal statistics

The analysis involves first determining the pixels within a polygon and then calculating their aggregate statistics, such as their count, mean, sum, or maximum.

Implementation steps

To perform this operation, first load a raster and a vector dataset for spatially overlapping areas, then perform zonal aggregations as shown below using the xarray and GeoPandas libraries.

  1. Load overlapping the raster and vector data
  2. For the extent of each Polygon, clip the raster to create a mask array
  3. Perform an aggregation operation (e.g., mean, sum, max) on each masked array
  4. Return the results in a GeoDataFrame, where each row corresponds to a Polygon and the columns contain the value for the aggregate zonal statistics

Example UDF

This example UDF loads the building footprint table gdf and DSM raster arr from S3 only for a section defined by bbox, which corresponds to a map tile. It then calculates the zonal statistics and returns the results as the GeoDataFrame gdf_zonal. This process enables Fused Workbench to perform the calculation dynamically as users scroll and zoom in on the map.

@fused.udf
def udf(
bbox: fused.types.TileGDF=None, min_zoom=15, table="s3://fused-asset/infra/building_msft_us/"
):
dsm_to_tile = fused.load("https://github.com/fusedio/udfs/tree/eda5aec/public/DSM_JAXA_Example").utils.dsm_to_tile
utils = fused.load("https://github.com/fusedio/udfs/tree/2a76b6a/public/common/").utils

gdf = utils.table_to_tile(bbox, table, min_zoom)
arr = dsm_to_tile(bbox, z_levels=[4, 6, 9, 11], verbose=False)
gdf_zonal = utils.geom_stats(gdf, arr)
return gdf_zonal

The UDF would return a GeoDataFrame like the following, with an aggregate stats column for each input Polygon:

                                                geometry      stats  count
660 POLYGON ((-122.39806 37.76221, -122.39806 37.7... 22.00 5
661 POLYGON ((-122.39881 37.76219, -122.39876 37.7... 21.99 193
1452 POLYGON ((-122.39619 37.76326, -122.39613 37.7... 13.50 2
1458 POLYGON ((-122.39774 37.76327, -122.39776 37.7... 11.00 28
3033 POLYGON ((-122.39680 37.76362, -122.39701 37.7... 10.20 5
...

Conclusion

While zonal statistics is a valuable tool, it's often just the starting point for more complex analyses. Combining zonal statistics with other techniques such timeseries analysis or machine learning approaches can provide even richer insights into your data.

Demo app [beta]