Zonal Statistics: A Comprehensive Guide

Zonal statistics calculates aggregate statistics (e.g., mean, sum, maximum) of the pixel values of a raster that fall within areas defined by a Polygon dataset. The method has diverse applications such as determining vegetation health in an agricultural field, assessing surface water availability, or approximating building heights from DSM rasters.

note

The dynamic map only shows data when zoomed in. If zoomed out too much we don’t load data to prevent slowing down the docs. Make sure to zoom in to see buildings

Building Polygon features colored based on DSM aggregate values.

Applications

Height approximation: Estimate the height of buildings from DSM
Crop yield estimation: Determine vegetation health within crop polygons using NDVI rasters
Water resource management: Assess surface water availability in watersheds using NDWI rasters
Forestry: Track changes in forest cover over time within administrative boundaries

Implementing zonal statistics

The analysis involves first determining the pixels within a polygon and then calculating their aggregate statistics, such as their count, mean, sum, or maximum.

Implementation steps

To perform this operation, first load a raster and a vector dataset for spatially overlapping areas, then perform zonal aggregations as shown below using the xarray and GeoPandas libraries.

Load overlapping the raster and vector data
For the extent of each Polygon, clip the raster to create a mask array
Perform an aggregation operation (e.g., mean, sum, max) on each masked array
Return the results in a GeoDataFrame, where each row corresponds to a Polygon and the columns contain the value for the aggregate zonal statistics

Example UDF

This example UDF loads the building footprint table gdf and DSM raster arr from S3 only for a section defined by bounds, which corresponds to a map tile. It then calculates the zonal statistics and returns the results as the GeoDataFrame gdf_zonal. This process enables Fused Workbench to perform the calculation dynamically as users scroll and zoom in on the map.

@fused.udf
def udf(
    bounds: fused.types.Bounds=None, min_zoom=15, table="s3://fused-asset/infra/building_msft_us/"
):
    dsm_to_tile = fused.load("https://github.com/fusedio/udfs/tree/91845c4/public/DSM_JAXA_Example").utils.dsm_to_tile
    utils = fused.load("https://github.com/fusedio/udfs/tree/2a76b6a/public/common/").utils

    gdf = utils.table_to_tile(bounds, table, min_zoom)
    arr = dsm_to_tile(bounds, z_levels=[4, 6, 9, 11], verbose=False)
    gdf_zonal = utils.geom_stats(gdf, arr)
    return gdf_zonal

The UDF would return a GeoDataFrame like the following, with an aggregate stats column for each input Polygon:

                                                geometry      stats  count
  POLYGON ((-122.39806 37.76221, -122.39806 37.7...      22.00      5
  POLYGON ((-122.39881 37.76219, -122.39876 37.7...      21.99    193
 POLYGON ((-122.39619 37.76326, -122.39613 37.7...      13.50      2
 POLYGON ((-122.39774 37.76327, -122.39776 37.7...      11.00     28
 POLYGON ((-122.39680 37.76362, -122.39701 37.7...      10.20      5
...

Conclusion

While zonal statistics is a valuable tool, it's often just the starting point for more complex analyses. Combining zonal statistics with other techniques such timeseries analysis or machine learning approaches can provide even richer insights into your data.

Applications​

Implementing zonal statistics​

Implementation steps​

Example UDF​

Conclusion​

Demo app​

Applications

Implementing zonal statistics

Implementation steps

Example UDF

Conclusion

Demo app