# Zonal Statistics: A Comprehensive Guide

Zonal statistics calculates aggregate statistics (e.g., mean, sum, maximum) of the pixel values of a raster that fall within areas defined by a `Polygon`

dataset. The method has diverse applications such as determining vegetation health in an agricultural field, assessing surface water availability, or approximating building heights from DSM rasters.

*Building Polygon features colored based on DSM aggregate values.*

## Applications

- Height approximation: Estimate the height of buildings from DSM
- Crop yield estimation: Determine vegetation health within crop polygons using NDVI rasters
- Water resource management: Assess surface water availability in watersheds using NDWI rasters
- Forestry: Track changes in forest cover over time within administrative boundaries

## Implementing zonal statistics

The analysis involves first determining the pixels within a polygon and then calculating their aggregate statistics, such as their count, mean, sum, or maximum.

### Implementation steps

To perform this operation, first load a raster and a vector dataset for spatially overlapping areas, then perform zonal aggregations as shown below using the `xarray`

and `GeoPandas`

libraries.

- Load overlapping the raster and vector data
- For the extent of each
`Polygon`

, clip the raster to create a mask array - Perform an aggregation operation (e.g., mean, sum, max) on each masked array
- Return the results in a
`GeoDataFrame`

, where each row corresponds to a`Polygon`

and the columns contain the value for the aggregate zonal statistics

### Example UDF

This example UDF loads the building footprint table `gdf`

and DSM raster `arr`

from S3 only for a section defined by `bbox`

, which corresponds to a map tile. It then calculates the zonal statistics and returns the results as the `GeoDataFrame`

`gdf_zonal`

. This process enables Fused Workbench to perform the calculation dynamically as users scroll and zoom in on the map.

`@fused.udf`

def udf(

bbox: fused.types.TileGDF=None, min_zoom=15, table="s3://fused-asset/infra/building_msft_us/"

):

dsm_to_tile = fused.load("https://github.com/fusedio/udfs/tree/eda5aec/public/DSM_JAXA_Example").utils.dsm_to_tile

utils = fused.load("https://github.com/fusedio/udfs/tree/2a76b6a/public/common/").utils

gdf = utils.table_to_tile(bbox, table, min_zoom)

arr = dsm_to_tile(bbox, z_levels=[4, 6, 9, 11], verbose=False)

gdf_zonal = utils.geom_stats(gdf, arr)

return gdf_zonal

The UDF would return a `GeoDataFrame`

like the following, with an aggregate `stats`

column for each input `Polygon`

:

` geometry stats count`

660 POLYGON ((-122.39806 37.76221, -122.39806 37.7... 22.00 5

661 POLYGON ((-122.39881 37.76219, -122.39876 37.7... 21.99 193

1452 POLYGON ((-122.39619 37.76326, -122.39613 37.7... 13.50 2

1458 POLYGON ((-122.39774 37.76327, -122.39776 37.7... 11.00 28

3033 POLYGON ((-122.39680 37.76362, -122.39701 37.7... 10.20 5

...

## Conclusion

While zonal statistics is a valuable tool, it's often just the starting point for more complex analyses. Combining zonal statistics with other techniques such timeseries analysis or machine learning approaches can provide even richer insights into your data.