Aggregate

After ingesting a dataset to H3 hexagons it's possible to visualize it in Fused but we can also directly work with the data.

This page shows how to aggregate data at different resolutions or create derivative layers.

Aggregating H3 Data

Example aggregating elevation data across the US at different H3 resolutions:

@fused.udf
def udf(
    bounds: fused.types.Bounds = [-74.556, 40.400, -73.374, 41.029],  # Default to full NYC
):
    path = "s3://fused-asset/hex/copernicus-dem-90m/"
    hex_reader = fused.load("https://github.com/fusedio/udfs/tree/8024b5c/community/joris/Read_H3_dataset/")

    df = hex_reader.read_h3_dataset(path, bounds, res=None) # res=None lets the hex_reader determine the resolution based on the current bounds

    return df

Using existing ingested H3 reader

You can already use the h3_reader.read_h3_dataset and specify:

res at a specific resolution
value to filter on a specific value (especially for categorical data)

This reader uses DuckDB under the hood to read the data meaning you can write additional queries after reading the data as shown in creating derivative layers.

Aggregate at lower resolutions

H3 hexagons allow for aggregation at different resolutions. The ingestion process already creates different overviews at different H3 resolutions.

Numerical data

Example numerical values:

Temperature
Elevation
Income
Population

Aggregating means:

Summing values (i.e. how many people live in a H3 cell in total)
Taking mean / max / min / stddev (i.e. what is the average temperature in a H3 cell)

Example: Aggregating elevation across the US at different H3 levels

@fused.udf
def udf(
    bounds: fused.types.Bounds = [-74.556, 40.400, -73.374, 41.029],  # Default to full NYC
    res: int = 4, # Default to H3 resolution 4
):
    path = "s3://fused-asset/hex/copernicus-dem-90m/"
    hex_reader = fused.load("https://github.com/fusedio/udfs/tree/dd40354/community/joris/Read_H3_dataset/")
    df = hex_reader.read_h3_dataset(
        path, 
        bounds, 
        res=res
    )
    return df

Categorical data

Example categorical data:

Land Use
Crop Type
Zone

Aggregating means:

Counting values (i.e. how many different crop types are there in a H3 cell)
Taking mode (i.e. what is the most common crop type in a H3 cell)

Example: Aggregating Corn yields across the US at different H3 levels

Expand for code

@fused.udf
def udf(
    bounds: fused.types.Bounds = [-127.54220803198237,10.667151173068717,-66.93703570835524,55.22298640160706],
    res: int = None,  # if left to None, hex_reader will determine resolution itself
    data_value: int = 1,  # 1=Corn in CDL for this example
    year: int = 2024,
):
    import h3.api.basic_int as h3
    path = f"s3://fused-asset/hex/cdls_v8/year={year}/"

    common = fused.load("https://github.com/fusedio/udfs/tree/6dd2c4e/public/common/")
    hex_reader = fused.load("https://github.com/fusedio/udfs/tree/f2b3909/community/joris/Read_H3_dataset/")

    # Read all hexagons where the CDL value matches the requested data class (e.g. Corn)
    df = hex_reader.read_h3_dataset(path, bounds, res=res, value=data_value)
    print(df.T)
    
    if 'pct' not in df.columns:
        # Dynamically compute the % of each hex covered by the specified crop type
        data_res = h3.get_resolution(df["hex"].iloc[0])
        print(f"{data_res=}")

        con = common.duckdb_connect()
        df = con.query(f"""
            SELECT 
                hex,
                SUM(area) as total_area,
                ANY_VALUE(data) as data,
                h3_get_hexagon_area_avg({data_res}, 'm^2') as hex_area,
                (SUM(area) / h3_get_hexagon_area_avg({data_res}, 'm^2')) * 100 as pct
            FROM df 
            WHERE data == {data_value}
            GROUP BY hex
        """).to_df()

    print(df.shape)
    if df.shape[0] > 0:
        return df
    else:
        return None

Derivative layers

You can also create derivative layers by writing custom logic after reading data using the h3_reader.read_h3_dataset

Example: Calculating slope from elevation

Expand for code

@fused.udf
def udf(
    bounds: fused.types.Bounds = [-74.556, 40.400, -73.374, 41.029],  # Default to full NYC
):
    path = "s3://fused-asset/hex/copernicus-dem-90m/"
    
    hex_reader = fused.load("https://github.com/fusedio/udfs/tree/8024b5c/community/joris/Read_H3_dataset/")
    df = hex_reader.read_h3_dataset(path, bounds, res=None)

    # Slope calculation using h3_grid_ring
    common = fused.load("https://github.com/fusedio/udfs/tree/5b11e17/public/common/")
    con = common.duckdb_connect()
    qr = f"""
    WITH hex_neighbors AS (
        SELECT 
            h1.hex AS hex,
            h1.data_avg AS elevation,
            UNNEST(h3_grid_ring(h1.hex, 1)) AS neighbor_hex
        FROM df h1       -- Notice that DuckDB allows you to directly call the dataframe by its name
    ),
    neighbor_data AS (
        SELECT
            n.hex,
            n.elevation,
            n.neighbor_hex,
            df2.data_avg AS neighbor_elevation,
            h3_get_hexagon_edge_length_avg(h3_get_resolution(n.hex), 'm') AS edge_length_m
        FROM hex_neighbors n
        JOIN df df2 ON n.neighbor_hex = df2.hex
    ),
    calculations AS (
        SELECT
            hex,
            elevation,
            DEGREES(ATAN(ABS(elevation - neighbor_elevation) / edge_length_m)) AS slope_deg
        FROM neighbor_data
    ),
    slope_aggregated AS (
        SELECT
            hex,
            elevation,
            AVG(slope_deg) * 100 AS avg_slope_deg,
            MAX(slope_deg) * 100 AS max_slope_deg
        FROM calculations
        GROUP BY hex, elevation
    )
    SELECT
        s.hex,
        ROUND(s.elevation, 2) AS elevation,
        ROUND(s.avg_slope_deg, 2) AS avg_slope_deg,
        ROUND(s.max_slope_deg, 2) AS max_slope_deg
    FROM slope_aggregated s;
    """
    processed_df = con.execute(qr).df()

    print(processed_df.describe())

    return processed_df

Aggregate at lower resolutions​

Numerical data​

Categorical data​

Derivative layers​

Aggregate at lower resolutions

Numerical data

Categorical data

Derivative layers