Skip to main content

Ingesting Dataset to H3

Vector to Hex​

Example of ingesting a release of the Overture Building dataset to H3 hexagons:

Run this in a notebook or a Python shell, not in Workbench

The following is similar to a vector dataset ingestion, running on batch jobs.

Therefore we recommend running this in a notebook or a Python shell, not in Workbench.

# Run this in a notebook or a Python shell, not in Workbench
username = fused.api.whoami() # Getting username to save data in your own S3 bucket
release = '2025-01-22-0'
# Run this in a notebook or a Python shell, not in Workbench
args = [{'input_path': f's3://us-west-2.opendata.source.coop/fused/overture/{release}/theme=buildings/type=building/',
'output_path': f's3://fused-users/fused/{username}/overture_overview/{release}/',
'hex_res': 11}]

In this example we'll be simply counting the number of buildings per hexagon.

# Run this in a notebook or a Python shell, not in Workbench
Overture_Hexify = fused.load("https://github.com/fusedlabs/fusedudfs/tree/main/Overture_Hexify/")
j = Overture_Hexify(arg_list=args)

j.run_remote(instance_type='r5.16xlarge', disk_size_gb=999)

Reading the ingested data becomes a lot simpler & faster:

Code
common = fused.load("https://github.com/fusedio/udfs/blob/main/public/common/")
@fused.udf
def udf(bounds: fused.types.Bounds = [-122.71963771127753,36.53196328805067,-120.70395948802646,38.082911654639275]):
res = bounds_to_res(bounds)
print(res)
releases = ['2024-02-15-alpha-0', '2024-03-12-alpha-0', '2024-08-20-0', '2024-09-18-0', '2024-10-23-0', '2024-11-13-0', '2024-12-18-0', '2025-01-22-0', '2025-03-19-1', '2025-04-23-0', '2025-05-21-0']
release1 = releases[-1]
df1 = common.read_hexfile(bounds, f"s3://fused-users/fused/sina/overture_overview/{release1}/hex{res}.parquet", clip=True)

return df1

@fused.cache
def bounds_to_res(bounds, res_offset=0, max_res=14, min_res=3):
z = common.estimate_zoom(bounds)
return max(min(int(3 + max(0, z - 3) / 1.7 + res_offset), max_res), min_res)