Skip to main content

Writing Geospatial Data

When working with geospatial data in Fused we recommend saving files in these formats:

  • Vector data: GeoParquet
  • Raster data: Cloud Optimized GeoTIFF (COG)

Vector: GeoParquet

@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/subway_stations.geojson"):
import geopandas as gpd

gdf = gpd.read_file(path)

# Process data...

# Save to your Fused bucket
username = fused.api.whoami()['handle']
output_path = f"fd://{username}/subway_stations.parquet"
gdf.to_parquet(output_path)

return f"File saved to {output_path}"

Raster: Cloud Optimized GeoTIFF (COG)

@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/satellite_imagery/wildfires.tiff"):
import rasterio
import numpy as np

# Read the raster data
with rasterio.open(path) as src:
data = src.read()
profile = src.profile

# Process the data
processed_data = np.where(data > np.percentile(data, 80), 255, 0).astype(np.uint8)

# Update profile for writing
profile.update({
'driver': 'GTiff',
'compress': 'lzw',
'dtype': 'uint8'
})

# Write to Fused's shared disk (accessible to all UDFs in org)
username = fused.api.whoami()['handle']
output_path = f"/mnt/cache/wildfires_processed_{username}.tif"

with rasterio.open(output_path, 'w', **profile) as dst:
dst.write(processed_data)

return f"File saved to shared disk at {output_path}"

Large Datasets: fused.ingest()

For large geospatial datasets, use fused.ingest() to create optimized, geo-partitioned files. This enables efficient spatial queries on datasets of any size.

# Get your user handle 
user = fused.api.whoami()['handle']

# Ingest Washington DC Census data
job = fused.ingest(
input="https://www2.census.gov/geo/tiger/TIGER_RD18/LAYER/TRACT/tl_rd22_11_tract.zip",
output=f"fd://{user}/data/census/partitioned/",
)

job.run_batch()

You can tail logs to see how the job is progressing:

fused.api.job_tail_logs("your-job-id")

Learn more about geospatial data ingestion.