Skip to main content

Exploring Maxar Open Data

A guide showing how to use Fused to get all of Maxar's Open Data from all the available STAC catalogs and explore the imagery

Requirements

Summary

Maxar, the high resolution satellite image company, offers some of their data in the open specifically for following natural disaster. This data is available as a series of STAC Catalogs for each of event.

One of the limitation of this setup is that each event is its own STAC Catalog making it hard to parse through all the available Maxar open data if we were to look for specific images. While there is a STAC Browser available it still only gives us access to a single 'Event' at a time

Maxar STAC Catalogs

In this in-depth Example we'll:

  • Show how to use fused.submit() to fetch all the available STAC Catalogs all at once in parallel in just a few seconds
  • Filter for any data we want (latest event, only cloud free images, etc.)
  • Write a UDF to then visualise some of the data

Maxar's Open Data

To access Maxar's Open Data we first need to understand how it's structured. After a quick Google search we find the "Maxar ARD Open Data Catalog" containing:

  • A main catalog containing a list of all the events: https://maxar-opendata.s3.amazonaws.com/events/catalog.json
  • Every event being under it's own events/ directory, for example: https://maxar-opendata.s3.amazonaws.com/events/BayofBengal-Cyclone-Mocha-May-23/collection.json. Each collection then contains

We can use Workbench to write a UDF to explore one of the collections. At the time of writing one of the most recent events is the Jan 2025 Los Angeles Wildfires:

@fused.udf
def udf(event_name: str = "WildFires-LosAngeles-Jan-2025"):

common = fused.load("https://github.com/fusedio/udfs/tree/495e84/public/common/").utils
gdf = common.stac_to_gdf_maxar(event_name, 1000)

print(f"{gdf.shape=}")
return gdf

We've abstracted away some of the logic for how to gather then STAC catalog inside a common function.

How Fused uses helper functions: looking at stac_to_gdf_maxar

Working with Fused UDFs also give you the option to easily use functions defined in other UDFs. In practice this means we've created a common public UDF that contains some functions we've found useful when working with any type of data.

You can explore it yourself by directly reading the code in Github. If you see any functions you'd like to use, we strongly recommend you use fused.load() and pass the latest commit hash at the time you want to use it:

commit_hash = "39d93ca" # Latest commit hash of https://github.com/fusedio/udfs/blob/main/public/common/utils.py at time of writing
common = fused.load(f"https://github.com/fusedio/udfs/tree/{commit_hash}/public/common/").utils

Each Maxar Event itself contains multiple collections. We created a simple function that loops over all the available UNIQUE_ID/collection.json, reads them an appends them into a single GeoDataFrame:

Looking at the WildFires-LosAngeles-Jan-2025/collections.json file:

{
"type": "Collection",
"id": "WildFires-LosAngeles-Jan-2025",
"stac_version": "1.0.0",
"links": [
{"rel": "root","href": "../collection.json","type": "application/json"},
{
"rel": "child",
"href": "./ard/acquisition_collections/103001010A705C00_collection.json",
"type": "application/json"
}
{...}
],
"extent": {
"spatial": {
"bbox": [
[
-118.83595849685837,
33.94834763200993,
-117.96801495199365,
34.38301736
],
[
-118.65050916791418,
34.1935474876183,
-118.46364201282341,
34.33393673056412
],
[...]
]
},
"temporal": {"interval": [["2024-12-14 18:39:04Z","2025-01-20 18:32:44Z"]]}
},
"title": "Los Angeles Wildfires 2025",
"description": "Driven by strong Santa Ana winds, multiple wildfires are burning in the Los Angeles, California, area. More than 40,000 acres and more than 12,300 structures have burned; at least 19 people have died.",
"license": "CC-BY-NC-4.0"
}

So we create stac_to_gdf_maxar to:

  • Loop over all the UNIQUE_ID/collection.json files
  • Read each collection.json file
  • Extract the metadata & extent of each collection
  • Convert the extent into a GeoDataFrame
  • Concat all into a single GeoDataFrame

Once again, you can directly read the code in Github to see exactly how we do this

tip

You can easily rename your UDFs in Workbench. Rename this UDF to Maxar_Open_Data_STAC_single_catalog so we can call it later directly by name.

Make sure to save your UDF with Cmd + S (or Ctrl + S on Windows / Linux) or in the Workbench UI for these changes to take effect.

Renaming UDF

And here we get all the images for the Los Angeles Wildfires 2025 event:

Single collection Maxar STAC

Aggregating all available data

Getting all events

To be able to explore all of Maxar's Open Data Program we now need to run this specific UDF over all the available events.

We'll do this in 2 steps:

  • Fetch all the event names
  • Use fused.submit() to fetch all the STAC Collections for each event name
@fused.udf
def udf():
import pandas as pd
from pystac import Catalog

@fused.cache
def getting_stac_collection(stac_url = "https://maxar-opendata.s3.amazonaws.com/events/catalog.json"):
root_catalog = Catalog.from_file(stac_url)
collections = root_catalog.get_collections()
return [collection.id for collection in collections]

collections = getting_stac_collection()
print(f"{collections[:5]=}")

return collections

Let's break this UDF down:

  • We're using @fused.cache to cache the Catalog request so our UDF doesn't need to do this request each time we execute it. It prevents being rate limited and doing the same request over and over against the same endpoint
  • We're returning a list (collections) so if you run this in Workbench you'll notice nothing shows up on the map! That's also why we print the first 5 rows.
tip

Read through the Best Practices for more handy tips on how to write efficient and easy to debug UDFs

This UDF returns a list of all the available event names currently accessible through Maxar's Open STAC Catalog:

>>> print(f"{collections[:5]=}")
['BayofBengal-Cyclone-Mocha-May-23', 'Belize-Wildfires-June24', 'Brazil-Flooding-May24', 'Cyclone-Chido-Dec15', 'Earthquake-Myanmar-March-2025']

Maxar STAC Events

Preparing fused.submit() to run in parallel

We're going to use fused.submit() to run our first UDF in parallel. To do this we need a few things:

  • Prepare our inputs (in this case the name of all the events). We recommend doing this as a dataframe as it's simple to read & work with
  • Pass our first UDF to fused.submit()
@fused.udf
def udf():
import pandas as pd
from pystac import Catalog

@fused.cache
def getting_stac_collection(stac_url = "https://maxar-opendata.s3.amazonaws.com/events/catalog.json"):
root_catalog = Catalog.from_file(stac_url)
collections = root_catalog.get_collections()
return [collection.id for collection in collections]

collections = getting_stac_collection()
collections_df = pd.DataFrame({'event_name': collections})

dfs_out = fused.submit(
"Maxar_Open_Data_STAC_single_catalog",
collections_df,
debug_mode=True # Using debug to run just the 1st event at first
)
print(f"{dfs_out.head(3)=}")

return dfs_out

Let's unpack this:

  • We're calling the UDF called "Maxar_Open_Data_STAC_single_catalog" that we renamed earlier over collections_df. At the time of writing this example this represents 46 events§
  • We use fused.submit(..., debug_mode = True) to run only the 1st value from collections_df. This allows us to test that our fused.submit() job is written correctly.
tip

fused.submit() allows you to run a single over a list / dataframe of inputs in parallel. Under the hood Fused spins up many realtime instances (see technical docs for details) that will each run the given UDF (in this case "Maxar_Open_Data_STAC_single_catalog") all at the same time.

This is a powerful way to scale a process with just a single function call.

Read the dedicated Docs section on fused.submit() for more

Maxar submit debug mode

Getting all Maxar open data

Once we're confident that our fused.submit() job setup is correct, we can remove debug_mode=True (it's set to False by default) and run our UDF across all events.

We can also increase the number of max_workers, as we have 46 events and the default max_workers is set to 32. We can ask Fused server to thus spin up more instances for us so this parallel job is even faster:

@fused.udf
def udf():
import pandas as pd
from pystac import Catalog

@fused.cache
def getting_stac_collection(stac_url = "https://maxar-opendata.s3.amazonaws.com/events/catalog.json"):
root_catalog = Catalog.from_file(stac_url)
collections = root_catalog.get_collections()
return [collection.id for collection in collections]

collections = getting_stac_collection()
collections_df = pd.DataFrame({'event_name': collections})

dfs_out = fused.submit(
"Maxar_Open_Data_STAC_single_catalog",
collections_df,
max_workers=50, # Increasing the number of max_workers as we have more than events than the default value
)
print(f"{dfs_out.head(3)=}")

return dfs_out

After a few seconds, we get back a GeoDataFrame containing all the Maxar open data STAC catalogs:

Maxar submit all STACs

This allows us to do a few different things:

  • Explore all of the available Maxar Open Data on a map directly. This helps us see what data Maxar has available that might be of interest, to compare image quality across areas for example.
  • Offer a wide variety of high resolution imagery to query against. For example retrieving as much the cloud free imagery as possible

Choosing 1 Event to display

With access to all the images from Maxar, we can navigate the map and choose any image we'd like to display. Let's select one and display it in Workbench.

First we can use the Results Tab to find the URL of an image we'd like to display:

@fused.udf
def udf(
bounds: fused.types.Bounds,
path: str = "https://maxar-opendata.s3.amazonaws.com/events/Emilia-Romagna-Italy-flooding-may23/ard/33/031111210233/2023-05-23/1050010033C95B00-visual.tif",
chip_len=256,
display_extent: bool = True
):
import rasterio
import numpy as np
import geopandas as gpd
from shapely.geometry import box
from rasterio.session import AWSSession

# Getting just bounds of image so we can zoom to layer
if display_extent:
print("Returning extent")
with rasterio.Env(session=AWSSession()):
with rasterio.open(path) as src:
bbox_gdf = gpd.GeoDataFrame(geometry=[box(*src.bounds)],crs=src.crs)
bbox_gdf.to_crs(4326, inplace=True)

return bbox_gdf

# Otherwise reading the GeoTiff
else:
print("Returning image")
utils = fused.load("https://github.com/fusedio/udfs/tree/5432edc/public/common/").utils
tiles = utils.get_tiles(bounds)

arr = utils.read_tiff(tiles, path, output_shape=(chip_len, chip_len))
print(f"{arr.shape=}")
return arr

Unpacking this UDF:

  • This UDF takes :
    • a bounds object. This allows us to pass the current Workbench Map Viewport to our UDF
    • path represents the path on S3 to one of the images we want to display
    • chip_len: The size of the chip size we'd like our image to display in
note

These images can be loaded using bounds and Tile mode because Maxar has provided these images as Cloud Optimized Geotiffs. This allows us to leverage their tiles & overviews and only load the data we need as we pan around the map

We can check this by reading the metadata in CLI with gdalinfo and see that each band has Block, meaning is tiled:

gdalinfo /vsis3/maxar-opendata/events/Cyclone-Chido-Dec15/ard/38/300200022120/2024-06-11/104001009713BA00-visual.tif

>>>
Driver: GTiff/GeoTIFF

...

Band 1 Block=512x512 Type=Byte, ColorInterp=Red
Overviews: 8704x8704, 4352x4352, 2176x2176, 1088x1088, 544x544, 272x272
Mask Flags: PER_DATASET
Overviews of mask band: 8704x8704, 4352x4352, 2176x2176, 1088x1088, 544x544, 272x272
Band 2 Block=512x512 Type=Byte, ColorInterp=Green
Overviews: 8704x8704, 4352x4352, 2176x2176, 1088x1088, 544x544, 272x272
Mask Flags: PER_DATASET
Overviews of mask band: 8704x8704, 4352x4352, 2176x2176, 1088x1088, 544x544, 272x272
Band 3 Block=512x512 Type=Byte, ColorInterp=Blue
Overviews: 8704x8704, 4352x4352, 2176x2176, 1088x1088, 544x544, 272x272
Mask Flags: PER_DATASET
Overviews of mask band: 8704x8704, 4352x4352, 2176x2176, 1088x1088, 544x544, 272x272

Running the above UDF we can for now return the extent of the image:

maxar return img extent

This allows us to introduce 2 concepts in Workbench:

1. Setting a default view in Workbench

After getting the extent of our image, we're going to Zoom to layer and set this view as the default view:

This allows us to to any change we want to this UDF or pan anywhere on the map and always be able to zoom back to this default view!

2. Displaying the image

Now we can edit our UDF to return the image by changing display_extent to False:

@fused.udf
def udf(
bounds: fused.types.Bounds,
path: str = "https://maxar-opendata.s3.amazonaws.com/events/Emilia-Romagna-Italy-flooding-may23/ard/33/031111210233/2023-05-23/1050010033C95B00-visual.tif",
chip_len=256,
display_extent: bool = False
):
import rasterio
import numpy as np
import geopandas as gpd
from shapely.geometry import box
from rasterio.session import AWSSession

# Getting just bounds of image so we can zoom to layer
if display_extent:
print("Returning extent")
with rasterio.Env(session=AWSSession()):
with rasterio.open(path) as src:
bbox_gdf = gpd.GeoDataFrame(geometry=[box(*src.bounds)],crs=src.crs)
bbox_gdf.to_crs(4326, inplace=True)

return bbox_gdf

# Otherwise reading the GeoTiff
else:
print("Returning image")
utils = fused.load("https://github.com/fusedio/udfs/tree/5432edc/public/common/").utils
tiles = utils.get_tiles(bounds)

arr = utils.read_tiff(tiles, path, output_shape=(chip_len, chip_len))
print(f"{arr.shape=}")
return arr

This change returns the image instead of the extent, but it returns the image all at once, because Workbench is set to "File" mode by default

File mode array return

If you reproduce this yourself and pan around the map you'll notice:

  • We see the whole image but with a relatively low resolution.
  • Nothing changes as we pan around the map (resolution doesn't change)

We can change this by setting Workbench to "Tile" mode:

Under the hood, switching to "Tile" mode tells Workbench to run this UDF not only 1 times, but by breaking it up into Mercantile tiles. This is why you see the image being broken up into a grid of tiles.

These different modes don't change what code is being executed, as our udf didn't change. It's only changing what geospatial parameters are being passed:

  • "File" mode passes the bounds of the current viewport (run 1 time§)
  • "Tile" mode passes the bounds of the current viewport broken up into a grid of tiles (run 1 per each tile)

Next steps

We've shown you how to:

If you want to go a bit further you could: