Skip to main content

Python SDK

The Fused Python SDK (fused-py) contains utility functions that can be called from Workbench and can also be installed in Python environments to interact with Fused. Use it to run Fused UDFs from Jupyter notebooks.

Documentation overview

Install

fused-py is a breeze to get started with.

  1. Set up a Python environment:
python3 -m venv .venv
source .venv/bin/activate
  1. Run:
pip install fused
note

Fused support Python versions >=3.8 to <3.12.

  1. Authenticate:
from fused import NotebookCredentials
credentials = NotebookCredentials()

Run this snippet from a Notebook Cell and follow the authentication flow, which will store a credentials file in ~/.fused/credentials.

Quickstart

This snippet shows how to import and then run a UDF from the UDF Catalog GitHub repo.

import fused

udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/DuckDB_NYC_Example")
gdf = udf.run_local()
gdf

Similarly, as a bash oneliner.

python -c "import fused; gdf = fused.load('https://github.com/fusedio/udfs/tree/main/public/DuckDB_NYC_Example').run_local(); print(gdf);"

The following "API Reference" sections show how to write, manage, and run UDFs, as well as the different utility functions designed to make your life easy.

The main thing to remember at this point is that UDFs are simply Python functions decorated with @fused.udf.

Load a UDF

Loading UDFs is a fundamental aspect of collaborative and streamlined workflows. It fosters discoverability within teams and the UDF community, promotes reuse of existing code, and simplifies your code.

UDFs can be loaded from various sources, including GitHub repositories, local files, and the Fused cloud.

Loading a UDF from a GitHub URL:

udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/REM_with_HyRiver/")

Loading a UDF from a local file:

udf = fused.load("/localpath/REM_with_HyRiver/")

Loading a UDF using a Fused platform-specific identifier:

udf = fused.load("username@fused.io/REM_with_HyRiver")
note

Loading UDFs from GitHub repositories or local files does not require authentication to the Fused platform.

Run a UDF

☝️ Read more about File & Tile execution modes in the core concepts section.

Once a UDF is loaded, running it executes the parametrized function code and returns the function output.

UDFs by default run as a single operation, called File mode, and can run as spatially partitioned, called Tile.

  • File. By default, UDFs run as a single operation and return all data in one call. This option is suitable for localized and smaller outputs where fetching the entire dataset at once is feasible.
  • Tile. In this mode, UDFs process data for specific geographic areas defined by predefined bounding boxes. These bounding boxes can be specified in various ways. This option is suitable for datasets that cover geographic extents and allow for spatial queries. Compute tasks are distributed among worker, with each worker processing only the fraction of data corresponding to a specific tile. This enables parallel processing and efficient computation.

Deciding which to use is based on the underlying dataset and the client. This is specified by the parameters of fused.run.

Run as File

To run as File, a UDF definition is run without specifying geometry parameters.

import fused

udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/DuckDB_NYC_Example")
gdf = fused.run(udf=udf)
gdf

Run as Tile

To run as a Tile, a UDF definition needs to have its initial parameter specified as bbox, serving as a reserved keyword parameter. When this bounding box parameter is specified, UDFs slice data based on the bounds of individual tiles.

When a UDF is called with parameters that specify a tile, Fused will convert them to the corresponding bbox. Below are the different ways to specify tiles.

a) With lat, lng, z parameters

See the documentation for the mercantile Python library, for reference.

import fused

udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/DEM_10m_Tile_Example")
fused.run(udf=udf, lat=37.1, lng=-122.0, z=13)

b) With x, y , z parameters

import fused

# Load and run UDF
udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/DEM_10m_Tile_Example")
fused.run(udf=udf, x=2411, y=3079, z=13)

c) Shapely box (coming soon)

Specify the bounding box with a shapely.geometry.box type.

import fused
from shapely.geometry import box

# Load and run UDF
udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/LULC_Tile_Example")
fused.run(udf=udf, bbox=box(-77.34375, 38.41055, -77.167968, 38.54816))

d) Shapely polygon (coming soon)

Specify the bounding box with a shapely.geometry.Polygon type.

import fused
from shapely.geometry import Polygon

# Define bbox polygon
polygon = Polygon([[-77.16796, 38.54816], [-77.16796, 38.41055], [-77.34375, 38.41055], [-77.34375, 38.54816], [-77.16796, 38.54816]])

# Load and run UDF
udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/LULC_Tile_Example")
fused.run(udf=udf, bbox=polygon)

e) GeoDataFrame

Specify the bounding box with a geopandas.geodataframe.GeoDataFrame type.

import fused
import geopandas as gpd

# Define GeoDataFrame
gdf = gpd.read_file('{"geometry": {"type": "Polygon", "coordinates": [[[-77.16796875, 38.54816542304658], [-77.16796875, 38.410558250946075], [-77.34375, 38.410558250946075], [-77.34375, 38.54816542304658], [-77.16796875, 38.54816542304658]]]}}]')

# Load and run UDF
udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/LULC_Tile_Example")
fused.run(udf=udf, bbox=gdf)
note

The fused.run convenience function wraps run_tile and run_file functions, which can optionally be used.

Save a UDF

UDFs can be saved to the local file system, to the Fused cloud, and to GitHub.

  • UDFs saved to the Fused cloud can be used as HTTP endpoints.
  • UDFs saved to the local file system or GitHub can be loaded with fused.load as described above.

First, create a UDF object.

import fused

@fused.udf
def my_udf():
return "Hello from Fused!"

Save locally as a directory:

my_udf.to_directory("my_udf")

Save locally as a .zip file:

my_udf.to_file("my_udf.zip")

Save to a GitHub gist:

my_udf.to_gist()

Save remotely to Fused (under the same name as the function object):

my_udf.to_fused()

UDFs saved to file systems are structured as a directory, which makes them easy to share and transport. Each UDF, like Sample_UDF, is contained within its own subdirectory within the public directory - along with its documentation, code, metadata, and utility function code. This means they can be thought of as a standalone Python package.

└── Sample_UDF
├── README.MD
├── Sample_UDF.py
├── meta.json
└── utils.py

Files relevant to each UDF are:

  • README.md Provides details of the UDF's purpose and how it works.
  • Sample_UDF.py This eponymous Python file contains the UDF's business logic as a Python function decorated with @fused.udf.
  • meta.json This file contains metadata needed to show the UDF in the UDF Catalog.
  • utils.py This (optional) Python file contains helper functions the UDF imports and references.

Typing

Fused UDFs support Python function annotations. Annotated parameters convert to the specified type before the UDF is called.

This is important to ensure that parameters serialized on HTTP calls resolve to the intended type. For example, the UDF below takes an integer and a dictionary, and is annotated as follows.

from typing import Dict
import fused

@fused.udf
def udf(my_param: int, my_dict: Dict):
print(my_param, type(my_param)) # int
print(my_dict, type(my_dict)) # Dict

Supported types

This feature is under active development. Presently supported types are int, float, bool, list, dict, List, Dict, Iterable, uuid.UUID, Optional[], and gpd.GeoDataFrame from a GeoJSON. Union is not supported. Parameters that are not annotated are handled as strings.

Fused also exposes special types to specify whether the output in Workbench should be handled as Tile or File. These are fused.types.TileXYZ and fused.types.TileGDF respectively. The bbox parameter is typed as fused.types.Bbox. You can read about them in the bbox object types section of the documentation.

The gpd.GeoDataFrame type

A common pattern in geospatial is to run operations on used-defined polygons. UDFs can accept a gpd.GeoDataFrame type parameter if serialized as a stringified GeoJSON object. This is because Fused needs a serializable object to pass to the UDF via an API call and GeoDataFrames don't serialize well. To address this, the gpd.GeoDataFrame type signals Fused to typecast the input GeoJSON string to a GeoDataFrame.

For example the following UDF accepts a gpd.GeoDataFrame type for the target_gdf parameter.

import geopandas as gpd

@fused.udf
def my_udf(target_gdf: gpd.GeoDataFrame=None):
return target_gdf

When calling the UDF with a GeoJSON string for the target_gdf parameter, Fused will automatically convert the string to a GeoDataFrame object.

import fused
import geopandas as gpd
import json

target_geom = json.dumps({"type":"FeatureCollection","features":[{"type":"Feature","properties":{},"geometry":{"type":"Polygon","coordinates":[[[-122.51121183018593,37.77096317381872],[-122.47612122306056,37.77228925202057],[-122.44351531587931,37.77597823096521],[-122.44362991745042,37.7664402812457],[-122.50998289649768,37.76279226591039],[-122.51121183018593,37.77096317381872]]]}}]})

gdf = fused.run(my_udf, target_gdf=target_geom)
warning

Note that the parameter must not be named bbox because fused.run will handle that reserved parameter name in a special way.