Python SDK
The Fused Python SDK (fused-py
) contains utility functions that can be called from Workbench and can also be installed in Python environments to interact with Fused. Use it to run Fused UDFs from Jupyter notebooks.
Documentation overview
📄️ Top-Level Functions
@fused.udf
🗃️ API Reference
3 items
📄️ Authentication
Authenticate Jupyter Notebooks to use the Fused Python SDK
📄️ Dependencies
To keep things simple, Fused maintains a single runtime image. The Python packages it are listed below and can also be found in this public .txt file.
📄️ Changelog
v1.8.0 (2024-06-25)
📄️ Contribute to Fused
Overview
Install
fused-py
is a breeze to get started with.
- Set up a Python environment:
python3 -m venv .venv
source .venv/bin/activate
- Run:
pip install fused
Fused support Python versions >=3.8
to <3.12
.
- Authenticate:
from fused import NotebookCredentials
credentials = NotebookCredentials()
Run this snippet from a Notebook Cell and follow the authentication flow, which will store a credentials file in ~/.fused/credentials
.
Quickstart
This snippet shows how to import and then run a UDF from the UDF Catalog GitHub repo.
import fused
udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/DuckDB_NYC_Example")
gdf = udf.run_local()
gdf
Similarly, as a bash oneliner.
python -c "import fused; gdf = fused.load('https://github.com/fusedio/udfs/tree/main/public/DuckDB_NYC_Example').run_local(); print(gdf);"
The following "API Reference" sections show how to write, manage, and run UDFs, as well as the different utility functions designed to make your life easy.
The main thing to remember at this point is that UDFs are simply Python functions decorated with @fused.udf
.
Load a UDF
Loading UDFs is a fundamental aspect of collaborative and streamlined workflows. It fosters discoverability within teams and the UDF community, promotes reuse of existing code, and simplifies your code.
UDFs can be loaded from various sources, including GitHub repositories, local files, and the Fused cloud.
Loading a UDF from a GitHub URL:
udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/REM_with_HyRiver/")
Loading a UDF from a local file:
udf = fused.load("/localpath/REM_with_HyRiver/")
Loading a UDF using a Fused platform-specific identifier:
udf = fused.load("username@fused.io/REM_with_HyRiver")
Loading UDFs from GitHub repositories or local files does not require authentication to the Fused platform.
Run a UDF
☝️ Read more about File & Tile execution modes in the core concepts section.
Once a UDF is loaded, running it executes the parametrized function code and returns the function output.
UDFs by default run as a single operation, called File
mode, and can run as spatially partitioned, called Tile
.
File
. By default, UDFs run as a single operation and return all data in one call. This option is suitable for localized and smaller outputs where fetching the entire dataset at once is feasible.Tile
. In this mode, UDFs process data for specific geographic areas defined by predefined bounding boxes. These bounding boxes can be specified in various ways. This option is suitable for datasets that cover geographic extents and allow for spatial queries. Compute tasks are distributed among worker, with each worker processing only the fraction of data corresponding to a specific tile. This enables parallel processing and efficient computation.
Deciding which to use is based on the underlying dataset and the client. This is specified by the parameters of fused.run
.
Run as File
To run as File, a UDF definition is run without specifying geometry parameters.
import fused
udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/DuckDB_NYC_Example")
gdf = fused.run(udf=udf)
gdf
Run as Tile
To run as a Tile, a UDF definition needs to have its initial parameter specified as bbox
, serving as a reserved keyword parameter. When this bounding box parameter is specified, UDFs slice data based on the bounds of individual tiles.
When a UDF is called with parameters that specify a tile, Fused will convert them to the corresponding bbox
. Below are the different ways to specify tiles.
a) With lat
, lng
, z
parameters
See the documentation for the mercantile
Python library, for reference.
import fused
udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/DEM_10m_Tile_Example")
fused.run(udf=udf, lat=37.1, lng=-122.0, z=13)
b) With x
, y
, z
parameters
import fused
# Load and run UDF
udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/DEM_10m_Tile_Example")
fused.run(udf=udf, x=2411, y=3079, z=13)
c) Shapely box (coming soon)
Specify the bounding box with a shapely.geometry.box
type.
import fused
from shapely.geometry import box
# Load and run UDF
udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/LULC_Tile_Example")
fused.run(udf=udf, bbox=box(-77.34375, 38.41055, -77.167968, 38.54816))
d) Shapely polygon (coming soon)
Specify the bounding box with a shapely.geometry.Polygon
type.
import fused
from shapely.geometry import Polygon
# Define bbox polygon
polygon = Polygon([[-77.16796, 38.54816], [-77.16796, 38.41055], [-77.34375, 38.41055], [-77.34375, 38.54816], [-77.16796, 38.54816]])
# Load and run UDF
udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/LULC_Tile_Example")
fused.run(udf=udf, bbox=polygon)
e) GeoDataFrame
Specify the bounding box with a geopandas.geodataframe.GeoDataFrame
type.
import fused
import geopandas as gpd
# Define GeoDataFrame
gdf = gpd.read_file('{"geometry": {"type": "Polygon", "coordinates": [[[-77.16796875, 38.54816542304658], [-77.16796875, 38.410558250946075], [-77.34375, 38.410558250946075], [-77.34375, 38.54816542304658], [-77.16796875, 38.54816542304658]]]}}]')
# Load and run UDF
udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/LULC_Tile_Example")
fused.run(udf=udf, bbox=gdf)
Save a UDF
UDFs can be saved to the local file system, to the Fused cloud, and to GitHub.
- UDFs saved to the Fused cloud can be used as HTTP endpoints.
- UDFs saved to the local file system or GitHub can be loaded with
fused.load
as described above.
First, create a UDF object.
import fused
@fused.udf
def my_udf():
return "Hello from Fused!"
Save locally as a directory:
my_udf.to_directory("my_udf")
Save locally as a .zip file:
my_udf.to_file("my_udf.zip")
Save to a GitHub gist:
my_udf.to_gist()
Save remotely to Fused (under the same name as the function object):
my_udf.to_fused()
UDFs saved to file systems are structured as a directory, which makes them easy to share and transport. Each UDF, like Sample_UDF
, is contained within its own subdirectory within the public
directory - along with its documentation, code, metadata, and utility function code. This means they can be thought of as a standalone Python package.
└── Sample_UDF
├── README.MD
├── Sample_UDF.py
├── meta.json
└── utils.py
Files relevant to each UDF are:
README.md
Provides details of the UDF's purpose and how it works.Sample_UDF.py
This eponymous Python file contains the UDF's business logic as a Python function decorated with@fused.udf
.meta.json
This file contains metadata needed to show the UDF in the UDF Catalog.utils.py
This (optional) Python file contains helper functions the UDF imports and references.
Typing
Fused UDFs support Python function annotations. Annotated parameters convert to the specified type before the UDF is called.
This is important to ensure that parameters serialized on HTTP calls resolve to the intended type. For example, the UDF below takes an integer and a dictionary, and is annotated as follows.
from typing import Dict
import fused
@fused.udf
def udf(my_param: int, my_dict: Dict):
print(my_param, type(my_param)) # int
print(my_dict, type(my_dict)) # Dict
Supported types
This feature is under active development. Presently supported types are int
, float
, bool
, list
, dict
, List
, Dict
, Iterable
, uuid.UUID
, Optional[]
, and gpd.GeoDataFrame
from a GeoJSON. Union
is not supported. Parameters that are not annotated are handled as strings.
Fused also exposes special types to specify whether the output in Workbench should be handled as Tile
or File
. These are fused.types.TileXYZ
and fused.types.TileGDF
respectively. The bbox
parameter is typed as fused.types.Bbox
. You can read about them in the bbox
object types section of the documentation.
The gpd.GeoDataFrame
type
A common pattern in geospatial is to run operations on used-defined polygons. UDFs can accept a gpd.GeoDataFrame
type parameter if serialized as a stringified GeoJSON object. This is because Fused needs a serializable object to pass to the UDF via an API call and GeoDataFrames
don't serialize well. To address this, the gpd.GeoDataFrame
type signals Fused to typecast the input GeoJSON string to a GeoDataFrame
.
For example the following UDF accepts a gpd.GeoDataFrame
type for the target_gdf
parameter.
import geopandas as gpd
@fused.udf
def my_udf(target_gdf: gpd.GeoDataFrame=None):
return target_gdf
When calling the UDF with a GeoJSON string for the target_gdf
parameter, Fused will automatically convert the string to a GeoDataFrame
object.
import fused
import geopandas as gpd
import json
target_geom = json.dumps({"type":"FeatureCollection","features":[{"type":"Feature","properties":{},"geometry":{"type":"Polygon","coordinates":[[[-122.51121183018593,37.77096317381872],[-122.47612122306056,37.77228925202057],[-122.44351531587931,37.77597823096521],[-122.44362991745042,37.7664402812457],[-122.50998289649768,37.76279226591039],[-122.51121183018593,37.77096317381872]]]}}]})
gdf = fused.run(my_udf, target_gdf=target_geom)
Note that the parameter must not be named bbox
because fused.run
will handle that reserved parameter name in a special way.