Un-pythonic aspects of Fused
Fused UDFs look like Python functions but run in a managed, distributed environment. That environment imposes a few constraints that feel unusual if you're used to writing standard Python scripts.
Imports must be inside the UDF
In regular Python you put imports at the top of the file. In a Fused UDF they must go inside the function body:
# import at module level
import pandas as pd
@fused.udf
def udf():
return pd.DataFrame({"a": [1, 2, 3]})
@fused.udf
def udf():
import pandas as pd
return pd.DataFrame({"a": [1, 2, 3]})
Why: Fused serializes and ships your UDF function to a remote worker. Only the function body travels — the surrounding module scope does not. Any name referenced at module level won't exist on the worker when the function runs.
Fused serialises the outputs
UDFs must return a value that Fused can serialize and send back to the caller. Anything else will fail at return time. See Output formats for the full list of supported types.
@fused.udf
def udf():
import duckdb
con = duckdb.connect()
return con.execute("SELECT 1 AS a") # DuckDB cursor — cannot be serialized
Result serialization failed: ValueError: Return value not in an expected vector format (gpd.GeoDataFrame, pd.DataFrame, gpd.GeoSeries, pd.Series, shapely geometry) Was: <class '_duckdb.DuckDBPyConnection'>
@fused.udf
def udf():
import duckdb
con = duckdb.connect()
return con.execute("SELECT 1 AS a").df() # DataFrame — serialized and returned
Leverage udf.map() for multi processing
@fused.udf turns your function into a Udf object — by default, calling it submits a job to Fused's remote workers, not a local Python call. udf.map() gives you flexibility over execution:
engine='remote'(default) — fans out and spins up new UDFsengine='local'— runs multiple UDFs in parallel using the current UDF's available cores and memory
pool = my_udf.map(list_of_inputs)
pool.wait()
results = pool.df()
We built udf.map() as a simple way to parallelize jobs either locally in your current UDF or remotely by spinning up new compute as you need it on demand.
See Scaling out UDFs.
Reuse UDFs with fused.load()
Reusing a UDF is done with fused.load() with a UDF name or URL rather than a Python import:
other_udf = fused.load("https://github.com/fusedio/udfs/tree/main/public/DuckDB_H3_Example")
You're not just importing the code — you're importing the whole UDF, which lets you:
- Inspect it:
other_udf.meta - Execute it directly:
other_udf(params) - Run it in parallel:
other_udf.map(list_of_inputs)
Fused environments come with predefined packages
Fused provides a hosted environment with the most common Python packages for data manipulation. See the full list in Dependencies.
Packages not supported? Reach out to info@fused.io.