Skip to main content

Load & Export Data

Common examples for loading and saving data in Fused.

Load Data

pandas

@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/housing_2024.csv"):
import pandas as pd

return pd.read_csv(path)

duckdb

@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/housing_2024.parquet"):
import duckdb

conn = duckdb.connect()
result = conn.execute(f"""
SELECT *
FROM '{path}'
LIMIT 10
""").df()

return result

From other UDFs

@fused.udf
def udf(bounds: fused.types.Bounds):
overture_udf = fused.load('https://github.com/fusedio/udfs/tree/main/public/Overture_Maps_Example/')
buildings = fused.run(overture_udf, bounds=bounds, theme='buildings', overture_type='building')

return buildings

Download data to shared Fused mount

@fused.udf
def udf(url='https://www2.census.gov/geo/tiger/TIGER_RD18/STATE/11_DISTRICT_OF_COLUMBIA/11/tl_rd22_11_bg.zip'):
out_path = fused.download(url=url, file_path='out.zip')
return str(out_path)

Files will be written to /mount/tmp/, where any other UDF can then access them.

Read more about fused.download() here

Export Data

Fused managed storage: fd://

df.to_parquet("fd://my-dataset/data.parquet")

Read more about about fd:// S3 Bucket

Fused mount disk: /mnt/cache

df.to_parquet("/mnt/cache/data.parquet")

Read more about about /mnt/cache mount disk

AWS S3: s3://

df.to_parquet("s3://my-bucket/data.parquet")

Google Cloud Storage: gcs://

df.to_parquet("gcs://my-bucket/data.parquet")

Use as API (No file saving required)

You can directly call your UDFs as APIs, removing the need to even save your data at all!

After creating a Shared Token for your UDF, you can change the output format of your HTTPS endpoint:

https://fused.io/.../run/file?dtype_out_vector=json

Tabular data downloads:

?dtype_out_vector=csv          # CSV download
?dtype_out_vector=geojson # GeoJSON download
?dtype_out_vector=parquet # Parquet download
?dtype_out_vector=json # JSON download
?dtype_out_vector=mvt # Mapbox Vector Tile download

Image data downloads:

?dtype_out_raster=png          # PNG image
?dtype_out_raster=tiff # GeoTIFF download

Integrations

Call your UDFs from other tools after creating a Shared Token for your UDF:

DuckDB

select * from read_parquet('https://fused.io/.../run/file?');

Curl

curl -L -XGET 'https://fused.io/.../run/file?'

Google Sheets

=importData('https://fused.io/.../run/file?')

Notion

  • Use /embed block with UDF endpoint: 'https://fused.io/.../run/file?'