Load & Export Data
Common examples for loading and saving data in Fused.
Load Data
pandas
@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/housing_2024.csv"):
import pandas as pd
return pd.read_csv(path)
duckdb
@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/housing_2024.parquet"):
import duckdb
conn = duckdb.connect()
result = conn.execute(f"""
SELECT *
FROM '{path}'
LIMIT 10
""").df()
return result
From other UDFs
@fused.udf
def udf(bounds: fused.types.Bounds):
overture_udf = fused.load('https://github.com/fusedio/udfs/tree/main/public/Overture_Maps_Example/')
buildings = fused.run(overture_udf, bounds=bounds, theme='buildings', overture_type='building')
return buildings
Download data to shared Fused mount
@fused.udf
def udf(url='https://www2.census.gov/geo/tiger/TIGER_RD18/STATE/11_DISTRICT_OF_COLUMBIA/11/tl_rd22_11_bg.zip'):
out_path = fused.download(url=url, file_path='out.zip')
return str(out_path)
Files will be written to /mount/tmp/
, where any other UDF can then access them.
Read more about fused.download()
here
Export Data
Fused managed storage: fd://
df.to_parquet("fd://my-dataset/data.parquet")
Read more about about fd://
S3 Bucket
Fused mount disk: /mnt/cache
df.to_parquet("/mnt/cache/data.parquet")
Read more about about /mnt/cache
mount disk
AWS S3: s3://
df.to_parquet("s3://my-bucket/data.parquet")
Google Cloud Storage: gcs://
df.to_parquet("gcs://my-bucket/data.parquet")
Use as API (No file saving required)
You can directly call your UDFs as APIs, removing the need to even save your data at all!
After creating a Shared Token for your UDF, you can change the output format of your HTTPS endpoint:
https://fused.io/.../run/file?dtype_out_vector=json
Tabular data downloads:
?dtype_out_vector=csv # CSV download
?dtype_out_vector=geojson # GeoJSON download
?dtype_out_vector=parquet # Parquet download
?dtype_out_vector=json # JSON download
?dtype_out_vector=mvt # Mapbox Vector Tile download
Image data downloads:
?dtype_out_raster=png # PNG image
?dtype_out_raster=tiff # GeoTIFF download
Integrations
Call your UDFs from other tools after creating a Shared Token for your UDF:
DuckDB
select * from read_parquet('https://fused.io/.../run/file?');
Curl
curl -L -XGET 'https://fused.io/.../run/file?'
Google Sheets
=importData('https://fused.io/.../run/file?')
Notion
- Use
/embed
block with UDF endpoint:'https://fused.io/.../run/file?'