Load & Export Data

Common examples for loading and saving data in Fused.

Load Data

`pandas`

@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/housing_2024.csv"):
    import pandas as pd
    
    return pd.read_csv(path)

`duckdb`

@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/housing_2024.parquet"):
    import duckdb
    
    conn = duckdb.connect()
    result = conn.execute(f"""
        SELECT * 
        FROM '{path}'
        LIMIT 10
    """).df()
    
    return result

From other UDFs

@fused.udf
def udf(bounds: fused.types.Bounds):
    overture_udf = fused.load('https://github.com/fusedio/udfs/tree/main/public/Overture_Maps_Example/')
    buildings = fused.run(overture_udf, bounds=bounds, theme='buildings', overture_type='building')
 
    return buildings

Download data to shared Fused mount

@fused.udf
def udf(url='https://www2.census.gov/geo/tiger/TIGER_RD18/STATE/11_DISTRICT_OF_COLUMBIA/11/tl_rd22_11_bg.zip'):
    out_path = fused.download(url=url, file_path='out.zip')
    return str(out_path)

Files will be written to /mount/tmp/, where any other UDF can then access them.

Export Data

Fused managed storage: `fd://`

df.to_parquet("fd://my-dataset/data.parquet")

Fused mount disk: `/mnt/cache`

df.to_parquet("/mnt/cache/data.parquet")

Read more about about /mnt/cache mount disk

AWS S3: `s3://`

df.to_parquet("s3://my-bucket/data.parquet")

Google Cloud Storage: `gcs://`

df.to_parquet("gcs://my-bucket/data.parquet")

Use as API (No file saving required)

You can directly call your UDFs as APIs, removing the need to even save your data at all!

After creating a Shared Token for your UDF, you can change the output format of your HTTPS endpoint:

https://fused.io/.../run/file?dtype_out_vector=json

Tabular data downloads:

?dtype_out_vector=csv          # CSV download
?dtype_out_vector=geojson      # GeoJSON download
?dtype_out_vector=parquet      # Parquet download
?dtype_out_vector=json         # JSON download
?dtype_out_vector=mvt          # Mapbox Vector Tile download

Image data downloads:

?dtype_out_raster=png          # PNG image
?dtype_out_raster=tiff         # GeoTIFF download

Integrations

Call your UDFs from other tools after creating a Shared Token for your UDF:

DuckDB

select * from read_parquet('https://fused.io/.../run/file?');

Curl

curl -L -XGET 'https://fused.io/.../run/file?'

Google Sheets

=importData('https://fused.io/.../run/file?')

Notion

Use /embed block with UDF endpoint: 'https://fused.io/.../run/file?'

Load Data​

pandas​

duckdb​

From other UDFs​

Download data to shared Fused mount​

Export Data​

Fused managed storage: fd://​

Fused mount disk: /mnt/cache​

AWS S3: s3://​

Google Cloud Storage: gcs://​

Use as API (No file saving required)​

Integrations​

DuckDB​

Curl​

Google Sheets​

Notion​