Skip to main content

Engineering & ETL

Connect Data to Fused

Connect your own data sources

You can directly connect your data buckets to Fused:

Bring data directly inside Fused

Quickly bring any data not on the cloud into Fused:

  1. Drag & Drop in File Explorer!

Drag and drop files directly into Workbench

  1. Use fused.upload()

Install fused Python, authenticate & run:

fused.api.upload("my_local_file.csv", "fd://my_data/file.csv")

Note: fd:// is the Fused provisioned private S3 path for your team.

Optimize data loading

For files < 1GB:

Leverage caching built in to Fused to make loading any data faster:

@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/housing_2024.csv"):
import pandas as pd

@fused.cache
def load_data(path):
return pd.read_csv(path)

# Some processing

return load_data(path)
  • As you make changes inside your UDF, load_data() will be called from cache.
  • This is especially useful for slow formats (CSV, Excel, etc.) or files that are not partitioned well.

For files > 1GB:

Use fused.ingest() to ingest large datasets into cloud optimized, partitioned files.

job = fused.ingest(
input="https://www2.census.gov/geo/tiger/TIGER_RD18/LAYER/TRACT/tl_rd22_11_tract.zip",
output=f"s3://fused-users/{user_id}/census/dc_tract/",
)

job.run_batch()

Read more about how to ingest your data.

Turn your data into an API

  1. In Workbench create a new UDF that returns data, for example:
@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/housing/housing_2024.csv"):
import pandas as pd
housing = pd.read_csv(path)
housing['price_per_area'] = round(housing['price'] / housing['area'], 2)

return housing[['price', 'price_per_area']]
  1. Click "Share" to open your UDF in a new Tab as a HTML Page
  2. Optionally: Edit the URL to return the data in your preferred file format.

For example changing the dtype_out_vector from html to json:

# Returning as JSON
https://fused.io/.../run/file?dtype_out_vector=json

# Returning as CSV
https://fused.io/.../run/file?dtype_out_vector=csv

Infrastructure (Github / Secrets / On Prem)

You can use Fused with your own infrastructure:

Examples