Engineering & ETL
Connect Data to Fused
Connect your own data sources
You can directly connect your data buckets to Fused:
Bring data directly inside Fused
Quickly bring any data not on the cloud into Fused:
- Drag & Drop in File Explorer!
- Use
fused.upload()
Install fused
Python, authenticate & run:
fused.api.upload("my_local_file.csv", "fd://my_data/file.csv")
Note: fd://
is the Fused provisioned private S3 path for your team.
Optimize data loading
For files < 1GB:
Leverage caching built in to Fused to make loading any data faster:
@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/housing_2024.csv"):
import pandas as pd
@fused.cache
def load_data(path):
return pd.read_csv(path)
# Some processing
return load_data(path)
- As you make changes inside your UDF,
load_data()
will be called from cache. - This is especially useful for slow formats (CSV, Excel, etc.) or files that are not partitioned well.
For files > 1GB:
Use fused.ingest()
to ingest large datasets into cloud optimized, partitioned files.
job = fused.ingest(
input="https://www2.census.gov/geo/tiger/TIGER_RD18/LAYER/TRACT/tl_rd22_11_tract.zip",
output=f"s3://fused-users/{user_id}/census/dc_tract/",
)
job.run_batch()
Read more about how to ingest your data.
Turn your data into an API
- In Workbench create a new UDF that returns data, for example:
@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/housing/housing_2024.csv"):
import pandas as pd
housing = pd.read_csv(path)
housing['price_per_area'] = round(housing['price'] / housing['area'], 2)
return housing[['price', 'price_per_area']]
- UDF Editor
- Advanced UDF Editor
- Click "Share" to open your UDF in a new Tab as a HTML Page
- Optionally: Edit the URL to return the data in your preferred file format.
For example changing the dtype_out_vector
from html
to json
:
# Returning as JSON
https://fused.io/.../run/file?dtype_out_vector=json
# Returning as CSV
https://fused.io/.../run/file?dtype_out_vector=csv
- Save your UDF with
Cmd + S
(MacOS) /Ctrl + S
(Win / Linux) - Click the 3 dots and open "Settings"
- Click HTTPS URL to open the UDF in it's default file format
Optionally: Edit the URL to return the data in your preferred file format.
For example changing the dtype_out_vector
from html
to json
:
# Returning as JSON
https://fused.io/.../run/file?dtype_out_vector=json
# Returning as CSV
https://fused.io/.../run/file?dtype_out_vector=csv
Infrastructure (Github / Secrets / On Prem)
You can use Fused with your own infrastructure:
- Allow your team to save UDFs in your own Github repo
- Save & access secrets in Fused
- Use Fused on your own servers (On prem option)
Examples
- Ingesting ship transponder data in fused