Engineering & ETL
Connect Data to Fusedโ
Connect your own data sourcesโ
You can directly connect your data buckets to Fused:
Bring data directly inside Fusedโ
Quickly bring any data not on the cloud into Fused:
- Drag & Drop in File Explorer!
- Use
fused.upload()
Install fused
Python, authenticate & run:
fused.api.upload("my_local_file.csv", "fd://my_data/file.csv")
Note: fd://
is the Fused provisioned private S3 path for your team.
Optimize data loadingโ
For files < 1GB:
Leverage caching built in to Fused to make loading any data faster:
@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/housing_2024.csv"):
import pandas as pd
@fused.cache
def load_data(path):
return pd.read_csv(path)
# Some processing
return load_data(path)
- As you make changes inside your UDF,
load_data()
will be called from cache. - This is especially useful for slow formats (CSV, Excel, etc.) or files that are not partitioned well.
For files > 1GB:
Use fused.ingest()
to ingest large datasets into cloud optimized, partitioned files.
job = fused.ingest(
input="https://www2.census.gov/geo/tiger/TIGER_RD18/LAYER/TRACT/tl_rd22_11_tract.zip",
output=f"s3://fused-users/{user_id}/census/dc_tract/",
)
job.run_remote()
Read more about how to ingest your data.
Turn your data into an APIโ
Share your data with the world by turning it into an API:
def udf(path: str = "s3://fused-sample/demo_data/housing_2024.csv"):
import pandas as pd
df = pd.read_csv(path)
# Only return the relevant data for my team
df = df[df['price'] > 1000000]
return df[['price', 'area']]
In Workbench:
- Save your UDF
- Click "Share"
- Create a shared token
- You now have a HTTPs endpoint to call this UDF, which returns data in your desired format:
https://fused.io/.../run/file?
Learn more about creating a shared token.
Infrastructure (Github / Secrets / On Prem)โ
You can use Fused with your own infrastructure:
- Allow your team to save UDFs in your own Github repo
- Save & access secrets in Fused
- Use Fused on your own servers (On prem option)
Examplesโ
- Ingesting ship transponder data in fused