Advanced Fused Features
Install the fused
Python package locally:
pip install "fused[all]"
Read more about installing Fused.
Run the following code snippets in a notebook for best interactivity!
Run 100s of jobs in seconds: fused.submit()
Calculate the average prices of houses over the years:
import fused
@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/housing/housing_2024.csv"):
import pandas as pd
housing = pd.read_csv(path)
housing['price_per_area'] = housing['price'] / housing['area'].round(2)
return housing['price_per_area'].mean()
# List all available files
housing_lists = fused.api.list('s3://fused-sample/demo_data/housing/')
results = fused.submit(
udf,
housing_lists[10:] # Running only first 10 files for demo purposes
)
Print results:
results
>>>
path
s3://fused-sample/demo_data/housing/housing_1970.csv 1376.804413
s3://fused-sample/demo_data/housing/housing_1971.csv 1492.044819
s3://fused-sample/demo_data/housing/housing_1972.csv 1484.496548
Read more about running your code in parallel here
Batch jobs
Run larger jobs that exceed the memory & time constraints of a single UDF job.
@fused.udf
def udf(val):
import time
import pandas as pd
# Simulating a complex processing step
time.sleep(200)
return pd.DataFrame({'val':[val]})
# Running with 5 inputs
job = udf(arg_list=[0,1,2,3,4])
job.run_remote()
Read more about setting up batch jobs here
Preview large files from your laptop
Use Fused to sample large datasets for local exploration. The data loads inside your UDF and only sends you the sample.
Using the ERA5 weather dataset as an example (~140MB file):
@fused.udf
def udf(path: str='s3://fused-asset/data/era5/t2m/datestr=2024-01-01/0.parquet'):
import pandas as pd
df = pd.read_parquet(path)
return df.head(10)
Get a small sample of data:
sample = fused.run(udf)