File systems
Fused provides two file systems to make files accessible to all UDFs: an S3 bucket and a disk. Access is scoped at the organization level.
fd://
S3 bucket
Fused provisions a private S3 bucket namespace for your organization. It's ideal for large-scale, cloud-native, or globally accessible datasets, such as ingested tables, GeoTIFFs, and files that need to be read outside of Fused.
Use the File explorer to browse the bucket and see its full path.
Fused utility functions may reference it with the fd://
alias.
job = fused.ingest(
input="https://www2.census.gov/geo/tiger/TIGER_RD18/STATE/06_CALIFORNIA/06/tl_rd22_06_bg.zip",
output="fd://census/ca_bg_2022/",
).run_remote()
/mnt/cache
disk
/mnt/cache
is the path to a mounted disk to store files shared between UDFs. This is where @fused.cache
and fused.download
write data. It's ideal for files that UDFs need to read with low-latency, downloaded files, the output of cached functions, access keys, .env
, and ML model weights.
UDFs may interact with the disk as with a local file system. For example, to list files in the directory:
@fused.udf
def udf():
import os
for each in os.listdir('/mnt/cache/'):
print(each)