Cloud storage

Fused supports multiple cloud storage options for reading and writing data.

Supported storage paths

Provider	Format	Example
Fused managed	`fd://`	`fd://my-data/file.parquet`
AWS S3	`s3://`	`s3://bucket-name/path/file.parquet`
Google Cloud	`gs://` or `gcs://`	`gs://bucket-name/path/file.parquet`
HTTP(S)	`https://`	`https://example.com/file.csv`

Fused-managed storage

For details on using fd:// paths and the /mnt/cache disk, see File System.

Connect your own bucket

Connect S3 or GCS buckets to access their files interactively from within the File Explorer UI and programmatically from within UDFs.

Contact Fused to set an S3 or GCS bucket on the File Explorer for all users in your organization. Alternatively, set a bucket as a "favorite" so it appears in the File Explorer for your account only.

Amazon S3

Set the policy below on your bucket, replacing YOUR_BUCKET_NAME with its name. Fused will provide YOUR_ENV_NAME.

Details

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Allow object access by Fused fused account",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::926411091187:role/rt-production-YOUR_ENV_NAME",
                    "arn:aws:iam::926411091187:role/ec2_job_task_role-v2-production-YOUR_ENV_NAME",
                ]
            },
            "Action": [
                "s3:ListBucket",
                "s3:GetObjectAttributes",
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::YOUR_BUCKET_NAME/*",
                "arn:aws:s3:::YOUR_BUCKET_NAME"
            ]
        }
    ]
}

Alternatively, use this Fused app to automatically structure the policy for you.

The bucket must enable the following CORS settings to allow uploading files from Fused:

Details

[
    {
        "AllowedHeaders": [
            "range",
            "content-type",
            "content-length"
        ],
        "AllowedMethods": [
            "GET",
            "HEAD",
            "PUT",
            "POST"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": [
            "content-range"
        ],
        "MaxAgeSeconds": 0
    }
]

Encrypted S3 Buckets

To connect an encrypted S3 bucket, access to both the bucket and the KMS key is required. The KMS key must be in the same region as the bucket.

Configure KMS policy:

{
  "Sid": "AllowCrossAccountUseOfKMS",
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam::<FUSED_ACCOUNT>:role/<FUSED_ROLE_NAME>"
  },
  "Action": [
    "kms:Decrypt",
    "kms:Encrypt",
    "kms:GenerateDataKey*",
    "kms:DescribeKey"
  ],
  "Resource": "*"
}

Google Cloud Storage (GCS)

To connect a Google Cloud Storage bucket to your Fused environment:

1. Create a Service Account in GCS

Set up a Google Cloud service account with permissions to read, write, and list from the GCS bucket. See the Google Cloud documentation for instructions to:

2. Download the JSON Key File

Download the JSON Key file associated with the Service Account. This file contains credentials that Fused will use to access the GCS bucket.

3. Set the JSON Key as a Secret

Set the JSON Key as a secret in the secrets management UI. The secret must be named gcs_fused.

You then need to write these credentials to a JSON file and pass them to Google:

@fused.udf
def udf():
    from google.cloud import storage

    # get GCP secrets
    with open("/tmp/gcs_key.json", "w") as f:
        f.write(fused.secrets["gcs_fused"])
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/tmp/gcs_key.json"

    # your code here

Read & write examples

Reading from S3

@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/housing_2024.csv"):
    import pandas as pd
    return pd.read_csv(path)

Writing to S3

df.to_parquet("s3://my-bucket/data.parquet")

Writing to GCS

df.to_parquet("gcs://my-bucket/data.parquet")

Download to Fused mount

@fused.udf
def udf(url='https://www2.census.gov/geo/tiger/TIGER_RD18/STATE/11_DISTRICT_OF_COLUMBIA/11/tl_rd22_11_bg.zip'):
    out_path = fused.download(url=url, file_path='out.zip')
    return str(out_path)

Files will be written to /mnt/cache/, where any other UDF can then access them.

Downloading large remote files

For datasets from external sources like Zenodo or Humanitarian Data Exchange that take longer than 120s to download, run as a batch job:

@fused.udf(
    engine='c2-standard-4',  # Small instance - download uses little resources
    disk_size_gb=999         # Large disk for the file
)
def udf():
    import s3fs
    import requests
    import os, tempfile
    
    url = "https://zenodo.org/records/4395621/files/my_large_file.zip"
    s3_path = f"s3://fused-asset/data/my-files/{url.split('/')[-1]}"
    
    # Skip if already downloaded
    if s3fs.S3FileSystem().exists(s3_path):
        return f'File exists: {s3_path}'

    # Download with progress
    temp_path = tempfile.NamedTemporaryFile(delete=False).name
    resp = requests.get(url, stream=True)
    resp.raise_for_status()
    
    with open(temp_path, 'wb') as f:
        for chunk in resp.iter_content(chunk_size=8192):
            f.write(chunk)

    # Upload to S3
    s3fs.S3FileSystem().put(temp_path, s3_path)
    return f"Uploaded to: {s3_path}"

Once downloaded, use the Reading Data guide for extracting compressed files (ZIP/RAR).

`/mnt/cache` disk

/mnt/cache is the path to a mounted disk to store files shared between UDFs. This is where @fused.cache and fused.download write data. It's ideal for files that UDFs need to read with low-latency, downloaded files, the output of cached functions, access keys, .env, and ML model weights.

UDFs may interact with the disk as with a local file system:

# Write to mount
df.to_parquet("/mnt/cache/data.parquet")

# List files
@fused.udf
def udf():
    import os
    for each in os.listdir('/mnt/cache/'):
        print(each)

note

If you encounter Error: No such file or directory: '/mnt/cache/', contact the Fused team to enable it for your environment.

Supported storage paths​

Connect your own bucket​

Amazon S3​

Encrypted S3 Buckets​

Google Cloud Storage (GCS)​

Read & write examples​

Reading from S3​

Writing to S3​

Writing to GCS​

Download to Fused mount​

Downloading large remote files​

/mnt/cache disk​