Skip to main content

download

download(url: str, file_path: str, storage: StorageStr = 'auto') -> str

Download a file.

May be called from multiple processes with the same inputs to get the same result.

Fused runs UDFs from top to bottom each time code changes. This means objects in the UDF are recreated each time, which can slow down a UDF that downloads files from a remote server.

tip

Downloaded files are written to a mounted volume shared across all UDFs in an organization. This means that a file downloaded by one UDF can be read by other UDFs.

Fused addresses the latency of downloading files with the download utility function. It stores files in the mounted filesystem so they only download the first time.

tip

Because a Tile UDF runs multiple chunks in parallel, the download function sets a signal lock during the first download attempt, to ensure the download happens only once.

Parameters

  • url (str) – The URL to download.
  • file_path (str) – The local path where to save the file.
  • storage (StorageStr) – Set where the cache data is stored. Supported values are "auto", "mount" and "local". Auto will automatically select the storage location defined in options (mount if it exists, otherwise local) and ensures that it exists and is writable. Mount gets shared across executions where local will only be shared within the same execution.

Returns

  • str – The function downloads the file only on the first execution, and returns the file path.

Example

@fused.udf
def geodataframe_from_geojson():
import geopandas as gpd
url = "s3://sample_bucket/my_geojson.zip"
path = fused.core.download(url, "tmp/my_geojson.zip")
gdf = gpd.read_file(path)
return gdf