Skip to main content

fused.api

fused.api.whoami

whoami()

Returns information on the currently logged in user


fused.api.delete

delete(path: str, max_deletion_depth: int | Literal['unlimited'] = 3) -> bool

Delete the files at the path.

Parameters:

  • path (str) – Directory or file to delete, like fd://my-old-table/
  • max_deletion_depth (int | Literal['unlimited']) – If set (defaults to 3), the maximum depth the operation will recurse to. This option is to help avoid accidentally deleting more data that intended. Pass "unlimited" for unlimited.

Examples:

fused.api.delete("fd://bucket-name/deprecated_table/")

fused.api.list

list(path: str, *, details: bool = False) -> list[str] | list[ListDetails]

List the files at the path.

Parameters:

  • path (str) – Parent directory URL, like fd://bucket-name/
  • details (bool) – If True, return additional metadata about each record.

Returns:

Examples:

fused.api.list("fd://bucket-name/")

fused.api.get

get(path: str) -> bytes

Download the contents at the path to memory.

Parameters:

  • path (str) – URL to a file, like fd://bucket-name/file.parquet

Returns:

  • bytes – bytes of the file

Examples:

fused.api.get("fd://bucket-name/file.parquet")

fused.api.download

download(path: str, local_path: str | Path) -> None

Download the contents at the path to disk.

Parameters:

  • path (str) – URL to a file, like fd://bucket-name/file.parquet
  • local_path (str | Path) – Path to a local file.

fused.api.upload

upload(
local_path: str | Path | bytes | BinaryIO | pd.DataFrame | gpd.GeoDataFrame,
remote_path: str,
timeout: float | None = None,
) -> None

Upload local file to S3.

Parameters:

  • local_path (str | Path | bytes | BinaryIO | pd.DataFrame | gpd.GeoDataFrame) – Either a path to a local file (str, Path), a (Geo)DataFrame (which will get uploaded as Parquet file), or the contents to upload. Any string will be treated as a Path, if you wish to upload the contents of the string, first encode it: s.encode("utf-8")
  • remote_path (str) – URL to upload to, like fd://new-file.txt
  • timeout (float | None) – Optional timeout in seconds for the upload (will default to OPTIONS.request_timeout if not specified).

Examples:

To upload a local json file to your Fused-managed S3 bucket:

fused.api.upload("my_file.json", "fd://my_bucket/my_file.json")

fused.api.sign_url

sign_url(path: str) -> str

Create a signed URL to access the path.

This function may not check that the file represented by the path exists.

Parameters:

  • path (str) – URL to a file, like fd://bucket-name/file.parquet

Returns:

  • str – HTTPS URL to access the file using signed access.

Examples:

fused.api.sign_url("fd://bucket-name/table_directory/file.parquet")

fused.api.sign_url_prefix

sign_url_prefix(path: str) -> dict[str, str]

Create signed URLs to access all blobs under the path.

Parameters:

  • path (str) – URL to a prefix, like fd://bucket-name/some_directory/

Returns:

  • dict[str, str] – Dictionary mapping from blob store key to signed HTTPS URL.

Examples:

fused.api.sign_url_prefix("fd://bucket-name/table_directory/")

fused.api.get_udfs

get_udfs(
n: int = 10,
*,
skip: int = 0,
per_request: int = 25,
max_requests: int | None = None,
by: Literal["name", "id", "slug"] = "name",
whose: Literal["self", "public"] = "self"
) -> dict

Fetches a list of UDFs.

Parameters:

  • n (int) – The total number of UDFs to fetch. Defaults to 10.
  • skip (int) – The number of UDFs to skip before starting to collect the result set. Defaults to 0.
  • per_request (int) – The number of UDFs to fetch in each API request. Defaults to 25.
  • max_requests (int | None) – The maximum number of API requests to make.
  • by (Literal['name', 'id', 'slug']) – The attribute by which to sort the UDFs. Can be "name", "id", or "slug". Defaults to "name".
  • whose (Literal['self', 'public']) – Specifies whose UDFs to fetch. Can be "self" for the user's own UDFs or "public" for UDFs available publicly. Defaults to "self".

Returns:

  • dict – A list of UDFs.

Examples:

Fetch UDFs under the user account:

fused.api.get_udfs()

fused.api.job_get_logs

job_get_logs(job: CoerceableToJobId, since_ms: int | None = None) -> list[Any]

Fetch logs for a job

Parameters:

  • job (CoerceableToJobId) – the identifier of a job or a RunResponse object.
  • since_ms (int | None) – Timestamp, in milliseconds since epoch, to get logs for. Defaults to None for all logs.

Returns:

  • list[Any] – Log messages for the given job.

fused.api.job_print_logs

job_print_logs(
job: CoerceableToJobId, since_ms: int | None = None, file: IO | None = None
) -> None

Fetch and print logs for a job

Parameters:

  • job (CoerceableToJobId) – the identifier of a job or a RunResponse object.
  • since_ms (int | None) – Timestamp, in milliseconds since epoch, to get logs for. Defaults to None for all logs.
  • file (IO | None) – Where to print logs to. Defaults to sys.stdout.

Returns:

  • None – None

fused.api.job_tail_logs

job_tail_logs(
job: CoerceableToJobId,
refresh_seconds: float = 1,
sample_logs: bool = True,
timeout: float | None = None,
)

Continuously print logs for a job

Parameters:

  • job (CoerceableToJobId) – the identifier of a job or a RunResponse object.
  • refresh_seconds (float) – how frequently, in seconds, to check for new logs. Defaults to 1.
  • sample_logs (bool) – if true, print out only a sample of logs. Defaults to True.
  • timeout (float | None) – if not None, how long to continue tailing logs for. Defaults to None for indefinite.

fused.api.job_get_status

job_get_status(job: CoerceableToJobId) -> RunResponse

Fetch the status of a running job

Parameters:

  • job (CoerceableToJobId) – the identifier of a job or a RunResponse object.

Returns:


fused.api.job_cancel

job_cancel(job: CoerceableToJobId) -> RunResponse

Cancel an existing job

Parameters:

  • job (CoerceableToJobId) – the identifier of a job or a RunResponse object.

Returns:


fused.api.job_get_exec_time

job_get_exec_time(job: CoerceableToJobId) -> timedelta

Determine the execution time of this job, using the logs.

Returns:

  • timedelta – Time the job took. If the job is in progress, time from first to last log message is returned.

fused.api.job_wait_for_job

job_wait_for_job(
job: CoerceableToJobId,
poll_interval_seconds: float = 5,
timeout: float | None = None,
) -> RunResponse

Block the Python kernel until this job has finished

Parameters:

  • poll_interval_seconds (float) – How often (in seconds) to poll for status updates. Defaults to 5.
  • timeout (float | None) – The length of time in seconds to wait for the job. Defaults to None.

Raises:

Returns:


fused.api.FusedAPI

FusedAPI(
*,
base_url: str | None = None,
set_global_api: bool = True,
credentials_needed: bool = True
)

API for running jobs in the Fused service.

Create a FusedAPI instance.

Other Parameters:

  • base_url (str | None) – The Fused instance to send requests to. Defaults to https://www.fused.io/server/v1.
  • set_global_api (bool) – Set this as the global API object. Defaults to True.
  • credentials_needed (bool) – If True, automatically attempt to log in. Defaults to True.

create_udf_access_token

create_udf_access_token(
udf_email_or_name_or_id: str | None = None,
/,
udf_name: str | None = None,
*,
udf_email: str | None = None,
udf_id: str | None = None,
client_id: str | Ellipsis | None = ...,
public_read: bool | None = None,
access_scope: str | None = None,
cache: bool = True,
metadata_json: dict[str, Any] | None = None,
enabled: bool = True,
) -> UdfAccessToken

Create a token for running a UDF. The token allows anyone who has it to run the UDF, with the parameters they choose. The UDF will run under your environment.

The token does not allow running any other UDF on your account.

Parameters:

  • udf_email_or_name_or_id (str | None) – A UDF ID, email address (for use with udf_name), or UDF name.
  • udf_name (str | None) – The name of the UDF to create the token for.

Other Parameters:

  • udf_email (str | None) – The email of the user owning the UDF, or, if udf_name is None, the name of the UDF.
  • udf_id (str | None) – The backend ID of the UDF to create the token for.
  • client_id (str | Ellipsis | None) – If specified, overrides which realtime environment to run the UDF under.
  • cache (bool) – If True, UDF tiles will be cached.
  • metadata_json (dict[str, Any] | None) – Additional metadata to serve as part of the tiles metadata.json.
  • enable – If True, the token can be used.

upload

upload(path: str, data: bytes | BinaryIO, timeout: float | None = None) -> None

Upload a binary blob to a cloud location


start_job

start_job(
config: JobConfig | JobStepConfig,
*,
instance_type: WHITELISTED_INSTANCE_TYPES | None = None,
region: str | None = None,
disk_size_gb: int | None = None,
additional_env: Sequence[str] | None = ("FUSED_CREDENTIAL_PROVIDER=ec2",),
image_name: str | None = None
) -> RunResponse

Execute an operation

Parameters:

  • config (JobConfig | JobStepConfig) – the configuration object to run in the job.

Other Parameters:

  • instance_type (WHITELISTED_INSTANCE_TYPES | None) – The AWS EC2 instance type to use for the job. Acceptable strings are "m5.large", "m5.xlarge", "m5.2xlarge", "m5.4xlarge", "r5.large", "r5.xlarge", "r5.2xlarge", "r5.4xlarge". Defaults to None.
  • region (str | None) – The AWS region in which to run. Defaults to None.
  • disk_size_gb (int | None) – The disk size to specify for the job. Defaults to None.
  • additional_env (Sequence[str] | None) – Any additional environment variables to be passed into the job, each in the form KEY=value. Defaults to None.
  • image_name (str | None) – Custom image name to run. Defaults to None for default image.

get_jobs

get_jobs(
n: int = 5,
*,
skip: int = 0,
per_request: int = 25,
max_requests: int | None = 1
) -> Jobs

Get the job history.

Parameters:

  • n (int) – The number of jobs to fetch. Defaults to 5.

Other Parameters:

  • skip (int) – Where in the job history to begin. Defaults to 0, which retrieves the most recent job.
  • per_request (int) – Number of jobs per request to fetch. Defaults to 25.
  • max_requests (int | None) – Maximum number of requests to make. May be None to fetch all jobs. Defaults to 1.

Returns:

  • Jobs – The job history.

get_status

get_status(job: CoerceableToJobId) -> RunResponse

Fetch the status of a running job

Parameters:

  • job (CoerceableToJobId) – the identifier of a job or a RunResponse object.

Returns:


get_logs

get_logs(job: CoerceableToJobId, since_ms: int | None = None) -> list[Any]

Fetch logs for a job

Parameters:

  • job (CoerceableToJobId) – the identifier of a job or a RunResponse object.
  • since_ms (int | None) – Timestamp, in milliseconds since epoch, to get logs for. Defaults to None for all logs.

Returns:

  • list[Any] – Log messages for the given job.

tail_logs

tail_logs(
job: CoerceableToJobId,
refresh_seconds: float = 1,
sample_logs: bool = False,
timeout: float | None = None,
)

Continuously print logs for a job

Parameters:

  • job (CoerceableToJobId) – the identifier of a job or a RunResponse object.
  • refresh_seconds (float) – how frequently, in seconds, to check for new logs. Defaults to 1.
  • sample_logs (bool) – if true, print out only a sample of logs. Defaults to False.
  • timeout (float | None) – if not None, how long to continue tailing logs for. Defaults to None for indefinite.

wait_for_job

wait_for_job(
job: CoerceableToJobId,
poll_interval_seconds: float = 5,
timeout: float | None = None,
) -> RunResponse

Block the Python kernel until the given job has finished

Parameters:

  • job (CoerceableToJobId) – the identifier of a job or a RunResponse object.
  • poll_interval_seconds (float) – How often (in seconds) to poll for status updates. Defaults to 5.
  • timeout (float | None) – The length of time in seconds to wait for the job. Defaults to None.

Raises:

Returns:


cancel_job

cancel_job(job: CoerceableToJobId) -> RunResponse

Cancel an existing job

Parameters:

  • job (CoerceableToJobId) – the identifier of a job or a RunResponse object.

Returns:


auth_token

auth_token() -> str

Returns the current user's Fused environment (team) auth token