fused.api
fused.api.whoami
whoami()
Returns information on the currently logged in user
fused.api.delete
delete(path: str, max_deletion_depth: int | Literal['unlimited'] = 3) -> bool
Delete the files at the path.
Parameters:
- path (
str
) – Directory or file to delete, likefd://my-old-table/
- max_deletion_depth (
int | Literal['unlimited']
) – If set (defaults to 3), the maximum depth the operation will recurse to. This option is to help avoid accidentally deleting more data that intended. Pass"unlimited"
for unlimited.
Examples:
fused.api.delete("fd://bucket-name/deprecated_table/")
fused.api.list
list(path: str, *, details: bool = False) -> list[str] | list[ListDetails]
List the files at the path.
Parameters:
- path (
str
) – Parent directory URL, likefd://bucket-name/
- details (
bool
) – If True, return additional metadata about each record.
Returns:
list[str] | list[ListDetails]
– A list of paths as URLs, or as metadata objects.
Examples:
fused.api.list("fd://bucket-name/")
fused.api.get
get(path: str) -> bytes
Download the contents at the path to memory.
Parameters:
- path (
str
) – URL to a file, likefd://bucket-name/file.parquet
Returns:
bytes
– bytes of the file
Examples:
fused.api.get("fd://bucket-name/file.parquet")
fused.api.download
download(path: str, local_path: str | Path) -> None
Download the contents at the path to disk.
Parameters:
- path (
str
) – URL to a file, likefd://bucket-name/file.parquet
- local_path (
str | Path
) – Path to a local file.
fused.api.upload
upload(
local_path: str | Path | bytes | BinaryIO | pd.DataFrame | gpd.GeoDataFrame,
remote_path: str,
timeout: float | None = None,
) -> None
Upload local file to S3.
Parameters:
- local_path (
str | Path | bytes | BinaryIO | pd.DataFrame | gpd.GeoDataFrame
) – Either a path to a local file (str
,Path
), a (Geo)DataFrame (which will get uploaded as Parquet file), or the contents to upload. Any string will be treated as a Path, if you wish to upload the contents of the string, first encode it:s.encode("utf-8")
- remote_path (
str
) – URL to upload to, likefd://new-file.txt
- timeout (
float | None
) – Optional timeout in seconds for the upload (will default toOPTIONS.request_timeout
if not specified).
Examples:
To upload a local json file to your Fused-managed S3 bucket:
fused.api.upload("my_file.json", "fd://my_bucket/my_file.json")
fused.api.sign_url
sign_url(path: str) -> str
Create a signed URL to access the path.
This function may not check that the file represented by the path exists.
Parameters:
- path (
str
) – URL to a file, likefd://bucket-name/file.parquet
Returns:
str
– HTTPS URL to access the file using signed access.
Examples:
fused.api.sign_url("fd://bucket-name/table_directory/file.parquet")
fused.api.sign_url_prefix
sign_url_prefix(path: str) -> dict[str, str]
Create signed URLs to access all blobs under the path.
Parameters:
- path (
str
) – URL to a prefix, likefd://bucket-name/some_directory/
Returns:
Examples:
fused.api.sign_url_prefix("fd://bucket-name/table_directory/")
fused.api.get_udfs
get_udfs(
n: int = 10,
*,
skip: int = 0,
per_request: int = 25,
max_requests: int | None = None,
by: Literal["name", "id", "slug"] = "name",
whose: Literal["self", "public"] = "self"
) -> dict
Fetches a list of UDFs.
Parameters:
- n (
int
) – The total number of UDFs to fetch. Defaults to 10. - skip (
int
) – The number of UDFs to skip before starting to collect the result set. Defaults to 0. - per_request (
int
) – The number of UDFs to fetch in each API request. Defaults to 25. - max_requests (
int | None
) – The maximum number of API requests to make. - by (
Literal['name', 'id', 'slug']
) – The attribute by which to sort the UDFs. Can be "name", "id", or "slug". Defaults to "name". - whose (
Literal['self', 'public']
) – Specifies whose UDFs to fetch. Can be "self" for the user's own UDFs or "public" for UDFs available publicly. Defaults to "self".
Returns:
dict
– A list of UDFs.
Examples:
Fetch UDFs under the user account:
fused.api.get_udfs()
fused.api.job_get_logs
job_get_logs(job: CoerceableToJobId, since_ms: int | None = None) -> list[Any]
Fetch logs for a job
Parameters:
- job (
CoerceableToJobId
) – the identifier of a job or aRunResponse
object. - since_ms (
int | None
) – Timestamp, in milliseconds since epoch, to get logs for. Defaults to None for all logs.
Returns:
fused.api.job_print_logs
job_print_logs(
job: CoerceableToJobId, since_ms: int | None = None, file: IO | None = None
) -> None
Fetch and print logs for a job
Parameters:
- job (
CoerceableToJobId
) – the identifier of a job or aRunResponse
object. - since_ms (
int | None
) – Timestamp, in milliseconds since epoch, to get logs for. Defaults to None for all logs. - file (
IO | None
) – Where to print logs to. Defaults to sys.stdout.
Returns:
None
– None
fused.api.job_tail_logs
job_tail_logs(
job: CoerceableToJobId,
refresh_seconds: float = 1,
sample_logs: bool = True,
timeout: float | None = None,
)
Continuously print logs for a job
Parameters:
- job (
CoerceableToJobId
) – the identifier of a job or aRunResponse
object. - refresh_seconds (
float
) – how frequently, in seconds, to check for new logs. Defaults to 1. - sample_logs (
bool
) – if true, print out only a sample of logs. Defaults to True. - timeout (
float | None
) – if not None, how long to continue tailing logs for. Defaults to None for indefinite.
fused.api.job_get_status
job_get_status(job: CoerceableToJobId) -> RunResponse
Fetch the status of a running job
Parameters:
- job (
CoerceableToJobId
) – the identifier of a job or aRunResponse
object.
Returns:
RunResponse
– The status of the given job.
fused.api.job_cancel
job_cancel(job: CoerceableToJobId) -> RunResponse
Cancel an existing job
Parameters:
- job (
CoerceableToJobId
) – the identifier of a job or aRunResponse
object.
Returns:
RunResponse
– A new job object.
fused.api.job_get_exec_time
job_get_exec_time(job: CoerceableToJobId) -> timedelta
Determine the execution time of this job, using the logs.
Returns:
timedelta
– Time the job took. If the job is in progress, time from first to last log message is returned.
fused.api.job_wait_for_job
job_wait_for_job(
job: CoerceableToJobId,
poll_interval_seconds: float = 5,
timeout: float | None = None,
) -> RunResponse
Block the Python kernel until this job has finished
Parameters:
- poll_interval_seconds (
float
) – How often (in seconds) to poll for status updates. Defaults to 5. - timeout (
float | None
) – The length of time in seconds to wait for the job. Defaults to None.
Raises:
TimeoutError
– if waiting for the job timed out.
Returns:
RunResponse
– The status of the given job.
fused.api.FusedAPI
FusedAPI(
*,
base_url: str | None = None,
set_global_api: bool = True,
credentials_needed: bool = True
)
API for running jobs in the Fused service.
Create a FusedAPI instance.
Other Parameters:
- base_url (
str | None
) – The Fused instance to send requests to. Defaults tohttps://www.fused.io/server/v1
. - set_global_api (
bool
) – Set this as the global API object. Defaults to True. - credentials_needed (
bool
) – If True, automatically attempt to log in. Defaults to True.
create_udf_access_token
create_udf_access_token(
udf_email_or_name_or_id: str | None = None,
/,
udf_name: str | None = None,
*,
udf_email: str | None = None,
udf_id: str | None = None,
client_id: str | Ellipsis | None = ...,
public_read: bool | None = None,
access_scope: str | None = None,
cache: bool = True,
metadata_json: dict[str, Any] | None = None,
enabled: bool = True,
) -> UdfAccessToken
Create a token for running a UDF. The token allows anyone who has it to run the UDF, with the parameters they choose. The UDF will run under your environment.
The token does not allow running any other UDF on your account.
Parameters:
- udf_email_or_name_or_id (
str | None
) – A UDF ID, email address (for use with udf_name), or UDF name. - udf_name (
str | None
) – The name of the UDF to create the token for.
Other Parameters:
- udf_email (
str | None
) – The email of the user owning the UDF, or, if udf_name is None, the name of the UDF. - udf_id (
str | None
) – The backend ID of the UDF to create the token for. - client_id (
str | Ellipsis | None
) – If specified, overrides which realtime environment to run the UDF under. - cache (
bool
) – If True, UDF tiles will be cached. - metadata_json (
dict[str, Any] | None
) – Additional metadata to serve as part of the tiles metadata.json. - enable – If True, the token can be used.
upload
upload(path: str, data: bytes | BinaryIO, timeout: float | None = None) -> None
Upload a binary blob to a cloud location
start_job
start_job(
config: JobConfig | JobStepConfig,
*,
instance_type: WHITELISTED_INSTANCE_TYPES | None = None,
region: str | None = None,
disk_size_gb: int | None = None,
additional_env: Sequence[str] | None = ("FUSED_CREDENTIAL_PROVIDER=ec2",),
image_name: str | None = None
) -> RunResponse
Execute an operation
Parameters:
- config (
JobConfig | JobStepConfig
) – the configuration object to run in the job.
Other Parameters:
- instance_type (
WHITELISTED_INSTANCE_TYPES | None
) – The AWS EC2 instance type to use for the job. Acceptable strings are "m5.large", "m5.xlarge", "m5.2xlarge", "m5.4xlarge", "r5.large", "r5.xlarge", "r5.2xlarge", "r5.4xlarge". Defaults to None. - region (
str | None
) – The AWS region in which to run. Defaults to None. - disk_size_gb (
int | None
) – The disk size to specify for the job. Defaults to None. - additional_env (
Sequence[str] | None
) – Any additional environment variables to be passed into the job, each in the form KEY=value. Defaults to None. - image_name (
str | None
) – Custom image name to run. Defaults to None for default image.
get_jobs
get_jobs(
n: int = 5,
*,
skip: int = 0,
per_request: int = 25,
max_requests: int | None = 1
) -> Jobs
Get the job history.
Parameters:
- n (
int
) – The number of jobs to fetch. Defaults to 5.
Other Parameters:
- skip (
int
) – Where in the job history to begin. Defaults to 0, which retrieves the most recent job. - per_request (
int
) – Number of jobs per request to fetch. Defaults to 25. - max_requests (
int | None
) – Maximum number of requests to make. May be None to fetch all jobs. Defaults to 1.
Returns:
Jobs
– The job history.
get_status
get_status(job: CoerceableToJobId) -> RunResponse
Fetch the status of a running job
Parameters:
- job (
CoerceableToJobId
) – the identifier of a job or aRunResponse
object.
Returns:
RunResponse
– The status of the given job.
get_logs
get_logs(job: CoerceableToJobId, since_ms: int | None = None) -> list[Any]
Fetch logs for a job
Parameters:
- job (
CoerceableToJobId
) – the identifier of a job or aRunResponse
object. - since_ms (
int | None
) – Timestamp, in milliseconds since epoch, to get logs for. Defaults to None for all logs.
Returns:
tail_logs
tail_logs(
job: CoerceableToJobId,
refresh_seconds: float = 1,
sample_logs: bool = False,
timeout: float | None = None,
)
Continuously print logs for a job
Parameters:
- job (
CoerceableToJobId
) – the identifier of a job or aRunResponse
object. - refresh_seconds (
float
) – how frequently, in seconds, to check for new logs. Defaults to 1. - sample_logs (
bool
) – if true, print out only a sample of logs. Defaults to False. - timeout (
float | None
) – if not None, how long to continue tailing logs for. Defaults to None for indefinite.
wait_for_job
wait_for_job(
job: CoerceableToJobId,
poll_interval_seconds: float = 5,
timeout: float | None = None,
) -> RunResponse
Block the Python kernel until the given job has finished
Parameters:
- job (
CoerceableToJobId
) – the identifier of a job or aRunResponse
object. - poll_interval_seconds (
float
) – How often (in seconds) to poll for status updates. Defaults to 5. - timeout (
float | None
) – The length of time in seconds to wait for the job. Defaults to None.
Raises:
TimeoutError
– if waiting for the job timed out.
Returns:
RunResponse
– The status of the given job.
cancel_job
cancel_job(job: CoerceableToJobId) -> RunResponse
Cancel an existing job
Parameters:
- job (
CoerceableToJobId
) – the identifier of a job or aRunResponse
object.
Returns:
RunResponse
– A new job object.
auth_token
auth_token() -> str
Returns the current user's Fused environment (team) auth token