Skip to main content

register_dataset

Register a dataset for indexed queries.

fused.register_dataset(
dataset_path: str,
base_url: str | None = None
) -> dict[str, Any]

This function registers a directory in your file storage as a dataset, enabling fast geospatial queries using H3 indexing.

Parameters

  • dataset_path (str) – Path to the dataset directory. The path should point to a directory containing parquet files.
  • base_url (str | None) – Base URL for API. If None, uses current environment.

Returns

  • dict – Dictionary with registration results:
    • dataset_id: ID of the created/updated dataset
    • location: Normalized URL of the dataset
    • visit_status: Status of the dataset visit (success/timeout/error)
    • items_discovered: Total number of items found
    • new_items: Number of new items added

Raises

  • requests.HTTPError – If the API request fails.

Example

import fused

# Register a dataset from your storage
result = fused.register_dataset("s3://my-bucket/my-data/buildings/")
print(f"Registered dataset with ID: {result['dataset_id']}")
print(f"Found {result['items_discovered']} files")
note
  • Regular users can use any storage paths they have access to
  • Datasets are registered as private (only accessible to your team)
  • Files are automatically queued for metadata extraction

See also