fused.h3
fused.h3.run_ingest_raster_to_h3
run_ingest_raster_to_h3(
src_path: str | list[str],
output_path: str,
metrics: str | list[str] = "cnt",
res: int | None = None,
k_ring: int = 1,
res_offset: int = 1,
chunk_res: int | None = None,
file_res: int | None = None,
overview_res: list[int] | None = None,
overview_chunk_res: int | list[int] | None = None,
max_rows_per_chunk: int = 0,
target_chunk_size: int = 10000000,
debug_mode: bool = False,
remove_tmp_files: bool = True,
overwrite: bool = False,
extract_kwargs: bool = {},
partition_kwargs: bool = {},
overview_kwargs: bool = {},
**kwargs: bool
)
Run the raster to H3 ingestion process.
This process involves multiple steps:
- extract pixels values and assign to H3 cells in chunks (extract step)
- combine the chunks per partition (file) and prepare metadata (partition step)
- create the metadata
_samplefile and overviews files
Parameters:
- src_path (
str, list) – Path(s) to the input raster data. When this is a single path, the file is chunked up for processing based ontarget_chunk_size. When this is a list of paths, each file is processed as one chunk. - output_path (
str) – Path for the resulting Parquet dataset. - metrics (
str or list of str) – The metrics to compute per H3 cell. Supported metrics are either "cnt" or a list containing any of "avg", "min", "max", "stddev", and "sum". - res (
int) – The resolution of the H3 cells in the output data. The pixel values are assigned to H3 cells at resolutionres + res_offsetand then aggregated tores. - k_ring (
int) – The k-ring distance at resolutionres + res_offsetto which the pixel value is assigned (in addition to the center cell). Defaults to 1. - res_offset (
int) – Offset to child resolution (relative tores) at which to assign the raw pixel values to H3 cells. - file_res (
int) – The H3 resolution to chunk the resulting files of the Parquet dataset. By default will be inferred based on the target resolutionres. You can specifyfile_res=-1to have a single output file. - chunk_res (
int) – The H3 resolution to chunk the row groups within each file of the Parquet dataset (ignored whenmax_rows_per_chunkis specified). By default will be inferred based on the target resolutionres. - overview_res (
list of int) – The H3 resolutions for which to create overview files. By default, overviews are created for resolutions 3 to 7 (or capped at a lower value if theresof the output dataset is lower). - overview_chunk_res (
int or list of int) – The H3 resolution(s) to chunk the row groups within each overview file of the Parquet dataset. By default, each overview file is chunked at the overview resolution minus 5 (clamped between 0 and theresof the output dataset). - max_rows_per_chunk (
int) – The maximum number of rows per chunk in the resulting data and overview files. If 0 (the default),chunk_resandoverview_chunk_resare used to determine the chunking. - target_chunk_size (
int) – The approximate number of pixel values to process per chunk in the first "extract" step (only used when ingesting a single file). - debug_mode (
bool) – If True, run only the first two chunks for debugging purposes. Defaults to False. - remove_tmp_files (
bool) – If True, remove the temporary files after ingestion is complete. Defaults to True. - overwrite (
bool) – If True, overwrite the output path if it already exists, by first removing the existing content before writing the new files. Defaults to False, in which case an error is raised if theoutput_pathis not empty. - extract_kwargs (
dict) – Additional keyword arguments to pass tofused.submitfor the extract step. - partition_kwargs (
dict) – Additional keyword arguments to pass tofused.submitfor the partition step. - overview_kwargs (
dict) – Additional keyword arguments to pass tofused.submitfor the overview step.
The extract, partition and overview steps are run in parallel using
fused.submit(). By default, the function will first attempt to run this using
"realtime" instances, and retry any failed runs using "large" instances.
You can override this behavior by specifying the engine, instance_type,
max_workers, n_processes_per_worker, etc parameters as additional
keyword arguments to this function (**kwargs). If you want to specify
those per step, use extract_kwargs, partition_kwargs, and overview_kwargs.
For example, to run everything locally on the same machine where this
function runs, use:
run_ingest_raster_to_h3(..., engine="local")
To run the extract step on realtime and the partition step on medium instance, you could do:
run_ingest_raster_to_h3(...,
extract_kwargs={"instance_type": "realtime", "max_workers": 256, "max_retry": 1},
partition_kwargs={"instance_type": "medium", "max_workers": 5, "n_processes_per_worker": 2},
)