ingest_nongeospatial

ingest_nongeospatial(
    input: str | Path | Sequence[str, Path] | pd.DataFrame | gpd.GeoDataFrame,
    output: str | None = None,
    *,
    output_metadata: str | None = None,
    partition_col: str | None = None,
    partitioning_maximum_per_file: int = 2_500_000,
    partitioning_maximum_per_chunk: int = 65_000
) -> NonGeospatialPartitionJobStepConfig

Ingest a dataset into the Fused partitioned format.

Parameters

input (str | Path | Sequence[str, Path] | pd.DataFrame | gpd.GeoDataFrame) – A GeoPandas GeoDataFrame or a path to file or files on S3 to ingest. Files may be Parquet or another geo data format.
output (str | None) – Location on S3 to write the main table to.
output_metadata (str | None) – Location on S3 to write the fused table to.
partition_col (str | None) – Partition along this column for nongeospatial datasets.
partitioning_maximum_per_file (int) – Maximum number of items to store in a single file. Defaults to 2,500,000.
partitioning_maximum_per_chunk (int) – Maximum number of items to store in a single file. Defaults to 65,000.

Returns

Configuration object describing the ingestion process. Call .execute on this object to start a job.

Example

job = fused.ingest_nongeospatial(
    input=gdf,
    output="s3://sample-bucket/file.parquet",
).execute()

Parameters​

Returns​

Example​

Parameters

Returns

Example