Skip to main content

ingest_nongeospatial

ingest_nongeospatial(
input: str | Path | Sequence[str, Path] | pd.DataFrame | gpd.GeoDataFrame,
output: str | None = None,
*,
output_metadata: str | None = None,
partition_col: str | None = None,
partitioning_maximum_per_file: int = 2500000,
partitioning_maximum_per_chunk: int = 65000
) -> NonGeospatialPartitionJobStepConfig

Ingest a non-geospatial dataset into the Fused partitioned format.

Parameters

  • input (str | Path | Sequence[str, Path] | pd.DataFrame | gpd.GeoDataFrame) – A DataFrame or a path to file or files on S3 to ingest. Files may be Parquet or another data format.
  • output (str | None) – Location on S3 to write the main table to.
  • output_metadata (str | None) – Location on S3 to write the fused table to.
  • partition_col (str | None) – Partition along this column for nongeospatial datasets.
  • partitioning_maximum_per_file (int) – Maximum number of items to store in a single file. Defaults to 2,500,000.
  • partitioning_maximum_per_chunk (int) – Maximum number of items to store in a single chunk. Defaults to 65,000.

Returns

NonGeospatialPartitionJobStepConfig – Configuration object describing the ingestion process. Call .execute on this object to start a job.

Example

job = fused.ingest_nongeospatial(
input=gdf,
output="s3://sample-bucket/file.parquet",
).execute()