ingest_nongeospatial
ingest_nongeospatial(
input: str | Path | Sequence[str, Path] | pd.DataFrame | gpd.GeoDataFrame,
output: str | None = None,
*,
output_metadata: str | None = None,
partition_col: str | None = None,
partitioning_maximum_per_file: int = 2500000,
partitioning_maximum_per_chunk: int = 65000
) -> NonGeospatialPartitionJobStepConfig
Ingest a non-geospatial dataset into the Fused partitioned format.
Parameters
- input (
str | Path | Sequence[str, Path] | pd.DataFrame | gpd.GeoDataFrame) – A DataFrame or a path to file or files on S3 to ingest. Files may be Parquet or another data format. - output (
str | None) – Location on S3 to write themaintable to. - output_metadata (
str | None) – Location on S3 to write thefusedtable to. - partition_col (
str | None) – Partition along this column for nongeospatial datasets. - partitioning_maximum_per_file (
int) – Maximum number of items to store in a single file. Defaults to 2,500,000. - partitioning_maximum_per_chunk (
int) – Maximum number of items to store in a single chunk. Defaults to 65,000.
Returns
NonGeospatialPartitionJobStepConfig – Configuration object describing the ingestion process. Call .execute on this object to start a job.
Example
job = fused.ingest_nongeospatial(
input=gdf,
output="s3://sample-bucket/file.parquet",
).execute()