Skip to main content

From query to map: Exploring GeoParquet Overture Maps with Ibis, DuckDB, and Fused

Β· 3 min read
Naty Clementi
Sr. Software Engineer - Ibis Project Committer

Naty is a Senior Software Engineer and a contributor to Ibis, the portable Python dataframe library. One of her main contributions was enabling the DuckDB spatial extension for Ibis in 2023.

In this blog post, she shows us how to leverage the spatial extension in DuckDB with Ibis to query Overture data. Ibis works by compiling Python expressions into SQL, you write Python dataframe-like code, and Ibis takes care of the SQL. Thanks to Ibis integration with Pandas and GeoPandas, you only need to do to_pandas() to get your expression as a GeoDataFrame.


info

Find Naty's UDF code here:

Motivations​

Overture Maps is emerging as a global backbone for linked spatial data. The Overture Maps Foundation, founded in 2022, is an open data initiative aiming to introduce the foundation for next-generation map services and applications. Overture offers reliable, easy-to-use, and interoperable open map data. Its structured schema enables consistent data access and seamless integration with other spatial data using unique identifiers.

Overture Maps has multiple datasets you can explore and query. Check their documentation to find out more about them. In this blog, we will explore the base theme and some of its infrastructure features.

Have you ever been on a walk as a tourist and wondered where you could get public drinking water or a bench to sit, or maybe you need to make a restroom stop, but you don't know where to find any of these?

We will find out how you can easily get this information from Overture data and visualize it, using Ibis and Fused. You'll be able to insert an address and see this key information.

Getting the data​

Overture distributes its datasets as GeoParquet, an optimized cloud-native format that enables you to easily get chunks of data from the official S3 bucket where the Overture dataset is published. The release of DuckDB 1.1.1 marked a significant milestone by adding support for GeoParquet files. This update enables users to query rich geospatial datasets like Overture Maps. Thanks to the support for DuckDB geospatial operations in Ibis, we can query the datasets from Overture Maps using Python with the performance of DuckDB without having to write SQL. Together, these tools offer an accessible and powerful solution for unlocking insights from global geospatial data.

Exploring Overture Maps​

Using Ibis with DuckDB as the backend, we can access the Overture Maps data directly from the cloud. These queries will only fetch the attributes and rows that meet specified conditions, minimizing how much data is processed locally. You can learn more about it in this DuckDB blog.

Creating a custom UDF​

I created a UDF to read an Overture maps GeoParquet file stored on S3 and filter it based on a given bounding box and column values using Ibis. You can access and run it here.

File

We first establish a connection to a DuckDB database, in this particular case we have an in-memory connection. Then, we do read_parquet and we receive a table expression. Since our result, t, is a table expression, we can now run queries against the file using Ibis expressions. In this example, to start, we filter by some infrastructure subtypes (pedestrian, and water), select only a few columns, and limit our β€œsearch” to a bounding box bbox. Notice that this bbox is the Fused bounding box, not the overture maps one.

We then rename the column class to avoid conflicts with the deferred operator, and finally filter the expression by specific infrastructure classes like toilets, ATMs, drinking water, and other useful information. Up to this point, we only have a table expression, Ibis has a deferred execution model. It builds up expressions based on what you ask it to do and then executes those expressions on request.

To show an example of an aggregate, we executed and printed the value_counts() as a Pandas DataFrame. Ibis can execute the table against the DuckDB backend, and return it as a Pandas DataFrame or a GeoPandas GeoDataFrame (if geometry column is present), by only doing to_pandas().

Conclusion​

The synergy between Ibis, DuckDB, and Fused has redefined the ease of querying and visualizing geospatial data. These frameworks provide an intuitive and powerful toolkit, enabling users to express geospatial queries, perform efficient transformations, and access high-performance analytics with minimal setup.

By leveraging this stack, interacting with vast geospatial datasets like Overture Maps becomes straightforward, efficient, and accessible.

Resources​

If you want to learn more about Ibis geospatial capabilities, check some of the geospatial blog posts here.

You might also find these resources useful as you dive into Ibis, DuckDB, and Overture: