Skip to main content

Frequently Asked Questions

General

Who can use Fused?

Fused is for teams looking to increase the speed at which they ship data products and features. It’s for companies looking for a stack that scales with them and the size of their data while shielding teams from hours of burdensome engineering. At a glance:

  • Data scientists find, reuse, and share code snippets.
  • Developers use the code to build responsive apps.
  • Executives use the apps to explore data interactively.
What is a UDF?

User Defined Functions (UDFs) are Python building blocks of geospatial operations that integrate across the stack - with Planetary Computer, Google Earth Engine, Big Query, Snowflake, DuckDB, and more. They load datasets from the cloud ecosystem such as NASA, NOAA, US Census, and Overture. In the Fused community, developers can find, reuse, and share verified code snippets and run them with their UDFs.

How should I conceptualize how Fused’s tools interface each other?

You Build UDFs with the Python SDK, preview them on the browser with Fused Workbench, and run them anywhere in your stack via the Hosted API.

  • UDF catalog. Community collection composable building blocks of geospatial operations.
  • Python SDK to Build and run workflows on any Python environment including Jupyter Notebooks, VSCode, or ETL tools.
  • Fused Workbench to interactively preview and explore data on the browser and see the effect of code changes instantly.
  • Hosted API. Supercharges UDFs and turns them into live HTTP endpoints that load the result anywhere they’re called.
What should I know when I consider integrating Fused into an existing stack?
  • Fused is the glue layer that integrates data platforms with data tools via a managed serverless API. It enables interoperability between all geospatial datasets and tools in the modern data stack.
  • This means that Fused does not compete with other tools - it creates a common ground for them to speak the same language. It’s an open system that brings together great systems originally designed for specific use cases - data formats and frameworks that historically didn’t talk to each other. Previously, teams looking to scale data science workflows to work with larger datasets needed to translate code and data between frameworks for bespoke “big data” operations - creating silos and friction.
  • Instead, Fused flips the paradigm on its head and instead processes small pieces of data in parallel. This enables teams to use their preferred development frameworks in production. Fused shields teams from the need for specialized “big data” frameworks, which creates new possibilities that weren’t there before.
How does Fused compare to other tools for distributed compute? How is it different from DuckDB, Spark, Modal, Ray, Databricks, or Dask?
  • You could think of Fused as: if Python could run on any size datasets and distributed compute were fun.
  • You can think of Fused like Spark on demand with a hotpool - but without the need to manage or configure infrastructure, without query plans and OOM errors, and without vendor lock-in. Fused’s open source Python SDK obviates these issues from the start, and has paid offerings for additional compute power when you need it, which is serverless and turns off on its own.
Aren’t there already solutions to serve tiles? How is this different from Mapbox, tippecanoe, and Titiler, Protomaps, and Earth Engine?
  • With Fused, companies create interactive apps with backend code that runs on demand and responds to events like parameter updates and viewport panning. This makes Fused a great choice to serve Tile based maps.
  • While Mapbox is great at serving static tiles, this results in read-only map experiences. Companies looking to personalize maps based on users or context, can use Fused UDFs to serve dynamic Tiles. This unlocks an unprecedented dynamism.
Why Python, when there’s spatial SQL?
  • Python is the lingua franca of spatial data science. While you can effectively perform spatial SQL joins and transformations between tables in an external database with PostGIS, at some point you will need to convert that data back to Pandas and NumPy for additional processing and analysis - especially for refined operations on raster arrays.
  • That said, it’s always possible to run SQL right on Fused with Python libraries such as DuckDB.
How can my team use Fused with a scheduler or orchestrator?
  • Fused brings interoperable workflows, apps, and maps to your preferred stack.
  • With Fused, you can trigger UDFs (which can be chained together) from any tools via HTTP calls. This reduces the need for complex ETL pipelines and shields your team from having to maintain infra.
  • You can easily call UDFs to load their output using your preferred scheduler or orchestrator tools like GitHub Actions, Airflow, Dagster, or Prefect workflows. You can either do it with Python using the SDK or with any tool that can make HTTP requests.
How does Fused pricing and features stack up against other popular map tools?
  • Running workflows at scale typically requires teams to manage infrastructure to host and run backend code, then size, provision, and scale infrastructure that then needs to be monitored for performance and availability. Fused offers an alternative.
  • The open source Fused SDK is a free Python library to work with datasets of any size in local environments. You can search and freely use code snippets from Fused’s public UDF catalog, and run them on their local environments.
  • To supercharge that same code with the power of serverless compute, you can check out Fused’s paid tiers. They do not require upfront investment. When requests are made to your UDF’s endpoint, your account is charged a low fee per request and duration. Getting started is easy because there are no new tools or frameworks to learn and you can use your most important Python libraries.
How does Fused run behind the scenes?
  • Fused runs lightweight Python operations on any size datasets, which empowers teams to build responsive maps, dashboards, and reports. User’s code runs on a) geo partitioned datasets b) in parallel, on c) serverless cloud infrastructure that is d) close to the data, and e) are strategically cached.
  • Fused instantly converts user code to workflows and maps in Jupyter notebooks, low-code web apps, the Fused Workbench webapp, ETL, or any tool that consumes HTTP API endpoints. When users pass Python code to the backend, the service manages all the capacity, scaling, processing, dependencies, and serving to run your code and provide visibility and performance by surfacing real time metrics and logs.
  • To visualize what’s happening behind the scenes, you can think of Fused as a geospatial OLAP-inspired query engine.
How does Fused enable operations that are not possible present alternatives?
  • Fused allows people for the first time to easily work with geospatial data and integrate it with modern data tools.
  • Today, the advent of earth observation imagery and sensor networks produce large scale geospatial datasets that are migrating to open cloud-enabled formats. However, the sheer volume of data, complexity of operations, and fragmentation of tooling holds back how companies process and present that data to make informed decisions about critical company operations.
  • With traditional approaches, the sheer size of data results in long-running jobs that take a long time simply load the entire dataset into a runtime, and the size of the different operations calls for specialty tooling that also incur a cost to transition data between tools and back.
  • Instead of loading the entire dataset, Fused brings only the pieces of data relevant to the time and geographical area to be studied. This minimizes the cost and penalties of moving data between tools, and working with small data (in parallel) means it runs on small virtual machines using just Python.
  • This framework’s data architecture enables development environments and apps to be fast because only the relevant data is loaded, then cached. So it’s not just about speed - this significantly simplifies system architecture and therefore enables operations where it wasn’t possible before. Performance is critical in mapping apps because neither developers nor end users like to wait - and maps only create value when people use them.
What datasets can I access with Fused today?
  • Developers looking to try out Fused use the Fused Python library to freely download the Overture buildings and transportation datasets hosted in Source Coop.
  • Fused also hosts partitioned versions of public datasets such as US census, buildings (Microsoft and Open Street Map), NOAA weather, Sentinel & Modis multiband imagery, Chloris Biomass, flood (Sentinel), DEM, LULC, NAIP, HLS, and other datasets.
  • Fused works with open data that 3rd party providers host in Amazon S3, such as AWS’s open registry.
  • To work with your own datasets, Fused makes it easy to geopartition and load your data into your own S3 bucket.
What does Fused’s Geoparquet partitioning data format enable?
  • It enables teams to develop in production - to run code on any scale data without the friction of infrastructure.
  • The power of Fused to do efficient reading of large datasets is enabled by strategic partitioning in Geoparquet files. Fused’s Geoparquet format contains metadata that enables the system to select only the chunks of data relevant to an operation.
  • Acting on smaller pieces of data means that the compute to load can be distributed across many small machines, and because the operation is lightweight, it can be done with just Python and libraries such as GeoPandas and Xarray that developers already use for geospatial data science. This eliminates the need to translate code to classical “big data” standards such as Geospark and Scala. The incumbent alternative was to load entire datasets into memory, convert it into a format that a geo database can ingest, push it tot the database, perform an operation, then send the data back to Python. Instead, Fused runs the same Python code over the data with a lightweight worker.
'Quota limit: Number of running instances'

Fused ingest and batch job operations require workers, specified with a quota number. If you encounter this error, please get in touch with the Fused team to increase the quota allotted to your account.

Workbench-specific

How do I create an account to use Workbench?

Through the private release period, the Fused team will grant access to pre-registered accounts on a rolling basis.

Fill out this form to get on the waitlist. You'll receive an email when your account is ready to go.

What is the difference between Fused Workbench and fused-py Python SDK?

You can think of Workbench as a sandbox environment where you iteratively work on your UDF code - it uses only a subset of data and gives instant feedback. Once the code is ready and you want to run it at scale across an entire dataset, you can run your UDF with fused-py.

What if I want to add a new dependency to the Python runtime?

Fused simplifies Python dependencies management by issuing a single image that contains all dependencies requested by the community. Email Fused to add a dependency to the runtime or if you’d like a private runtime image issued for your organization.

What is the bbox argument of a Tile UDF?

The bbox argument of the Tile UDF is a GeoDataFrame with a single row, which contains the Tile’s boundaries as a Polygon, and its x, y, & z coordinates.