Why UDFs
Fused is a data analytics platform to write and deploy Python User Defined Functions (UDFs) behind HTTP endpoints and interactive applications.
- Read files in cloud storage with UDFs
- Write and share UDFs with ease
- Run UDFs from anywhere with simple HTTP calls
- Scale and parallelize without managing infrastructure
- Create apps that run UDFs
The problem
Fused UDFs directly address the problems of productionizing data science and analytical workflows, including:
- Difficulty sharing and reproducing Notebooks
- Slow iteration cycles in development
- Limited discoverability of data within an organization
- Poor reusability of code snippets across projects
- Friction transpiling code to production
- Managing workflow infrastructure
- Inefficient data delivery to applications
- Slow performance of analytical apps
Fused UDFs address these issues by standardizing how Python code is written, shared, and run.
What is a UDF?
UDFs are Python functions that can be called from anywhere to apply a specific operation to data. For every UDF, Fused creates an endpoint that can be called to run the function code and return its output. This makes Fused easy to integrate with data applications and deliver dynamically generated data on demand.
You can think of UDFs as versatile building blocks to load and transform data across a range of use cases, including geospatial. They can act, for example, as virtual datasets, file readers, and workflow tasks.
Virtual Datasets
UDFs can be used as virtual datasets, similar to database views, to deliver data behind an HTTP endpoint. They can return data in formats defined at call time based on the needs of the client application, such as tiffs, Parquet, GeoJSON, and others. This eliminates the need to pre-process or transfer datasets ahead of time.
File Readers
UDFs can also be used to open files of various formats, like Parquet, CSV, and GeoJSON. This provides a standard interface to easily explore files in cloud object storage and eliminates the need to move, copy, or transform entire datasets.
Workflow Tasks
UDFs can serve as reusable tasks in analysis pipelines that easily integrate with 3rd party applications to load, process, and write data. Multiple UDFs can be chained and called in parallel to create complex workflows that run and return data on demand via HTTP requests.