FAQ
General questions
Whom is Fused for?
Fused is designed for teams seeking to simplify their workflows and accelerate the creation and delivery of data products. It's ideal for organizations that need a scalable solution to handle growing data sizes while minimizing the time spent on data engineering.
Why Python, when there's spatial SQL?
Python is the go-to language for spatial data science. Although spatial SQL joins and transformations can be efficiently performed using PostGIS in an external database, you may eventually need to convert that data to Pandas and NumPy for further processing and analysis, especially for detailed operations on raster arrays. Additionally, you can run SQL directly on Fused using Python libraries like DuckDB, combining the strengths of both approaches.
What's the benefit of geo partitioning vector tables?
It enables efficient reading of large datasets by strategically partitioning GeoParquet files. Fused's GeoParquet format includes metadata that allows for spatial filtering of any dataset, loading only the chunks relevant to a specific area of interest. This approach reduces memory usage and allows you to work with any size dataset with just Python.
When should I ingest a file vs. load it as is?
You should ingest a file if it has a spatial component and you plan to visualize it or use it for downstream analysis. Ingesting allows for more efficient and lightweight repeated access. On the other hand, if the file is small (under 100 MB), fits into memory, and is intended for a one-off operation, you should load it as is. This approach avoids the overhead of ingestion for single-use or infrequent access scenarios.
Which authentication methods do you support?
Fused currently uses Auth0 to support authentication via Google and GitHub.
How do we configure Github integration?
To configure the integration, connect your GitHub repository and provide us with the repository name and details. We'll activate it for you.
Is there a way to set environment variables or secrets/API keys?
Save environment variables and secrets to an .env
file as shown here to make them available to UDFs as environment variables.
How can I share utility modules between UDFs?
Troubleshooting
Error: Access is not configured for you in the Fused Workbench. Please refresh the page if you think this is an error, or get in touch if you require further help. Cause: Realtime instance not configured.
This error occurs when you try to run a UDF with an account associated with a workspace environment that does not have a realtime instance configured. This means that there are no worker nodes available to run the UDF. To resolve this issue, please get in touch with the Fused team team to ensure your account is associated with an environment with a realtime instance.
When Troubleshooting this error, it may help to navigate to your account's User Profile page to determine if the account is associated with an environment and realtime instance, as shown here.
Error: No such file or directory: '/mnt/cache/'
This error occurs when a UDF attempts to access the /mnt/cache
disk when it is not available for the environment. To resolve this issue, please contact the Fused team to ensure that the cache directory is available for your account.
Error: No space left on the device: '/tmp/'
This error occurs when a UDF attempts to write more data than the /tmp
directory of the real-time instance can handle. Realtime instances have a limited amount of space available and are ephemeral between runs. You might want to consider writing to /mnt/cache
disk instead.
Error: Quota limit: Number of running instances
Fused batch jobs, which are initiated with run_remote, require a server quota to be enabled for your account. These include data ingestion jobs. If you encounter this error, please contact the Fused team to request an increase in the quota allotted to your account.