Skip to main content

Streamlining Infrastructure Risk Analysis with Fused

ยท 2 min read
Jacob Prince-Bieker
Senior ML Engineer @ VIDA

VIDA uses dozens of the latest generation of climate models to have the most up-to-date climate information, collectively part of the Coupled Model Intercomparison Project 6 (CMIP6). These models provide a range of possible futures, under different emission scenarios, as well as differences in how each model does its forecast. Using this information, we can create ensembles from the models, and have higher confidence in the risks and hazards we derive from the models, and which we present to our customers.

In this blog post, I show how I created a UDF to pre-process and visually inspect the Zarrs we generate as the output from our climate risk models.

info

Find Jacob's UDF code here.

As the Senior Machine Learning Engineer at VIDA, I work daily with large, multidimensional geospatial datasets to determine the future risks to infrastructure from climate change globally. These risks and hazards help organizations that invest in infrastructure to accurately assess investments, plan for climate resilient adaption where needed, and comply with new climate regulations such as the EU Taxonomy.

The engineering challenge to create climate modelsโ€‹

At VIDA, we derive dozens of risk metrics from CMIP6 and derive possible hazards from their outputs such as from flooding risk, drought, tornados, and many more. We write datasets to either Zarr or Cloud-Optimized GeoTiff so that we can efficiently load the data that we need. One of the best ways to sanity-check our outputs and the data we give our customers is through visualizing it.

Prior to using Fused, much of this visualization was ad-hoc, using tools such as matplotlib or xarray to generate plots locally and then sharing them as image files for each client site. While this does work, it builds a lot of friction into the feedback cycle, and to focus on a single site or area, we'd have to rerun our visualization pipeline.

Creating the UDFโ€‹

With Fused, we've been able to easily interactively pre-process and visually inspect the Zarrs that we generate as the output from our models. Being able to just share a link to a UDF to view the data in an interactive map has greatly sped up exploring the model outputs, and sanity check the data that we are using.

File

Conclusionโ€‹

Fused made it a lot easier for me to load, preprocess, and visualize the large datasets I work with and share those across my team. We create a lot of Zarrs which we put on the cloud, so our internal workflow of creating datasets, saving them as Zarrs, and sharing the outputs among the team significantly improved.

We now save engineering hours and maintain more simple workflows because Fused makes it easier to work with data and collaborate compared to sharing static PNGs with colleagues. Fused streamlines our internal workflows by enabling us to create Zarr datasets, upload them to Google Cloud, and efficiently build, run, and share climate models along with their outputs.