Beacon, one of the products we’re building at Path and Focus, requires access to weather forecast information for all of Canada. Accessing and parsing forecasting data was challenging, but we were able to come up with a way to process forecast data files in GRIB2 format from the Meteorological Service of Canada (MSC)’s data service.
- using MSC Datamart services
- using GRIB2 output from the High Resolution Deterministic Prediction System (HRDPS)
- using Python to parse GRIB2 and connecting it with our core Node.js GraphQL API
MSC provides raw weather, water, climate, and environmental data for public use — see here for a complete list.
HRDPS GRIB2 data
We were specifically interested in accessing numerical forecasting data for British Columbia, namely temperature, relative humidity, precipitation, and wind speed. MSC provides various formats and different prediction models that vary in terms of how far in the future the data is predicted for, geographic location, and resolution.
Of the models that MSC provides, the High Resolution Deterministic Prediction System (HRDPS) data, which covers the majority of North America, was what we ended up using due to it having the highest resolution (2.5km), covers all of BC (the province that we’re currently interested in), and provides a 48-hour forecast. The HRDPS has varying domains (ex. East, Prairies, West). We used the Continental domain grid as it covered all of BC (and a big chunk of North America).
For weather data that forecasts further out into the future, another model that might also be useful is the Regional Deterministic Prediction System which provides forecasts 84 hours in the future, however the resolution of the data is 10km.
Before moving on to how we parsed it, it might be useful to understand how HRDPS files are organized in Datamart. The HRDPS model runs every 6 hours, so Datamart has directories for 00, 06, 12, and 18 hours UTC.
Within these directories, there is a collection of GRIB2 files for every hour up to 48 hours from the model run time. For example, if you are interested in the forecast for tomorrow at 2:00 am UTC (7:00 pm PT today), you can check a forecast that was run at 06:00 UTC for 20 hours in the future by selecting the 020 directory.
Within each of the hourly directories, there is one file for each weather variable. The variables are displayed in the file name.
For a list of variables in the HRDPS model, this variable table may be useful. While it is slightly out of date with the current files, it may help point you in the right direction for what particular GRIB2 file to check out. In our case, the variables we were interested in were:
- Relative Humidity: RH_AGL-2m
- Wind Speed: WIND_AGL-10m
- Temperature: TMP_AGL-2m
- 24-hour precipitation: APCP_SFC
For some variables, it’s important to know what vertical level the data is being forecasted at. For example, we chose a wind speed forecast that was 10m above surface level.
Once we narrowed down the variables we needed and determined how to interpret the folder structure, we were able to start collecting and parsing the data.
Parsing GRIB2 files
The request from our core API:
The response from the Python microservice:
We chose to use the Xarray Python package to open the GRIB2 file as it is a helpful tool to read multidimensional arrays such as GRIB files. For Xarray to read GRIB2 files it also needs the cfgrib package (Xarray provides a general tutorial for this process) and you must have ecCodes installed on your machine.
For each weather type, the data variable will differ. For example, in the file above for air temperature (TMP_AGL-2m in Datamart) the data variable is called t2m. For relative humidity (RH_AGL-2m in Datamart) the variable is called r2. The data in the GRIB2 file span all across the continent, so to get a specific forecast value, you must query a coordinate (lat, long) location that you are interested in.
Building the GRIB API Python server
We used Flask to build our microservice. If you’re familiar with Node, Flask is similar to Express.
A basic Flask application scaffold looks like this:
The next step was to write the parse route handler. It is responsible for downloading an HRDPS file and calling our parsing logic.
The query_hrdps function finds and returns the nearest HRDPS value for a given attribute for each query.
In this case, it takes in the hrdps data frame that was created above. It takes in the query parameters in the POST request and the attribute that we care about. In the file above the attribute is t2m but may differ based on what type of weather data you are interested in.
Let’s take a look at the query_hrdps function. We use a cdist function to help calculate the distance between two pairs of latitudes and longitudes. This is necessary as the latitude and longitude we put in the request object may not exist in the data file. To find a weather value for that location, the best approximate is to find the closest data point to the provided query point.
And now we have our microservice!
Collecting GRIB2 files
To retrieve new forecasts in real-time, MSC has set up an Advanced Messaging Queuing Protocol (AMQP) server so that the application you are building can subscribe to new events. These events are notification objects containing a URL from which the new files can be requested. The developer can target notifications for a specific set of files based on their needs. MSC provides some docs on AMQP to help get started. We utilized the amqplib npm library to set up a connection on our core API.
This is a general guide of how we set up AMQP, fetching temperature forecasts as an example:
Then, for each GRIB2 file notification event, we can make a request to our microservice:
This will print the forecasted temperature data at the given latitude and longitude for every hour of each new forecast release!
MSC has various data formats, forecasting models, and even a Geomet API service that can also be helpful. Though HRDPS GRIB2 files are a fraction of the data they provide, we hope this can be used as a starting point for accessing public Canadian forecasting data. If you have any feedback or suggestions or questions feel free to send us an email.
If you’re curious as to one of the ways we’re using forecasting data, check out our latest tool: Beacon. It’s an app built for folks that work out in heavily forested areas that uses weather data to calculate forest fire danger ratings and the corresponding work restrictions. Not only does Beacon display the current and historic danger ratings, but it also uses forecasted weather data to calculate what the danger rating may be 48 hours in the future. This helps workers make more informative decisions on whether it’s safe to work or not and plan for upcoming shifts.
If you’re interested in our work at Path and Focus, or have any feedback, questions, or suggestions, we’d love to hear from you! Get in touch and we’ll see how we can help.