Overview#

The Field Compression Laboratory aims to evaluate the impact of lossy compression on the accuracy of meteorological quantities used in numerical weather prediction. The current framework includes a Python library (fcpy) and example notebooks. Currently, we support latitude/longitude and Gaussian gridded data in netCDF and GRIB formats.

Prerequisites#

Set up#

To set-up or update the environment with the required dependencies and download sample data used in the examples run the following command from the command-line interface:

scripts/conda_init.sh

How to use#

Below is a minimal example of how to use fcpy to compare the effects of lossy compression on the relative error of specific humidity q. To set up an experiment in fcpy you need to load a GRIB or NetCDF dataset and create an fcpy suite defining a baseline, a list of compressors, and the type of metrics. If you wish to plot the data you can use the helper methods or functions provided or use your own.

# There is a 30-second wait
# because of how we import julia packages
import matplotlib.pyplot as plt

import fcpy

# Loads data as an xarray Dataset
ds = fcpy.open_dataset("data/cams_q_20191201_v3.nc")
# Only select specific humidity q
ds = ds[["q"]]

# Define the suite. Here instead of telling fcpy how many
# bits to iterate through, we let it figure out based
# on Klöwer et al. (2021)'s bit-information metric.
suite = fcpy.Suite(
    ds=ds,
    baseline=fcpy.Float(bits=32),
    compressors=[
        fcpy.Round(),
        fcpy.Log(fcpy.LinQuantization()),  # <- nested compressor
    ],
    metrics=[fcpy.RelativeError, fcpy.AbsoluteError],
    bits=None,  # <- computes number of bits using Klöwer et al. (2021)'s bit-information
)

# Plot the maximum relative error per bit and compressor combination
suite.lineplot(fcpy.RelativeError, reduction="max")
plt.savefig("sample.png", dpi=300)

For options please refer to the API documentation.

Example notebooks#

The easiest to start is by running the Jupyter Notebooks under notebooks/ with the following command:

scripts/conda_run_notebooks.sh

There you will see two example notebooks named examples-interactive and examples-programmatic. The former shows how to call interactive plots and the latter programmatically.

Command-line interface#

The fcpy command-line interface offers an easy way to determine the number of bits required per variable and dimensions in a CSV table.

fcpy --input data/cams_q_20191201_v3.nc --vars q --subset lev=0-10

This will create the following CSV output table:

var_name,lev,compressor,bits,sigmas
q,1.0,Round,14.0,0.32247692346572876
q,2.0,Round,14.0,0.4317324161529541
q,3.0,Round,14.0,0.493638277053833
q,4.0,Round,15.0,0.5703839063644409
q,5.0,Round,15.0,0.6555898189544678
q,6.0,Round,16.0,0.774020254611969
q,7.0,Round,17.0,0.8185981512069702
q,8.0,Round,18.0,0.8566567897796631
q,9.0,Round,19.0,0.8976887464523315
q,10.0,Round,17.0,0.9218334555625916
q,1.0,LinQuantization,13.0,0.9965649843215942
q,2.0,LinQuantization,13.0,0.9766294956207275
q,3.0,LinQuantization,12.0,0.966712474822998
q,4.0,LinQuantization,12.0,0.9503071904182434
q,5.0,LinQuantization,12.0,0.9498687982559204
q,6.0,LinQuantization,14.0,0.9693909883499146
q,7.0,LinQuantization,15.0,0.9825363159179688
q,8.0,LinQuantization,15.0,0.9840608239173889
q,9.0,LinQuantization,16.0,1.0056097507476807
q,10.0,LinQuantization,16.0,0.9886303544044495

For more info on how to use the tool, run fcpy --help:

usage: fcpy [-h] --input INPUT [--output OUTPUT] [--dtype {float32}] [--compressors COMPRESSORS [COMPRESSORS ...]] [--vars VARS [VARS ...]]
            [--subset SUBSET [SUBSET ...]]

options:
  -h, --help            show this help message and exit
  --input INPUT         Dataset (.nc or .grib) or MARS request file (.json)
  --output OUTPUT       Output folder
  --dtype {float32}     Convert data to different type
  --compressors COMPRESSORS [COMPRESSORS ...]
                        Example: --compressors Float,LinQuantization Log,Round
  --vars VARS [VARS ...]
                        Variables to process, otherwise all
  --subset SUBSET [SUBSET ...]
                        Variables to subset, e.g. --subset level=0-5