Metaclip: a tool to keep track of the climate data journey #PredictiaPapers

Wednesday, November 25, 2020

Since it is sowed in the wheat field until it reaches our table in the form of a bread loaf, a wheat grain suffers many transformations. The same goes for climate and meteorological data. Since they are captured through satellites and in-situ sensors or calculated on supercomputers, until they reach a decision-maker, datasets and forecasts suffer many transformations and processes. And just as we know that the food in our fridge is safe to eat because there’s a chain of food safety protocols that we can follow, the quality of climate data must be ensured along the process. However, this is no easy task. Climate and meteorological datasets are huge, containing thousands of measures for different variables. All of them undergoes quality assessments, modifications and calculations. So today we want to talk about METACLIP, a tool we developed a while ago, that allows climate data users to assess the quality, reliability and trustworthiness of the data they are using. Or, as experts call it, data provenance.

If you want to read the original paper, with all the details about METACLIP ontologies, please go to "The METACLIP semantic provenance framework for climate products"

Spatial plots of RPSS

Before we go deep into detail, we want you to experience METACLIP. So, download the map above, enter, and drag & drop the image to the tool. You should see something like the interactive graph below. It contains useful information, such as: the actors involved in producing the data, the processes it’s been through and how they all relate to each other. It even has the interpolation method used to treat the data. Useful, right?

Graph shown by METACLIP

To develop METACLIP, our colleagues didn’t start from scratch. Climate is not the only field dealing with data provenance—that is, keeping a record of the people, institutions, entities and activities that are involved in the data journey. Logistics, manufacturing, oil and gas, industry 4.0… The rise of the Internet of Things has made a ton of data available to multiple sectors, that are now in need of ways to ensure the quality and reliability of the data. That’s one of the reasons why the World Wide Web Consortium, the international organisation defining standards for the web, established the Resource Description Framework (RDF). It’s a standard model for data interchange on the Web. METACLIP has its roots in the RDF, and provides a semantic description for climate products (maps, plots, datasets…). In particular, it offers vocabularies for:

  • Datasource: describes the origin of the input data and the transformations the data has gone through, such as subsetting, aggregation, anomalies, PCA or climate indices. It also establishes the links between the different transformation commands and arguments in each step.
  • Calibration: encodes the metadata describing the statistical adjustments applied to the climate data: the bias adjustment you can apply through Climadjust; downscaling techniques; or other methods such as variance inflation or ensemble recalibration. The calibration vocabulary follows the framework designed by VALUE, a COST Action European initiative to systematically validate and improve downscaling methods in climate research.
  • Verification: establishes the the metadata related with the verification of seasonal forecast products, describing the verification measures applied, as well as describing the the verification aspect that each measure addresses.In addition, this vocabulary also provides a conceptual scheme to define other forms of climate validation.
  • Graphical: this aims at decribing graphical product, like charts and maps, including a characterization of uncertainty types and how they are communicated.

METACLIP includes previously existing ontologies and schemas, like the Element Set from Dublin Core Metadata Initiative, GeoSPARQL or the PROV Ontology. METACLIP is on constant development. In the end, our hope is to establish a standard metadata for climate products. This would guarantee that, just as we know that we can eat safely thanks to the food safety protocols, we have a realiable climate data diet. Because it’s in our best interest to be as fit as possible in the fight against climate change.