FAIR Data Principles

April 10, 2026

If you've submitted a data management plan to a funder in the last five years, you've almost certainly encountered the term 'FAIR data.' If you've applied for US federal funding since 2022, you've seen it there too. FAIR principles are now referenced in journal data policies, institutional strategies, and funder requirements across the globe.

But what do they actually mean in practice? And for a climate researcher working with observational datasets, model outputs, and remote sensing products — what does being FAIR actually require you to do?

FAIR stands for Findable, Accessible, Interoperable, and Reusable. The principles were published in 2016 in Scientific Data and have since become the dominant framework for good research data practice globally. They apply to both data and the software or algorithms used to process it.

The Four Principles

F

Findable

Data can be found by both humans and machines. It has a persistent identifier (DOI), rich metadata, and is registered or indexed in a searchable resource.

 

A

Accessible

Once found, data can be retrieved using a standardised, open protocol. Access conditions (including any restrictions) are clearly documented and machine-readable.

 

I

Interoperable

Data uses formal, shared vocabularies and formats so it can be integrated with other datasets and used with different tools and applications.

 

R

Reusable

Data is richly described with accurate, relevant attributes and a clear data usage license, so it can be replicated and combined in different settings.

F — Findable: What It Means in Practice

The FAIR principles specify four sub-requirements for Findability:

  • F1: Data and metadata are assigned a globally unique and persistent identifier (a DOI is the standard for datasets).

  • F2: Data is described with rich metadata — not just a title and author, but scope, methodology, time range, spatial coverage, and variable descriptions.

  • F3: Metadata clearly and explicitly includes the identifier of the data it describes (i.e. the DOI appears in the metadata record, not just on the landing page).

  • F4: Data and/or metadata are registered or indexed in a searchable resource (a data repository like Panthaion, DataCite, or a domain-specific catalogue).

For climate researchers, the practical implication is: don't share data via a lab website URL. Upload it to a repository that provides DOI registration and metadata indexing. A dataset shared as a Google Drive link is not FAIR — even if the data itself is excellent.

A — Accessible: What It Means in Practice

Accessible does not mean free. It means the access conditions are clearly specified and retrievable:

  • A1: Data and metadata are retrievable by their identifier using a standardised communications protocol (e.g. HTTPS). This means the DOI resolves to a landing page where the data can be downloaded.

  • A1.1: The protocol is open, free, and universally implementable (HTTPS qualifies; proprietary APIs do not).

  • A1.2: The protocol allows for an authentication and authorisation procedure where necessary — so restricted datasets can still be FAIR if access conditions are clearly documented.

  • A2: Metadata remains accessible even if the data itself is no longer available. The landing page and metadata record should persist even after a dataset is withdrawn or embargoed.

FAIR ≠ Open. A restricted dataset — one that requires a data sharing agreement to access, for example — can still be fully FAIR if its metadata is public, its access conditions are clear, and its identifier is persistent.

 I — Interoperable: What It Means in Practice

Interoperability is the most technically demanding FAIR principle for climate researchers. It requires:

  • I1: Data and metadata use a formal, accessible, shared, and broadly applicable language for knowledge representation. For climate data, this typically means using standard formats (NetCDF, CSV, Parquet) and controlled vocabularies (CF Conventions, GCMD keywords).

  • I2: Data and metadata use vocabularies that follow FAIR principles themselves. Variable names should map to a controlled vocabulary — 'sea_surface_temperature' rather than 'sst_val_col3'.

  • I3: Data and metadata include qualified references to other data — for example, linking a derived dataset to the source datasets it was computed from, using their DOIs.

For climate science specifically, the CF (Climate and Forecast) Conventions are the primary interoperability standard for gridded data. ACDD (Attribute Convention for Data Discovery) conventions apply to NetCDF files. For tabular data, GCMD (Global Change Master Directory) keywords are widely used for variable naming.

In practice: use standard file formats, name your variables consistently with established vocabularies, and link to source datasets where yours is derived.

R — Reusable: What It Means in Practice

Reusability closes the loop. A dataset that is findable, accessible, and interoperable is still not fully FAIR if another researcher cannot legally or practically reuse it:

  • R1: Data and metadata are richly described with a plurality of accurate and relevant attributes — enough for a researcher who was not involved in data collection to understand and use the data correctly.

  • R1.1: Data and metadata are released with a clear and accessible data usage license. CC BY 4.0 is the standard for open research data — it allows any use with attribution.

  • R1.2: Data and metadata are associated with detailed provenance — how the data was collected, processed, and by whom.

  • R1.3: Data and metadata meet domain-relevant community standards — for climate data, this means conforming to conventions like CF, following relevant ISO standards, and including the metadata fields expected by your domain's data infrastructure.

Why FAIR Matters for Climate Science Specifically

Climate science has a multi-decadal data problem. Long time-series are fundamental to understanding climate trends — but only if the datasets from 1980, 1995, and 2020 can actually be combined and compared. FAIR principles address this directly:

  • Interoperable data uses consistent variable names and controlled vocabularies, making it possible to integrate data from different sources and eras without manual harmonisation.

  • Reusable data includes provenance records that allow future researchers to understand methodological changes over time — critical when comparing data collected under different protocols.

  • Findable and accessible data means that the measurements collected today will still be discoverable and retrievable in 2050, when they may be the crucial baseline for a new generation of climate researchers.

The IPCC assessment reports are possible because climate scientists have been, imperfectly but increasingly, sharing FAIR data for decades. Every step toward better FAIR compliance makes the next assessment report more comprehensive and more reliable.

 A Practical FAIR Checklist for Climate Datasets

  • Findable: Register a DOI. Write rich metadata including spatial/temporal scope, methodology, variables, and keywords. Upload to an indexed repository.

  • Accessible: Ensure the DOI resolves to a landing page with a direct download link. If the dataset is restricted, document the access conditions clearly in the metadata.

  • Interoperable: Use standard formats (NetCDF, CSV, Parquet). Name variables using CF Conventions or another domain vocabulary. Include DOIs for source datasets in your metadata.

  • Reusable: Assign a license (CC BY 4.0 for open data). Document methodology and provenance fully. Include creator ORCIDs. Follow domain metadata standards.

Panthaion's QA scoring system evaluates datasets against many of these criteria automatically, giving you a rapid signal of how FAIR-compliant your data is before it reaches reviewers or the wider research community.

Explore and contribute climate datasets at panthaion.org.