What Makes a High QA Score

April 10, 2026

Not all scientific datasets are created equal, and a QA score tells you exactly where a dataset stands. On Panthaion, scores reflect five core dimensions: completeness, consistency, accuracy, metadata quality, and format. A dataset can pass every automated check and still fall short if the documentation is thin or the methodology doesn't hold up to expert review. This post breaks down what each dimension measures and the practical steps that separate a high-scoring dataset from an average one.

If you've browsed datasets on Panthaion, you've noticed that each one carries a QA score, a number between 0 and 100 that appears alongside the dataset title. Some datasets score 92 or 93. Others come in at 63 or 65. A handful haven't been scored yet at all.

What does that number actually mean? How is it calculated? And, if you're a researcher preparing to publish your own data, what do you need to do to achieve a high score?

This post explains the mechanics behind scientific dataset QA scoring, the specific checks that matter most, and the practical steps that separate an excellent dataset from an average one.

A QA score is a standardised, reproducible measure of a dataset's fitness for scientific use. It is not a measure of how interesting the data is, or how significant the findings are. It measures whether the data itself is complete, accurate, consistent, and well-documented enough to be reliably used by other researchers.

The Score Bands On Panthaion, QA scores fall into four bands:

Score Band What it means
90–100 Excellent Passes all major checks with minimal issues. Suitable for direct use in peer-reviewed research.
75–89 Good Solid dataset with minor gaps or documentation issues. Useable with awareness of limitations.
60–74 Fair Notable issues in one or more dimensions. Requires careful assessment before use.
Below 60 Poor Significant quality problems. Not recommended for primary use without substantial remediation.

The Five Pillars of a High QA Score QA scoring evaluates datasets across five dimensions. Each contributes to the overall score, and weakness in any one dimension drags the total down significantly.

  1. Completeness Completeness is often the largest single contributor to a score. Automated checks measure:

What percentage of expected rows are present (based on the declared time range and frequency) The null/missing value rate per column, a column that is 40% null is a serious completeness problem Whether all declared variables are actually present in the file

To maximise your completeness score: fill gaps where possible using documented interpolation methods, clearly flag any remaining gaps with a consistent null representation (e.g. -9999 or NaN, never blank), and document known gaps in the methodology.

Consistency Consistency checks look for internal coherence. Automated tests include:

Unit consistency — are the same units used throughout every column that shares a variable type? Null encoding consistency — is the same sentinel value used for all missing data, or do some rows use blanks while others use -999? Temporal consistency — are timestamps monotonically increasing? Are there duplicate timestamps? Cross-variable plausibility — do related variables agree? (e.g. dew point should not exceed air temperature)

To maximise your consistency score: standardise all units before upload, use a single consistent null encoding throughout, and run basic cross-variable logic checks on your data before submission.

Accuracy Signals Accuracy is harder to assess automatically than completeness or consistency, but several proxy checks are applied:

Physical plausibility bounds — values are checked against the known physical range for each variable type (e.g. temperatures below absolute zero, precipitation above recorded maxima)

Statistical outlier detection — values more than a defined number of standard deviations from the local mean are flagged for review 

Comparison against reference climatologies where applicable

To maximise your accuracy score: run your own outlier detection before upload, document your instrument calibration procedures, and include uncertainty estimates where available. Expert reviewers will also assess whether your stated methodology is consistent with the data you've provided.

Metadata Quality A technically perfect dataset with poor metadata will still score below 90. Metadata checks assess:

Title — is it specific and descriptive?

Description — does it explain what the data contains, its geographic scope, time range, and collection methodology?

Creator — are individual contributors or the responsible organisation named, with ORCID where available?

License — is a clear data license specified?

DOI — is a persistent identifier present? Tags/keywords — are relevant domain-specific keywords included?

Metadata is the part of QA that is entirely within your control before you upload. A dataset that scores 72 due to metadata gaps can often reach 85+ with an hour of documentation work.

Format and Accessibility The final dimension covers whether the dataset is in a format that enables reuse:

File format — standard formats (CSV, Parquet, NetCDF, GeoJSON) score better than proprietary or binary formats

Column naming — are variable names human-readable and unambiguous?

Schema documentation — is there a data dictionary or README that defines every field?

File size and structure — is the data structured for efficient querying, or is it a single monolithic file that cannot be partially loaded?

Why Expert Review Matters on Top of Automated Scoring Automated QA scoring is fast and reproducible, but it has limits. A dataset can pass all automated checks and still be scientifically problematic, for example, if the stated methodology doesn't match the data structure, or if the geographic coordinates are implausible for the stated study region.

On Panthaion, datasets that pass automated scoring above a threshold proceed to expert review, a domain scientist evaluates the scientific integrity of the methodology, the plausibility of the data, and the adequacy of the documentation. This is the difference between a QA score and a quality-assured dataset.

For researchers building on Panthaion data: datasets with both a high automated score and a completed expert review are the gold standard. For researchers publishing to Panthaion: treat the expert review as an opportunity to strengthen your dataset, not just a gate to pass through.

A Pre-Upload Checklist for a High QA Score

Run a null value audit, fix what you can, document what you can't Standardise units across all columns Use a single, consistent null encoding (document which one) Check timestamps for duplicates and monotonicity Run basic outlier detection against physical plausibility bounds Write a complete description covering scope, methodology, and known limitations Include creator names and ORCIDs Assign a license (CC BY 4.0 for open access) Register a DOI before or during upload Provide a data dictionary for all field names

Explore verified climate datasets at panthaion.org.