Skip to main content

Data Provenance

Every AtlasCore output is traceable to its government source. This is not optional — it is a core design principle driven by the regulatory requirements of AASB S2 reporting and ISSA 5000 assurance.

Why provenance matters

Sustainability consultants and auditors need to answer: Where did this number come from?

AtlasCore answers this at multiple levels:

  1. Per-value — every emission factor includes source table, sheet, row number, and data quality grade
  2. Per-response — every API response includes an evidence_hash for integrity verification
  3. Per-report — every disclosure bundle includes a full five-file evidence trail

Five-file evidence bundles

Every report generates five files regardless of the requested output format:

FilePurpose
report.jsonStructured disclosure data
report.mdHuman-readable Markdown report
provenance.jsonData lineage — which sources, snapshots, and versions were used
checksums.jsonSHA-256 hashes of all bundle files
report.pdf / report.xlsxFormatted output for distribution

This means even if you only requested a PDF, the full provenance chain is preserved alongside it.

Evidence hashing

Every API response that returns data includes:

  • evidence_hash in the response body — SHA-256 hash of the canonical inputs
  • X-AtlasCore-Evidence-Hash response header — same hash for programmatic access

If you call the same endpoint with the same inputs and the underlying data hasn't changed, you get the same evidence_hash. This provides deterministic verification: auditors can confirm that the data they reviewed matches the data in the disclosure.

Version stability

AtlasCore maintains version stability through:

  • Snapshot versioning — every data extraction creates a numbered snapshot. The latest policy always uses the most recent successful snapshot.
  • Factor set editions — NGA factors are versioned by edition (e.g. au_nga_2024 for the 2023-24 workbook). Superseded editions are preserved, not deleted.
  • Amendment tracking — when a factor set is re-ingested (correction or restatement), the amendment is recorded with type and diff counts.
  • Deterministic outputs — same persisted inputs always produce the same outputs. No live API calls at query time.

Provenance in API responses

Emission factor responses include structured provenance:

{
"source_document_title": "Australian National Greenhouse Accounts Factors",
"source_document_url": "https://www.dcceew.gov.au/...",
"source_table": "Table 1",
"source_sheet": "Table 1",
"source_row_number": 4,
"data_quality_grade": "A",
"evidence_hash": "sha256:..."
}

Company climate profiles include data_as_of timestamps and evidence hashes that trace back to the specific emission, grid intensity, and climate data snapshots used.