Layer 1 module
Citation verification
For each cited reference, we resolve the DOI, fetch the citing paper's metadata from Semantic Scholar and OpenCitations, and surface broken DOIs, retracted citations, and self-citation rates on the report.
methodology only
aggregate stats forthcoming
sources: S2 + OpenCitations
Methodology
The module parses the reference list out of the manuscript, attempts to resolve each entry to a DOI, and then queries two complementary sources:
- Semantic Scholar — paper-level metadata, citation count, fields of study, and forward / backward citation graph.
- OpenCitations — independent open citation index used to cross-check the S2 view and to fill gaps where S2 has no record.
From the resolved view, the module produces three signals on the report:
- Broken DOIs — references whose DOI could not be resolved at either source. Could be a typo, a withdrawn paper, or a mis-extracted reference.
- Retracted citations — references that resolve to papers flagged as retracted (Retraction Watch / Crossref retraction metadata). Citing a retracted paper is not automatically wrong, but it deserves a flag.
- Self-citation rate — share of references that share at least one author with the manuscript's author list. Surfaced as a figure on the report; not used as a grade input.
Results
We do not yet publish aggregate statistics here. Per-paper citation-verification output is visible on every assessment that ran the module. Aggregate publishable numbers we owe this page:
- Median resolution rate for the reference list (share of references resolvable to a DOI).
- Distribution of self-citation rates across the production corpus, by field.
- Total retracted citations flagged; share of papers with at least one such flag.
Why we have not published numbers yet. Aggregating these honestly requires deduplicating against the daily backfill cron (which is still re-running citation verification for older assessments) and stratifying by field. We will publish once the backfill stabilises.
Caveats — what this doesn't measure
- The module flags that a citation is to a retracted paper. It does not judge whether the citation is appropriate (e.g. citing a retracted paper to discuss its retraction is fine).
- Reference parsing is imperfect. Mangled references that fail to extract a DOI become "unresolved", not "broken" — the module distinguishes between the two on the report but it remains a noisy signal.
- Self-citation rate is a descriptor, not a grade input. We report it because it is useful context, not because high values are automatically a problem.
- OpenCitations and Semantic Scholar disagree on roughly 1–3% of citations. We mark such cases as "single-source" rather than fabricate a tiebreak.
- Citations to non-DOI sources (preprint URLs, books, datasets, software releases) are out of scope for retraction checking.