Preprints.ai
← All evidence pages
AI reviewer ensemble

Calibration corpus & review-12b

9,279 reviews assembled from eLife, preprints.ai self-generated, and PREreview, used to train a calibration model that maps the 9-agent panel output to a final grade. The model is in training and not yet wired into the production pipeline.

9,279 reviews CC-BY 4.0 (PREreview slice) review-12b: in training
eLife reviews
6,411
peer-reviewed, published
preprints.ai reviews
1,570
self-generated, used for consistency only
PREreview
1,298
harvested from Zenodo, CC-BY 4.0

Pipeline today

The currently shipping production pipeline is fully described on the methodology page. In summary: 9 specialist agents read the paper, an Opus advisor arbitrates borderline cases, and a deterministic lookup converts the panel's evidence-strength and significance labels into the integrity letter and novelty number on the report.

The deterministic lookup was originally calibrated against 800 historical eLife peer reviews used as a gold standard.

What review-12b is

review-12b is an in-training calibration model whose only job will be to map the structured output of the 9-agent panel to the same grade scale, learned end-to-end against the 9,279-review corpus rather than via the hand-tuned lookup. It is a shape-matching layer, not a replacement reviewer.

Status

review-12b is not yet integrated into the live grade pipeline. Until it is, every grade you see on a report is produced by the deterministic lookup described on the methodology page. We will publish the diff metrics — agreement, calibration error, and grade-shift histogram — on this page when the model is wired in for real traffic.

Why we have not shipped it yet. A calibration model that disagrees with the deterministic lookup must justify itself before it ships. The disagreement set has not yet been audited end-to-end by a human reviewer. We would rather miss a launch date than ship a model whose disagreements we cannot explain.

Caveats — what this doesn't measure

Code & attribution

Pipeline orchestration: agents/agentic_review.py. PREreview reviews are reproduced under CC-BY 4.0; eLife public reviews are reproduced under their open licensing.