Multi-agent AI peer review

Quality signal
for preprints.

Nine specialist agents review every preprint independently, then deliberate to produce a consensus grade on evidence strength and significance.

papers assessed across indexed preprints from bioRxiv and medRxiv.

Machine-generated indicators. Assists but does not replace expert peer review.

Live pipeline

Auto-refreshes every 30 seconds

Live
Indexed
Assessed
In queue
Full text
Integrity
letter · A–E
A exceptional · B compelling · C solid · D incomplete · E inadequate
Novelty
number · 1–5
5 landmark · 4 fundamental · 3 important · 2 valuable · 1 useful
Recent assessments
View all →
Grade Paper Source Assessed
Loading…

How it works

Four layers. Nine agents. One consensus grade.

01
Layer 1

Deterministic checks

Paper-mill detection, statcheck p-value verification, GRIM tests, data availability, retraction cross-referencing — before any LLM runs.

02
Layer 2

Nine-agent review

Four integrity agents (methodologist, statistician, ethics, validity) plus five domain agents review independently.

03
Layer 3

Deterministic grading

Evidence_strength and significance labels from each agent map to an A–E grade via a rule-based lookup calibrated to eLife reviews.

04
Layer 4

Opus arbitration

Borderline or low-agreement cases are arbitrated by Claude Opus with the full agent panel as context.