Documentation

How Preprints.ai assesses research quality using a two-layer AI system

Introduction

Preprints.ai provides automated quality signals for academic preprints. With over 10,000 preprints posted weekly across bioRxiv, medRxiv, and arXiv, researchers need help identifying which papers deserve their attention.

Our system combines fast automated checks (detecting paper mills, statistical errors, missing data) with deep multi-agent peer review (5+ specialized AI reviewers analyzing methodology, statistics, reproducibility, and domain-specific standards).

What We Assess

We evaluate methodological integrity and novelty—not whether findings are "true". A high grade means good scientific practices were followed. A low grade means methodological concerns warrant caution.

The A5–E1 Grade System

Every paper receives a two-part grade: Integrity letter (A–E) + Novelty number (1–5)

5
4
3
2
1
A
A5
A4
A3
A2
A1
B
B5
B4
B3
B2
B1
C
C5
C4
C3
C2
C1
D
D5
D4
D3
D2
D1
E
E5
E4
E3
E2
E1

Integrity Grades (A–E)

GradeScoreMeaning
A≥0.85Exemplary methodology
B0.70–0.84Solid with minor concerns
C0.55–0.69Adequate but notable gaps
D0.40–0.54Significant concerns
E<0.40Critical issues

Novelty Grades (1–5)

GradeScoreMeaning
5≥0.85Highly novel, potentially field-changing
40.70–0.84Novel contribution
30.55–0.69Incremental advance
20.40–0.54Confirmatory
1<0.40Limited novelty

Assessment Pipeline

Papers flow through a two-layer system combining fast automated checks with deep AI review.

1
Paper Ingestion
Extract metadata and full text from bioRxiv/medRxiv
2
Layer 1: Automated Checks
Paper mill detection, statistical verification, trust markers
~2 seconds
3
Layer 2: Agentic Peer Review
5 specialized agents + domain expert review in parallel
~12 seconds
4
Consensus Synthesis
Weighted aggregation, agreement analysis, final grade

Layer 1: Automated Checks

11 automated checks run in parallel before AI review (~2 seconds total).

CheckDetectsImpact
Paper Mill DetectionTortured phrases, SCIgen/Mathgen signatures, LLM artifacts→ E grade cap
Statistical VerificationP-value recalculation errors (statcheck)−0.15 penalty
Fabrication DetectionGRIM test, Benford's law, terminal digit analysis→ E grade cap
Trust MarkersORCID, ethics statement, COI, funding±0.05
Open Data (ODDPub)Data/code availability, accession numbers+0.02 bonus
Sample Size ConsistencyN values match Methods vs Results−0.05 warning
Reference VerificationRetracted papers, citejacked journals−0.03 to −0.10
Reproducibility ChecklistCONSORT/ARRIVE/MIQE items presentInforms agents
Image Forensics (ELIS)Duplicate images, manipulation signs−0.15 to −0.20
Language DetectionMachine translation artifactsWarning flag
Adversarial SanitizerPrompt injection attemptsSecurity

Layer 2: Agentic Peer Review

Six specialized AI agents review each paper in parallel.

The 5 Core Agents

Methodologist

Experimental design, controls, sample sizes

Statistician

Statistical validity, effect sizes, corrections

Domain Expert

Field-specific standards (CONSORT, ARRIVE, etc.)

Reproducibility

Protocol detail, data/code availability

Ethics

Ethics approval, COI, transparency

Integrity Score Calculation

Weighted consensus of agent assessments plus Layer 1 adjustments:

ComponentWeight
Methodologist25%
Statistician25%
Reproducibility25%
Ethics & Transparency15%
Domain Expert10%

Layer 1 Adjustments

Novelty Score Calculation

Weighted more heavily toward domain expertise:

ComponentWeight
Domain Expert40%
Methodologist20%
Statistician15%
Ethics15%
Reproducibility10%

Consensus & Agreement

AgreementInterpretation
≥85%High confidence
70–84%Good agreement
60–69%Moderate disagreement
<60%Significant disagreement—flagged

27 Domain Expert Configurations

Each bioRxiv category has specialized expertise:

CategoryKey Standards
Clinical TrialsCONSORT, pre-registration, ITT
NeuroscienceARRIVE, optogenetic controls
GenomicsMINSEQE, GEO/SRA deposition
Cancer BiologySTR authentication, PDX models
BioinformaticsBenchmarking, code availability
EpidemiologySTROBE, DAGs, E-values

+ 21 more categories

Integrated Tools

Paper Mill Detection (PPS)

Statistical Verification

Reported: t(24) = 2.50, p = 0.02
Recalculated: p = 0.0196
Status: ✓ Consistent

Image Forensics (ELIS)

Named after Elisabeth Bik. Detects:

Domain Context (OpenAlex)

We enrich assessments with real literature context:

Validation & Ground Truth

We track our predictions against outcomes to ensure accuracy:

Retraction Monitoring

Papers we grade are monitored via CrossRef and Retraction Watch. We track:

Publication Outcomes

We track where preprints end up:

Calibration Dataset

We maintain a set of papers with known ground truth:

Critical Failures

Automatic E Grade
  • Paper mill content detected
  • Image manipulation
  • GRIM/SPRITE violations
  • Tautological claims

API Reference

Base URL: https://api.preprints.ai/v1

GET /grade/{doi}

GET /v1/grade/10.1101/2024.01.15.123456

{
  "grade": "B3",
  "integrity": { "score": 0.78, "letter": "B" },
  "novelty": { "score": 0.62, "number": 3 },
  "confidence": 0.85,
  "agreement_score": 0.78
}

POST /assess

POST /v1/assess
{ "doi": "10.1101/2024.01.15.123456" }

Response: { "status": "queued" }

Rate Limits

EndpointLimit
GET /grade/*100/minute
POST /assess10/minute
POST /v1/assess (partner)60/hour

Partner API

The Partner API enables external platforms like OpenAccess.ai to submit manuscripts for automated peer review. Partner reviews are stored separately from bioRxiv assessments and include additional provenance auditing for AI-generated research.

Authentication

All partner endpoints require an X-API-Key header with a valid partner key.

Endpoint 1: Submit for Assessment

POST /v1/assess
Content-Type: application/json
X-API-Key: {partner_key}

{
  "manuscript_content": "Full text (markdown/plain/JATS)",
  "metadata": {
    "title": "Paper title",
    "abstract": "Abstract text",
    "authors": [
      {"name": "Human Author", "orcid": "0000-...", "is_ai_system": false},
      {"name": "Claude (Anthropic)", "is_ai_system": true}
    ],
    "subject_area": "Biology",
    "ai_system": "Claude (Anthropic)"
  },
  "provenance": {
    "model_id": "claude-sonnet-4-5-20250929",
    "databases_queried": ["PubMed", "Semantic Scholar"],
    "generation_date": "2026-02-16",
    "total_compute_hours": 0.5
  },
  "callback_url": "https://yourapp.com/webhook",
  "callback_secret": "your_hmac_secret",
  "submission_ref": "your-internal-id",
  "assessment_config": {
    "include_provenance_audit": true,
    "include_reproducibility": true,
    "reviewer_count": 8
  }
}

→ 202 Accepted
{
  "assessment_id": "ps_abc123",
  "status": "pending",
  "estimated_completion_seconds": 300
}

Endpoint 2: Webhook Callback

When complete, we POST to your callback_url with:

Endpoint 3: Poll Status

GET /v1/assess/{assessment_id}
X-API-Key: {partner_key}

→ Returns full assessment when status = "completed"

Endpoint 4: Reassessment

POST /v1/assess/{previous_id}/reassess
X-API-Key: {partner_key}

{
  "manuscript_content": "Updated v2 text...",
  "version": 2,
  "version_note": "Addressed reviewer concerns"
}

Endpoint 5: Public Report

GET /assessment/{assessment_id}
→ Redirects to the interactive report page

Provenance Audit

When include_provenance_audit is true, the assessment includes: