Preprints.ai

Evidence wall

What we measure, and what we don't

Every credibility claim on preprints.ai gets its own page — with a denominator, a sample size, a source-code link, and a list of things it does not measure. No marketing claims, no cherry-picked screenshots.

When we don't yet have data, the page says so plainly. When an external benchmark is referenced, it is attributed to its source. This wall exists because trust in AI-assisted peer review must be earned, page by page.

01

Layer 1 audit modules

Layer 1 audit pipeline

18 deterministic modules run before any LLM sees the paper. Coverage rises with the daily backfill cron.

18 modules ~60% coverage →

Hidden-prompt detection

Rendering-level scan for white-on-white text, sub-pixel fonts, and off-page coordinates that hide instructions from humans.

scan: 16 ms cloud path only →

Paper-mill signals

Template, fingerprint, and metadata patterns associated with paper-mill output. Methodology and caveats; quantitative results pending labelled corpus.

data forthcoming →

Image forensics

Duplicate-region detection and band-shift heuristics inspired by ELIS. Runs only when full text is available.

benchmarking pending →

Citation verification

Cross-checks the reference list against Semantic Scholar and OpenCitations. Flags broken DOIs, retracted citations, and self-citation rates.

benchmarking pending →

02

AI reviewer ensemble

Calibration corpus & review-12b

9,279 training examples (eLife + preprints.ai + PREreview). Calibration model is in training; not yet wired into the production grade pipeline.

9,279 reviews model: in training →

Honest methodology limits

What the pipeline cannot detect: fabricated data, misread figures, and agreement that is consistency, not truth.

limits enumerated →

03

External grounding

Semantic Scholar literature context

Related-work and missing-citation suggestions sourced from Semantic Scholar. Coverage limited by S2 rate limits; gaps marked explicitly on the report.

~58% coverage →

04

Quality measurement

Per-comment thumbs feedback

Anonymous, browser-cookie-scoped useful / not-useful votes on every reviewer comment. We publish the live counts here.

live counter →

05

Case studies

Retracted papers we would have caught

Retrospective analysis of retracted preprints scored by the live pipeline. Page forthcoming once the labelled set is finalised.

data forthcoming →

Prompt-injection attacks caught in production

Real adversarial PDFs flagged by the hidden-prompt detector, with redactions. Page forthcoming once cloud-path coverage stabilises.

data forthcoming →