Preprints.ai
← All evidence pages
Layer 1 audit modules

The 18 modules that run before any LLM does

Layer 1 is the deterministic pre-flight check. It runs on every paper, takes a second or two, and surfaces concrete, verifiable signals — not opinions. The reviewer agents see the Layer 1 report alongside the manuscript text.

18 modules ~60% paper coverage ~2,510 assessments

Modules

ModuleWhat it checks
paper_mill_detectionTemplate patterns, fingerprint matches, and metadata signals associated with paper-mill output.
hidden_prompt_detectorRendering-level prompt-injection scan: white-on-white text, sub-pixel fonts, off-page coordinates. Details →
adversarial_sanitizerText-level injection-pattern detection, run after extraction. Sister module to hidden_prompt_detector.
ai_content_detectionHeuristic markers of LLM-authored prose (perplexity, burstiness, characteristic phrasings).
dataseer_unifiedDataSeer-style parse for data and code availability statements.
dataseer_integrationAPI integration with DataSeer for richer data-statement classification.
elis_unifiedELIS-style image duplicate-region detection where full text is available.
image_forensicsBand-shift, splice and duplication heuristics for figure panels.
fabrication_detectorPattern-based flags for implausible numerical sequences (rounded p-values, suspicious last digits).
language_detectorManuscript language detection. Used to gate later modules that assume English text.
open_data_detectionODDPub-inspired detection of open-data and open-code statements.
pdf_qualityStructural PDF checks (broken tables, OCR errors, missing fonts).
pdf_quality_v2Successor PDF parser with more robust column / figure handling.
reference_verificationResolves the reference list against Semantic Scholar and OpenCitations; flags broken DOIs.
reproducibility_checklistField-specific checklist coverage (ARRIVE, CONSORT, MIQE, etc.) extracted from manuscript text.
rigor_reportingReporting rigour: blinding, randomisation, sample-size justification, exclusion criteria.
sample_size_consistencyCross-section consistency of sample sizes between abstract, methods, and results.
software_citationDetection and resolution of software citations (versioned packages, DOIs for code).
statcheckRecomputation of reported test statistics and p-values; flags inconsistencies (statcheck-style).
statistical_consistencyGRIM and related sanity checks on reported means and proportions.
statistical_verificationCross-check of degrees of freedom, t/F values, and effect-size reporting.
trust_markersAggregator: writes a binary checklist of trust markers (ethics, COI, funding, pre-registration) into the report.

The pipeline ships with 18 enabled modules — additional files in the directory are auxiliary helpers, integration adapters, or experimental flags. The registry in checks/registry.py is the source of truth.

Coverage

Layer 1 ran on roughly 60% of papers in the production corpus at the time of writing. The 40% gap is dominated by papers ingested before the Layer 1 pipeline shipped (pre-Apr-13). A daily backfill cron at 06:07 UTC is processing up to 500 of those papers per night, in priority order by reader traffic.

What "coverage" means here. Coverage is the share of current assessments whose layer1_report field is populated. Assessments where Layer 1 fired but found no flags still count as covered.

Caveats — what this doesn't measure

Code

Module directory: checks/layer1/ · registry: checks/registry.py.