The 18 modules that run before any LLM does
Layer 1 is the deterministic pre-flight check. It runs on every paper, takes a second or two, and surfaces concrete, verifiable signals — not opinions. The reviewer agents see the Layer 1 report alongside the manuscript text.
Modules
| Module | What it checks |
|---|---|
| paper_mill_detection | Template patterns, fingerprint matches, and metadata signals associated with paper-mill output. |
| hidden_prompt_detector | Rendering-level prompt-injection scan: white-on-white text, sub-pixel fonts, off-page coordinates. Details → |
| adversarial_sanitizer | Text-level injection-pattern detection, run after extraction. Sister module to hidden_prompt_detector. |
| ai_content_detection | Heuristic markers of LLM-authored prose (perplexity, burstiness, characteristic phrasings). |
| dataseer_unified | DataSeer-style parse for data and code availability statements. |
| dataseer_integration | API integration with DataSeer for richer data-statement classification. |
| elis_unified | ELIS-style image duplicate-region detection where full text is available. |
| image_forensics | Band-shift, splice and duplication heuristics for figure panels. |
| fabrication_detector | Pattern-based flags for implausible numerical sequences (rounded p-values, suspicious last digits). |
| language_detector | Manuscript language detection. Used to gate later modules that assume English text. |
| open_data_detection | ODDPub-inspired detection of open-data and open-code statements. |
| pdf_quality | Structural PDF checks (broken tables, OCR errors, missing fonts). |
| pdf_quality_v2 | Successor PDF parser with more robust column / figure handling. |
| reference_verification | Resolves the reference list against Semantic Scholar and OpenCitations; flags broken DOIs. |
| reproducibility_checklist | Field-specific checklist coverage (ARRIVE, CONSORT, MIQE, etc.) extracted from manuscript text. |
| rigor_reporting | Reporting rigour: blinding, randomisation, sample-size justification, exclusion criteria. |
| sample_size_consistency | Cross-section consistency of sample sizes between abstract, methods, and results. |
| software_citation | Detection and resolution of software citations (versioned packages, DOIs for code). |
| statcheck | Recomputation of reported test statistics and p-values; flags inconsistencies (statcheck-style). |
| statistical_consistency | GRIM and related sanity checks on reported means and proportions. |
| statistical_verification | Cross-check of degrees of freedom, t/F values, and effect-size reporting. |
| trust_markers | Aggregator: writes a binary checklist of trust markers (ethics, COI, funding, pre-registration) into the report. |
The pipeline ships with 18 enabled modules — additional files in the directory are auxiliary helpers, integration adapters, or experimental flags. The registry in checks/registry.py is the source of truth.
Coverage
Layer 1 ran on roughly 60% of papers in the production corpus at the time of writing. The 40% gap is dominated by papers ingested before the Layer 1 pipeline shipped (pre-Apr-13). A daily backfill cron at 06:07 UTC is processing up to 500 of those papers per night, in priority order by reader traffic.
layer1_report field is populated. Assessments where Layer 1 fired but found no flags still count as covered.
Caveats — what this doesn't measure
- Most modules are heuristic and can produce false positives. Layer 1 findings are surfaced as flags, not blockers — the reviewer agents and human readers weigh them.
- Image forensics requires full PDF figures. For papers ingested via abstract-only metadata, several modules either skip or run in a degraded mode.
- The
statcheck-style modules can only audit reported statistics in a recoverable format. Tables embedded as images or stats described only in prose are out of reach. - Coverage of non-English manuscripts is lower;
language_detectorgates several downstream modules to English text only.
Code
Module directory: checks/layer1/ · registry: checks/registry.py.