Layer 1 audit pipeline — Evidence

Modules

Module	What it checks
paper_mill_detection	Template patterns, fingerprint matches, and metadata signals associated with paper-mill output.
hidden_prompt_detector	Rendering-level prompt-injection scan: white-on-white text, sub-pixel fonts, off-page coordinates. Details →
adversarial_sanitizer	Text-level injection-pattern detection, run after extraction. Sister module to `hidden_prompt_detector`.
ai_content_detection	Heuristic markers of LLM-authored prose (perplexity, burstiness, characteristic phrasings).
dataseer_unified	DataSeer-style parse for data and code availability statements.
dataseer_integration	API integration with DataSeer for richer data-statement classification.
elis_unified	ELIS-style image duplicate-region detection where full text is available.
image_forensics	Band-shift, splice and duplication heuristics for figure panels.
fabrication_detector	Pattern-based flags for implausible numerical sequences (rounded p-values, suspicious last digits).
language_detector	Manuscript language detection. Used to gate later modules that assume English text.
open_data_detection	ODDPub-inspired detection of open-data and open-code statements.
pdf_quality	Structural PDF checks (broken tables, OCR errors, missing fonts).
pdf_quality_v2	Successor PDF parser with more robust column / figure handling.
reference_verification	Resolves the reference list against Semantic Scholar and OpenCitations; flags broken DOIs.
reproducibility_checklist	Field-specific checklist coverage (ARRIVE, CONSORT, MIQE, etc.) extracted from manuscript text.
rigor_reporting	Reporting rigour: blinding, randomisation, sample-size justification, exclusion criteria.
sample_size_consistency	Cross-section consistency of sample sizes between abstract, methods, and results.
software_citation	Detection and resolution of software citations (versioned packages, DOIs for code).
statcheck	Recomputation of reported test statistics and p-values; flags inconsistencies (statcheck-style).
statistical_consistency	GRIM and related sanity checks on reported means and proportions.
statistical_verification	Cross-check of degrees of freedom, t/F values, and effect-size reporting.
trust_markers	Aggregator: writes a binary checklist of trust markers (ethics, COI, funding, pre-registration) into the report.

The pipeline ships with 18 enabled modules — additional files in the directory are auxiliary helpers, integration adapters, or experimental flags. The registry in checks/registry.py is the source of truth.

Coverage

Layer 1 ran on roughly 60% of papers in the production corpus at the time of writing. The 40% gap is dominated by papers ingested before the Layer 1 pipeline shipped (pre-Apr-13). A daily backfill cron at 06:07 UTC is processing up to 500 of those papers per night, in priority order by reader traffic.

What "coverage" means here. Coverage is the share of current assessments whose layer1_report field is populated. Assessments where Layer 1 fired but found no flags still count as covered.

Caveats — what this doesn't measure

Most modules are heuristic and can produce false positives. Layer 1 findings are surfaced as flags, not blockers — the reviewer agents and human readers weigh them.
Image forensics requires full PDF figures. For papers ingested via abstract-only metadata, several modules either skip or run in a degraded mode.
The statcheck-style modules can only audit reported statistics in a recoverable format. Tables embedded as images or stats described only in prose are out of reach.
Coverage of non-English manuscripts is lower; language_detector gates several downstream modules to English text only.

Code

Module directory: checks/layer1/ · registry: checks/registry.py.

The 18 modules that run before any LLM does

Modules

Coverage

Caveats — what this doesn't measure

Code