Preprints.ai
← All evidence pages
Layer 1 module

Paper-mill detection

Heuristic detection of template, fingerprint, and metadata patterns associated with paper-mill output. We describe how it works, what it cannot do, and where the gaps in our evaluation are.

methodology only labelled corpus forthcoming

Methodology

The module flags papers whose surface features cluster with patterns characteristic of paper-mill output. Signals include:

Results

We do not yet publish a precision or recall figure for this module. The reason is straightforward: there is no widely-agreed labelled public corpus of academic paper-mill preprints to evaluate against. Constructing one with sufficient coverage and an honest negative class is non-trivial work that we have not finished.

What we will do, when that work is finished, is publish:

Why no number yet. Publishing a precision figure off an unlabelled production stream would be self-graded homework. We would rather leave this page empty of metrics than seed it with a number that cannot be defended.

Caveats — what this doesn't measure

Code

Module: checks/layer1/paper_mill_detection.py · related fabrication heuristics: checks/layer1/fabrication_detector.py.