Preprints.ai
← All evidence pages
External grounding

Literature context

For every paper we score, we pull the most-related published work from Semantic Scholar and surface plausible missing citations. When the upstream API rate-limits us, we mark the gap explicitly rather than synthesise context from thin air.

~58% coverage source: Semantic Scholar API 429-rate gaps marked on report
Assessments with literature_context
~58%
rolling, last load
Total assessments
~2,510
production, current
Backfill cron
06:07 UTC
capped at 500 / run

Methodology

When a paper is queued for assessment, the worker calls Semantic Scholar's /paper/search and /paper/{id}/references endpoints to assemble three pieces of context that the reviewer agents then consume:

Example output line from a recent daily-fired assessment: 10 related works fetched, 2 potentially missing citations flagged, 17 references resolved against S2 corpus, 3 references unresolved. The DOI for that example: 10.1101/2024.06.14.598985.

Honest coverage

The Semantic Scholar API rate-limits unauthenticated callers and returns 429 Too Many Requests when our worker bursts. Two consequences follow:

What this number means. 58.2% is the share of current assessments that have any literature_context, regardless of how complete that context is. We do not yet break down coverage by completeness tier (full, partial, missing-citations-only).

Caveats — what this doesn't measure

Code & attribution

S2 client and worker integration: agents/. Semantic Scholar API and citation graph are properties of the Allen Institute for AI and used in accordance with their terms.