u/Zestyclose_View_4605

If an agent generates a 200-citation literature review today, can it produce identical output a week from now? Most citation tools can't promise that, and getting mine to was more work than I expected.

Background: I built a citation MCP server — resolves DOI, PMID, PMCID, ISBN, arXiv, ADS, WHO IRIS; formats in Vancouver, AMA, APA, IEEE, CSE plus 10k CSL styles; exports BibTeX, RIS, CSV, EndNote XML. The tool-call shape was the easy part. Reproducibility is the part I keep coming back to.

The reason most generators can't do it: Crossref's metadata is mutable - titles get re-cased, author names corrected, abstracts back-filled, ORCID IDs added. The fallback chain when an identifier isn't in the primary source isn't documented anywhere I could find, so you don't actually know whether your citation came from Crossref, DataCite, or doi.org's content negotiation. CSL style files update without semver. "Best-effort" fields silently appear or disappear between runs. For a chat-style tool that's fine. For an agent producing a bibliography someone has to defend in peer review or a clinical audit, it's a correctness problem — a reviewer can't reproduce what got submitted.

What I ended up doing was emit an x-scholar-transform-version header on every response (currently 2026-05-04). I bump it whenever normalisation, formatter output, or the resolver chain change in a way that would alter byte-identical output for the same input. Agents that care about reproducibility pin against it.

The actual resolver chain is published at /.well-known/sources.json — primary plus fallback hosts per identifier type, mirrored against the live code. DOI is Crossref then DataCite then doi.org; ISBN is OpenLibrary then Google Books; PMID and PMCID are NCBI; arXiv, ADS, WHO IRIS are direct. The chain is fixed-order, first non-empty wins. No quality scoring, no "best of N." Quality scoring across sources is great for chat but a nightmare for reproducibility because the scoring inputs themselves drift.

There's also a /verification page with a copy-paste curl kit so anyone can spot-check the determinism claim and the provenance headers without taking my word for it. That one was a direct response to evaluator feedback that determinism claims aren't worth much unless they're independently verifiable.

Honest about what this doesn't fix. CSL styles can drift across engine versions; transform_version covers the engine, but only if you actually pin to it. It doesn't help if a retracted paper gets silently corrected upstream — but that's the point, a bibliography should reproduce what you submitted, not what's true today. Retraction status lives behind a separate endpoint with no determinism promise. The server itself is a thin MCP shim over a hosted REST API, so if you need fully local, this isn't it.

The package is scholar-sidekick-mcp on npm. Genuinely curious how other people are handling drift in MCP servers that wrap mutable upstream data — feels under-discussed for how much agent output downstream depends on it.

github.com/mlava/scholar-sidekick-mcp

Scholar Sidekick is a fast, deterministic citation resolver, formatter, and exporter built for researchers, clinicians, students, librarians, and AI agents who need reliable bibliographic infrastructure.

Paste any scholarly identifier — DOI, PubMed ID (PMID), PMCID, ISBN, ISSN, arXiv ID, ADS bibcode, or a WHO IRIS URL — and Scholar Sidekick fetches the bibliographic record from authoritative sources (Crossref, PubMed, arXiv, Open Library, ADS), normalizes the metadata, and renders a clean citation in the style you choose.

Five high-quality builtin styles ship out of the box — Vancouver, AMA, APA, IEEE, and CSE — tuned for accuracy and speed. Beyond those, the full Citation Style Language catalog of 10,000+ styles is available, including journal-specific dependent styles that automatically resolve to their parent.

Outputs travel anywhere your workflow needs: plain text, HTML, and Markdown for direct paste; RIS, BibTeX, CSL-JSON, and EndNote XML for reference managers (Zotero, Mendeley, EndNote, Papers, JabRef, Citavi); CSV for spreadsheets; and streaming NDJSON for batch pipelines.

Scholar Sidekick exposes a documented REST API with an OpenAPI spec, tiered rate limits, and deterministic response headers (request ID, cache state, style provenance, fallback warnings). It ships a Chrome extension for one-click citation capture from any page, and an open-source MCP server so AI assistants like Claude can resolve and format citations natively.

Built to be the boring, correct piece of citation infrastructure: allowlisted outbound fetches with timeouts and bounded retries, two-tier caching for sub-second repeat lookups, strict input validation and SSRF protection at every route boundary, structured JSON logs and Sentry observability, and 99% line coverage enforced in CI.

Use cases: drafting papers, building reading lists, batch-converting PMID lists for Zotero, or letting AI agents cite scholarly work natively. Free on the web; paid API tiers for higher volume.

https://scholar-sidekick.com/

How do you keep an MCP server's output reproducible when the upstream metadata is mutable?