Reproducibility · Results · Verify

Verify our discharge numbers yourself.

Every routine on /results/ has a discharge percentage next to it. This kit lets you re-run the proof yourself and confirm the numbers. ~5 minutes per routine on a modern laptop with gnatprove installed.

What's in the kit

The two-command verification

tar xzf dark-factory-verifier.tar.gz
cd dark-factory-verifier && make all

The make all target runs gnatprove --level=1 on every routine and writes a summary to actual-results.txt. Compare against expected-results.txt with diff. Modulo prover wall-time, the two files should be identical.

What this kit doesn't include

The factory itself — the translator, the spec expander, the body-fill stage, the algorithms catalogue, the per-stage prompts, the prover-feedback loops. These are how the output gets produced. The factory is the moat; the output is what you can verify.

Said plainly: anyone with gnatprove can confirm what our tool produced. The question of whether we have a tool that produced it is a separate one, answered by the cumulative weight of everything on /results/ — fifteen-plus routines, gnatmake-clean SPARK, discharge numbers anyone can re-derive, and the erratum we published when our own audit caught a stub-shaped body in one of them.

What the unproved residue means

Across the kit, the unproved obligations fall into three classes: floating-point overflow checks at theoretical extremes (closed by tighter Pre conditions); proof gaps from unaxiomatised library functions like Exp (closed by pragma Assume); and inductive ghost arguments at --level=1 (closed by re-running at --level=2). None are body-correctness failures. See the README for the per-class breakdown.

Found a discrepancy?

If your actual-results.txt doesn't match expected-results.txt we want to know. Email tony.gair@thedarkfactory.co.uk with your output. The publication contract works only if the re-runs match.