Field notes · 24 May 2026

Old code, new code, the numbers.

Show-your-working from today's factory run.

Per-routine discharge across factory output versions. v0.1 = April baseline. v0.2 = today's run. Phase 3 routines have only a v0.2 — they were not in the April set.
0% 50% 75% 100% Phase 1 — BLAS ddot 97.8 93.7 daxpy 96.1 94.9 dscal 97.4 88.4 dnrm2 98.1 92.1 dgemv 99.2 88.5 dger 99.0 88.8 dgemm 99.0 82.4 Phase 1 — LAPACK dpotrf 99.2 88.8 dgeqrf 99.2 87.5 dgetrf 99.3 93.6 Phase 2 — FFT cfftf 96.2 92.7 Phase 3 — inference kernels (no v0.1) int8_matmul 96.0 int8_conv2d 92.6 conv2d_fp64 89.2 max_pool 90.6 relu 94.3 layernorm 94.3 v0.1 (April baseline) v0.2 (today)
Read this as:

Four routines below. The Fortran code that went into the factory, the Ada code that came out, and the score the proof checker came back with. Numbers came down from the April figures because the test the proof checker is now asked to apply is stricter; the code cannot pass the test by accident.

Dot product

349 things to prove · 22 not proved · 93.7% proved automatically

In April this routine scored 97.8% under a less demanding test. The test moved.

Matrix-matrix multiplication

449 things to prove · 79 not proved · 82.4% proved automatically

The hardest test in the set; the score reflects that. Closing the remaining 79 is bounded engineering work, not anything wrong with what the code computes.

8-bit integer matrix multiplication (the kind that runs inside any modern AI chip)

559 things to prove · 24 not proved · 96.0% proved automatically

There is no published “reference int8 matmul in Fortran” anywhere — the operation comes from the AI era and Fortran's reference libraries pre-date the AI era by decades. The Fortran here was written from scratch as a textbook implementation of the standard 2018 quantisation paper. The factory was then pointed at that Fortran.

Layer normalisation (a standard step in every transformer model)

405 things to prove · 23 not proved · 94.3% proved automatically

Same situation as the int8 case — no published Fortran for this operation exists, so the reference was hand-written from the standard 2016 paper.

Full set of seventeen routines at /results/. The newer AI-era routines on the page (int8 quantised primitives, the floating-point inference layers) had no real published Fortran to start from; their reference implementations were written textbook-style and committed to the factory's input alongside the real Netlib code. Everything the proof checker is asked, and everything it gets right, is on the page.

Four routines below. The Fortran that went into the factory, the SPARK-Ada that came out, the discharge number the proof checker came back with. Numbers came down from the April figures because the question the prover is asked is stricter now; bodies cannot discharge by accident.

ddot — double-precision dot product

349 obligations · 22 unproved · 93.7% discharged

In April this same routine measured 97.8% under a different test. The test moved.

dgemm — general matrix-matrix product

449 obligations · 79 unproved · 82.4% discharged

The hardest test in the set; the score reflects that. The 79 unproved are bounded engineering work, not unsoundness.

int8_matmul — symmetric per-tensor int8 × int8 → int32 matrix product

559 obligations · 24 unproved · 96.0% discharged

There is no Netlib “reference int8 matmul in Fortran” — quantised inference primitives postdate the Netlib codebases by decades. The Fortran here was written as a textbook implementation of Jacob et al., Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, arXiv:1712.05877 (2018). The factory was then pointed at that Fortran.

layernorm — layer normalisation, single row

405 obligations · 23 unproved · 94.3% discharged

Same input pattern as int8 matmul — no published Fortran layernorm exists, so the reference was hand-written from Ba, Kiros, Hinton, Layer Normalization, arXiv:1607.06450 (2016).

Full set of seventeen routines at /results/. The Phase 3 inference kernels had no published Fortran to start from; their reference implementations were written textbook-style and committed to the factory's input alongside the real Netlib code. Everything the prover is asked, and everything it discharges, is on the page.