Field notes · 24 May 2026

Old code, new code, the numbers.

Show-your-working from today's factory run.

Correction — 24 May 2026, later the same day

The chart below originally showed dgeqrf (LAPACK QR factorisation) at 98.7%. An audit after publication found that the code the factory had written for that routine satisfied its specification without doing the underlying QR work — the spec was passable without the substance. We caught it ourselves. The factory has since re-emitted a version that does the work, and that version scores 87.5% against a stricter form of the same specification. The bar for dgeqrf below now shows the honest figure; the routine page at /results/#dgeqrf carries the same correction.

While we were at it we put dpotrf (Cholesky) and dgetrf (LU) through the same audit treatment. Their previous numbers had not been independently checked the way dgeqrf was, and we wanted them on the same footing as the corrected QR figure. dpotrf moves 90.2% → 88.8%; dgetrf moves 93.3% → 93.6%. Both bodies now serve factorisations whose output is checked against a known reference at runtime. The aggregate moves from 91.2% to 90.1%.

We publish errors in the open for the same reason we publish the scores. A number on this site that we wouldn't stake our reputation on is a number we shouldn't have on this site.

Per-routine discharge across factory output versions. v0.1 = April baseline. v0.2 = today's run. Phase 3 routines have only a v0.2 — they were not in the April set.

Read this as: Plain English Formal-methods reader

Four routines below. The Fortran code that went into the factory, the Ada code that came out, and the score the proof checker came back with. Numbers came down from the April figures because the test the proof checker is now asked to apply is stricter; the code cannot pass the test by accident.

Dot product

Old code: ddot.f (real published code, Netlib reference library)
New code: ddot_pkg.ads (specification), ddot_pkg.adb (implementation)

349 things to prove · 22 not proved · 93.7% proved automatically

In April this routine scored 97.8% under a less demanding test. The test moved.

Matrix-matrix multiplication

Old code: dgemm.f (real published code, Netlib reference library)
New code: dgemm_pkg.ads, dgemm_pkg.adb

449 things to prove · 79 not proved · 82.4% proved automatically

The hardest test in the set; the score reflects that. Closing the remaining 79 is bounded engineering work, not anything wrong with what the code computes.

8-bit integer matrix multiplication (the kind that runs inside any modern AI chip)

Old code: int8_matmul.f (textbook reference, hand-written for this purpose, public domain)
New code: int8_matmul_pkg.ads, int8_matmul_pkg.adb

559 things to prove · 24 not proved · 96.0% proved automatically

There is no published “reference int8 matmul in Fortran” anywhere — the operation comes from the AI era and Fortran's reference libraries pre-date the AI era by decades. The Fortran here was written from scratch as a textbook implementation of the standard 2018 quantisation paper. The factory was then pointed at that Fortran.

Layer normalisation (a standard step in every transformer model)

Old code: layernorm.f (textbook reference, hand-written for this purpose, public domain)
New code: layernorm_pkg.ads, layernorm_pkg.adb

405 things to prove · 23 not proved · 94.3% proved automatically

Same situation as the int8 case — no published Fortran for this operation exists, so the reference was hand-written from the standard 2016 paper.

Full set of seventeen routines at /results/. The newer AI-era routines on the page (int8 quantised primitives, the floating-point inference layers) had no real published Fortran to start from; their reference implementations were written textbook-style and committed to the factory's input alongside the real Netlib code. Everything the proof checker is asked, and everything it gets right, is on the page.

Four routines below. The Fortran that went into the factory, the SPARK-Ada that came out, the discharge number the proof checker came back with. Numbers came down from the April figures because the question the prover is asked is stricter now; bodies cannot discharge by accident.

`ddot` — double-precision dot product

Fortran: ddot.f (Netlib BLAS, Modified BSD)
Ada spec: ddot_pkg.ads
Ada body: ddot_pkg.adb

349 obligations · 22 unproved · 93.7% discharged

In April this same routine measured 97.8% under a different test. The test moved.

`dgemm` — general matrix-matrix product

Fortran: dgemm.f (Netlib BLAS, Modified BSD)
Ada spec: dgemm_pkg.ads
Ada body: dgemm_pkg.adb

449 obligations · 79 unproved · 82.4% discharged

The hardest test in the set; the score reflects that. The 79 unproved are bounded engineering work, not unsoundness.

`int8_matmul` — symmetric per-tensor int8 × int8 → int32 matrix product

Fortran: int8_matmul.f (textbook reference, hand-written for the purpose, public domain)
Ada spec: int8_matmul_pkg.ads
Ada body: int8_matmul_pkg.adb

559 obligations · 24 unproved · 96.0% discharged

There is no Netlib “reference int8 matmul in Fortran” — quantised inference primitives postdate the Netlib codebases by decades. The Fortran here was written as a textbook implementation of Jacob et al., Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, arXiv:1712.05877 (2018). The factory was then pointed at that Fortran.

`layernorm` — layer normalisation, single row

Fortran: layernorm.f (textbook reference, hand-written for the purpose, public domain)
Ada spec: layernorm_pkg.ads
Ada body: layernorm_pkg.adb

405 obligations · 23 unproved · 94.3% discharged

Same input pattern as int8 matmul — no published Fortran layernorm exists, so the reference was hand-written from Ba, Kiros, Hinton, Layer Normalization, arXiv:1607.06450 (2016).

Full set of seventeen routines at /results/. The Phase 3 inference kernels had no published Fortran to start from; their reference implementations were written textbook-style and committed to the factory's input alongside the real Netlib code. Everything the prover is asked, and everything it discharges, is on the page.

Old code, new code, the numbers.

Dot product

Matrix-matrix multiplication

8-bit integer matrix multiplication (the kind that runs inside any modern AI chip)

Layer normalisation (a standard step in every transformer model)

ddot — double-precision dot product

dgemm — general matrix-matrix product

int8_matmul — symmetric per-tensor int8 × int8 → int32 matrix product

layernorm — layer normalisation, single row

`ddot` — double-precision dot product

`dgemm` — general matrix-matrix product

`int8_matmul` — symmetric per-tensor int8 × int8 → int32 matrix product

`layernorm` — layer normalisation, single row