Phase 5: SpectralQuant compression — SpectralQuantCodec, topology-coherent quantisation, and SpectralQuantProjector

**Parent macro-issue:** #12
**Depends on:** #17 (Phase 4)
**Overlaps:** Phase 4 (Weeks 7–10)

## Goal

Leverage the quadrance-exact, rationally-wired Laplacian eigenbasis as a **transform coding frame** for corpus-topology-coherent embedding compression. The eigenmodes are now reproducible across runs and platforms (Phase 4), which makes the spectral basis a stable codec dictionary.

---

## Design: SpectralQuant as transform coding

SpectralQuant treats the bottom-`d` eigenvectors of `L_F` as an **orthonormal transform** analogous to a DCT basis, but adapted to the corpus manifold topology. Items that are spectrally smooth (low λ / low Rayleigh quotient) have most of their energy in the low-frequency modes and compress aggressively. Items that are rough (high λ) retain energy across many modes and require more bits.

This is **automatically aligned with the epiplexity decomposition**: structural information (low-frequency modes, learnable) compresses; random information (high-frequency modes, irreducible) does not. The lossless limit at `b=32`, `d=F` reproduces the epiplexity compression ratio observed on CVE (38.4×).

---

## New crate: `surfface-codec`

Create `crates/surfface-codec/` with:

```
surfface-codec/
  src/
    lib.rs
    codec.rs          # SpectralQuantCodec
    quantise.rs       # Lloyd-Max scalar quantisation
    entropy.rs        # Arithmetic coder using H_T budget
    projector.rs      # SpectralQuantProjector (IndexScorePhase)
  benches/
    codec_bench.rs
  tests/
    round_trip.rs
    biochain.rs
```

---

## `SpectralQuantCodec`

```rust
pub struct SpectralQuantCodec {
    /// Bottom-d eigenvectors of L_F: shape [F x d]
    pub basis: Array2<f32>,
    /// Per-mode variance sigma_i^2(Phi_i): shape [d]
    pub mode_variance: Array1<f32>,
    /// Per-mode quantisation step sizes (Lloyd-Max optimal): shape [d]
    pub step_sizes: Array1<f32>,
    /// Entropy model: H_T estimate per mode
    pub entropy_model: EntropyModel,
    /// WiringMetric used to build this codec (must be Quadrance*)
    pub wiring_metric: WiringMetric,
    pub d: usize,    // number of spectral modes retained
    pub version: u32,
}

impl SpectralQuantCodec {
    /// Build codec from an existing ArrowSpace index.
    /// Uses the already-computed L_F eigenvectors — no extra eigendecomposition.
    pub fn from_index(index: &ArrowSpaceIndex, d: usize) -> Result<Self>;

    /// Compress item x to b bits per spectral coefficient.
    /// Returns arithmetic-coded byte vector.
    pub fn encode(&self, x: &[f32], b: usize) -> Vec<u8>;

    /// Reconstruct from compressed bytes.
    pub fn decode(&self, bytes: &[u8]) -> Vec<f32>;

    /// Theoretical compression ratio at b bits/coefficient.
    /// ratio = (F * 32) / (d * b + entropy_overhead)
    pub fn compression_ratio(&self, b: usize) -> f64;

    /// Reconstruction RMSE on a held-out sample.
    pub fn eval_rmse(&self, sample: &[Vec<f32>], b: usize) -> f32;
}
```

---

## Forward and inverse transform

**Forward (encode):**

```
x_tilde = Phi^T x          // [d] spectral coefficients
x_tilde_q = quantise(x_tilde, step_sizes, b)   // Lloyd-Max
bytes = arithmetic_encode(x_tilde_q, entropy_model)
```

**Inverse (decode):**

```
x_tilde_q = arithmetic_decode(bytes, entropy_model)
x_hat = Phi * x_tilde_q    // [F] reconstructed embedding
```

Reconstruction error: `||x - x_hat||^2 = Q(x, x_hat)` — expressible as a quadrance, and available as a score.

---

## Lloyd–Max quantisation per mode

Each spectral coefficient `x_tilde_i` is modelled as Gaussian with variance `sigma_i^2 = mode_variance[i]`. The Lloyd–Max optimal step size for `b` bits:

```
delta_i = 2 * sigma_i * sqrt(3) / (2^b - 1)   // uniform approximation
```

For non-Gaussian modes, run 5 iterations of the Lloyd–Max algorithm on a calibration sample (1 % of corpus).

---

## Entropy coding

Use the epiplexity `H_T` estimate as the per-item entropy budget:

- Items with low `H_T` (structurally regular, high epiplexity compression) get a tighter arithmetic code.
- Items with high `H_T` (high randomness) get more bits allocated.

This is the direct computational realisation of the epiplexity MDL criterion applied to compression.

---

## `SpectralQuantProjector`

Slots into `IndexScorePhase` alongside `TauModeScore`:

```rust
pub struct SpectralQuantProjector {
    pub codec: Arc<SpectralQuantCodec>,
    pub b: usize,  // bits per coefficient for reconstruction error scoring
}

impl IndexScoreProjector for SpectralQuantProjector {
    /// Returns Q(x, x_hat) = ||x - decode(encode(x, b))||^2
    /// High score = item is structurally anomalous (hard to compress spectrally).
    fn project(&self, item: &EmbeddingVector, ctx: &IndexContext) -> f32;
    fn name(&self) -> &'static str { "spectral_quant_error" }
}
```

---

## Target benchmarks (CVE corpus, N = 313 841, F = 384)

| Config | Compression ratio | Reconstruction RMSE | Notes |
|---|---|---|---|
| f32 baseline (floating-point wiring) | < 6× | — | existing |
| `QuadranceGaussian`, d=64, b=8 | ≥ 8× | < 0.01 | target |
| `QuadranceGaussian`, d=128, b=4 | ≥ 12× | < 0.05 | target |
| `QuadranceRational`, d=64, b=8 | ≥ 8× | < 0.01, bit-exact | BioChain target |

---

## Tests

- [ ] **Round-trip:** `decode(encode(x, b))` has RMSE < `eval_rmse` threshold for all `b ∈ {4, 8, 16}`.
- [ ] **Monotonicity:** RMSE strictly decreases as `b` increases.
- [ ] **BioChain bit-exact:** `QuadranceRational` encode → serialise → deserialise → decode produces bit-identical output on x86-64 and Apple Silicon.
- [ ] **Score distribution:** `SpectralQuantProjector` scores on CVE are positively correlated with `TauModeScore` λ (Spearman ρ > 0.3).
- [ ] **`compression_ratio` consistency:** `compression_ratio(32)` ≈ 38.4× (matching observed epiplexity ratio on CVE).
- [ ] **`from_index` no-op:** Building `SpectralQuantCodec` from an existing index adds < 2 s on CVE (reuses already-computed eigenvectors).

---

## Acceptance criteria

- [ ] `surfface-codec` crate compiles and passes all tests.
- [ ] `SpectralQuantCodec::from_index()` operational on CVE with `d=64`.
- [ ] Benchmark targets met (see table above).
- [ ] `SpectralQuantProjector` registered in `IndexScorePhase`.
- [ ] BioChain bit-exact CI job passing (x86-64 + Apple Silicon).
- [ ] Codec serialises to `<index_name>.sqcodec.bin` alongside the main index.
- [ ] All previous phase regression tests still pass.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Phase 5: SpectralQuant compression — SpectralQuantCodec, topology-coherent quantisation, and SpectralQuantProjector #18

Goal

Design: SpectralQuant as transform coding

New crate: `surfface-codec`

`SpectralQuantCodec`

Forward and inverse transform

Lloyd–Max quantisation per mode

Entropy coding

`SpectralQuantProjector`

Target benchmarks (CVE corpus, N = 313 841, F = 384)

Tests

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Config	Compression ratio	Reconstruction RMSE	Notes
f32 baseline (floating-point wiring)	< 6×	—	existing
`QuadranceGaussian`, d=64, b=8	≥ 8×	< 0.01	target
`QuadranceGaussian`, d=128, b=4	≥ 12×	< 0.05	target
`QuadranceRational`, d=64, b=8	≥ 8×	< 0.01, bit-exact	BioChain target

Uh oh!

Uh oh!

Phase 5: SpectralQuant compression — SpectralQuantCodec, topology-coherent quantisation, and SpectralQuantProjector #18

Description

Goal

Design: SpectralQuant as transform coding

New crate: surfface-codec

SpectralQuantCodec

Forward and inverse transform

Lloyd–Max quantisation per mode

Entropy coding

SpectralQuantProjector

Target benchmarks (CVE corpus, N = 313 841, F = 384)

Tests

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

New crate: `surfface-codec`

`SpectralQuantCodec`

`SpectralQuantProjector`