Skip to content

Dynamical Models of AI Governability (blog post + basin explorer)#154

Draft
davidoj wants to merge 71 commits into
masterfrom
dynamical_models
Draft

Dynamical Models of AI Governability (blog post + basin explorer)#154
davidoj wants to merge 71 commits into
masterfrom
dynamical_models

Conversation

@davidoj

@davidoj davidoj commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What this is

A new blog post — Dynamical Models of AI Governability — plus an interactive companion app (the cooperativeness basin explorer, served at /basin-explorer/).

The post puts a small ODE model under the question of whether the AI workforce that builds future AI ends up mostly cooperative or mostly uncooperative: where the basin boundary between a managed-endemic outcome and takeover lies, what current misbehaviour/suppression evidence says about which side we're on (two named calibrations, Broad and Strict, which currently land on opposite sides), whether there's a fire alarm on the bad path, and what observables would tell us we're on the good one. The app exposes the model's full parameter space with presets matching the post's calibrations.

⚠️ Draft status — under review

I've used a lot of AI assistance in developing the model and writing the post, and I'm working through all the items I need to review.

Also included (affect the live site — could be cherry-picked separately)

  • 29ace12 fixes a theme bug (classList += clobbering .post-content styling on any mathjax post with top-level display math — paragraph spacing was collapsing to a wall of text).
  • f3c3337 removes the polyfill.io script from the mathjax partial (domain was compromised in a 2024 supply-chain attack; the ES6 polyfills are unnecessary).

🤖 Generated with Claude Code

davidoj and others added 30 commits June 10, 2026 11:52
Copied from claude-scratch/basin-explorer (canonical model implementation),
excluding node_modules and dist. Original left in place for David to retire.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Vite base /basin-explorer/, outDir static/basin-explorer; Hugo serves it
as-is with no new site dependencies. Built assets committed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Reads query string on load (clamped to slider ranges), writes back via
history.replaceState whenever state changes; only non-default values encoded.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
alpha = 1 + l - k_uu was labelled as the k_cu->0 reduction of Condition 1
but omitted the O*(0) factor on l. Badge now shows l*O*(0) vs k_uu-1,
matching the actual boundary-stability condition.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Zero level set of a swappable margin function (BASIN_BOUNDARY.margin):
delta=1 condition l >= k_cu(a+c) + b(1+c) + 2*sqrt(k_cu*b*(a+c)(1+c)) for
b = k_cu+k_uu-1 > 0 (reduces to l >= k_cu(sqrt(a+c)+sqrt(1+c))^2 at k_uu=1),
basin always exists for b <= 0. Verified against numeric root scan (~5e-4).
Swap margin() for the delta-general condition in Phase B.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
URL state, horizon slider, corrected basin badge, analytic boundary overlay.
Verified in headless Chromium against the standalone static build (15/15
checks; evidence in _scratch/review/app-test-evidence/).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… presets

- delta-general dynamics: F_c gains (1-delta)*l*O*Q; G = (1-Q)+eta+(k_uu-delta*l*O)Q
- fixed-point scan and outcome-map boundary use the delta-general quadratic
  (A_delta = b(a+c) - (1-delta)l) with closed-form threshold l* = min(l_A, max(P, l+))
- new delta slider (default 0.7, flagged as fresh filtering-fraction estimate)
- l slider updated to corrected calibration (default 0.4; Petri trend identifies l*O)
- Broad / Strict / AI-2027 preset buttons; k_hu step 0.005 so Strict is representable
- docs panel rewritten for delta (Condition 2 = A_delta > 0; C1 no longer necessary)
- regression: delta=1 reproduces pre-change app bit-for-bit (22/22 checks)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…endix

- F_c gains the redirected flow (1-delta)*l*O*q_u; F = q_c+q_h+(k_uu-delta*l*O)q_u
- destruction-vs-redirection prose and mechanism table (filtering/retraining/control)
- Condition 1 qualified (necessary only at delta=1); Condition 2 delta-general
  (k_uu+k_cu-1 > (1-delta)l/(a+c), exactly A_delta > 0)
- headline closed-form basin threshold: l*O* >= 4*delta*k_cu (delta>=1/2),
  k_cu/(1-delta) (delta<=1/2); four-to-one rule at delta=1
- long-run fixed-points appendix reworked to the delta-general quadratic
  (A_delta = b(a+c)-(1-delta)l); endpoint-stability intuition updated
- delta estimation passage (filtering-fraction heuristic, central 0.7, range 0.3-1.0)
  and summary-table row; AI 2027 1/16th passage restated delta-generally (2^(-4*delta^2))
- new idealization bullets (per-action vs pool suppression, dilution terms,
  constant-l stabilisation, pools as persistent behaviours, redirection at par)
- AI control limitation note with class-level-discovery signpost for O

Derivations verified in _scratch/review/scripts/verify-delta-fixed-points.js (12/12).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- conversion now reads l*O(0) ~ T_auto/T_half; central l = 0.4 (was 0.2),
  working range 0.2-1.0 (previous 0.1-0.5 scaled by 1/O(0) = 2)
- both identification caveats added: net-of-inflow biases down, falling-k_hu
  confound biases up; double-duty note on the observed-misbehaviour evidence
- summary-table row updated

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- up-front double-duty statement: observed-misbehaviour evidence sets q_u0 AND
  k_hu (they converge during the human era); only the k_hu reading does work
- Broad = behavioural rates at face value (k's = q_u0 = 0.05); Strict =
  reproduction-relevant (ii') rates ~10x lower (0.005); all structural
  parameters shared (l = 0.4, delta = 0.7, k_uu = 1, O* = O(0) = 0.5)
- parameter table and link to app presets

Verified in _scratch/review/scripts/calibration-verdicts.js (12/12 + 8/8).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…rdicts

- 'uncooperative dominant' single verdict (computed at the mis-identified
  l = 0.2, delta = 1) replaced by the verdict table over Broad/Strict x delta
- Broad: genuinely marginal (monostable q_u* ~ 0.32 at delta = 0.7; exact
  knife edge at delta = 1); Strict: cooperative dominant robustly (q_u* ~ 2.5%)
- headline: the decisive uncertainty is how much observed misbehaviour is
  reproduction-relevant, not any structural parameter
- David's authority caveats and gradual-takeover prose retained

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- lagged-but-unconfounded measurement: re-auditing archived systems with
  current tools reaches near-ceiling observability (0.86-0.97 at 2-3 sigma lag
  vs <= 0.67 contemporaneous); retain-development-logs recommendation
- the lagged audit directly adjudicates Broad vs Strict (~5% vs ~0.5%
  reproduction-relevant share of archived dev labour)
- thresholds: good outcomes are endemic management, not eradication; the model
  locates plateaus and margins but cannot set safe levels; decision sketch
- per-calibration falsifiable predictions from RK4 trajectories: observed
  misbehaviour rises on BOTH good paths (replaces the raw note's 'probably not
  self-stabilizing' conjecture, which the central calibrations contradict);
  shared near-term observability hump prediction
- confidently-good vs not-bad criteria; Strict passes, Broad does not
- cross-link to EleutherAI reward-hacking indicators post as a k_hu/l indicator

Trajectories: _scratch/review/scripts/good-path-trajectories.js.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- hook: live AI-builds-AI evidence (Codex builds Codex, Claude builds Claude,
  automated alignment researchers) moved from the estimation section to the intro
- model section: explanation in words first (q_u' and F_u remain inline),
  full equation block demoted to a compact 'Model reference' subsection
- 'Long-run model behaviour' retitled as the basin-picture section (already
  carries the headline inequality)
- 'Implications and illustrative scenarios' wrapper removed; 'The default path'
  promoted to a top-level section
- AI 2027 appendix promoted to a body section after the default path, with the
  'similar concerns mask different risk models' point promoted to its thesis
- calibration compressed: load-bearing judgements and named-calibrations table
  stay in body; source-by-source literature detail moved to a new
  'calibration evidence in detail' appendix
- terminology note (uncooperative vs misaligned) and prominent app links added

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- duplicated intro clause removed; citation placeholders replaced with real
  links (AI Futures model, Eth & Davidson software intelligence explosion,
  Davidson-Epoch interactive takeoff model)
- related-work paragraph (Davidson takeoff, Christiano 'What failure looks
  like' Pt 1 as the default path informally formalised, Hubinger deceptive
  alignment as the bistable regime)
- typos: possibilities, substantial, robust, elicitation, transition,
  aggressive, representative, technology, 'tendency to instil', c_M
- AI 2027 list renumbered (6 -> 8 skip removed); RepliBench paren closed and
  rephrased (negative results are data, not absence of data)
- '## Appendices' header added; all in-text anchors verified to resolve;
  explicit link to the fixed-points appendix from the basin section
- front-matter description filled (draft: true and date retained)
- AI usage note updated to reflect the delta/calibration/app workflow honestly

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- figures_ai2027.py: delta-general model core (F_c redirected flow, G, G0,
  delta-general r-quadratic classifier); defaults l = 0.4, delta = 0.7
- validation extended: delta=1/l=0.2 reproduces the pre-delta JS ground truth
  exactly, and delta-general verdicts match calibration-verdicts.js (Broad
  q_u* = 0.323, Strict 0.0255, AI 2027 escape, threshold l* = 0.28)
- fig 1 regenerated at l = 0.4, delta = 0.7 (k_cu = 0.9 trajectory)
- fig 2 boundary now k_cu = l*O*/(4*delta); AI 2027 annotation updated
  (needs l ~ 2.5, ~6x central, even at O* = 0.99)
- post captions updated to the delta-general boundary formula and parameters

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- enable goldmark passthrough for block \[ ... \] delimiters only (existing
  posts use $$ + escaped-underscore style and are untouched; full-site md5
  diff confirms only this post and reward-hacking-indicators change)
- reward-hacking-indicators: two escaped \[..\] bracket literals would now
  pass through to MathJax; switched to plain brackets — rendered page verified
  byte-identical to the pre-change build
- join bare '=' / '-' lines inside display blocks (goldmark setext headings
  were swallowing ~25 equations and polluting the ToC)
- inline math: ^* -> ^{\ast} (38 sites) so markdown emphasis cannot eat
  asterisks; set notation via \lbrace/\rbrace
- verified in headless Chromium: 591 MathJax containers, 0 typeset errors,
  0 console errors; figures load; ToC and anchors clean

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
All seven confirmed text discrepancies from the adversarial verification
report (_scratch/review/verification-report.md rows 1-7):

1. Strict good-path bullet: Q crosses 1% at sigma=3.94 and reaches ~1.4%
   by sigma=6; reworded "stays below about 1% (sigma 4-6)".
2. Retro-observability ranges scoped to sigma <~ 7; on Broad lag-2 drifts
   to ~0.83 by sigma=12 (asymptote ~0.79) - "near-ceiling across the whole
   series" qualified to Strict.
3. Verdict table Strict delta=0.7 attractor cell 0.026 -> 0.025.
4. "q_u* ~ 0.61 for delta < 0.5" attached to delta=0.3; span 0.53-0.88
   stated.
5. Summary table q_u0 central aligned to ~5% (was ~5-15%).
6. "Observability roughly flat after an early hump" corrected: settles
   ~0.49 on Strict, erodes 0.66 -> 0.34 on Broad; conclusion (observed
   rate climbs) retained.
7. 1/16th-progress and 2^(-4 delta^2) passages reframed explicitly as a
   per-doubling cost heuristic (the literal model exits its validity
   envelope, F < 0, in that regime).

Conditions C1/C2 text (report item 12) was verified already delta-general
in the current post; no change needed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Expands the bare per-calibration verdicts into a complete default-path
exploration, parallel in depth to the AI 2027 section:

- Act structure: human era (q_u pinned at the k_hu anchor; initial
  conditions forgotten, citing the q_u0 0.1%-99% sweep), handoff at
  sigma = ln(eta0) ~ 1.6 (parity is not an event in the model), and
  per-calibration post-handoff dynamics with sigma AND calendar
  landmarks (Broad: 10% @ sigma~4/3y, 20% @ ~10/7y, attractor 0.32 over
  decades with O eroding 0.66 -> 0.34; Strict: 1% @ 3.94/2.8y, settling
  at the endemic 2.5% with O ~ 0.49).
- Broad framed as genuinely marginal, never a clean verdict: exact
  delta=1 knife edge (l O* = 0.2 = 4 k_cu; l = 0.396 vs 0.404 flips the
  asymptotic outcome while observables differ by <= 0.15 pp for ~40 y),
  plus a newly flagged proximity: the takeover endpoint stabilises at
  exactly delta = 0.75, just above the central 0.7.
- Fire-alarm question answered from runs: on the bad path (Broad,
  l = 0.2, delta = 0.7) no observable is discontinuous - observed rate
  peaks at 12.7% near sigma~28 (~20 y) then declines, max movement
  ~2 pp/year, and is within 1.05-1.2x of the good path through sigma=8.
  Far-from-boundary regimes (AI 2027) do self-announce.
- Rescue analysis by attractor LOCATION (basin existence is not a
  good-outcome proxy at delta<1), provisional good threshold
  q_u* <= 0.1: k_cu <= 0.019 or l >= 1.08 (both 2.7x moves) rescue
  alone; O* = 1 tops out at q_u* ~ 0.14 and delta = 0 at 0.25 - mapped
  to alignment-training / control-correction / interpretability /
  mechanism-mix intervention classes.
- Observed-vs-true divergence noted as an emergent, unforced result
  (not the flagship), contrasted with AI 2027's discrete transition.

Figures generated by _scratch/review/scripts/default-path-figs.py,
which self-validates against the independent verification numbers
(11/11) before drawing. Existing verdict table/text absorbed, not
duplicated. Hugo build verified clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Delete model.mjs, analytic.mjs, scenarios.mjs, transients.mjs and
explore_ai2027.mjs: all five predate the destruction-fraction (delta)
extension (no delta, default l=0.2, old quadratic), and analytic.mjs
asserts a delta=1-only claim ("regime boundary independent of l") that
is false for delta<1. They were not part of the vite build but shipped
in the repo and contradicted the published model. Deleting rather than
updating: they were one-off exploration scripts fully superseded by the
in-app engine (src/BasinExplorer.jsx, canonical) and the self-validating
delta-general port in figures_ai2027.py.

Add a README noting the canonical math engine and the removal, and fix
figures_ai2027.py docstring references to the deleted files (its
validation values are pinned in-file; it still passes PORT VALID and
reproduces both AI 2027 figures unchanged).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Implement the David-approved production-gated model (derivation audit
Part A, G1-G12): observability-gated suppression lO now intercepts the
leakage inflow into F_u as well as removing from the q_u stock, with the
same delta split of the full intercepted flow lO*L.

- makeDeriv/computeG/calibratedRates: gated F_u, F_c, G (leakage is no
  longer a pure transfer; destroyed intercepted leakage subtracts from G,
  including in the calibration-time G0)
- findSteadyStatesR: gated fixed-point function
  f(r) = k_cu + b*r - lO*(r)(k_cu+r)(1+(1-delta)r)
- BASIN_BOUNDARY: gated lstar (u/V/N discriminant form, three-branch at
  k_uu=1, a=1); new ERADICATION_BOUNDARY overlay (lO*(0) = 1) on the
  outcome map; exact-zero crossings now traced
- classifyBasin/badge: new eradication regime (q_u -> 0 reachable when
  lO*(0) >= 1), incl. bistable eradication/takeover and the two-root
  eradication-basin + high-attractor case
- presets: identity-based leakage defaults (Broad trend-adjusted
  k = 0.0407, Strict steady-state k = 0.00542, AI 2027 k_hu follows
  Broad); k-slider steps 0.005 -> 0.0001; app defaults stay = Broad
- URL schema versioned (v=2); unversioned pre-gating links load defaults
- docs panel + hover cards: gated equations, workaround/pipeline
  rationale, gated C1, C2-unchanged note, gated quadratic and threshold,
  eradication bullet, q_h <-> eta notation mapping

Verified: _scratch/review/scripts/gating-app-regression.js (38/38; the
gate-zeroed source reproduces the previous model bit-for-bit, and presets
reproduce the derivation fixed points to 3 decimals) and
gating-browser-drive.mjs (27/27 in headless Chromium); report in
_scratch/review/app-test-report-gating.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
vite build of the gated BasinExplorer.jsx; bundle index-BTR035jY.js ->
index-DTN_euC9.js. Browser-verified against this exact build (27/27,
_scratch/review/app-test-report-gating.md).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pre-existing working-tree edits found at session start: drops the
l = 0.2 sensitivity paragraph from the default-path verdicts, softens
the retro-audit prose, and rewords the lagged-development-data
paragraph. Committed as-is before the production-gating edit queue.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Suppression now intercepts the leakage inflow as well as self-reproduction
(F_u = (1-lO)(k_cu q_c + k_hu q_h) + (k_uu - lO)q_u; F_c redirects (1-d)lO L;
F = q_c + q_h + k_uu q_u - d lO L). Changes, per derivation-audit Part A:

- dynamics-in-words and model reference: gated equations, L defined, beta=1
  prevention-efficacy extension noted, leakage no longer cancels in F
- idealizations: gating motivation cross-ref, stock-vs-flow asymmetry
- basin picture: gated Condition 1 (bracketed form, G8), C2 unchanged note,
  eradication regime bullet (G9), three-branch closed-form threshold (G6);
  delta=1 rule becomes 4 k_cu (1-k_cu)
- default path: identity-calibrated defaults (Broad k=0.0407 trend-adjusted,
  Strict k=0.00542 steady-state); Broad dips 5%->4.3% then drifts to a fifth
  (q*=0.205, margin 1.79x); Strict plateau 2.2%; knife-edge coincidence
  retired, boundary-adjacent framing rebuilt; fire alarm moved to the gated
  bad path (50% at sigma~53) and new knife pair 0.222/0.226; rescue levers
  restated - monitoring alone can now rescue (O*>=0.80, product 0.32)
- AI 2027: gated B=0-branch arithmetic (lO* ~ 1.4-1.8, half ungated),
  1/16th and 2^{-4 delta^2} heuristics replaced, eradication-regime caveat
- good-path: eradication scope condition on 'management, not eradication'
- appendix: gated g(r), g'(0), convexity condition, quadratic B/C, C<=0
  case, u/V/N threshold, endpoint and k_cu=0 gating-unchanged notes

All numbers from verify-gated-fixed-points.js (332 checks),
gated-calibrations.js and gated-landmarks.js; prose claims headers in
_scratch/review/drafts/gating-post-prose.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Estimating parameters: the level->leakage identity displayed and
  explained (F0 convention noted); three-value table (naive 0.050 /
  steady-state 0.055 / trend-adjusted 0.041 for Broad; Strict 0.0054);
  defaults justified (Broad trend-adjusted for coherence with its falling
  identifying series; Strict steady-state - the trend is not licensed for
  the (ii') pool); net-of-inflow caveat absorbed into the identity; the
  same series now flagged as doing triple duty (level, trend, rate);
  delta-independence of the ell estimate restated under gating
- Two-readings passage: the falling observed series is ambiguous between
  benign (q_u falling 0.22/sigma) and alarming (O falling 0.5->0.23 with
  flat truth, or 0.5->0.11 with rising truth); separation requires an
  independent O-trend instrument; tied to the good-path dichotomy
- Summary table: k_hu/k_cu rows carry the identity anchors; new trend row g
- Named calibrations: double duty -> triple duty; symmetric Broad/Strict
  framing (measured proxy vs explicit guesswork, no headline hierarchy);
  parameter table and new variant-verdict table (incl. delta=1 spread)
- Dual observability-calibration routes: when to use O*-prior vs
  T_M,1/2 / T_E,2 rate-judgement route (text + appendix intro)

Numbers from gated-calibrations.js; cf. drafts/calibrations-v2.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… scope

- Good path: the foregrounded near-term catch-rate prediction is replaced
  by the medium-term dichotomy (re-verified under gating across
  O* in [0.15, 0.5], landmarks-v2 s5): either observed misbehaviour rises
  by sigma ~ 8, or observability itself is collapsing (O 0.5 -> 0.13 in
  the one non-rising cell); near-term direction explicitly O-sensitive
- O-robustness tags appended to the good-path section: robust (dichotomy,
  no-fire-alarm, ell*O*(0) product structure incl. the gating-modified
  monitoring-rescue claim) vs O-sensitive (basin verdicts, near-term
  observed direction, crossing times)
- Capability-gating scope statement: single-pool limitation named in the
  idealizations (non-reproducing (i') misbehaviour has nowhere to live;
  today's rates may calibrate observability and (i')-leakage, Broad's
  defence is the proxy hypothesis); sigma-clock starts where (ii')
  becomes capability-possible; k_uu(sigma) ramp noted, not implemented;
  calibration scope note tied to why Strict declines the trend
- Two-pool extension promoted to lead future-work item in the
  neglect/subversion appendix (subsumes suppression-channel and control
  questions)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- O-transient demoted from shared near-term prediction to calibration
  diagnostic: hump size measures the human monitoring subsidy (q_h0,
  a_AI/H, both weakly anchored; peak 0.66 at a_AI/H=1 vs ~0.53 at 10,
  long-run O* untouched); stagnating catch rates degenerate, falling
  human-era catch rate is the alarm
- Path-prediction bullets re-fixed to gated defaults (Broad dip to 4.3%,
  back through 5% at ~2 y, 10% at ~5 y, endemic fifth, observed ~3.5% at
  sigma 4; Strict 1% at sigma 4.9, plateau 2.2%) and given an explicit
  series-binding rule: predictions bind (ii')-targeted instruments;
  Broad additionally binds the broad series via its proxy hypothesis,
  Strict predicts the broad series will NOT track q_u
- Confounding paragraph tied to the two-readings ambiguity arithmetic
- Retro-observability numbers updated (Broad lag-2 drift to ~0.85 by
  sigma 12, asymptote ~0.83)
- Confidently-good paragraph: 2:1 -> 4:1, 1.4x -> 1.8x, margin caveat
- Indicators: reward-hacking indicator labelled as binding the broad
  series; the two instrument gaps named ((ii')-targeted audits and an
  O-trend instrument: fixed-protocol monitorability testbeds, lagged
  dev-log audits)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- figures_ai2027.py ported to the gated engine (gated F_u/F_c/G, gated
  c_M calibration, gated quadratic classifier with eradication regime,
  three-branch lstar port); self-tests re-baselined by design against
  gated fixtures (Broad 0.2053 / Strict 0.0222 / l* 0.2241 / d=1 rule
  0.38 / B=0 branch 2.835 & 3.6 / eradication trajectory / AI 2027
  sigma@50% = 3.02, observed peak 23.6%) - all pass
- ai2027-observability-cannot-save.png: gated boundary inverted by
  bisection (panel a), basin boundary clipped at the new eradication
  line with AI 2027 marker moved to it (panel b); caption rewritten
  (old k_cu = lO*/(4 delta) formula retired)
- ai2027-high-leakage-run.png: gated trajectory (k_hu follows the Broad
  default 0.0407), observed peak ~24%
- default-path figures regenerated from the gated default-path-figs.py
  (validates 14/14 against gated-landmarks fixtures): dip annotation,
  attractor 0.21 / 0.022, rescue panels with reversed O* verdict and
  attractor-climb caveat; captions updated
- AI 2027 policy-contrast sentence: required product 5-9x central

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
davidoj and others added 2 commits June 11, 2026 15:22
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Strategy-stealing citation on the equal-reproduction assumption; E
redefined as a stock of failure surfaces with named channels; the
constant-l/l_k idealization recast as exchange-rate assumptions (the
coverage race is O's dynamic, not an l assumption); redirection quality
discounts and lags folded into delta_eff = 1 - v(1-delta) exactly;
draft-history self-references removed or recast as timeless statements
throughout (eradication retraction parentheticals, re-centring talk,
earlier-version comparisons); inline-math backslash escapes doubled so
Goldmark passes them to MathJax intact; hook paragraph trimmed and
instrumental-convergence wording tightened (David).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@netlify

netlify Bot commented Jun 11, 2026

Copy link
Copy Markdown

Deploy Preview for gilded-faun-da8b4f ready!

Name Link
🔨 Latest commit 1206403
🔍 Latest deploy log https://app.netlify.com/projects/gilded-faun-da8b4f/deploys/6a2c0fa0b76eef00082bacb2
😎 Deploy Preview https://deploy-preview-154--gilded-faun-da8b4f.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify

netlify Bot commented Jun 11, 2026

Copy link
Copy Markdown

Deploy Preview for pedantic-newton-e8d424 ready!

Name Link
🔨 Latest commit 1206403
🔍 Latest deploy log https://app.netlify.com/projects/pedantic-newton-e8d424/deploys/6a2c0fa29126860008674807
😎 Deploy Preview https://deploy-preview-154--pedantic-newton-e8d424.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

davidoj and others added 26 commits June 11, 2026 15:39
The Goldmark passthrough config in config-blog.toml requires Hugo >= 0.122;
the pinned 0.109.0 ignores it silently, so inline/display math would be
markdown-mangled on the deployed blog. 0.127.0 matches local development.
Both site configs verified to build clean under 0.127.0.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Four verdict-level results up front (evidence brackets the basin
boundary; bad paths are quiet; reported misbehaviour trends are
instrument-relative; rescues are multi-fold) so readers get the meat
before the model machinery. Wording by David from the reviewed draft;
dichotomy texture pending v4 re-verification as flagged in the PR.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Implements the David-specified v4 model change in the basin explorer:
passive opacity becomes a per-doubling constant (dE_passive = c0*dA)
instead of a production share; only active evasion remains a labour
share. Downstream of the one-term change:

- Long-run O*(r) is now implicit (minus root of cdP*O^2 - D*O + 1 = 0);
  fixed-point scan root-finds g with the implicit O* on a log grid with
  validity-envelope edge bisection (the envelope can clip real roots in
  deep-suppression corners).
- Basin boundary overlay and lstar are numeric (-min g over the
  cooperative-side validity interval); v3's closed-form threshold chain
  is superseded, the delta=1 closed form survives as a cross-check.
- c0 calibration switches to the O*-dial pin at the long-run
  rare-uncooperative point: c0/cM = (1-O*)/(O*(1-delta*lk*kcu*O*)),
  re-solved per parameter cell (badge O*(0) == the dial exactly).
- Docs panel: new e' equation, per-doubling c0 reading + rationale, the
  dropped production-share variant named in one sentence, implicit-O*
  quadratic + cubic Phi fixed points, v4 endemic floor with the -c*d*k
  term and its validity bound, two-product claim removed (falsified by
  v4), C1/C2 displays updated.
- URL schema bumped to v=4 (hard fallback: v<4 links load defaults).
- Presets unchanged (identity values provably identical under v4).

Verification: _scratch/review/scripts/v4/v4-app-regression.js (61/61;
pinned bit-for-bit reconstruction of 6539bf1 when the two v4 edits are
reverted; c0=0 reproduces v3 exactly; fixtures from calibrations-v4 /
landmarks-v4) and v4-browser-drive.mjs (28/28).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
vite build of the v4 app source (asset index-DhUrkXvT.js replaces
index-DrxafwgL.js). Browser-verified against the built bundle on a
local server: 28/28 checks (badges, dial-pinned c0 readouts, URL v=4
hard fallback, endemic floor, O*-dial rescue, growth-peg docs, outcome
map with the numeric boundary, zero console errors). Evidence in
_scratch/review/app-test-evidence/v4-*.png and
_scratch/review/app-test-report-v4.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Passive opacity becomes a per-doubling constant instead of a production
share: e' = c_0 + a_E/M c_M q_u / F - e (F_E := c_0 F + a_E/M c_M q_u).
Rewrites the model reference, dynamics-in-words and idealizations (the
peg named as a specification choice, with the dropped production-share
variant and its consequences stated); the basin picture and fixed-points
appendix move to the implicit long-run observability (per-q_u quadratic,
minus root) and the cubic fixed-point equation, with the new floor
formula and its validity condition, numeric threshold + delta=1 closed
form, and re-derived C1/C2; the sigma-clock and observability-calibration
appendices re-derive e' and the T_E,2 route; c_0 is now back-solved from
a long-run observability dial per parameter cell. AI usage note updated.

Derivations verified by _scratch/review/scripts/v4/check_algebra_v4.py
(36/36); audit trail in _scratch/review/derivation-audit.md Part A''.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
All 12 verdict cells keep their outcome (Broad monostable takeover at
margins 0.48-0.89x, Strict endemic 4.4-4.9% at 4.6-6.6x); transients
move: observability is flat (~0.5) through the human era and handoff
(no hump, no subsidy), crossings shift ~0.5-1 sigma earlier (Strict 1%
at sigma~3.9, ~2.8y), observed-rate path 2.5->3.8->7.0->9.2% peaking
12.8% near sigma~33, knife pairs 0.308/0.316 (<=0.19pp), retro-O bands
0.86-0.88 / 0.94-0.95 vs contemporaneous <=0.5, rescue levers k<=0.016
/ l>=0.76 / dial>=0.88 ("every rescue >= 3.5x"), dichotomy 19/20 with
the same exception cell. The O-hump calibration diagnostic becomes a
dial-trend diagnostic; the two-product robustness clause is rescoped
to directly recomputed claims. AI 2027: delta=0.7 threshold l*~3.9-4.0
(product ~2.0, 20x central, ~1/4 per doubling, appearance 44-54%);
at delta=1 no fix rate yields a basin under the dial convention
(convention stated). Two-readings gains the model-side dial gloss.

Numbers from _scratch/review/scripts/v4/{calibrations,landmarks}_v4.py;
fixture digests in _scratch/review/drafts/{calibrations,landmarks}-v4.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
figures_ai2027.py ported to the v4 engine (per-doubling passive opacity,
per-cell dial pin, implicit minus-root observability, numeric basin
boundary with the delta=1 closed form as cross-check); its self-tests
re-baselined to the v4 fixtures (Broad l* 0.3123, AI-2027 l* 3.926/3.995
at delta=0.7 and NO BASIN at delta=1, tolerable k_cu 0.0395->0.1078,
l* 1.674 at the 0.99 dial, trajectory 50% at sigma~3.1 with obs peak
~18%) — 12/12 pass. default-path figures regenerated from the v4
default-path-figs.py (validated 21/21 against landmarks-v4/
calibrations-v4): observability panels now flat through the handoff,
crossings and rescue thresholds re-annotated (l*=0.312 appearance at
q*~0.58, good-attractor l>=0.76, dial>=0.88, floor 14% at l_k=0.5).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Opening act: add the why-human-labour-is-not-an-observability-subsidy
  paragraph (worked m*/e* flow-balance two-liner; the rejected
  production-share variant named with its six-fold dilution).
- Handoff: nothing happens even in observability — O crosses parity at
  0.493 from 0.500, ranges verified against the v4 runs.
- Good-path dial-trend paragraph: rising/stagnating/falling catch-rate
  readings map onto the dial one-for-one; falling = evasion gaining on
  coverage now; the v3 a_AI/H stagnation degeneracy is gone.
- Calibration routes + observability appendix: the two routes nearly
  coincide at the central calibration (flat-trend c = 1.01 vs pin 1.02);
  stagnating catch rates are now consistent with flow balance today.

Numbers verified against _scratch/review/scripts/v4 this session;
drafts/v4-prose.md carries the claims header.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…k anchor

- Rescue section: one-frame paragraph on the lk gate reserve — the
  identity anchors effective seeding today at (1 - lk*O0)*k = 0.031
  under both lk readings; the reserve diverges under O movement in both
  directions, explaining both the worse calibrated margin at higher lk
  (0.64x vs 0.89x) and the monitoring-rescue split.
- Two destinies: one-sentence extension of the entanglement note to lk
  (higher lk, higher inferred k, worse margin) with a forward pointer.
- Named measurements: third measurement added — the recurrence audit of
  escaped misbehaviour, estimating O(1-lk)/(1-lk*O) (zero at lk=1, ~1/3
  at lk=0.5 and O=0.5), the first proposed empirical anchor for lk;
  parameter table and Building-the-indicators list updated to match.
- Timeless-voice fix in Named calibrations (drop 'as this version of
  the post does').

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Names each David-specified refinement (delta, production gating, bounded
lk, growth-pegged passive opacity), credits the delta heuristic and the
two-lineage verification scheme explicitly, and keeps revision
provenance confined to this note per the post's convention.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Escape the thin-space in the gate-reserve inline math (Goldmark was
  eating the single backslash, rendering a literal comma).
- Reword the gate-reserve frame's opening (the margin pattern lives in
  the verdict table, not the rescue section) and the handoff sentence
  ('passes through the handoff' instead of the ambiguous 'crosses
  parity' for an O value).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Verdict-table one-character fixes (l* 0.41 at delta=1; attractor 0.68 at
delta=0.3); widen the benign-Strict-corners example; Strict prediction
bullet rescoped to fifty-fold growth; O*'(0) slope claim scoped to
working ranges; sigma~23 rounding. Verdict-gap headline recast as
bracket-the-boundary (the reproduction-relevance input straddles the
basin boundary; no near-boundary-tuning claim). Fixed-instrument
flip-side added to the instrument theory (declining fixed-harness
series ~guaranteed at nonzero fix rate; the series is an l-meter) with
a matching reflex note in Building the indicators, and the exec
summary's lagged-timeseries sentence anchor-linked.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ionale

Pools defined over deployed labour rather than weights (constructive
misuse as a present-day human-assisted member of A_u); strategy-stealing
reframed as long-run convergence; RepliBench scoped as weights-level with
the assemblage-level qualification; the horizon-independence form behind
Strict's discount, with its two asymmetries. No parameter changes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ument

Development labour acts on successors by construction, so independence
of misbehaviour and setting yields the broad rate, not a discount; the
low reading rests on evidenced negative correlation (rates fall sharply
under naturalisation). Broad restated as zero correlation with setting;
Strict's 10x as a judgement about how far the naturalisation trend
extends.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Body gains the rarely-farsighted defeater (matching the summary) with its
empty-lagged-audit signature, and both body and summary state the
semi-positive reading of a quiet signal (occasional findings + flat trend
+ an explaining signature, else evidence about observability). Dangling
dichotomy references removed or relabelled; definition and typo fixes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Occasional reproduction-relevant findings rule out the rarely-farsighted
defeater, gradual-not-sharp declines rule out a suppression surge, and a
series that includes misuse rules out mispointed instruments — once all
three hold, a flat series raises concern (rise coming, or observability
eroding) rather than providing reassurance.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Exec summary claim 3 is now two sentences plus the verified bend claim
and a link to What each path predicts; defeater detail lives in the body
(which already carried it). Body gains the verified bend statement
(20/20 cells for true share and observed rate, including shallow-form
survival in the O-collapse exception cell) and the two practical
corollaries (misuse-inclusive measures; bend-or-investigate reading).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The definition now lives in the state space (low plateau, wide margin,
inside the valid regime — with the low bar and validity bar explicitly
related but doing different jobs) and the measurement story follows
separately, instead of a checklist mixing the two. Jargon translated;
verdicts and numbers unchanged. Also fixes the near-term observability
paragraph's M-vs-E slip and typo.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Confidently good now means: low plateau, wide margin, and both holding
across the plausible ranges of every input. Verified at Strict leakage:
central is good (4.7%, 4.6-6.6x) but the one-percent plateau is out of
reach everywhere (best ~1.5%), the ten-times bar is reachable only in
the friendliest corners (10-20x), and the low-observability corner
loses the good outcome entirely. Plain-language wording throughout.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Confidence = survives the sweep; Strict fails on exactly one axis —
long-run observability within plausible guesses — surviving every other
input. Bars and corner detail dropped per David's direction.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Seeing a third of misbehaviour long-run (vs half today) lifts Strict's
plateau from ~4.7% to ~9-12%; verified by the wiggle-sweep ladder.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claim 1 is now the central-parametrization headline with the four-step
intuitive mechanism and an explicit uncertainty paragraph; lead-in and
cross-references adjusted to match the one-scenario framing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant