Roadmap

v0.5.0 (Current, 2026-04-17) — Confidence-aware LLM Guardrails

The yuragi.guardrails subpackage lands. yuragi is no longer just a measurement library — it is now a guardrail platform with audit logging, multi-agent runtime, and Git-like cognitive snapshots. Origin: ported from the previously-private dacm project. See CHANGELOG.md for the full list and KNOWN_LIMITATIONS.md G1–G7 for acknowledged tradeoffs.

Highlights

v0.6.0 (planned)

NATS DeadLetter detection via consumer-info polling
on_malformed returns ACK | NAK | TERM
NATS prefetch / max_in_flight config
Background-queue AuditSink for high-throughput deployments
PII / jailbreak / prompt-injection detectors (Presidio bridge)

v0.7.0 (planned)

Streaming-token guardrails (early-stop on confidence drop)
Optional executor-backed AuditSink (run_in_executor)
Snapshot benchmark against the ≤ 1 s / 10 k-agent target

v0.4.0 (2026-04-12) — Production Applications & Open-Box Analysis

Core Confidence Measurement

Production Applications (5 modules + 5 CLI commands)

yuragi check — CI/CD fragility regression detection with GitHub Actions workflow
yuragi route — Fragility-aware multi-model routing (3 strategies)
yuragi guard — Abstention system for high-stakes domains (5 domain presets)
yuragi recommend — Model selection based on fragility profiles
yuragi red-team — Automated vulnerability discovery via perturbation probing
applications/ Python API for all 5 use cases

Cerebras & Provider Support

Cerebras API integration (direct logprobs via _complete_with_logprobs_direct())
Thinking model detection (Qwen3.5/LFM2.5-thinking fallback to sampling)
Provider detection for Groq, Together AI, Fireworks AI, OpenRouter

Metrics & Analysis

Adaptive fragility metrics (adaptive_fragility, maladaptive_fragility, adversarial_fragility, fragility_ratio)
Perturbation semantic classification (SemanticClass enum: PRESERVING/MODIFYING/ADVERSARIAL)
Phase transition experiment (experiments/phase_transition.py) — 11-step graduated prefix ladder
Hallucination prediction experiment (benchmarks/hallucination_experiment.py)

White-Box Experiments

whitebox_design.py — 5 whitebox experiments (Exp 1–5): layer entropy propagation, causal tracing, attention shift, representation geometry, SAE feature decomposition (Pythia-410m, Qwen2-1.5B)

Theory & Papers

theory.md Sections 12–20 (NN interpretation, PIRI framework, phase transition, cross-species fragility, adaptive fragility)
ICML 2026 MI Workshop paper (paper/icml2026_mi/main.tex)
JOSS paper (paper.md + paper.bib)

Tests & Infrastructure

1320+ tests passing / ruff + bandit clean
Gradio demo (demo/app.py) for HuggingFace Spaces / local use
GitHub Actions reusable workflow (.github/workflows/yuragi-check.yml)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap

v0.5.0 (Current, 2026-04-17) — Confidence-aware LLM Guardrails

Highlights

v0.6.0 (planned)

v0.7.0 (planned)

v0.4.0 (2026-04-12) — Production Applications & Open-Box Analysis

Core Confidence Measurement

Production Applications (5 modules + 5 CLI commands)

Cerebras & Provider Support

Metrics & Analysis

White-Box Experiments

Theory & Papers

Tests & Infrastructure

v0.5.0 (Next) — Validation & Ecosystem

v0.2.0 (2026-04-11, Released) — Empirical Reproducibility & Academic-Grade Foundation

v0.1.0 (2026-04-10, Released) — Full Fragility Analysis

FilesExpand file tree

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

Roadmap

v0.5.0 (Current, 2026-04-17) — Confidence-aware LLM Guardrails

Highlights

v0.6.0 (planned)

v0.7.0 (planned)

v0.4.0 (2026-04-12) — Production Applications & Open-Box Analysis

Core Confidence Measurement

Production Applications (5 modules + 5 CLI commands)

Cerebras & Provider Support

Metrics & Analysis

White-Box Experiments

Theory & Papers

Tests & Infrastructure

v0.5.0 (Next) — Validation & Ecosystem

v0.2.0 (2026-04-11, Released) — Empirical Reproducibility & Academic-Grade Foundation

v0.1.0 (2026-04-10, Released) — Full Fragility Analysis