AI Engineering Arsenal

Production-grade AI engineering skills, audits, workflows, benchmarks, and evaluation frameworks.

The open-source AI engineering framework for architecture reviews, security audits, startup validation, competitor analysis, AI systems design, SEO audits, AI code review, technical due diligence, and technical decision-making.

Built for developers, founders, CTOs, technical teams, and AI builders.

Developed by the LetsCookTech Open Source Team.

Why use AI Engineering Arsenal instead of asking an AI directly?

Default AI answers are often plausible but hard to trust: they skip evidence, confuse guesses with facts, miss release risks, and leave no test path. AI Engineering Arsenal gives an assistant an operational contract: what to inspect, what to prove, what to refuse, how to verify, and what artifact to hand to a human.

The reputation this project is designed to earn:

This framework catches things normal AI misses.

AI Engineering Arsenal is currently a library of cross-model operating skills. The long-term direction is an AI engineering operating layer for routing, evaluation, policy, and lifecycle. The current repository is intentionally honest about what exists today.

What it helps with

AI code review
Security audit AI workflows
Architecture review and system design
Startup validation
Competitor analysis
Technical due diligence
Engineering playbooks and engineering workflows
AI evaluation and benchmark design
AI CTO operating rhythms
SaaS, Supabase, Next.js, RAG, and AI agent decision-making

Flagship production reviewers

These are the category-defining Arsenal skills. Start here if you want practical value instead of another prompt collection.

Reviewer	What it catches	Best for
`nextjs-production-architecture-reviewer`	Next.js architecture, App Router, Server Actions, performance, SEO, AI-search, security, Vercel cost, Supabase integration, and deployment risks.	Next.js SaaS, AI platforms, ecommerce, dashboards, blogs, marketplaces, agency sites.
`supabase-production-auditor`	RLS bypass, service-role misuse, weak auth, storage exposure, Realtime fan-out, database growth, cost, backups, and production-readiness gaps.	Supabase SaaS, AI apps, mobile apps, internal tools, marketplaces, learning platforms.
`ai-agent-architecture-reviewer`	Planning failures, memory risks, tool abuse, MCP risks, prompt injection, hallucination gaps, cost fan-out, observability, and reliability issues.	AI agents, copilots, workflow agents, browser agents, coding agents, research agents, multi-agent systems.

Each flagship reviewer includes a benchmark rubric and comparison template under benchmarks/ so future claims can be proven with baseline-vs-framework outputs.

See the difference

Without a playbook	With an Arsenal playbook
"Add authentication and validate inputs."	Maps assets and trust boundaries; reports evidence, preconditions, impact, remediation, regression tests, confidence, and review gaps.
"Use a queue and a database."	Compares designs, records assumptions and trade-offs, specifies timeouts/retries/rollback, and names the test that validates the decision.
"Build an AI SaaS."	Produces an acceptance contract plus tenancy, authorization, AI-evaluation, cost-cap, migration, observability, release, and rollback gates.

Read a safe, concrete finding from the synthetic tenant-review case study. It demonstrates an evidence-linked result; it does not claim a benchmark win.

Start with these four

Playbook	Use it when	Proof path
`security-auditor`	You need an authorized code, API, infra, or release-risk review.	Case study · Rubric
`startup-validator`	You need to test whether a product should exist before building it.	Case study · Rubric
`competitor-analyzer`	You need positioning based on evidence rather than a feature grid.	Case study · Rubric
`cto-operating-system`	You need a focused operating plan from engineering signals.	Case study · Rubric

Use a playbook

Copy a skill folder into your agent's skills directory, or attach its SKILL.md to the task. Example:

Use $security-auditor to review this authorized SaaS API. Scope: /api/invoices.
Evidence: repository files and deployment configuration attached.
Return only confirmed findings, review gaps, safe remediation, and verification tests.

Works as portable Markdown with Codex, Claude Code/Projects, ChatGPT, Gemini, Cursor, Windsurf, Cline, Roo Code, Aider, and agent SDKs. See compatibility.

First wave

nextjs-production-architecture-reviewer · supabase-production-auditor · ai-agent-architecture-reviewer · security-auditor · startup-validator · competitor-analyzer · system-architect · database-architect · technical-debt-hunter · ai-search-optimizer · seo-auditor · cost-explosion-detector · cto-operating-system · production-ai-saas-builder

Searchable guides

These pages are written for GitHub, Google, and AI-search discoverability while staying useful to developers:

Evidence, not marketing

AI Engineering Arsenal does not claim that a playbook finds more issues, saves money, or outperforms a model until a reproducible result is published. Each benchmark holds model/version, tools, temperature, budget, inputs, rubric, baseline, playbook run, evaluator, and limitations constant. Read the benchmark protocol.

Trust system

AI Engineering Arsenal has a repository-level system for improving itself instead of only adding more skills:

System	Purpose
Repository audit	Finds weak assets, filler risk, missing proof, and deletion candidates.
Arsenal constitution	Defines the laws every contribution must follow.
AI CTO operating model	Standardizes input, research, verification, risk review, decision, and quality review.
Evaluation standard	Scores outputs across accuracy, evidence, verification, actionability, security, and user value.
Red-team framework	Attacks outputs before users trust them.
Benchmark lab	Defines the proof artifacts required before performance claims.
Self-evolution roadmap	Moves the project toward a proof engine, runtime adapters, and Open Source AI CTO workflows.

FAQ

Is this just a prompt repository?

No. A prompt repository optimizes for copyable text. AI Engineering Arsenal optimizes for evidence, verification, failure detection, benchmarks, and repeatable engineering decisions.

Is this tied to one AI model?

No. The playbooks are Markdown-first and model-portable. They are designed for Claude, ChatGPT, Gemini, Codex, Cursor, Windsurf, Cline, Roo Code, Aider, OpenAI Agents, Anthropic agents, and future AI systems.

Does it already prove benchmark superiority?

Not yet. The repository includes rubrics, synthetic case studies, and benchmark protocol. Public benchmark wins should only be claimed after raw baseline and framework outputs are published.

Contribute a useful playbook

A contribution needs a recurring decision problem, an evidence/verification contract, safety boundaries, and a sanitized evaluation case. Generic personas and untested prompt collections do not qualify. Start with CONTRIBUTING.md.

Repository map

Path	Purpose
`skills/`	Portable operating playbooks.
`case-studies/`	Safe, concrete demonstrations of the output standard.
`benchmarks/`	Per-playbook rubrics and reproducibility protocol.
`evals/`	Versioned task fixtures for baseline-versus-playbook runs.
`docs/`	Product thesis, compatibility, launch, and publishing guidance.
`templates/`	Proof-pack templates for graduating skills into trusted assets.

Status

v0.1.0 is a foundation release. Case studies are synthetic demonstrations; public benchmark results are not yet published. That distinction is intentional.

Developed by the LetsCookTech Open Source Team.

License

MIT. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Engineering Arsenal

Why use AI Engineering Arsenal instead of asking an AI directly?

What it helps with

Flagship production reviewers

See the difference

Start with these four

Use a playbook

First wave

Searchable guides

Evidence, not marketing

Trust system

FAQ

Is this just a prompt repository?

Is this tied to one AI model?

Does it already prove benchmark superiority?

Contribute a useful playbook

Repository map

Status

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
assets		assets
benchmarks		benchmarks
case-studies		case-studies
docs		docs
evals		evals
scripts		scripts
skills		skills
templates		templates
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AI Engineering Arsenal

Why use AI Engineering Arsenal instead of asking an AI directly?

What it helps with

Flagship production reviewers

See the difference

Start with these four

Use a playbook

First wave

Searchable guides

Evidence, not marketing

Trust system

FAQ

Is this just a prompt repository?

Is this tied to one AI model?

Does it already prove benchmark superiority?

Contribute a useful playbook

Repository map

Status

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages