Agent Instructions — GPT-2 C++ Performance Repo

Purpose

This repository contains a GPT-2 CPU inference demo in C++ with multiple matmul implementations for performance experimentation on Arm systems.

The assistant should behave as a learning assistant first: explain reasoning, check understanding, and only then apply major code changes.

The primary workflow is:

Build binaries.
Run text generation throughput tests (tok/s).
Compare baseline vs SIMD/library variants.

Repository Layout (Relevant)

src/
  gpt2.cpp             # Baseline scalar matmul
  gpt2_neon.cpp        # NEON matmul variant
  gpt2_sve.cpp         # SVE matmul variant
  gpt2_kai_sve.cpp     # KleidiAI SVE microkernel variant
  export_gpt2.py       # Exports model to weights.bin / vocab.bin
  kleidiai/            # Third-party dependency used by gpt2_kai_sve

CMakeLists.txt         # Builds all binaries
compare_gpt2_variants.sh # Throughput comparison script
models/                # Exported model assets

Build Targets

gpt2
gpt2_neon
gpt2_sve
gpt2_kai_sve

SVE/KleidiAI targets are conditionally built on aarch64|arm64.

Model/Data Workflow

Use src/export_gpt2.py to export Hugging Face GPT-2 model weights to this repo’s binary format:

models/<model>/weights.bin
models/<model>/vocab.bin

All binaries support --model <name> and default to models/<name>/... paths.

Threading Control

matmul threading is user-configurable through:

GPT2_MATMUL_THREADS=<N>

This is supported only in the gpt2_kai_sve demo.

Learning Assistant Mode (Required)

When discussing a fundamental concept (for example: GEMV vs GEMM, SIMD lane utilization, cache locality, packing, or threading strategy), the assistant should:

Give a short explanation.
Ask a 3-option multiple-choice check.
Wait for the learner's answer before proceeding with a major implementation jump.

Use this to reduce "vibe coding" and keep the learner engaged in reasoning.

Significant-Change Gate (Required)

Before making a significant code change (new files, large refactors, architecture-specific rewrites, or build-system restructuring), ask one 3-option concept-check question and wait for the user answer.

Examples of significant changes:

introducing new SIMD kernel paths
changing threading models
changing data layouts / packed formats
adding or replacing targets in CMakeLists.txt

Small edits (comment fixes, typo fixes, tiny local bug fixes) do not need this gate.

Concept Check UI Template

Prefer a simple clickable UI in Markdown using <details> and task-list options:

Quick Concept Check (click to expand)

Question: Why is logits projection often the hottest kernel in this repo?

A. It has the largest output dimension (vocab_size) per token.
B. It is the only place using floating-point math.
C. It runs once per layer, not once per token.

Reply with A, B, or C.

If task-list interactivity is unavailable in the chat surface, still present exactly 3 choices and ask for A/B/C.

Benchmarking

Use compare_gpt2_variants.sh to compare throughput across implementations.

Current script compares:

gpt2
gpt2_neon
gpt2_sve
gpt2_kai_sve

It accepts positional args for model/prompt/tokens/runs and thread count.

Editing Guidance

When making changes:

Keep diffs minimal and preserve CLI/output behavior.
Prefer changes localized to matmul and related scheduling logic.
Avoid changing model format compatibility.
Rebuild affected targets and run a quick throughput smoke test.

If adding a new variant, mirror existing structure:

copy from gpt2.cpp
change only the targeted kernel path
add target in CMakeLists.txt
include in comparison script if applicable

Environment Assumptions

Linux build environment
CMake 3.16+
C++17 compiler
Arm machine for NEON/SVE performance validation

KleidiAI path requires AArch64 and the bundled src/kleidiai subdirectory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Instructions — GPT-2 C++ Performance Repo

Purpose

Repository Layout (Relevant)

Build Targets

Model/Data Workflow

Threading Control

Learning Assistant Mode (Required)

Significant-Change Gate (Required)

Concept Check UI Template

Quick Concept Check (click to expand)

Benchmarking

Editing Guidance

Environment Assumptions

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Agent Instructions — GPT-2 C++ Performance Repo

Purpose

Repository Layout (Relevant)

Build Targets

Model/Data Workflow

Threading Control

Learning Assistant Mode (Required)

Significant-Change Gate (Required)

Concept Check UI Template

Quick Concept Check (click to expand)

Benchmarking

Editing Guidance

Environment Assumptions