AGENTS.md

Architecture Map

Apache Texera: Scala/sbt backend services + the Amber workflow execution engine, an Angular UI, and the agent service. JVM modules wired in build.sbt.

Area	Path	Detail
Workflow execution engine (Amber)	`amber/`	amber/README.md
Backend services	`config-service/`, `access-control-service/`, `file-service/`, `computing-unit-managing-service/`, `workflow-compiling-service/`	`build.sbt`
Shared Scala libs	`common/` (`auth`, `config`, `dao`, `workflow-core`, `workflow-operator`, `pybuilder`)	`build.sbt`
Frontend (Angular)	`frontend/`	frontend/README.md
Agent service (Bun/TS, LLM agents)	`agent-service/`	`agent-service/package.json`
Pyright language service	`pyright-language-service/`	pyright-language-service/README.md
Deploy scripts / Dockerfiles	`bin/`	README / k8s / single-node
DDL, sbt plugins	`sql/`, `project/`	files therein

Amber breakdown

Path	Role
`amber/src/main/scala`	Pekko actors, scheduler, reconfiguration, fault tolerance, gRPC/proto
`amber/src/main/python/pyamber`	Python engine (`pyamber`) — bridge to the Scala engine
`amber/src/main/python/pytexera`	Python operator SDK exposed to UDFs

Where Things Live

Topic	Source of truth
Contribution / PR / lint / format / testing / license header	CONTRIBUTING.md
Reporting security issues	SECURITY.md
PR template	.github/PULL_REQUEST_TEMPLATE
Issue templates	bug / task / feature
License-header coverage; vendored `workflow-operator`	.licenserc.yaml; project/AddMetaInfLicenseFiles.scala
Local single-node / k8s deploy	single-node, k8s

If a topic is above, read that file instead of asking here.

Agent-Specific Rules

Scope and safety

Narrowly scoped changes. No unrelated rewrites or cross-service moves.
git status --short before editing; don't revert unrelated dirty files.
Never commit secrets / local config / build output / caches / binaries (python_udf.conf, .env, target/, dist/, .pytest_cache/, .ruff_cache/, logs).

Develop in a worktree

Leave texera/ on main. One worktree per PR, branched off a freshly fetched upstream/main.

texera/                      # stays on main, never dirty
texera-worktrees/<branch>/   # one worktree per PR

Reset to upstream/main at start; git log upstream/main..HEAD should contain only this PR's commits before pushing; remove the worktree after merge.

Environment

Component	Version
Java	JDK 17
Scala	2.13
Python	3.12
Node	24

One Python venv shared across worktrees, sibling of the texera checkout:

<workspace>/
├── texera/                   # main checkout
├── texera-worktrees/<br>/    # per-PR worktrees
└── venv312/                  # shared Python 3.12 venv

python3.12 -m venv ../venv312 && source ../venv312/bin/activate
pip install -r amber/requirements.txt -r amber/operator-requirements.txt
# For pytest or running bin/python-proto-gen.sh, also install dev deps:
pip install -r amber/dev-requirements.txt

Tests that spawn Python workers need an interpreter path. Edit python.path in udf.conf or export UDF_PYTHON_PATH="$(pwd)/../venv312/bin/python" (env var overrides). Without it, sbt Python-integration tests fail to launch a worker.

.jvmopts holds every --add-opens flag Texera needs for JDK 17+, with each group annotated by its upstream source (Kryo, Apache Arrow, Apache Pekko). sbt's launcher and the .run/ configs read it automatically; for raw java launches, pass it as an argfile: java @.jvmopts -jar …. If a future library version or a new code path triggers an InaccessibleObjectException, add the open to .jvmopts. project/JdkOptions.scala will propagates the changed options to forked test JVMs, sbt-native-packager dist launchers, and IntelliJ.

Branch and commit naming

Short, Conventional Commits, same shape for branch and commit subject.

Kind	Branch	Commit
Feature	`feat/agent-workflow-edit`	`feat(agent-service): enable workflow edit`
Bug fix	`fix/marker-replay`	`fix(amber): marker replay during reconfiguration`
Tests	`test/pyamber-handlers`	`test(pyamber): add handler unit tests`
Chore	`chore/angular-21`	`chore(deps): upgrade frontend to Angular 21`
CI	`ci/cache-action-bump`	`ci: bump coursier/cache-action to v8.1.0`

Both ≤ ~60 chars. For code changes, if you use a scope, use the module name (amber, pyamber, frontend, agent-service, file-service, …) — not amber-python. Use chore(deps): ... for dependency-only updates, and ci: ... for CI-only changes. No Co-authored-by: trailer for the repo owner.

Issues and PRs

Issue-first; both stay short.

issue (template + Type)  ->  PR (Closes #N, template)  ->  review  ->  merge

Every change starts as an issue (minor typo / docs excepted). File against apache/texera, never a fork.
Pick the right template and set the GitHub Issue Type explicitly (Bug / Task / Feature); the template's type: frontmatter doesn't always apply on creation.
Reference the issue: Closes #N (or Fixes / Resolves, or "related to").
Issue titles are plain prose; never use the Conventional Commits format (type(scope): ...) — that prefix is for commit and PR titles only.
Task issues match task-template.yaml exactly.
Prefer tables and small ASCII diagrams over long bullets. Don't restate the diff or the template.

For bugs, lead with root cause and a before -> after sketch:

Before:  reconfiguration -> replay marker -> worker hangs
After:   reconfiguration -> replay marker -> resume from checkpoint

Frontend PRs: any visible UI change requires screenshots / GIF, before / after side by side. For purely visual fixes that's the primary verification under "How was this PR tested?"; interactive flows also list manual steps (click path, browser, viewport).

Tests come first

TDD. Write the test before the source change.

write/adjust test (red)  ->  edit source (green)  ->  refactor

Situation	Order
New feature / behavior change	Failing test, then implement.
Bug fix	Regression test reproducing the bug, then fix.
Code with no tests	Characterization tests pin current behavior first; only then change source.
Refactor (no behavior change)	Tests stay green throughout — no assertion edits.

Every test must cover:

Both directions: positive (valid → expected) and negative (invalid / error → specific failure mode).
Edge cases: empty / null / zero / max / boundary, unicode, concurrency/order, missing or malformed config.
Don't assume valid. External input (user / API / file / message) must be tested with bad input.

Don't claim "tested" without commands. Paste the exact sbt testOnly / pytest / yarn test:ci / bun test invocation under "How was this PR tested?".

CI labels & gating

CI runs are selected by PR labels, not by file diff.

diff -> pr-labeler -> labels on PR -> required-checks maps labels to stacks -> CI runs

Path → label rules: .github/labeler.yml
Label → stacks (LABEL_STACKS, source of truth): .github/workflows/required-checks.yml. Read it directly; don't duplicate the mapping here.
Need extra coverage the diff doesn't imply (e.g. a common/ change you suspect breaks the frontend)? Add the relevant label manually.
Empty stack union (docs-only / dev-only / dependencies / feature / fix / refactor / release/* only) skips every build stack on purpose.
release/* labels select backport targets; removing one cancels that backport.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGENTS.md

Architecture Map

Amber breakdown

Where Things Live

Agent-Specific Rules

Scope and safety

Develop in a worktree

Environment

Branch and commit naming

Issues and PRs

Tests come first

CI labels & gating

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md

Architecture Map

Amber breakdown

Where Things Live

Agent-Specific Rules

Scope and safety

Develop in a worktree

Environment

Branch and commit naming

Issues and PRs

Tests come first

CI labels & gating