Apache Texera: Scala/sbt backend services + the Amber workflow execution
engine, an Angular UI, and the agent service. JVM modules wired in
build.sbt.
| Area | Path | Detail |
|---|---|---|
| Workflow execution engine (Amber) | amber/ |
amber/README.md |
| Backend services | config-service/, access-control-service/, file-service/, computing-unit-managing-service/, workflow-compiling-service/ |
build.sbt |
| Shared Scala libs | common/ (auth, config, dao, workflow-core, workflow-operator, pybuilder) |
build.sbt |
| Frontend (Angular) | frontend/ |
frontend/README.md |
| Agent service (Bun/TS, LLM agents) | agent-service/ |
agent-service/package.json |
| Pyright language service | pyright-language-service/ |
pyright-language-service/README.md |
| Deploy scripts / Dockerfiles | bin/ |
README / k8s / single-node |
| DDL, sbt plugins | sql/, project/ |
files therein |
| Path | Role |
|---|---|
amber/src/main/scala |
Pekko actors, scheduler, reconfiguration, fault tolerance, gRPC/proto |
amber/src/main/python/pyamber |
Python engine (pyamber) — bridge to the Scala engine |
amber/src/main/python/pytexera |
Python operator SDK exposed to UDFs |
| Topic | Source of truth |
|---|---|
| Contribution / PR / lint / format / testing / license header | CONTRIBUTING.md |
| Reporting security issues | SECURITY.md |
| PR template | .github/PULL_REQUEST_TEMPLATE |
| Issue templates | bug / task / feature |
License-header coverage; vendored workflow-operator |
.licenserc.yaml; project/AddMetaInfLicenseFiles.scala |
| Local single-node / k8s deploy | single-node, k8s |
If a topic is above, read that file instead of asking here.
- Narrowly scoped changes. No unrelated rewrites or cross-service moves.
git status --shortbefore editing; don't revert unrelated dirty files.- Never commit secrets / local config / build output / caches / binaries
(
python_udf.conf,.env,target/,dist/,.pytest_cache/,.ruff_cache/, logs).
Leave texera/ on main. One worktree per PR, branched off a freshly
fetched upstream/main.
texera/ # stays on main, never dirty
texera-worktrees/<branch>/ # one worktree per PR
Reset to upstream/main at start; git log upstream/main..HEAD should
contain only this PR's commits before pushing; remove the worktree after
merge.
| Component | Version |
|---|---|
| Java | JDK 17 |
| Scala | 2.13 |
| Python | 3.12 |
| Node | 24 |
One Python venv shared across worktrees, sibling of the texera checkout:
<workspace>/
├── texera/ # main checkout
├── texera-worktrees/<br>/ # per-PR worktrees
└── venv312/ # shared Python 3.12 venv
python3.12 -m venv ../venv312 && source ../venv312/bin/activate
pip install -r amber/requirements.txt -r amber/operator-requirements.txt
# For pytest or running bin/python-proto-gen.sh, also install dev deps:
pip install -r amber/dev-requirements.txtTests that spawn Python workers need an interpreter path. Edit python.path
in udf.conf or
export UDF_PYTHON_PATH="$(pwd)/../venv312/bin/python" (env var overrides).
Without it, sbt Python-integration tests fail to launch a worker.
.jvmopts holds every --add-opens flag Texera needs for
JDK 17+, with each group annotated by its upstream source (Kryo,
Apache Arrow, Apache Pekko). sbt's launcher and the .run/
configs read it automatically; for raw java launches, pass it as an
argfile: java @.jvmopts -jar …. If a future library version or a new
code path triggers an InaccessibleObjectException, add the open to
.jvmopts. project/JdkOptions.scala
will propagates the changed options to forked test JVMs, sbt-native-packager dist launchers,
and IntelliJ.
Short, Conventional Commits, same shape for branch and commit subject.
| Kind | Branch | Commit |
|---|---|---|
| Feature | feat/agent-workflow-edit |
feat(agent-service): enable workflow edit |
| Bug fix | fix/marker-replay |
fix(amber): marker replay during reconfiguration |
| Tests | test/pyamber-handlers |
test(pyamber): add handler unit tests |
| Chore | chore/angular-21 |
chore(deps): upgrade frontend to Angular 21 |
| CI | ci/cache-action-bump |
ci: bump coursier/cache-action to v8.1.0 |
Both ≤ ~60 chars. For code changes, if you use a scope, use the module name
(amber, pyamber, frontend, agent-service, file-service, …) — not
amber-python. Use chore(deps): ... for dependency-only updates, and
ci: ... for CI-only changes. No Co-authored-by: trailer for the repo
owner.
Issue-first; both stay short.
issue (template + Type) -> PR (Closes #N, template) -> review -> merge
- Every change starts as an issue (minor typo / docs excepted). File against
apache/texera, never a fork. - Pick the right template and set the GitHub Issue Type explicitly
(
Bug/Task/Feature); the template'stype:frontmatter doesn't always apply on creation. - Reference the issue:
Closes #N(orFixes/Resolves, or "related to"). - Issue titles are plain prose; never use the Conventional Commits
format (
type(scope): ...) — that prefix is for commit and PR titles only. - Task issues match
task-template.yamlexactly. - Prefer tables and small ASCII diagrams over long bullets. Don't restate the diff or the template.
- For bugs, lead with root cause and a before -> after sketch:
Before: reconfiguration -> replay marker -> worker hangs After: reconfiguration -> replay marker -> resume from checkpoint - Frontend PRs: any visible UI change requires screenshots / GIF, before / after side by side. For purely visual fixes that's the primary verification under "How was this PR tested?"; interactive flows also list manual steps (click path, browser, viewport).
TDD. Write the test before the source change.
write/adjust test (red) -> edit source (green) -> refactor
| Situation | Order |
|---|---|
| New feature / behavior change | Failing test, then implement. |
| Bug fix | Regression test reproducing the bug, then fix. |
| Code with no tests | Characterization tests pin current behavior first; only then change source. |
| Refactor (no behavior change) | Tests stay green throughout — no assertion edits. |
Every test must cover:
- Both directions: positive (valid → expected) and negative (invalid / error → specific failure mode).
- Edge cases: empty / null / zero / max / boundary, unicode, concurrency/order, missing or malformed config.
- Don't assume valid. External input (user / API / file / message) must be tested with bad input.
Don't claim "tested" without commands. Paste the exact sbt testOnly /
pytest / yarn test:ci / bun test invocation under "How was this PR
tested?".
CI runs are selected by PR labels, not by file diff.
diff -> pr-labeler -> labels on PR -> required-checks maps labels to stacks -> CI runs
- Path → label rules:
.github/labeler.yml - Label → stacks (
LABEL_STACKS, source of truth):.github/workflows/required-checks.yml. Read it directly; don't duplicate the mapping here. - Need extra coverage the diff doesn't imply (e.g. a
common/change you suspect breaks the frontend)? Add the relevant label manually. - Empty stack union (docs-only / dev-only /
dependencies/feature/fix/refactor/release/*only) skips every build stack on purpose. release/*labels select backport targets; removing one cancels that backport.