Skip to content

Commit 36efeb9

Browse files
soluwalanagithub-advanced-security[bot]anubhutivyas
authored
feat: add customizer as a plugin - AALGO-215 (#13)
* feat: plugin-contributed authorization policy merge Add AuthzContribution discovery from nemo.authz entry points, NemoService classes, and customization contributors; merge into OPA bundle at runtime and via auth-tools sync-plugins. Pass Authorization and other SDK default headers through submit_remote so protected routes receive credentials. Extend test_authz with a contributor example and coverage for authz discovery plus authenticated submit. Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * feat: add customizer as a plugin - AIRCORE-350 Respect the NMP_PLATFORM_SEED_AUTH_ENABLED=false env var Auth is required for local seed, update the default for local env fix bug with submit missing credentials Add functionality to allow plugins to update authz routes. uv run python services/core/auth/scripts/auth-tools.py sync-plugins Add skill for customization - add sizing guidance to the SKILL Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * feat(customizer): add unsloth backend plugin Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * Cleanup and PR comment addressing - Merge unsloth docker bake into top level docker bake - Potential fix for pull request finding 'CodeQL / Empty except' - fix remaining tests - Update e2e conftest - Fix code rabbit changes - code rabbit is incorrect, ctx is definitely required - Cleanup the image and registries in docker-bake.hcl - Add both platforms to the workspace steps - Identify sub-plugins auth in the customizer router - reduce build time on base - Regenerate stainless - fix python types - remove dead code - Fix skill to not reference run for unsloth - fix bug with validation path on unsloth - Fix unsloth build caching - Fix _pickle.PicklingError: Can't pickle <class 'trl.trainer.sft_config.SFTConfig'>: it's not the same object as trl.trainer.sft_config.SFTConfig - Add progress callback to unsloth - Force the agent to avoid 2>&1 - General scripts shouldn't be tied to automodel Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * fix accidental changes Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * lint fix Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * get_qualified_image moved Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * Fix startup for e2e tests Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * Don't leak devops configurations Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * everything is fine Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * wait_for_model_entity() is conflating IGW model entity cache and virutal model in igw cache, separate them fix the issue with AUTOMODEL and UNSLOTH image selection, use platform image_registry unless overridden Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * Resolve discrepancy between unsloth and automodel metric reporting Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * test fixes Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * PR Comment fixes Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * Try to resolve flaky IGW test only occurring under xdist Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * Fix tests Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * Fix tests Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * Remove unused bake variable, pin unsloth Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * Fix missing mkdocs.yaml in dockerfile Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * update platform-workspace comments after Fern docs migration Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * Resolve python type merge conflict with VLLM PR Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * Allow lint-fix to report on whether the fix was successful afterwards, fix 503s from storage when using HF repos Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * lint fix Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * Remove type hint fixes Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * Address Aaron's comments: - Fix discovery of the SDK from contributors instead of hardcoding the SDK contributions in the router - delete services/customizer deadcode To fix in the refactor: - Opinionated shared parameters - Integrations object - Epoch behavior - Steps behavior - All Weights / Full training type discrepancies Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> * Update test cases to supply required sdk contributions more usable output in the skill Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> --------- Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: anubhutiv <anubhutiv@nvidia.com>
1 parent 39b6c19 commit 36efeb9

450 files changed

Lines changed: 21284 additions & 29402 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.cursor/rules/nemo-platform.mdc

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,8 @@ User-facing skills in `packages/nemo_platform_ext/src/nemo_platform_ext/skills/`
3737
- `nemo-try-agent`: test a deployed agent or chat with a model.
3838
- `nemo-status`: read-only health dashboard. Run this before assuming the platform is up.
3939
- `nemo-teardown`: guided shutdown with confirmation.
40-
- `nemo-fine-tune`: fine-tuning. Not yet available; the skill tells the user this honestly instead of letting you improvise.
4140

42-
Plugin-owned skills under `plugins/*/src/*/skills/` handle their own routing for guardrails, evaluations, optimization, data designer, anonymizer, and auditor.
41+
Plugin-owned skills under `plugins/*/src/*/skills/` handle their own routing for customization, guardrails, evaluations, optimization, data designer, anonymizer, and auditor.
4342

4443
## Sandboxed environments
4544

.dockerignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ docker-bake.hcl
22
.venv
33
.ruff_cache
44
Dockerfile.bake
5-
services/customizer/tests
65
**/Dockerfile*
76
.dockerignore
87
**/__pycache__

.gitattributes

Lines changed: 0 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -36,50 +36,6 @@ third_party/requirements*.txt linguist-generated
3636
docker/locks/**/uv.lock linguist-generated
3737
documentation/docs/audit/_snippets/output/*.jsonl filter=lfs diff=lfs merge=lfs -text
3838
documentation/docs/generate-synthetic-data/images/* filter=lfs diff=lfs merge=lfs -text
39-
services/customizer/tests/testdata/grpo/workbench/training.jsonl filter=lfs diff=lfs merge=lfs -text
40-
services/customizer/tests/testdata/grpo/comp_coding/training.jsonl filter=lfs diff=lfs merge=lfs -text
41-
services/customizer/tests/testdata/grpo/instruction_following/training.jsonl filter=lfs diff=lfs merge=lfs -text
42-
services/customizer/tests/testdata/grpo/multineedle/validation.jsonl filter=lfs diff=lfs merge=lfs -text
43-
services/customizer/tests/testdata/grpo/python_math_exec/training.jsonl filter=lfs diff=lfs merge=lfs -text
44-
services/customizer/tests/testdata/grpo/mcqa/training.jsonl filter=lfs diff=lfs merge=lfs -text
45-
services/customizer/tests/testdata/grpo/comp_coding/validation.jsonl filter=lfs diff=lfs merge=lfs -text
46-
services/customizer/tests/testdata/grpo/google_search/training.jsonl filter=lfs diff=lfs merge=lfs -text
47-
services/customizer/tests/testdata/grpo/google_search/validation.jsonl filter=lfs diff=lfs merge=lfs -text
48-
services/customizer/tests/testdata/grpo/instruction_following/validation.jsonl filter=lfs diff=lfs merge=lfs -text
49-
services/customizer/tests/testdata/grpo/library_judge_math/training.jsonl filter=lfs diff=lfs merge=lfs -text
50-
services/customizer/tests/testdata/grpo/library_judge_math/validation.jsonl filter=lfs diff=lfs merge=lfs -text
51-
services/customizer/tests/testdata/grpo/multiverse_math_hard/training.jsonl filter=lfs diff=lfs merge=lfs -text
52-
services/customizer/tests/testdata/grpo/workbench/validation.jsonl filter=lfs diff=lfs merge=lfs -text
53-
services/customizer/tests/testdata/grpo/mcqa/validation.jsonl filter=lfs diff=lfs merge=lfs -text
54-
services/customizer/tests/testdata/grpo/multineedle/training.jsonl filter=lfs diff=lfs merge=lfs -text
55-
services/customizer/tests/testdata/grpo/multiverse_math_hard/validation.jsonl filter=lfs diff=lfs merge=lfs -text
56-
services/customizer/tests/testdata/grpo/python_math_exec/validation.jsonl filter=lfs diff=lfs merge=lfs -text
57-
services/customizer/tests/tasks/file_io/data/files_to_upload/nested_2/__0_0.distcp filter=lfs diff=lfs merge=lfs -text
58-
services/customizer/tests/python/training/nemo/data/mp_rank_00_customization.nemo filter=lfs diff=lfs merge=lfs -text
59-
services/customizer/tests/python/training/nemo/data/customization.nemo filter=lfs diff=lfs merge=lfs -text
60-
services/customizer/tests/python/training/nemo/data/gpt2b_tp1_lora.nemo filter=lfs diff=lfs merge=lfs -text
61-
services/customizer/tests/python/training/nemo/data/gpt8b_tp4_lora.nemo filter=lfs diff=lfs merge=lfs -text
62-
services/customizer/tests/python/training/nemo/data/expected_llmservice_peft_lora/model_weights.ckpt filter=lfs diff=lfs merge=lfs -text
63-
services/customizer/tests/python/training/nemo/data/expected_llmservice_peft_lora_tp4/model_weights.ckpt filter=lfs diff=lfs merge=lfs -text
64-
services/customizer/tests/python/data/gpt_126m.nemo filter=lfs diff=lfs merge=lfs -text
65-
services/customizer/tests/testdata/e2e-eval/email-composition-train/training/training_file.jsonl filter=lfs diff=lfs merge=lfs -text
66-
services/customizer/tests/testdata/e2e-eval/email-composition-train/validation/validation_file.jsonl filter=lfs diff=lfs merge=lfs -text
67-
services/customizer/tests/testdata/e2e-eval/email-composition-eval/email_eval_ms_test.json filter=lfs diff=lfs merge=lfs -text
68-
services/customizer/tests/testdata/e2e-eval/email-composition-eval/email_eval_ms_test_sft.json filter=lfs diff=lfs merge=lfs -text
69-
services/customizer/tests/testdata/gpt-sft-chat-dataset/e21a501b3cc14174835d787ced1583e2_tokenizer.model filter=lfs diff=lfs merge=lfs -text
70-
services/customizer/tests/testdata/gpt-sft-chat-dataset/llama2_tokenizer.model filter=lfs diff=lfs merge=lfs -text
71-
services/customizer/tests/testdata/gpt-sft-chat-dataset/merges.txt filter=lfs diff=lfs merge=lfs -text
72-
services/customizer/tests/testdata/gpt-sft-chat-dataset/tokenizer.model filter=lfs diff=lfs merge=lfs -text
73-
services/customizer/tests/testdata/gpt-sft-chat-dataset/vocab.json filter=lfs diff=lfs merge=lfs -text
74-
services/customizer/tests/testdata/e2e-eval/email-composition-convo/training/training_file.jsonl filter=lfs diff=lfs merge=lfs -text
75-
services/customizer/tests/testdata/e2e-eval/email-composition-convo/validation/validation_file.jsonl filter=lfs diff=lfs merge=lfs -text
76-
services/customizer/tests/testdata/tool-calling/xlam_openai_format.jsonl filter=lfs diff=lfs merge=lfs -text
77-
services/customizer/tests/testdata/tool-calling/training.jsonl filter=lfs diff=lfs merge=lfs -text
78-
services/customizer/tests/testdata/tool-calling/validation.jsonl filter=lfs diff=lfs merge=lfs -text
79-
services/customizer/tests/testdata/tool-calling/testing.jsonl filter=lfs diff=lfs merge=lfs -text
80-
services/customizer/tests/testdata/embedding/training/training.jsonl filter=lfs diff=lfs merge=lfs -text
81-
services/customizer/tests/testdata/embedding/testing.jsonl filter=lfs diff=lfs merge=lfs -text
82-
services/customizer/tests/testdata/embedding/validation/validation.jsonl filter=lfs diff=lfs merge=lfs -text
8339
# Files maintained by external garak project
8440
packages/garak_api/garakapi/_config.py linguist-generated
8541
packages/garak_api/garakapi/_plugins.py linguist-generated

.github/trufflehog-exclude.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Newline-separated regexes for paths TruffleHog should skip.
2+
# uv.lock contains many sha256 hex digests that false-positive as SentryToken.
3+
uv\.lock

.github/workflows/ci.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -303,6 +303,7 @@ jobs:
303303
env:
304304
PYTHON_VERSION: ${{ matrix.python-version }}
305305
NMP_DATA_DIR: ${{ runner.temp }}/nemo-data
306+
NMP_AUTH_ENABLED: "false"
306307
_TYPER_FORCE_DISABLE_TERMINAL: "1"
307308
run: |
308309
set -euo pipefail

.github/workflows/security.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ jobs:
3838
with:
3939
path: ./
4040
version: 3.95.3
41+
extra_args: --exclude-paths=.github/trufflehog-exclude.txt
4142

4243
- name: Scan Results Status
4344
if: ${{ github.event_name != 'merge_group' && steps.trufflehog.outcome == 'failure' }}

.pre-commit-config.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ repos:
3838
packages/nmp_common/src/nmp_common/api/.*|
3939
4040
# Individual microservices
41-
services/customizer/src/customizer/api/v1/.*|
4241
services/evaluator/src/evaluator/api/.*|
4342
services/guardrails/src/guardrails/api/.*|
4443
services/core/infrastructure/jobs/src/jobs/api/.*|

AGENTS.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,8 @@ User-facing skills in `packages/nemo_platform_ext/src/nemo_platform_ext/skills/`
3333
- `nemo-try-agent`: test a deployed agent or chat with a model.
3434
- `nemo-status`: read-only health dashboard.
3535
- `nemo-teardown`: guided shutdown with confirmation.
36-
- `nemo-fine-tune`: fine-tuning. Not yet available; the skill tells the user it's not shipped instead of improvising with another training library.
3736

38-
Plugin-owned skills under `plugins/*/src/*/skills/` handle guardrails, evaluations, optimization, data designer, anonymizer, and auditor.
37+
Plugin-owned skills under `plugins/*/src/*/skills/` handle their own routing for customization, guardrails, evaluations, optimization, data designer, anonymizer, and auditor.
3938

4039
### Working in a sandboxed environment
4140

CLAUDE.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,8 @@ User-facing skills in `packages/nemo_platform_ext/src/nemo_platform_ext/skills/`
3333
- `nemo-try-agent`: test a deployed agent or chat with a model.
3434
- `nemo-status`: read-only health dashboard. Run this before assuming the platform is up.
3535
- `nemo-teardown`: guided shutdown with confirmation.
36-
- `nemo-fine-tune`: fine-tuning. Not yet available; the skill tells the user it's not shipped instead of letting the agent improvise with another training library.
3736

38-
Plugin-owned skills live under `plugins/*/src/*/skills/` and handle their own routing for guardrails, evaluations, optimization, data designer, anonymizer, and auditor.
37+
Plugin-owned skills live under `plugins/*/src/*/skills/` and handle their own routing for customization, guardrails, evaluations, optimization, data designer, anonymizer, and auditor.
3938

4039
### Working in a sandboxed coding-agent environment
4140

Makefile

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -228,9 +228,11 @@ check-copyright-headers:
228228
lint: ## Run all linters (licenses, openapi, config docs, python style/types/sdk, vendored SDK, CLI, auth config)
229229
bash tools/lint/lint-all.sh
230230

231+
LINT_FIX_VERIFY ?= 0
232+
231233
.PHONY: lint-fix
232-
lint-fix: ## Auto-fix lint issues in dependency order (openapi → stainless → style → cli → vendor → licenses → config-docs)
233-
bash tools/lint/lint-fix.sh
234+
lint-fix: ## Auto-fix lint issues (set LINT_FIX_VERIFY=1 to also run CI lint checks)
235+
LINT_FIX_VERIFY=$(LINT_FIX_VERIFY) bash tools/lint/lint-fix.sh
234236

235237
.PHONY: vendor
236238
vendor: ## Vendor packages into the SDK and generate wrapper metadata
@@ -477,10 +479,10 @@ test-e2e-kubernetes-gpu: ## Run GPU e2e tests against Kubernetes (requires GPU n
477479
@echo "Running GPU e2e tests with Kubernetes with feature gpu enabled..."
478480
uv run --frozen pytest e2e --kubernetes --feature gpu -v --junitxml=report-kubernetes-gpu.xml
479481

480-
.PHONY: test-e2e-kubernetes-gpu-customizer
481-
test-e2e-kubernetes-gpu-customizer: ## Run GPU customizer e2e tests against Kubernetes (requires GPU nodes; set NMP_E2E_CLUSTER_URL)
482-
@echo "Running GPU customizer e2e tests with Kubernetes..."
483-
uv run --frozen pytest e2e/test_customizer.py --kubernetes --feature gpu --feature customizer --log-cli-level=INFO -v --junitxml=report-kubernetes-gpu-customizer.xml
482+
.PHONY: test-e2e-kubernetes-gpu-automodel
483+
test-e2e-kubernetes-gpu-automodel: ## Run GPU automodel customization e2e tests against Kubernetes (requires GPU nodes; set NMP_E2E_CLUSTER_URL)
484+
@echo "Running GPU automodel customization e2e tests with Kubernetes..."
485+
uv run --frozen pytest tests/agentic-use/customizer-lora-job-cli/tests/test_outputs.py --kubernetes --feature gpu --log-cli-level=INFO -v --junitxml=report-kubernetes-gpu-automodel.xml
484486

485487
.PHONY: benchmark-guardrails
486488
benchmark-guardrails: ## Run nemo-guardrails IGW benchmark sweep (set BENCHMARK_ARGS for extra flags)

0 commit comments

Comments
 (0)