Skip to content

feat(scan): add GCG injection scenario generator#2561

Open
kevinmessiaen wants to merge 5 commits into
mainfrom
feat/scan-gcg-generator
Open

feat(scan): add GCG injection scenario generator#2561
kevinmessiaen wants to merge 5 commits into
mainfrom
feat/scan-gcg-generator

Conversation

@kevinmessiaen

Copy link
Copy Markdown
Member

Port the lidar GCG Injection probe into a giskard-scan generator. It subclasses HuggingFaceDatasetScenarioGenerator (defaulting to the HarmBench dataset) to inherit per-language subset handling, then fans each loaded prompt out into one scenario per adversarial suffix (prompt x suffix cross-product). Each variant is renamed with a GCG prefix + suffix index and tagged gcg-suffix:, appended to the dataset's own tags rather than replacing them.

Description

Related Issue

Type of Change

  • 📚 Examples / docs / tutorials / dependencies update
  • 🔧 Bug fix (non-breaking change which fixes an issue)
  • 🥂 Improvement (non-breaking change which improves an existing feature)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 🔐 Security fix

Coding agents

Autonomous agents with no human in the loop must read AUTONOMOUS.md before opening a PR.

PR title: agent-opened PRs must end the title with 🤖🤖🤖🤖 (exactly four robot emojis). Do not omit — that suffix is how the expedited agent PR workflow picks up the PR.

Checklist

  • I've read the CODE_OF_CONDUCT.md document.
  • I've read the CONTRIBUTING.md guide.
  • I've written tests for all new methods and classes that I created.
  • I've written the docstring in NumPy format for all the methods and classes that I created or modified.
  • I've updated the uv.lock running uv lock (only applicable when pyproject.toml has been
    modified)

kevinmessiaen and others added 2 commits June 24, 2026 14:58
Register giskardai/harmbench-scenarios with an LLMJudge prompt bundled in
giskard-scan and document MIT attribution for commercial use.

Co-authored-by: Cursor <cursoragent@cursor.com>
Port the lidar GCG Injection probe into a giskard-scan generator. It
subclasses HuggingFaceDatasetScenarioGenerator (defaulting to the
HarmBench dataset) to inherit per-language subset handling, then fans
each loaded prompt out into one scenario per adversarial suffix
(prompt x suffix cross-product). Each variant is renamed with a GCG
prefix + suffix index and tagged gcg-suffix:<index>, appended to the
dataset's own tags rather than replacing them.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kevinmessiaen kevinmessiaen requested a review from pierlj June 24, 2026 10:16

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the GCGInjectionScenarioGenerator to generate Greedy Coordinate Gradient (GCG) injection attack scenarios by appending adversarial suffixes to harmful prompts. It also registers this generator in the vulnerability suite and adds comprehensive unit tests. The review feedback suggests ensuring a space separator between the prompt and the GCG suffix to prevent token merging, which could break the adversarial attack, and updating the corresponding unit tests to reflect this change.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread libs/giskard-scan/src/giskard/scan/generators/gcg.py
Comment thread libs/giskard-scan/tests/generators/test_gcg.py Outdated
self, description: str, languages: list[str]
) -> list[Scenario[Any, Any, Trace[Any, Any]]]:
base_scenarios = super().load_scenarios(description, languages)
return [

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's too much to do the cross product with all suffixes. I would simply take a suffix randomly for each scenario. It preserves the number of scenario generated unchanged, otherwise, you could ask for 10 max_scenarios but more than 100 instead.

@pierlj pierlj left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change required on the quantity of scenario generated. Otherwise, it looks fine

Base automatically changed from feat/harmbench-dataset to main June 25, 2026 03:04
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
# Conflicts:
#	libs/giskard-scan/src/giskard/scan/vulnerability.py
Rotate suffixes by scenario index instead of cross-product fan-out so
max_scenarios matches the base dataset size.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants