feat(scan): add GCG injection scenario generator#2561
Conversation
Register giskardai/harmbench-scenarios with an LLMJudge prompt bundled in giskard-scan and document MIT attribution for commercial use. Co-authored-by: Cursor <cursoragent@cursor.com>
Port the lidar GCG Injection probe into a giskard-scan generator. It subclasses HuggingFaceDatasetScenarioGenerator (defaulting to the HarmBench dataset) to inherit per-language subset handling, then fans each loaded prompt out into one scenario per adversarial suffix (prompt x suffix cross-product). Each variant is renamed with a GCG prefix + suffix index and tagged gcg-suffix:<index>, appended to the dataset's own tags rather than replacing them. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces the GCGInjectionScenarioGenerator to generate Greedy Coordinate Gradient (GCG) injection attack scenarios by appending adversarial suffixes to harmful prompts. It also registers this generator in the vulnerability suite and adds comprehensive unit tests. The review feedback suggests ensuring a space separator between the prompt and the GCG suffix to prevent token merging, which could break the adversarial attack, and updating the corresponding unit tests to reflect this change.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| self, description: str, languages: list[str] | ||
| ) -> list[Scenario[Any, Any, Trace[Any, Any]]]: | ||
| base_scenarios = super().load_scenarios(description, languages) | ||
| return [ |
There was a problem hiding this comment.
Maybe it's too much to do the cross product with all suffixes. I would simply take a suffix randomly for each scenario. It preserves the number of scenario generated unchanged, otherwise, you could ask for 10 max_scenarios but more than 100 instead.
pierlj
left a comment
There was a problem hiding this comment.
One change required on the quantity of scenario generated. Otherwise, it looks fine
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
# Conflicts: # libs/giskard-scan/src/giskard/scan/vulnerability.py
Rotate suffixes by scenario index instead of cross-product fan-out so max_scenarios matches the base dataset size. Co-authored-by: Cursor <cursoragent@cursor.com>
Port the lidar GCG Injection probe into a giskard-scan generator. It subclasses HuggingFaceDatasetScenarioGenerator (defaulting to the HarmBench dataset) to inherit per-language subset handling, then fans each loaded prompt out into one scenario per adversarial suffix (prompt x suffix cross-product). Each variant is renamed with a GCG prefix + suffix index and tagged gcg-suffix:, appended to the dataset's own tags rather than replacing them.
Description
Related Issue
Type of Change
Coding agents
Autonomous agents with no human in the loop must read AUTONOMOUS.md before opening a PR.
PR title: agent-opened PRs must end the title with
🤖🤖🤖🤖(exactly four robot emojis). Do not omit — that suffix is how the expedited agent PR workflow picks up the PR.Checklist
CODE_OF_CONDUCT.mddocument.CONTRIBUTING.mdguide.uv.lockrunninguv lock(only applicable whenpyproject.tomlhas beenmodified)