Skip to content

feat(typologies): add structuring (smurfing) pattern generator#17

Open
AllanSevilla05 wants to merge 4 commits into
SantanderAI:mainfrom
AllanSevilla05:feat/structuring-typology
Open

feat(typologies): add structuring (smurfing) pattern generator#17
AllanSevilla05 wants to merge 4 commits into
SantanderAI:mainfrom
AllanSevilla05:feat/structuring-typology

Conversation

@AllanSevilla05

Copy link
Copy Markdown

Summary

Adds StructuringGenerator — a second fraud typology implementing
BSA/FinCEN structuring (smurfing) patterns, where multiple smurf
accounts each send sub-$10,000 amounts to a single coordinator to
avoid Cash Transaction Report (CTR) filing.

The repo previously only generated cyclic ring patterns. Structuring
is the most commonly filed SAR typology and has a structurally
distinct graph signature (fan-in star vs cycle), which stresses
different subgraph detection algorithms.

Changes

  • typologies.pyStructuringGenerator dataclass with
    STRUCTURING_DESCRIPTIONS (8 realistic smurfing descriptions)
  • config.py — three new fields: num_structuring_patterns,
    structuring_smurfs_range (default 3–10 smurfs),
    structuring_amount_range (default $8,000–$9,900 sub-threshold)
  • generator.py — wired into Phase 3; tx IDs chain from ring
    generator's next_tx_id to avoid collisions
  • verify.py — dispatches on pattern_type so fan-in star
    patterns verify correctly alongside cycle rings
  • tests/test_generator.py — 8 new tests in
    TestStructuringGenerator

Test results

50 passed, 0 failed — coverage 97.93%

Domain note

Structuring thresholds modelled on 31 U.S.C. § 5324 and FinCEN
guidance. Default amount range ($8,000–$9,900) reflects common
real-world smurfing behaviour documented in SARs.

@AllanSevilla05 AllanSevilla05 requested review from a team as code owners June 25, 2026 04:31
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@AllanSevilla05

Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

github-actions Bot added a commit that referenced this pull request Jun 25, 2026
Adds StructuringGenerator — a fan-in star typology where multiple
smurf accounts each send sub-threshold amounts to a single coordinator,
modelling BSA/FinCEN structuring (smurfing) patterns.

Changes:
- typologies.py: add StructuringGenerator and STRUCTURING_DESCRIPTIONS
- config.py: add num_structuring_patterns, structuring_smurfs_range,
structuring_amount_range with auto-scaling defaults
- generator.py: wire StructuringGenerator into Phase 3 pipeline
- verify.py: dispatch verification on pattern_type so structuring
fan-in patterns are validated correctly alongside cycle rings
- test_generator.py: 8 new tests in TestStructuringGenerator
@AllanSevilla05 AllanSevilla05 force-pushed the feat/structuring-typology branch from 6740f55 to 67d2191 Compare June 25, 2026 04:54
Comment thread tests/test_generator.py Outdated



def test_transaction_count_matches_smurfs(self, tmp_dir):
Comment thread tests/test_generator.py Outdated
)
assert n_tx == num_patterns * fixed_smurfs

def test_amounts_are_sub_threshold(self, tmp_dir):
Comment thread tests/test_generator.py Outdated
amounts = [float(r["amount"]) for r in reader]
assert all(a < 10_000.00 for a in amounts), "Found amount >= CTR threshold"

def test_all_transactions_fan_into_coordinator(self, tmp_dir):
Comment thread tests/test_generator.py Outdated
f"src {src} not a registered smurf of coordinator {dst}"
)

def test_tx_ids_do_not_collide_with_start(self, tmp_dir):
Comment thread tests/test_generator.py Outdated
assert min(ids) == start
assert next_id == start + len(ids)

def test_neptune_format(self, tmp_dir):
Comment thread tests/test_generator.py Outdated
headers = next(csv.reader(fh))
assert "embedding" not in headers

def test_tiny_account_pool(self, tmp_dir):

@opensource-SantanderAI opensource-SantanderAI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AllanSevilla05 — the structuring/smurfing typology itself is a great, well-grounded addition (fan-in star vs cycle, FinCEN/31 U.S.C. § 5324 thresholds), and we'd like to merge it. Two things need fixing first:

1. Lint (Lint & format & type-check is failing on ruff check .) — all auto-fixable:

  • src/gen_fraud_graph/generator.py:6 — import block unsorted (I001).
  • Trailing whitespace (W291): generator.py:19, typologies.py:31 (also a typo there: #Description specififc# Description specific).
  • Please run ruff check --fix . && ruff format . (or black .) and push.

2. Duplicated test methods (the blocker we care most about). CodeQL flagged 6 "variable defined multiple times" alerts, and they're correct: every test in TestStructuringGenerator is defined twice, so the first copy of each is shadowed and never runs:

  • test_transaction_count_matches_smurfs — lines 265 and 371
  • test_amounts_are_sub_threshold — 282 and 388
  • test_all_transactions_fan_into_coordinator — 297 and 403
  • test_tx_ids_do_not_collide_with_start — 328 and 434
  • test_neptune_format — 344 and 450
  • test_tiny_account_pool — 360 and 466

This looks like an accidental copy/paste (or a bad merge). Please remove the duplicate definitions so each test exists once and actually executes, then re-confirm the pass/coverage numbers. If the two copies differ, keep the intended one.

Once ruff is green and the duplicate tests are removed, ping us and we'll merge. Thanks!

@AllanSevilla05

Copy link
Copy Markdown
Author

@opensource-SantanderAI Edits have been made. Ruff check green and duplicates removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants