feat(typologies): add structuring (smurfing) pattern generator#17
feat(typologies): add structuring (smurfing) pattern generator#17AllanSevilla05 wants to merge 4 commits into
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
Adds StructuringGenerator — a fan-in star typology where multiple smurf accounts each send sub-threshold amounts to a single coordinator, modelling BSA/FinCEN structuring (smurfing) patterns. Changes: - typologies.py: add StructuringGenerator and STRUCTURING_DESCRIPTIONS - config.py: add num_structuring_patterns, structuring_smurfs_range, structuring_amount_range with auto-scaling defaults - generator.py: wire StructuringGenerator into Phase 3 pipeline - verify.py: dispatch verification on pattern_type so structuring fan-in patterns are validated correctly alongside cycle rings - test_generator.py: 8 new tests in TestStructuringGenerator
6740f55 to
67d2191
Compare
|
|
||
|
|
||
|
|
||
| def test_transaction_count_matches_smurfs(self, tmp_dir): |
| ) | ||
| assert n_tx == num_patterns * fixed_smurfs | ||
|
|
||
| def test_amounts_are_sub_threshold(self, tmp_dir): |
| amounts = [float(r["amount"]) for r in reader] | ||
| assert all(a < 10_000.00 for a in amounts), "Found amount >= CTR threshold" | ||
|
|
||
| def test_all_transactions_fan_into_coordinator(self, tmp_dir): |
| f"src {src} not a registered smurf of coordinator {dst}" | ||
| ) | ||
|
|
||
| def test_tx_ids_do_not_collide_with_start(self, tmp_dir): |
| assert min(ids) == start | ||
| assert next_id == start + len(ids) | ||
|
|
||
| def test_neptune_format(self, tmp_dir): |
| headers = next(csv.reader(fh)) | ||
| assert "embedding" not in headers | ||
|
|
||
| def test_tiny_account_pool(self, tmp_dir): |
opensource-SantanderAI
left a comment
There was a problem hiding this comment.
Thanks @AllanSevilla05 — the structuring/smurfing typology itself is a great, well-grounded addition (fan-in star vs cycle, FinCEN/31 U.S.C. § 5324 thresholds), and we'd like to merge it. Two things need fixing first:
1. Lint (Lint & format & type-check is failing on ruff check .) — all auto-fixable:
src/gen_fraud_graph/generator.py:6— import block unsorted (I001).- Trailing whitespace (
W291):generator.py:19,typologies.py:31(also a typo there:#Description specififc→# Description specific). - Please run
ruff check --fix . && ruff format .(orblack .) and push.
2. Duplicated test methods (the blocker we care most about). CodeQL flagged 6 "variable defined multiple times" alerts, and they're correct: every test in TestStructuringGenerator is defined twice, so the first copy of each is shadowed and never runs:
test_transaction_count_matches_smurfs— lines 265 and 371test_amounts_are_sub_threshold— 282 and 388test_all_transactions_fan_into_coordinator— 297 and 403test_tx_ids_do_not_collide_with_start— 328 and 434test_neptune_format— 344 and 450test_tiny_account_pool— 360 and 466
This looks like an accidental copy/paste (or a bad merge). Please remove the duplicate definitions so each test exists once and actually executes, then re-confirm the pass/coverage numbers. If the two copies differ, keep the intended one.
Once ruff is green and the duplicate tests are removed, ping us and we'll merge. Thanks!
|
@opensource-SantanderAI Edits have been made. Ruff check green and duplicates removed. |
Summary
Adds
StructuringGenerator— a second fraud typology implementingBSA/FinCEN structuring (smurfing) patterns, where multiple smurf
accounts each send sub-$10,000 amounts to a single coordinator to
avoid Cash Transaction Report (CTR) filing.
The repo previously only generated cyclic ring patterns. Structuring
is the most commonly filed SAR typology and has a structurally
distinct graph signature (fan-in star vs cycle), which stresses
different subgraph detection algorithms.
Changes
typologies.py—StructuringGeneratordataclass withSTRUCTURING_DESCRIPTIONS(8 realistic smurfing descriptions)config.py— three new fields:num_structuring_patterns,structuring_smurfs_range(default 3–10 smurfs),structuring_amount_range(default $8,000–$9,900 sub-threshold)generator.py— wired into Phase 3; tx IDs chain from ringgenerator's
next_tx_idto avoid collisionsverify.py— dispatches onpattern_typeso fan-in starpatterns verify correctly alongside cycle rings
tests/test_generator.py— 8 new tests inTestStructuringGeneratorTest results
50 passed, 0 failed — coverage 97.93%
Domain note
Structuring thresholds modelled on 31 U.S.C. § 5324 and FinCEN
guidance. Default amount range ($8,000–$9,900) reflects common
real-world smurfing behaviour documented in SARs.