This document describes the internal architecture of the demo data generator.
People.AI-Data-Generator/
├── src/demo_gen/ # Main source code
│ ├── __init__.py
│ ├── cli.py # CLI interface (Click)
│ ├── config.py # Configuration schema (Pydantic)
│ ├── logger.py # Structured logging (JSONL + summary)
│ ├── state_store.py # SQLite idempotency tracker
│ ├── sf_client.py # Salesforce API wrapper
│ ├── activity_planner.py # Deterministic activity generation
│ ├── content_gen.py # LLM content generation
│ ├── scorecard_client.py # Scorecard creation & population
│ └── runner.py # Main orchestration logic
├── tests/ # Unit tests
├── runs/ # Generated run logs (gitignored)
├── demo.example.yaml # Example configuration
├── .env.example # Example environment variables
├── pyproject.toml # Package metadata
├── setup.py # Package setup
├── requirements.txt # Dependencies
├── Makefile # Convenience commands
├── README.md # User documentation
├── QUICKSTART.md # Getting started guide
└── ARCHITECTURE.md # This file
CLI Layer (cli.py)
Purpose: Command-line interface and user interaction
Commands:
run- Full pipeline executiondry-run- Preview without changesstatus- View run summaryreset- Cleanup run datasmoke- Single-opp validation
Dependencies: runner.py, config.py, logger.py
Configuration (config.py)
Purpose: Load, validate, and resolve configuration
Key Classes:
DemoGenConfig- Main configuration schema (Pydantic)ResolvedConfig- Runtime-resolved config with run metadata- Various sub-configs for each section
Validation:
- HTTPS URLs required
- Coverage targets 0-1
- Required fields present
Orchestration (runner.py)
Purpose: Main pipeline coordination
Flow:
- Query opportunities (via sf_client)
- For each opportunity:
- Plan activities (via activity_planner)
- Generate content (via content_gen, optional)
- Create Salesforce records (via sf_client)
- Create scorecards (via scorecard_client)
- Log events (via logger)
- Track state (via state_store, optional)
Methods:
run()- Main pipelinesmoke_test()- Single-opp testcleanup_run()- Reset helper
Salesforce Client (sf_client.py)
Purpose: Abstract Salesforce API interactions
Operations:
- Query opportunities (SOQL)
- Create Events (meetings)
- Create Tasks (emails)
- Query Contacts
- Tag records (for cleanup)
- Delete by run ID
Authentication: OAuth via simple-salesforce library
Activity Planner (activity_planner.py)
Purpose: Generate deterministic activity plans
Key Features:
- Seeded random generation
- Configurable min/max counts
- Past/future time windows
- Realistic subject lines
- Participant role assignment
Output: ActivityPlan with lists of PlannedMeeting and PlannedEmail
Content Generator (content_gen.py)
Purpose: Generate realistic text via LLM
Capabilities:
- Meeting notes (agenda/summary based on timing)
- Email bodies
- Scorecard answers
Fallback: Returns None if LLM fails; caller uses heuristics
Scorecard Client (scorecard_client.py)
Purpose: Create and populate scorecards
Templates:
- MEDDICC (currently)
- Extensible for others
Modes:
heuristic- Rule-based answersllm- AI-generated answershybrid- LLM with heuristic fallback
Scoring: Coverage + confidence weighted average
State Store (state_store.py)
Purpose: Track created records for idempotency
Schema:
opportunities- Selected oppsactivities- Created meetings/emails (by signature)scorecards- Created scorecardsscorecard_answers- Populated answers
Signature: MD5 hash of (type, timestamp, subject)
Logger (logger.py)
Purpose: Structured event and error logging
Outputs:
events.jsonl- One JSON per actionerrors.jsonl- One JSON per errorrun.json- Run metadatasummary.json- Final statistics
Classes:
DemoGenLogger- Full loggingDryRunLogger- No-op for dry runs
┌─────────────┐
│ CLI │
│ (cli.py) │
└──────┬──────┘
│
▼
┌─────────────┐
│ Config │
│(config.py) │
└──────┬──────┘
│
▼
┌─────────────┐
│ Runner │◄───────────┐
│ (runner.py) │ │
└──────┬──────┘ │
│ │
┌──────────────────┼──────────────────┼────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────┐ ┌──────────┐
│ SF Client │ │Activity │ │ Content │ │Scorecard │
│ (sf_client) │ │ Planner │ │ Gen │ │ Client │
└──────┬───────┘ └──────┬───────┘ └────┬─────┘ └────┬─────┘
│ │ │ │
│ └─────────────────┴─────────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Salesforce │ │ OpenAI API │
│ API │ │ │
└──────────────┘ └──────────────┘
Logging (logger.py) ──► events.jsonl, summary.json
State (state_store.py) ──► state.sqlite
Why: Reproducibility and debugging
- Same seed → same activity plan
- Predictable demo environments
- Easy to verify what changed
How: Seeded random number generator keyed by seed:opp_id
Why: Safe reruns without duplicates
Options:
- External state (SQLite) - No Salesforce changes needed
- Tag-based - Requires custom field, enables cleanup
Trade-off: External state more compatible, tag-based more powerful
Why: Preview before execution, validate config
Implementation: dry_run flag propagates through all clients
- SF client returns mock data
- No writes to Salesforce or state DB
- Logger is no-op variant
Why: Cost control, offline operation, reliability
Fallback: Heuristic content if LLM fails or disabled
- Still creates activities with subjects
- Uses template-based scorecard answers
Why: Debugging, auditing, analytics
Format: JSONL for easy parsing, streaming
- One event per line
- Greppable, jq-friendly
- Summary JSON for quick stats
- Create class inheriting
ScorecardTemplatein scorecard_client.py - Define questions with IDs and categories
- Add to
_load_templates()method - Optionally add heuristic answers
- Extend
ActivityPlandataclass in activity_planner.py - Add generation logic to
create_plan() - Add creation method to sf_client.py
- Add processing to
_create_activities()in runner.py
Replace or extend ContentGenerator:
- Could add Azure OpenAI support
- Could integrate with other LLMs
- Could use templates with variable substitution
- Config validation: Ensure constraints enforced
- Deterministic generation: Same seed → same output
- Bounds checking: Min/max respected
- Smoke test:
demo-gen smokevalidates end-to-end - Dry run: Preview without side effects
- Idempotency: Run twice, verify no duplicates
- Run smoke test with single opp
- Verify in Salesforce
- Wait for People.ai ingestion
- Confirm activities appear
- Then run full generation
click- CLI frameworkpyyaml- Config parsingpydantic- Schema validationsimple-salesforce- SF API clientopenai- LLM integrationpython-dotenv- Environment loadingrich- Terminal formatting
pytest- Testingblack- Code formattingruff- Linting
- Configurable via
--concurrencyflag - Default: 5 parallel API calls
- Implemented with ThreadPoolExecutor at opportunity level
- Future: finer-grained concurrency for per-activity writes
- Salesforce: 15K API calls/day (typical)
- OpenAI: Depends on tier
- Safety:
--max-oppshard cap
Current design handles:
- 100-200 opportunities comfortably
- 5-20 activities per opp
- Total: ~2000 API calls per run
For larger scale:
- Batch API calls
- Implement true concurrency
- Add progress bars
- Consider async/await pattern
- Never commit
.envto git - Use environment variables only
- Support OAuth (preferred) and JWT
Minimal required:
- Read: Opportunity, Account, Contact
- Create: Event, Task
- API Enabled
- API key via environment only
- No PII sent to OpenAI
- Opportunity names/stages only (demo data)
- Progress bars (rich.progress)
- People.ai API verification
- More scorecard templates (BANT, MEDDPICC)
- Custom activity subjects via config
- Web UI for configuration
- Scheduled runs (cron-style)
- Diff mode (show what changed)
- Export/import state
- Cloud deployment (Lambda/Cloud Run)
- Contact auto-generation
- Account hierarchy support
- Multi-org orchestration
- Reporting dashboard
- Slack notifications