Performance Tuning Guide

This guide explains how to optimize the performance of Atom's advanced skill execution features, including package installation, skill loading, marketplace search, and workflow execution.

Performance Targets
Package Installation
Skill Loading
Marketplace Performance
Workflow Performance
Benchmarking
Production Configuration
Monitoring
Troubleshooting
Optimization Checklist

Performance Targets

Operation Targets

Operation	Target	Typical	How to Measure
Package Installation	< 5 seconds	3-5s	Small packages (requests, lodash)
Skill Loading	< 1 second	0.5-1s	Dynamic import from file
Hot-Reload	< 1 second	0.3-0.8s	File change to reload
Marketplace Search	< 100ms	20-50ms	With pagination
Workflow Validation	< 50ms	10-30ms	DAG validation
Dependency Resolution	< 500ms	100-300ms	Conflict detection

Regression Thresholds

Performance regressions detected when operation exceeds 1.5x baseline:

REGRESSION_THRESHOLD = 1.5  # 50% slower triggers alert

Example:

Baseline: Package installation = 4 seconds
Regression threshold: 4 * 1.5 = 6 seconds
Alert triggered if: Installation takes > 6 seconds

Performance Budgets

Allocate time budgets for workflow operations:

PERFORMANCE_BUDGETS = {
    "package_install": 5.0,      # 5 seconds
    "skill_load": 1.0,           # 1 second
    "hot_reload": 1.0,           # 1 second
    "marketplace_search": 0.1,   # 100ms
    "workflow_validation": 0.05  # 50ms
}

Package Installation

Optimize Installation Time

Use Small Packages

Prefer minimal dependencies to reduce installation time:

# GOOD: Small, focused package
packages = ["requests==2.28.0"]  # ~5MB

# AVOID: Large packages with many transitive dependencies
packages = ["tensorflow==2.12.0"]  # ~500MB (use only if needed)

Package Size Impact:

Small (<10MB): 1-3 seconds installation
Medium (10-50MB): 3-10 seconds installation
Large (>50MB): 10-60 seconds installation

Enable Image Caching

Auto-installer caches images by default:

from core.auto_installer_service import AutoInstallerService

installer = AutoInstallerService(db)

# Check cache hit
result = await installer.install_dependencies(
    skill_id="my-skill",
    packages=["pandas==2.0.0"],
    package_type="python",
    agent_id="my-agent"
)

if result.get("cached"):
    print("Used cached image - no rebuild needed")
    # Installation: <1 second (cached)
else:
    print("Building new image")
    # Installation: 3-5 seconds (new build)

Cache Benefits:

First installation: 3-5 seconds
Cached installation: <1 second
5-10x speedup for repeat installations

Batch Installations

Install multiple skills at once:

# BAD: Sequential installations (slow)
for skill_id in ["skill1", "skill2", "skill3"]:
    await installer.install_dependencies(
        skill_id=skill_id,
        packages=["pandas==2.0.0"],
        package_type="python",
        agent_id="my-agent"
    )
# Total time: 3 * 5s = 15 seconds

# GOOD: Batch installation (fast)
result = await installer.batch_install(
    installations=[
        {"skill_id": "skill1", "packages": ["pandas==2.0.0"], "package_type": "python"},
        {"skill_id": "skill2", "packages": ["numpy>=1.24.0"], "package_type": "python"},
        {"skill_id": "skill3", "packages": ["requests==2.28.0"], "package_type": "python"}
    ],
    agent_id="my-agent"
)
# Total time: 5-7 seconds (parallel builds)

Performance Gains:

Sequential: 3 skills * 5s = 15 seconds
Batch: 5-7 seconds (2-3x faster)

Troubleshooting Slow Installation

Problem: Installation takes > 5 seconds

Diagnosis:

# Check package size
pip show pandas

# Check download speed
time pip download pandas==2.0.0

# Check Docker build time
docker build -t test-build .

Solutions:

Check package size:

# Large packages take longer
pip show --files pandas | wc -l  # File count
du -sh ~/.cache/pip  # Cache size

Verify network speed:

# Slow network slows downloads
speedtest-cli
ping pypi.org

Reduce number of dependencies:

# Before: 10 packages (slow)
packages = ["pandas", "numpy", "scipy", "matplotlib", "seaborn", ...]

# After: 3 packages (fast)
packages = ["pandas", "numpy", "requests"]

Use local package mirror:

# Configure pip to use mirror
pip config set global.index-url https://mirror.example.com/pypi/simple/

Problem: Docker build hangs

Diagnosis:

# Check Docker daemon
docker ps
docker info

# Check build logs
docker logs <container_id>

Solutions:

Check Docker daemon:

# Restart Docker if hung
sudo systemctl restart docker  # Linux
# macOS: Restart Docker Desktop

Increase Docker resources:

# Docker Desktop: Settings > Resources
# Increase CPU and memory allocation

Use --no-cache for debugging:

# Disable cache to debug slow layers
docker build --no-cache -t test-build .

Skill Loading

Optimize Load Time

Preload Frequently Used Skills

Load skills on startup for faster access:

from core.skill_dynamic_loader import get_global_loader

loader = get_global_loader()

# Preload common skills on startup
COMMON_SKILLS = [
    ("http_get", "/path/to/http_get.py"),
    ("json_parse", "/path/to/json_parse.py"),
    ("database_insert", "/path/to/database_insert.py")
]

for skill_name, skill_path in COMMON_SKILLS:
    loader.load_skill(skill_name, skill_path)
    print(f"Preloaded: {skill_name}")

# Now skills are ready for instant use
module = loader.get_skill("http_get")  # <1ms (cached)

Performance Impact:

First load: 500-1000ms
Preloaded access: <1ms (1000x faster)

Enable File Monitoring (Development)

Watchdog-based hot-reload for development:

from core.skill_dynamic_loader import SkillDynamicLoader

# Development: Enable monitoring
loader = SkillDynamicLoader(
    skills_dir="/path/to/skills",
    enable_monitoring=True  # Watch for file changes
)

# File changes trigger automatic reload
# Detected change in http_get.py, reloading...

Performance:

File detection: <100ms (watchdog)
Reload time: 300-800ms
No server restart required

Disable Monitoring (Production)

Disable monitoring for performance:

# Production: Disable monitoring for performance
loader = SkillDynamicLoader(
    skills_dir="/path/to/skills",
    enable_monitoring=False  # Disabled in production
)

# Skills still loadable, just no auto-reload
module = loader.load_skill("http_get", "/path/to/http_get.py")

Production Best Practices:

Disable file monitoring (reduces CPU usage)
Use preloaded skills (faster access)
Restart service to update skills (safer than hot-reload)

Module Cache Management

The loader automatically clears sys.modules on reload:

# GOOD: Uses proper cache clearing
loader.reload_skill("my_skill")  # Clears sys.modules first

# BAD: Manual reload without cache clearing
import importlib
importlib.reload(sys.modules["my_skill"])  # Stale code!

Cache Clearing Flow:

def reload_skill(self, skill_name: str):
    """Hot-reload skill without service restart."""
    # Step 1: Clear module cache (prevents stale code)
    if skill_name in sys.modules:
        del sys.modules[skill_name]

    # Step 2: Reload from file path
    skill_path = self.loaded_skills[skill_name]['path']
    return self.load_skill(skill_path, skill_name)

Why Clear Cache?

Python caches imported modules in sys.modules
Reloading without clear causes old code to execute
Stale imports cause bugs and inconsistencies

Troubleshooting Slow Loading

Problem: Skill loading takes > 1 second

Diagnosis:

# Check file size
ls -lh /path/to/skill.py

# Check import dependencies
python -c "import importtime; import skill"

# Check disk I/O
iostat -x 1

Solutions:

Check file size:

# Large files take longer to load
wc -l /path/to/skill.py  # Line count

Reduce import dependencies:

# BAD: Many imports (slow)
import pandas
import numpy
import scipy
import matplotlib
import seaborn

# GOOD: Lazy imports (fast)
def execute():
    import pandas  # Import only when needed
    return pandas.DataFrame()

Use bytecode caching:

# Python automatically caches .pyc files
# Ensure __pycache__ directory is writable
ls -la __pycache__/

Problem: Hot-reload not working

Diagnosis:

# Check watchdog installation
pip show watchdog

# Check file permissions
ls -la /path/to/skills/

# Test file monitoring
python -m watchdog.observers

Solutions:

Install watchdog:
```
pip install watchdog
```

Check file permissions:

# Ensure read access to skill directory
chmod +r /path/to/skills/*.py

Verify monitoring enabled:

# Check enable_monitoring flag
loader = SkillDynamicLoader(
    skills_dir="/path/to/skills",
    enable_monitoring=True  # Must be True
)

Marketplace Performance

Optimize Search

Use Specific Queries

# GOOD: Specific category (fast)
/marketplace/skills?category=data
# Time: ~20ms (indexed query)

# SLOWER: Full-text search
/marketplace/skills?query=data
# Time: ~50ms (text search)

Query Performance:

Category filter: ~20ms (indexed)
Type filter: ~20ms (indexed)
Full-text search: ~50ms (text matching)
Combined filters: ~40ms (index + text)

Pagination

# GOOD: Reasonable page size
/marketplace/skills?page=1&page_size=20
# Time: ~20ms (single query)

# AVOID: Too large page size
/marketplace/skills?page=1&page_size=1000
# Time: ~100ms (large result set)

Page Size Guidelines:

Default: 20 items (recommended)
Maximum: 100 items (API limit)
Optimal: 20-50 items (balance performance vs. UX)

Caching

Application-level caching for popular queries:

import asyncio
from datetime import datetime, timedelta

cache = {}

async def get_cached_skills(query: str):
    """Get skills from cache or database."""
    cache_key = f"skills:{query}"

    # Check cache
    if cache_key in cache:
        cached_data, timestamp = cache[cache_key]
        age = datetime.now() - timestamp

        if age < timedelta(minutes=5):
            print("Cache hit")
            return cached_data

    # Cache miss - fetch from database
    print("Cache miss")
    result = await fetch_skills_from_db(query)

    # Store in cache
    cache[cache_key] = (result, datetime.now())

    return result

Cache Strategy:

TTL: 5 minutes
Cache key: skills:{query}:{category}:{page}
Invalidate: On skill import/update

Troubleshooting Slow Queries

Problem: Marketplace search > 100ms

Diagnosis:

# Check query plan
EXPLAIN ANALYZE SELECT * FROM skill_executions WHERE ...

# Check database indexes
\di skill_executions

# Check database connection
psql -c "SELECT version();"

Solutions:

Add database indexes:

-- Index on skill_id
CREATE INDEX idx_skill_id ON skill_executions(skill_id);

-- Index on category
CREATE INDEX idx_category ON skill_executions((input_params->>'skill_metadata'->>'category'));

-- Index on status
CREATE INDEX idx_status ON skill_executions(status);

Enable query result caching:

# Cache query results for 5-15 minutes
cache_ttl = 300  # 5 minutes

Reduce page_size:

# Use smaller page sizes
/marketplace/skills?page=1&page_size=10  # Instead of 100

Use specific filters:

# Instead of full-text search
/marketplace/skills?query=data

# Use category filter
/marketplace/skills?category=data

Workflow Performance

Optimize DAG Structure

Minimize Depth

# GOOD: Shallow workflow (parallelizable)
steps = [
    SkillStep("a", "task", {}, []),
    SkillStep("b", "task", {}, []),
    SkillStep("c", "merge", {}, ["a", "b"])
]
# Execution time: 2 * task_time (parallel)

# BAD: Deep chain (sequential)
steps = [
    SkillStep(f"step{i}", "task", {}, [f"step{i-1}"] if i > 0 else [])
    for i in range(10)
]
# Execution time: 10 * task_time (sequential)

Performance Impact:

Shallow: Parallel execution = faster
Deep: Sequential execution = slower

Parallel Branches

# GOOD: Independent steps execute in parallel
steps = [
    SkillStep("start", "data_fetch", {}, []),
    SkillStep("branch1", "process_a", {}, ["start"]),
    SkillStep("branch2", "process_b", {}, ["start"]),
    SkillStep("branch3", "process_c", {}, ["start"])
]
# Execution time: 2 * task_time

# BAD: Sequential dependencies
steps = [
    SkillStep("branch1", "process_a", {}, []),
    SkillStep("branch2", "process_b", {}, ["branch1"]),
    SkillStep("branch3", "process_c", {}, ["branch2"])
]
# Execution time: 3 * task_time

Speedup:

Parallel: 1.5-2x faster (for independent steps)

Reduce Skill Execution Time

Optimize Individual Skills

# GOOD: Efficient algorithm
def process_data(data):
    return [x * 2 for x in data]  # O(n)

# BAD: Inefficient algorithm
def process_data(data):
    result = []
    for i, x in enumerate(data):
        result.append(data[i] * 2)  # O(n) but slower
    return result

Optimization Tips:

Use built-in functions (map, filter, list comprehensions)
Minimize I/O operations
Cache external API calls
Use appropriate data structures

Set Appropriate Timeouts

steps = [
    SkillStep("quick", "fast_task", {}, [], timeout_seconds=10),
    SkillStep("slow", "heavy_task", {}, [], timeout_seconds=300)
]

Timeout Guidelines:

Quick operations: 10-30 seconds
Heavy computation: 300-600 seconds (5-10 minutes)
External API calls: 30-60 seconds

Workflow Caching

Cache workflow results for repeated executions:

from functools import lru_cache
import hashlib

def get_workflow_hash(steps):
    """Generate hash for workflow caching."""
    steps_json = json.dumps([s.__dict__ for s in steps], sort_keys=True)
    return hashlib.sha256(steps_json.encode()).hexdigest()

@lru_cache(maxsize=100)
def execute_cached_workflow(workflow_hash, agent_id):
    """Execute workflow with caching."""
    return await execute_workflow(steps, agent_id)

Cache Benefits:

Repeated workflows: Instant results
Idempotent workflows: Safe to cache
TTL: 1-60 minutes (configurable)

Benchmarking

Run Benchmarks

# Run all performance benchmarks
pytest backend/tests/test_performance_benchmarks.py --benchmark-only

# Run specific benchmark group
pytest backend/tests/test_performance_benchmarks.py --benchmark-only -k "package-install"

# Generate benchmark report
pytest backend/tests/test_performance_benchmarks.py --benchmark-only --benchmark-json=benchmark.json

# Compare against baseline
pytest backend/tests/test_performance_benchmarks.py --benchmark-only --benchmark-compare

Benchmark Results

Example output:

--------------------------------------------------------------------------------------------------
Name (time in ms)                          Min       Max      Mean    StdDev    Median     Rounds
--------------------------------------------------------------------------------------------------
test_package_install_small               3200.5   4500.2   3800.1    450.3    3750.2         10
test_package_install_cached                 50.2     80.1     65.3     12.4      62.1         10
test_skill_load_first                     850.3   1200.5    950.2    150.4    920.3         10
test_skill_load_cached                      0.8      1.2      1.0      0.2       0.9         10
test_marketplace_search                   25.3     45.2     32.1      8.5      30.2         10
test_workflow_validation                  15.2     35.4     22.3      7.2      20.1         10
test_dependency_resolution               150.3    280.5    190.2     45.3    180.2         10
--------------------------------------------------------------------------------------------------

Compare Against Baselines

from core.performance_monitor import get_monitor

monitor = get_monitor()

# Check for regression
result = monitor.check_regression(
    operation="package_install",
    current_duration=6.0  # 6 seconds
)

if result["regression"]:
    print(f"REGRESSION: {result['percent_change']:.1f}% slower than baseline")
    print(f"Baseline: {result['baseline']:.2f}s")
    print(f"Current: {result['current']:.2f}s")
else:
    print("Performance OK")

Regression Detection:

Threshold: 1.5x baseline (50% slower)
Alert: Triggers when threshold exceeded
Action: Investigate performance degradation

Production Configuration

Environment Variables

# .env.production

# Disable hot-reload in production
SKILL_HOT_RELOAD_ENABLED=false

# Increase timeout for heavy workflows
WORKFLOW_TIMEOUT_SECONDS=600

# Limit concurrent installations
MAX_CONCURRENT_INSTALLATIONS=3

# Enable image caching
DOCKER_IMAGE_CACHE_ENABLED=true

# Marketplace cache TTL (seconds)
MARKETPLACE_CACHE_TTL=300

# Workflow cache TTL (seconds)
WORKFLOW_CACHE_TTL=60

Resource Limits

# Set Docker resource limits for skill execution
installer.install_packages(
    skill_id="my-skill",
    requirements=["pandas==2.0.0"],
    memory_limit="512m",  # Limit memory
    cpu_limit=1.0         # Limit CPU (1 core)
)

Resource Guidelines:

Small skills: 256m memory, 0.5 CPU
Medium skills: 512m memory, 1.0 CPU
Large skills: 1024m memory, 2.0 CPU

Connection Pooling

# Database connection pool
from sqlalchemy.pool import QueuePool

engine = create_engine(
    DATABASE_URL,
    poolclass=QueuePool,
    pool_size=20,        # Max connections
    max_overflow=10,     # Extra connections
    pool_timeout=30,     # Wait time for connection
    pool_recycle=3600    # Recycle connections after 1 hour
)

Monitoring

Track Performance Metrics

from core.performance_monitor import measure_performance

with measure_performance("my_operation") as timer:
    # Do work
    result = expensive_operation()

if timer.duration > PERFORMANCE_TARGETS["my_operation"]:
    logger.warning(f"Operation slow: {timer.duration:.2f}s")

Review Performance Reports

monitor = get_monitor()
summary = monitor.get_summary()

for op, stats in summary["operations"].items():
    print(f"{op}:")
    print(f"  Avg: {stats['avg']:.3f}s")
    print(f"  Min: {stats['min']:.3f}s")
    print(f"  Max: {stats['max']:.3f}s")
    print(f"  P50: {stats['p50']:.3f}s")
    print(f"  P95: {stats['p95']:.3f}s")
    print(f"  P99: {stats['p99']:.3f}s")

Output:

package_install:
  Avg: 4.250s
  Min: 3.200s
  Max: 5.100s
  P50: 4.100s
  P95: 5.000s
  P99: 5.080s

skill_load:
  Avg: 0.650s
  Min: 0.500s
  Max: 1.200s
  P50: 0.600s
  P95: 1.100s
  P99: 1.180s

Alert on Performance Degradation

def check_performance_alerts():
    """Check for performance regressions."""
    monitor = get_monitor()
    summary = monitor.get_summary()

    alerts = []

    for op, stats in summary["operations"].items():
        # Check if P99 exceeds 1.5x baseline
        baseline = PERFORMANCE_TARGETS.get(op, 1.0)
        p99 = stats['p99']

        if p99 > baseline * 1.5:
            alerts.append({
                "operation": op,
                "severity": "HIGH",
                "message": f"P99 ({p99:.2f}s) exceeds 1.5x baseline ({baseline:.2f}s)"
            })

    return alerts

Troubleshooting

Performance Degradation

Problem: Operations getting slower over time

Diagnosis:

# Check memory usage
docker stats

# Check database query performance
psql -c "SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;"

# Check disk I/O
iostat -x 1

# Check CPU usage
top

Solutions:

Check for memory leaks:

# Restart service if memory usage increases
systemctl restart atom-api

Clear module cache:

from core.skill_dynamic_loader import get_global_loader
loader = get_global_loader()
loader.clear_cache()

Review Docker image storage:

# Cleanup old images
docker system prune -a

Check database query performance:

-- Add indexes if needed
CREATE INDEX CONCURRENTLY idx_performance ON skill_executions(skill_id);

-- Vacuum database
VACUUM ANALYZE skill_executions;

High Memory Usage

Problem: Memory usage increases with skill loading

Diagnosis:

# Check memory usage
docker stats

# Check Python memory
python -c "import psutil; print(psutil.virtual_memory())"

# Check for memory leaks
python -m memory_profiler script.py

Solutions:

Unload unused skills:
```
loader.unload_skill("unused_skill")
```

Reduce module cache size:

loader = SkillDynamicLoader(
    cache_size=100  # Limit cache to 100 skills
)

Use process isolation:

# Run skills in separate processes
uvicorn main:app --workers 4

Monitor with memory profiler:

pip install memory_profiler
python -m memory_profiler script.py

Slow Marketplace Queries

Problem: Marketplace search > 100ms

Diagnosis:

# Check query plan
EXPLAIN ANALYZE SELECT * FROM skill_executions WHERE skill_id LIKE '%data%';

# Check database indexes
psql -c "\di skill_executions"

# Check database connections
psql -c "SELECT count(*) FROM pg_stat_activity;"

Solutions:

Add database indexes:

CREATE INDEX idx_skill_name ON skill_executions(skill_id);
CREATE INDEX idx_category ON skill_executions((input_params->>'skill_metadata'->>'category'));

Enable query result caching:

# Cache results for 5-15 minutes
cache_ttl = 300

Reduce page_size:

/marketplace/skills?page=1&page_size=10  # Instead of 100

Use specific filters:

/marketplace/skills?category=data  # Instead of ?query=data

Optimization Checklist

Pre-Deployment

Run performance benchmarks
Check for regressions against baseline
Verify all targets met (<5s install, <1s load, <100ms search)
Test with production data volume
Validate caching strategy
Review database query plans
Check resource limits (CPU, memory)

Production Monitoring

Enable performance metrics collection
Set up performance alerts (1.5x baseline)
Monitor P95/P99 latencies
Track error rates
Review performance reports weekly
Investigate regressions within 24 hours

Optimization Opportunities

Enable image caching for packages
Preload frequently used skills
Use batch installations
Implement query result caching
Add database indexes
Optimize DAG structure (minimize depth)
Use parallel execution for independent steps
Set appropriate timeouts
Implement workflow result caching

Summary

Performance optimization requires:

✅ Baseline Metrics: Establish performance targets
✅ Benchmarking: Regular performance testing
✅ Monitoring: Track metrics over time
✅ Alerting: Detect regressions early
✅ Optimization: Continuous improvement

Quick Wins:

Enable image caching (5-10x faster repeat installations)
Preload common skills (1000x faster access)
Use batch installations (2-3x faster)
Add database indexes (2-5x faster queries)
Minimize workflow depth (1.5-2x faster execution)

Next Steps:

Run performance benchmarks
Establish baselines
Set up monitoring and alerting
Optimize based on findings
Review and iterate regularly

See Also:

Advanced Skill Execution - Phase 60 overview
Skill Composition Patterns - Workflow optimization
[Supply Chain Security][] - Security testing (Plan 60-06)

Last Updated: February 19, 2026

Uh oh!

FilesExpand file tree

performance.md

Latest commit

History

performance.md

File metadata and controls

Performance Tuning Guide

Table of Contents

Performance Targets

Operation Targets

Regression Thresholds

Performance Budgets

Package Installation

Optimize Installation Time

Use Small Packages

Enable Image Caching

Batch Installations

Troubleshooting Slow Installation

Problem: Installation takes > 5 seconds

Problem: Docker build hangs

Skill Loading

Optimize Load Time

Preload Frequently Used Skills

Enable File Monitoring (Development)

Disable Monitoring (Production)

Module Cache Management

Troubleshooting Slow Loading

Problem: Skill loading takes > 1 second

Problem: Hot-reload not working

Marketplace Performance

Optimize Search

Use Specific Queries

Pagination

Caching

Troubleshooting Slow Queries

Problem: Marketplace search > 100ms

Workflow Performance

Optimize DAG Structure

Minimize Depth

Parallel Branches

Reduce Skill Execution Time

Optimize Individual Skills

Set Appropriate Timeouts

Workflow Caching

Benchmarking

Run Benchmarks

Benchmark Results

Compare Against Baselines

Production Configuration

Environment Variables

Resource Limits

Connection Pooling

Monitoring

Track Performance Metrics

Review Performance Reports

Alert on Performance Degradation

Troubleshooting

Performance Degradation

High Memory Usage

Slow Marketplace Queries

Optimization Checklist

Pre-Deployment

Production Monitoring

Optimization Opportunities

Summary