This guide explains how to optimize the performance of Atom's advanced skill execution features, including package installation, skill loading, marketplace search, and workflow execution.
- Performance Targets
- Package Installation
- Skill Loading
- Marketplace Performance
- Workflow Performance
- Benchmarking
- Production Configuration
- Monitoring
- Troubleshooting
- Optimization Checklist
| Operation | Target | Typical | How to Measure |
|---|---|---|---|
| Package Installation | < 5 seconds | 3-5s | Small packages (requests, lodash) |
| Skill Loading | < 1 second | 0.5-1s | Dynamic import from file |
| Hot-Reload | < 1 second | 0.3-0.8s | File change to reload |
| Marketplace Search | < 100ms | 20-50ms | With pagination |
| Workflow Validation | < 50ms | 10-30ms | DAG validation |
| Dependency Resolution | < 500ms | 100-300ms | Conflict detection |
Performance regressions detected when operation exceeds 1.5x baseline:
REGRESSION_THRESHOLD = 1.5 # 50% slower triggers alertExample:
- Baseline: Package installation = 4 seconds
- Regression threshold: 4 * 1.5 = 6 seconds
- Alert triggered if: Installation takes > 6 seconds
Allocate time budgets for workflow operations:
PERFORMANCE_BUDGETS = {
"package_install": 5.0, # 5 seconds
"skill_load": 1.0, # 1 second
"hot_reload": 1.0, # 1 second
"marketplace_search": 0.1, # 100ms
"workflow_validation": 0.05 # 50ms
}Prefer minimal dependencies to reduce installation time:
# GOOD: Small, focused package
packages = ["requests==2.28.0"] # ~5MB
# AVOID: Large packages with many transitive dependencies
packages = ["tensorflow==2.12.0"] # ~500MB (use only if needed)Package Size Impact:
- Small (<10MB): 1-3 seconds installation
- Medium (10-50MB): 3-10 seconds installation
- Large (>50MB): 10-60 seconds installation
Auto-installer caches images by default:
from core.auto_installer_service import AutoInstallerService
installer = AutoInstallerService(db)
# Check cache hit
result = await installer.install_dependencies(
skill_id="my-skill",
packages=["pandas==2.0.0"],
package_type="python",
agent_id="my-agent"
)
if result.get("cached"):
print("Used cached image - no rebuild needed")
# Installation: <1 second (cached)
else:
print("Building new image")
# Installation: 3-5 seconds (new build)Cache Benefits:
- First installation: 3-5 seconds
- Cached installation: <1 second
- 5-10x speedup for repeat installations
Install multiple skills at once:
# BAD: Sequential installations (slow)
for skill_id in ["skill1", "skill2", "skill3"]:
await installer.install_dependencies(
skill_id=skill_id,
packages=["pandas==2.0.0"],
package_type="python",
agent_id="my-agent"
)
# Total time: 3 * 5s = 15 seconds
# GOOD: Batch installation (fast)
result = await installer.batch_install(
installations=[
{"skill_id": "skill1", "packages": ["pandas==2.0.0"], "package_type": "python"},
{"skill_id": "skill2", "packages": ["numpy>=1.24.0"], "package_type": "python"},
{"skill_id": "skill3", "packages": ["requests==2.28.0"], "package_type": "python"}
],
agent_id="my-agent"
)
# Total time: 5-7 seconds (parallel builds)Performance Gains:
- Sequential: 3 skills * 5s = 15 seconds
- Batch: 5-7 seconds (2-3x faster)
Diagnosis:
# Check package size
pip show pandas
# Check download speed
time pip download pandas==2.0.0
# Check Docker build time
docker build -t test-build .Solutions:
-
Check package size:
# Large packages take longer pip show --files pandas | wc -l # File count du -sh ~/.cache/pip # Cache size
-
Verify network speed:
# Slow network slows downloads speedtest-cli ping pypi.org -
Reduce number of dependencies:
# Before: 10 packages (slow) packages = ["pandas", "numpy", "scipy", "matplotlib", "seaborn", ...] # After: 3 packages (fast) packages = ["pandas", "numpy", "requests"]
-
Use local package mirror:
# Configure pip to use mirror pip config set global.index-url https://mirror.example.com/pypi/simple/
Diagnosis:
# Check Docker daemon
docker ps
docker info
# Check build logs
docker logs <container_id>Solutions:
-
Check Docker daemon:
# Restart Docker if hung sudo systemctl restart docker # Linux # macOS: Restart Docker Desktop
-
Increase Docker resources:
# Docker Desktop: Settings > Resources # Increase CPU and memory allocation
-
Use --no-cache for debugging:
# Disable cache to debug slow layers docker build --no-cache -t test-build .
Load skills on startup for faster access:
from core.skill_dynamic_loader import get_global_loader
loader = get_global_loader()
# Preload common skills on startup
COMMON_SKILLS = [
("http_get", "/path/to/http_get.py"),
("json_parse", "/path/to/json_parse.py"),
("database_insert", "/path/to/database_insert.py")
]
for skill_name, skill_path in COMMON_SKILLS:
loader.load_skill(skill_name, skill_path)
print(f"Preloaded: {skill_name}")
# Now skills are ready for instant use
module = loader.get_skill("http_get") # <1ms (cached)Performance Impact:
- First load: 500-1000ms
- Preloaded access: <1ms (1000x faster)
Watchdog-based hot-reload for development:
from core.skill_dynamic_loader import SkillDynamicLoader
# Development: Enable monitoring
loader = SkillDynamicLoader(
skills_dir="/path/to/skills",
enable_monitoring=True # Watch for file changes
)
# File changes trigger automatic reload
# Detected change in http_get.py, reloading...Performance:
- File detection: <100ms (watchdog)
- Reload time: 300-800ms
- No server restart required
Disable monitoring for performance:
# Production: Disable monitoring for performance
loader = SkillDynamicLoader(
skills_dir="/path/to/skills",
enable_monitoring=False # Disabled in production
)
# Skills still loadable, just no auto-reload
module = loader.load_skill("http_get", "/path/to/http_get.py")Production Best Practices:
- Disable file monitoring (reduces CPU usage)
- Use preloaded skills (faster access)
- Restart service to update skills (safer than hot-reload)
The loader automatically clears sys.modules on reload:
# GOOD: Uses proper cache clearing
loader.reload_skill("my_skill") # Clears sys.modules first
# BAD: Manual reload without cache clearing
import importlib
importlib.reload(sys.modules["my_skill"]) # Stale code!Cache Clearing Flow:
def reload_skill(self, skill_name: str):
"""Hot-reload skill without service restart."""
# Step 1: Clear module cache (prevents stale code)
if skill_name in sys.modules:
del sys.modules[skill_name]
# Step 2: Reload from file path
skill_path = self.loaded_skills[skill_name]['path']
return self.load_skill(skill_path, skill_name)Why Clear Cache?
- Python caches imported modules in
sys.modules - Reloading without clear causes old code to execute
- Stale imports cause bugs and inconsistencies
Diagnosis:
# Check file size
ls -lh /path/to/skill.py
# Check import dependencies
python -c "import importtime; import skill"
# Check disk I/O
iostat -x 1Solutions:
-
Check file size:
# Large files take longer to load wc -l /path/to/skill.py # Line count
-
Reduce import dependencies:
# BAD: Many imports (slow) import pandas import numpy import scipy import matplotlib import seaborn # GOOD: Lazy imports (fast) def execute(): import pandas # Import only when needed return pandas.DataFrame()
-
Use bytecode caching:
# Python automatically caches .pyc files # Ensure __pycache__ directory is writable ls -la __pycache__/
Diagnosis:
# Check watchdog installation
pip show watchdog
# Check file permissions
ls -la /path/to/skills/
# Test file monitoring
python -m watchdog.observersSolutions:
-
Install watchdog:
pip install watchdog
-
Check file permissions:
# Ensure read access to skill directory chmod +r /path/to/skills/*.py
-
Verify monitoring enabled:
# Check enable_monitoring flag loader = SkillDynamicLoader( skills_dir="/path/to/skills", enable_monitoring=True # Must be True )
# GOOD: Specific category (fast)
/marketplace/skills?category=data
# Time: ~20ms (indexed query)
# SLOWER: Full-text search
/marketplace/skills?query=data
# Time: ~50ms (text search)Query Performance:
- Category filter: ~20ms (indexed)
- Type filter: ~20ms (indexed)
- Full-text search: ~50ms (text matching)
- Combined filters: ~40ms (index + text)
# GOOD: Reasonable page size
/marketplace/skills?page=1&page_size=20
# Time: ~20ms (single query)
# AVOID: Too large page size
/marketplace/skills?page=1&page_size=1000
# Time: ~100ms (large result set)Page Size Guidelines:
- Default: 20 items (recommended)
- Maximum: 100 items (API limit)
- Optimal: 20-50 items (balance performance vs. UX)
Application-level caching for popular queries:
import asyncio
from datetime import datetime, timedelta
cache = {}
async def get_cached_skills(query: str):
"""Get skills from cache or database."""
cache_key = f"skills:{query}"
# Check cache
if cache_key in cache:
cached_data, timestamp = cache[cache_key]
age = datetime.now() - timestamp
if age < timedelta(minutes=5):
print("Cache hit")
return cached_data
# Cache miss - fetch from database
print("Cache miss")
result = await fetch_skills_from_db(query)
# Store in cache
cache[cache_key] = (result, datetime.now())
return resultCache Strategy:
- TTL: 5 minutes
- Cache key:
skills:{query}:{category}:{page} - Invalidate: On skill import/update
Diagnosis:
# Check query plan
EXPLAIN ANALYZE SELECT * FROM skill_executions WHERE ...
# Check database indexes
\di skill_executions
# Check database connection
psql -c "SELECT version();"Solutions:
-
Add database indexes:
-- Index on skill_id CREATE INDEX idx_skill_id ON skill_executions(skill_id); -- Index on category CREATE INDEX idx_category ON skill_executions((input_params->>'skill_metadata'->>'category')); -- Index on status CREATE INDEX idx_status ON skill_executions(status);
-
Enable query result caching:
# Cache query results for 5-15 minutes cache_ttl = 300 # 5 minutes
-
Reduce page_size:
# Use smaller page sizes /marketplace/skills?page=1&page_size=10 # Instead of 100
-
Use specific filters:
# Instead of full-text search /marketplace/skills?query=data # Use category filter /marketplace/skills?category=data
# GOOD: Shallow workflow (parallelizable)
steps = [
SkillStep("a", "task", {}, []),
SkillStep("b", "task", {}, []),
SkillStep("c", "merge", {}, ["a", "b"])
]
# Execution time: 2 * task_time (parallel)
# BAD: Deep chain (sequential)
steps = [
SkillStep(f"step{i}", "task", {}, [f"step{i-1}"] if i > 0 else [])
for i in range(10)
]
# Execution time: 10 * task_time (sequential)Performance Impact:
- Shallow: Parallel execution = faster
- Deep: Sequential execution = slower
# GOOD: Independent steps execute in parallel
steps = [
SkillStep("start", "data_fetch", {}, []),
SkillStep("branch1", "process_a", {}, ["start"]),
SkillStep("branch2", "process_b", {}, ["start"]),
SkillStep("branch3", "process_c", {}, ["start"])
]
# Execution time: 2 * task_time
# BAD: Sequential dependencies
steps = [
SkillStep("branch1", "process_a", {}, []),
SkillStep("branch2", "process_b", {}, ["branch1"]),
SkillStep("branch3", "process_c", {}, ["branch2"])
]
# Execution time: 3 * task_timeSpeedup:
- Parallel: 1.5-2x faster (for independent steps)
# GOOD: Efficient algorithm
def process_data(data):
return [x * 2 for x in data] # O(n)
# BAD: Inefficient algorithm
def process_data(data):
result = []
for i, x in enumerate(data):
result.append(data[i] * 2) # O(n) but slower
return resultOptimization Tips:
- Use built-in functions (map, filter, list comprehensions)
- Minimize I/O operations
- Cache external API calls
- Use appropriate data structures
steps = [
SkillStep("quick", "fast_task", {}, [], timeout_seconds=10),
SkillStep("slow", "heavy_task", {}, [], timeout_seconds=300)
]Timeout Guidelines:
- Quick operations: 10-30 seconds
- Heavy computation: 300-600 seconds (5-10 minutes)
- External API calls: 30-60 seconds
Cache workflow results for repeated executions:
from functools import lru_cache
import hashlib
def get_workflow_hash(steps):
"""Generate hash for workflow caching."""
steps_json = json.dumps([s.__dict__ for s in steps], sort_keys=True)
return hashlib.sha256(steps_json.encode()).hexdigest()
@lru_cache(maxsize=100)
def execute_cached_workflow(workflow_hash, agent_id):
"""Execute workflow with caching."""
return await execute_workflow(steps, agent_id)Cache Benefits:
- Repeated workflows: Instant results
- Idempotent workflows: Safe to cache
- TTL: 1-60 minutes (configurable)
# Run all performance benchmarks
pytest backend/tests/test_performance_benchmarks.py --benchmark-only
# Run specific benchmark group
pytest backend/tests/test_performance_benchmarks.py --benchmark-only -k "package-install"
# Generate benchmark report
pytest backend/tests/test_performance_benchmarks.py --benchmark-only --benchmark-json=benchmark.json
# Compare against baseline
pytest backend/tests/test_performance_benchmarks.py --benchmark-only --benchmark-compareExample output:
--------------------------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median Rounds
--------------------------------------------------------------------------------------------------
test_package_install_small 3200.5 4500.2 3800.1 450.3 3750.2 10
test_package_install_cached 50.2 80.1 65.3 12.4 62.1 10
test_skill_load_first 850.3 1200.5 950.2 150.4 920.3 10
test_skill_load_cached 0.8 1.2 1.0 0.2 0.9 10
test_marketplace_search 25.3 45.2 32.1 8.5 30.2 10
test_workflow_validation 15.2 35.4 22.3 7.2 20.1 10
test_dependency_resolution 150.3 280.5 190.2 45.3 180.2 10
--------------------------------------------------------------------------------------------------
from core.performance_monitor import get_monitor
monitor = get_monitor()
# Check for regression
result = monitor.check_regression(
operation="package_install",
current_duration=6.0 # 6 seconds
)
if result["regression"]:
print(f"REGRESSION: {result['percent_change']:.1f}% slower than baseline")
print(f"Baseline: {result['baseline']:.2f}s")
print(f"Current: {result['current']:.2f}s")
else:
print("Performance OK")Regression Detection:
- Threshold: 1.5x baseline (50% slower)
- Alert: Triggers when threshold exceeded
- Action: Investigate performance degradation
# .env.production
# Disable hot-reload in production
SKILL_HOT_RELOAD_ENABLED=false
# Increase timeout for heavy workflows
WORKFLOW_TIMEOUT_SECONDS=600
# Limit concurrent installations
MAX_CONCURRENT_INSTALLATIONS=3
# Enable image caching
DOCKER_IMAGE_CACHE_ENABLED=true
# Marketplace cache TTL (seconds)
MARKETPLACE_CACHE_TTL=300
# Workflow cache TTL (seconds)
WORKFLOW_CACHE_TTL=60# Set Docker resource limits for skill execution
installer.install_packages(
skill_id="my-skill",
requirements=["pandas==2.0.0"],
memory_limit="512m", # Limit memory
cpu_limit=1.0 # Limit CPU (1 core)
)Resource Guidelines:
- Small skills: 256m memory, 0.5 CPU
- Medium skills: 512m memory, 1.0 CPU
- Large skills: 1024m memory, 2.0 CPU
# Database connection pool
from sqlalchemy.pool import QueuePool
engine = create_engine(
DATABASE_URL,
poolclass=QueuePool,
pool_size=20, # Max connections
max_overflow=10, # Extra connections
pool_timeout=30, # Wait time for connection
pool_recycle=3600 # Recycle connections after 1 hour
)from core.performance_monitor import measure_performance
with measure_performance("my_operation") as timer:
# Do work
result = expensive_operation()
if timer.duration > PERFORMANCE_TARGETS["my_operation"]:
logger.warning(f"Operation slow: {timer.duration:.2f}s")monitor = get_monitor()
summary = monitor.get_summary()
for op, stats in summary["operations"].items():
print(f"{op}:")
print(f" Avg: {stats['avg']:.3f}s")
print(f" Min: {stats['min']:.3f}s")
print(f" Max: {stats['max']:.3f}s")
print(f" P50: {stats['p50']:.3f}s")
print(f" P95: {stats['p95']:.3f}s")
print(f" P99: {stats['p99']:.3f}s")Output:
package_install:
Avg: 4.250s
Min: 3.200s
Max: 5.100s
P50: 4.100s
P95: 5.000s
P99: 5.080s
skill_load:
Avg: 0.650s
Min: 0.500s
Max: 1.200s
P50: 0.600s
P95: 1.100s
P99: 1.180s
def check_performance_alerts():
"""Check for performance regressions."""
monitor = get_monitor()
summary = monitor.get_summary()
alerts = []
for op, stats in summary["operations"].items():
# Check if P99 exceeds 1.5x baseline
baseline = PERFORMANCE_TARGETS.get(op, 1.0)
p99 = stats['p99']
if p99 > baseline * 1.5:
alerts.append({
"operation": op,
"severity": "HIGH",
"message": f"P99 ({p99:.2f}s) exceeds 1.5x baseline ({baseline:.2f}s)"
})
return alertsProblem: Operations getting slower over time
Diagnosis:
# Check memory usage
docker stats
# Check database query performance
psql -c "SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;"
# Check disk I/O
iostat -x 1
# Check CPU usage
topSolutions:
-
Check for memory leaks:
# Restart service if memory usage increases systemctl restart atom-api -
Clear module cache:
from core.skill_dynamic_loader import get_global_loader loader = get_global_loader() loader.clear_cache()
-
Review Docker image storage:
# Cleanup old images docker system prune -a -
Check database query performance:
-- Add indexes if needed CREATE INDEX CONCURRENTLY idx_performance ON skill_executions(skill_id); -- Vacuum database VACUUM ANALYZE skill_executions;
Problem: Memory usage increases with skill loading
Diagnosis:
# Check memory usage
docker stats
# Check Python memory
python -c "import psutil; print(psutil.virtual_memory())"
# Check for memory leaks
python -m memory_profiler script.pySolutions:
-
Unload unused skills:
loader.unload_skill("unused_skill")
-
Reduce module cache size:
loader = SkillDynamicLoader( cache_size=100 # Limit cache to 100 skills )
-
Use process isolation:
# Run skills in separate processes uvicorn main:app --workers 4 -
Monitor with memory profiler:
pip install memory_profiler python -m memory_profiler script.py
Problem: Marketplace search > 100ms
Diagnosis:
# Check query plan
EXPLAIN ANALYZE SELECT * FROM skill_executions WHERE skill_id LIKE '%data%';
# Check database indexes
psql -c "\di skill_executions"
# Check database connections
psql -c "SELECT count(*) FROM pg_stat_activity;"Solutions:
-
Add database indexes:
CREATE INDEX idx_skill_name ON skill_executions(skill_id); CREATE INDEX idx_category ON skill_executions((input_params->>'skill_metadata'->>'category'));
-
Enable query result caching:
# Cache results for 5-15 minutes cache_ttl = 300
-
Reduce page_size:
/marketplace/skills?page=1&page_size=10 # Instead of 100
-
Use specific filters:
/marketplace/skills?category=data # Instead of ?query=data
- Run performance benchmarks
- Check for regressions against baseline
- Verify all targets met (<5s install, <1s load, <100ms search)
- Test with production data volume
- Validate caching strategy
- Review database query plans
- Check resource limits (CPU, memory)
- Enable performance metrics collection
- Set up performance alerts (1.5x baseline)
- Monitor P95/P99 latencies
- Track error rates
- Review performance reports weekly
- Investigate regressions within 24 hours
- Enable image caching for packages
- Preload frequently used skills
- Use batch installations
- Implement query result caching
- Add database indexes
- Optimize DAG structure (minimize depth)
- Use parallel execution for independent steps
- Set appropriate timeouts
- Implement workflow result caching
Performance optimization requires:
- ✅ Baseline Metrics: Establish performance targets
- ✅ Benchmarking: Regular performance testing
- ✅ Monitoring: Track metrics over time
- ✅ Alerting: Detect regressions early
- ✅ Optimization: Continuous improvement
Quick Wins:
- Enable image caching (5-10x faster repeat installations)
- Preload common skills (1000x faster access)
- Use batch installations (2-3x faster)
- Add database indexes (2-5x faster queries)
- Minimize workflow depth (1.5-2x faster execution)
Next Steps:
- Run performance benchmarks
- Establish baselines
- Set up monitoring and alerting
- Optimize based on findings
- Review and iterate regularly
See Also:
- Advanced Skill Execution - Phase 60 overview
- Skill Composition Patterns - Workflow optimization
- [Supply Chain Security][] - Security testing (Plan 60-06)
Last Updated: February 19, 2026