Episodes now include lightweight references to canvas presentations and user feedback, enabling agents to retrieve enriched context for decision-making. This integration follows a metadata-only linkage pattern where episodes store ID references and fetch full records on demand.
Key Principle: Canvas and feedback context are ALWAYS fetched during every episode recall, not just for canvas-specific tasks. This ensures agents have complete context for ALL decision-making.
Canvas and feedback integration is enhanced with the POMDP (Partially Observable Markov Decision Process) memory framework from Phase 1:
Canvas as POMDP Observations:
from core.memory.pomdp_memory_framework import POMDPMemoryFramework
pomdp = POMDPMemoryFramework()
# Canvas interactions form observation space
pomdp.define_observation_space(
states=[
"canvas_presented",
"canvas_closed_quickly", # < 5 seconds
"canvas_closed_slowly", # > 5 seconds
"canvas_form_submitted",
"canvas_error_shown"
],
observation_rewards={
"canvas_closed_quickly": -0.2, # User didn't engage
"canvas_closed_slowly": 0.1, # User reviewed content
"canvas_form_submitted": 0.5, # Positive outcome
"thumbs_up_feedback": 0.3, # Explicit approval
"thumbs_down_feedback": -0.3 # Explicit disapproval
}
)Experience-Driven Graduation:
- Canvas feedback contributes to quality-weighted episode scoring
- Graduation criteria now consider canvas success rates (20% improvement)
- Intervention rate includes canvas-based corrections
Memory Consolidation:
- Canvas summaries processed during offline consolidation (inspired by human sleep)
- Canvas patterns extracted for replay in critical episodes
- Canvas interaction "forgetting curve" for stale memories
Example:
# Canvas feedback influences graduation
from core.agent_graduation_service import AgentGraduationService
graduation = AgentGraduationService()
# Quality-weighted episode scoring
episode_score = graduation.calculate_quality_score(
episode=episode,
weights={
"task_completion": 0.4,
"canvas_success": 0.3, # Canvas engagement rates
"feedback_quality": 0.2,
"consistency": 0.1
}
)
# Intervention rate includes canvas corrections
intervention_rate = graduation.calculate_intervention_rate(
episodes=agent_episodes,
correction_sources=[
"chat_corrections",
"canvas_corrections", # NEW: Canvas feedback
"workflow_corrections"
]
)┌─────────────────────────────────────────────────────────────────┐
│ EPISODE CREATION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Agent Execution Session │
│ │ │
│ ├─── Chat Messages ────────┐ │
│ ├─── Agent Executions ─────┤ │
│ ├─── Canvas Actions ───────┼──► CanvasAudit Table │
│ │ (present, submit, │ (Separate Storage) │
│ │ close, update) │ │
│ │ │ │
│ └─── User Feedback ────────┼──► AgentFeedback Table │
│ (ratings, corrections)│ (Separate Storage) │
│ │ │
│ ▼ │
│ Episode Creation │
│ │ │
│ ├─── Episode Model │
│ │ ├── execution_ids: [...] │
│ │ ├── canvas_ids: [...] │
│ │ └── feedback_ids: [...] │
│ │ (Metadata-Only Linkage) │
│ │ │
│ └─── Backlinks │
│ ├── CanvasAudit.episode_id │
│ └── AgentFeedback.episode_id│
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ EPISODE RETRIEVAL │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Agent Decision-Making (recall_experiences) │
│ │ │
│ └──► Retrieve Episodes │
│ │ │
│ ├─── Episode Data │
│ ├─── Episode Segments │
│ ├─── ALWAYS Fetch Canvas Context ───► CanvasAudit │
│ │ (by canvas_ids) Table │
│ │ │
│ └─── ALWAYS Fetch Feedback Context ──► AgentFeedback
│ (by feedback_ids) Table │
│ │
│ ▼ │
│ Enriched Episode │
│ ├── canvas_context: [...] (What was presented) │
│ └── feedback_context: [...] (How user responded) │
│ │
└─────────────────────────────────────────────────────────────────┘
- Separation of Concerns: Canvas and feedback records remain in their own tables
- Metadata-Only Linkage: Episodes store only ID references, not full data
- Always Fetch: Canvas/feedback context retrieved on every episode recall
- Efficient Storage: No data duplication, ~100 bytes per episode
- Flexible Retrieval: Fetch by ID, filter by type, weight by feedback
class Episode(Base):
# ... existing fields ...
# Canvas linkage (NEW - Feb 2026)
canvas_ids = Column(JSON, default=list) # List of CanvasAudit IDs
canvas_action_count = Column(Integer, default=0) # Total canvas actions
# Feedback linkage (NEW - Feb 2026)
feedback_ids = Column(JSON, default=list) # List of AgentFeedback IDs
aggregate_feedback_score = Column(Float, nullable=True) # -1.0 to 1.0class CanvasAudit(Base):
# ... existing fields ...
episode_id = Column(String, ForeignKey("episodes.id"), nullable=True, index=True) # NEWclass AgentFeedback(Base):
# ... existing fields ...
episode_id = Column(String, ForeignKey("episodes.id"), nullable=True, index=True) # NEWEpisodes track all canvas interactions with full audit trail:
Supported Canvas Types:
- Built-in types (7):
generic,docs,email,sheets,orchestration,terminal,coding - Custom components: User-created HTML/CSS/JS components (tracked via
canvas_type="generic"+ component info inaudit_metadata)
Canvas Actions: present, close, submit, update, execute
Example:
# Episode with canvas context
episode = {
"id": "ep_123",
"title": "Sales Data Analysis",
"canvas_ids": ["canvas_abc", "canvas_def"],
"canvas_action_count": 3,
"canvas_context": [
{
"id": "canvas_abc",
"canvas_type": "sheets",
"component_type": "table",
"action": "present",
"created_at": "2026-02-04T10:00:00Z",
"metadata": {"rows": 50, "columns": 5}
},
{
"id": "canvas_def",
"canvas_type": "charts",
"component_type": "line_chart",
"action": "present",
"created_at": "2026-02-04T10:01:00Z",
"metadata": {"data_points": 100}
}
]
}Episodes aggregate user feedback scores for retrieval weighting:
Feedback Scoring:
thumbs_up: +1.0thumbs_down: -1.0rating(1-5): converted to -1.0 to 1.0 scale- 1 → -1.0, 2 → -0.5, 3 → 0.0, 4 → 0.5, 5 → 1.0
Aggregate Score: Average of all feedback scores for the episode
Example:
# Episode with feedback context
episode = {
"id": "ep_123",
"feedback_ids": ["feedback_1", "feedback_2"],
"aggregate_feedback_score": 0.75, # Positive feedback
"feedback_context": [
{
"id": "feedback_1",
"feedback_type": "thumbs_up",
"created_at": "2026-02-04T10:05:00Z"
},
{
"id": "feedback_2",
"feedback_type": "rating",
"rating": 5,
"created_at": "2026-02-04T10:06:00Z"
}
]
}Default Behavior: Canvas and feedback context always included for complete agent context.
GET /api/episodes/{episode_id}/retrieve?include_canvas=true&include_feedback=trueResponse:
{
"episode": {
"id": "ep_123",
"title": "Sales Analysis",
"canvas_ids": ["canvas_abc"],
"feedback_ids": ["feedback_xyz"]
},
"segments": [...],
"canvas_context": [
{
"canvas_type": "sheets",
"component_type": "table",
"action": "present",
"created_at": "2026-02-04T10:00:00Z",
"metadata": {...}
}
],
"feedback_context": [
{
"feedback_type": "thumbs_up",
"rating": 5,
"corrections": [...]
}
]
}Performance: ~50ms overhead per episode (fetching canvas/feedback records)
Retrieve episodes filtered by canvas type and action:
POST /api/episodes/retrieve/by-canvas-type
{
"agent_id": "agent_123",
"canvas_type": "sheets",
"action": "present",
"time_range": "30d",
"limit": 10
}Use Cases:
- "Show me episodes where I presented spreadsheets"
- "Find all episodes with form submissions"
- "Episodes where I presented charts"
Contextual retrieval applies feedback-based boosting:
Boosting Rules:
- Canvas interactions: +0.1 relevance boost
- Positive feedback (>0): +0.2 relevance boost
- Negative feedback (<0): -0.3 relevance penalty
- Neutral feedback (~0): No adjustment
Example:
# Episode with canvas and positive feedback
base_score = 0.7 # From temporal + semantic matching
canvas_boost = +0.1 # Has canvas interactions
feedback_boost = +0.2 # Positive feedback
final_score = 1.0 # Boosted scoreAgents automatically retrieve canvas and feedback context when recalling episodes:
# In agent_world_model.py
result = await recall_experiences(agent, current_task)
# result["episodes"] now includes:
# - canvas_context: What canvases were presented
# - feedback_context: How users responded
# Agent uses this to:
# - Choose appropriate canvas types
# - Avoid mistakes from past negative feedback
# - Replicate successful presentation patternsCritical: Context enrichment happens for EVERY episode recall, not just canvas-specific tasks.
Task: "Show me sales data"
Agent recalls:
- Episode 456: Presented line chart, user closed after 3s
- Episode 789: Presented bar chart, user engaged 30s, gave thumbs up
Decision: Present bar chart (proven successful)
# Agent reasoning from enriched episodes
for episode in recalled_episodes:
canvas_context = episode["canvas_context"]
feedback_context = episode["feedback_context"]
# Analyze patterns
for canvas in canvas_context:
if canvas["action"] == "close":
# User didn't like this canvas type
avoid_canvas_types.append(canvas["canvas_type"])
for feedback in feedback_context:
if feedback["feedback_type"] == "thumbs_up":
# User liked this presentation
successful_patterns.append(canvas["canvas_type"])Task: "Help user file taxes"
Agent recalls:
- Episode 123: Presented tax form, user filled partially
- Form submission captured in episode feedback
Decision: Pre-fill form with previous answers, ask for remaining fields
# Episode shows partial form submission
episode = {
"canvas_context": [
{
"canvas_type": "generic",
"component_type": "form",
"action": "submit",
"metadata": {
"form_fields": ["name", "address", "income"],
"completed": ["name", "address"]
}
}
],
"feedback_context": [
{
"corrections": "I stopped because I didn't have my income info"
}
]
}
# Agent action: Present form with name/address pre-filledTask: "Create pricing proposal"
Agent recalls:
- Episode 999: Pricing table had errors, user corrected in feedback
aggregate_feedback_score: -0.5 (negative)
Decision: Double-check pricing calculations, mention learning from past error
# Negative feedback triggers caution
if episode["aggregate_feedback_score"] < 0:
for feedback in episode["feedback_context"]:
if feedback["corrections"]:
# User provided corrections
agent_reasoning = f"""
Based on previous feedback (Episode {episode['id']}):
User correction: {feedback['corrections']}
I will double-check my calculations to avoid this error.
"""Task: "Analyze user engagement data"
Agent recalls:
- Episodes show user engagement patterns:
- Sheets: User closes quickly (2s avg)
- Charts: User engages longer (45s avg)
- Positive feedback on line charts specifically
Decision: Present line chart instead of spreadsheet
# Extract insights from canvas context
insights = agent._extract_canvas_insights(episodes)
# insights["user_interaction_patterns"]:
# {
# "closes_quickly": ["sheets", "markdown"],
# "engages": ["charts", "forms"],
# "submits": ["forms"]
# }
# Agent chooses "charts" over "sheets"GET /api/episodes/{episode_id}/retrieve?include_canvas=true&include_feedback=trueParameters:
episode_id(str): Episode IDagent_id(str): Agent IDinclude_canvas(bool): Include canvas context (default: true)include_feedback(bool): Include feedback context (default: true)
Response: Enriched episode with canvas_context and feedback_context
POST /api/episodes/retrieve/by-canvas-type
{
"agent_id": "agent_123",
"canvas_type": "sheets",
"action": "present",
"time_range": "30d",
"limit": 10
}Parameters:
agent_id(str): Agent IDcanvas_type(str): Canvas type (sheets, charts, generic, etc.)action(str, optional): Action filter (present, submit, close, etc.)time_range(str): Time range (1d, 7d, 30d, 90d)limit(int): Max results
Response: Filtered episodes with canvas type info
POST /api/episodes/{episode_id}/feedback/submit
{
"feedback_type": "rating",
"rating": 5,
"corrections": "Great work on the charts"
}Parameters:
episode_id(str): Episode IDfeedback_type(str): thumbs_up, thumbs_down, ratingrating(int, optional): 1-5 for rating typecorrections(str, optional): User corrections
Response: Feedback ID and updated aggregate score
GET /api/episodes/{episode_id}/feedback/listResponse: All feedback for the episode
GET /api/episodes/analytics/feedback-episodes?agent_id=agent_123&min_feedback_score=0.5Parameters:
agent_id(str): Agent IDmin_feedback_score(float): Minimum score (-1.0 to 1.0)time_range(str): Time range (default: 30d)limit(int): Max results (default: 10)
Response: Episodes with high feedback scores
- Richer Context: Agents see what they presented AND how users reacted
- Presentation Learning: "User closes markdown quickly → try charts instead"
- Feedback Integration: Positive feedback boosts episode relevance
- Workflow Continuity: Form submissions become part of episode memory
- Graduation Validation: Canvas success rates inform promotion decisions
- Efficient Storage: Metadata-only linkage avoids data duplication
- Flexible Retrieval: Fetch by ID, filter by type, weight by feedback
Storage Overhead: ~100 bytes per episode (JSON arrays)
| Metric | Target | Current |
|---|---|---|
| Sequential retrieval | <100ms | ~50-70ms |
| Canvas context fetch | <50ms | ~20-30ms |
| Feedback context fetch | <50ms | ~10-20ms |
| Total overhead | <100ms | ✅ Met |
Index Usage:
ix_canvas_audit_episode_id: Backlink lookupsix_agent_feedback_episode_id: Backlink lookupsix_episodes_agent_canvas: Composite index for canvas queries
Existing Episodes: Empty arrays (canvas_ids = [], feedback_ids = [])
New Episodes: Automatically populated with canvas/feedback context
Backfill (Optional):
# Link historical records via session_id
for episode in old_episodes:
canvases = db.query(CanvasAudit).filter(
CanvasAudit.session_id == episode.session_id
).all()
episode.canvas_ids = [c.id for c in canvases]
episode.canvas_action_count = len(canvases)# Apply migration
alembic upgrade head
# Verify new columns
sqlite3 atom_dev.db ".schema episodes" | grep -E "canvas_ids|feedback_ids|aggregate_feedback_score"
# Verify indexes
sqlite3 atom_dev.db ".indexes episodes" | grep -E "canvas|feedback"# Run integration tests
PYTHONPATH=/Users/rushiparikh/projects/atom/backend pytest tests/test_canvas_feedback_episode_integration.py -v
# Run with coverage
pytest tests/test_canvas_feedback_episode_integration.py --cov=core.episode_segmentation_service --cov=core.episode_retrieval_service --cov-report=html
# Expected: All tests passing (25+ tests)Week 1: Database migration, model changes Week 2: Service modifications (segmentation, retrieval) Week 3: Agent integration, API endpoints Week 4: Monitoring, optimization, documentation
- All new fields are nullable or have defaults
- Existing episodes work (empty arrays)
- No breaking API changes
- Feature flags available:
CANVAS_EPISODE_INTEGRATION_ENABLED
- Migration runs without errors
- 100% of existing episodes still queryable
- New episodes populate canvas/feedback fields
- Sequential retrieval includes canvas/feedback context
- Canvas type filtering works correctly
- Feedback-weighted retrieval boosts relevant episodes
- Agents use enriched context in decision-making
- Performance overhead <100ms per retrieval
- Documentation complete and accurate
Check:
# Verify canvas audits exist for session
canvases = db.query(CanvasAudit).filter(
CanvasAudit.session_id == session_id
).all()
print(f"Found {len(canvases)} canvas events")
# Verify episode was created after feature deployment
episode = db.query(Episode).filter(Episode.id == episode_id).first()
print(f"Canvas IDs: {episode.canvas_ids}")Check:
# Verify feedback has agent_execution_id
feedback = db.query(AgentFeedback).filter(
AgentFeedback.episode_id == episode_id
).all()
print(f"Found {len(feedback)} linked feedback records")
# Check if feedback was created after episode
for f in feedback:
print(f"Feedback created: {f.created_at}, Episode created: {episode.created_at}")Check:
# Verify indexes exist
import sqlite3
conn = sqlite3.connect('atom_dev.db')
cursor = conn.cursor()
# Check episode indexes
cursor.execute("SELECT name FROM sqlite_master WHERE type='index' AND tbl_name='episodes'")
indexes = cursor.fetchall()
print("Episode indexes:", indexes)
# Should include: ix_episodes_agent_canvas