diff --git a/CHAIBOT_QUICKSTART.md b/CHAIBOT_QUICKSTART.md
new file mode 100644
index 0000000000000..faed5e930c216
--- /dev/null
+++ b/CHAIBOT_QUICKSTART.md
@@ -0,0 +1,218 @@
+# Chaibot Quick Start Guide
+
+## What is Chaibot?
+
+An AI-powered Slack workflow that automatically triages test failures in #opp-discussion and posts analysis in threads.
+
+## Files Created
+
+```
+core-services/ci-chat-bot/
+├── triage-config.yaml          # Main config (source of truth)
+└── CHAIBOT.md                  # Quick reference
+
+clusters/app.ci/ci-chat-bot/
+├── chaibot-configmap.yaml      # Kubernetes ConfigMap
+└── chaibot-deployment-patch.yaml  # Deployment updates + alerts
+
+core-services/ci-secret-bootstrap/
+└── chaibot-secret-config.yaml  # Secret management
+
+docs/
+└── chaibot-test-failure-triage.md  # Full documentation
+```
+
+## How to Deploy
+
+### Step 1: Get Credentials
+
+```bash
+# 1. Get OpenAI API key from https://platform.openai.com/api-keys
+
+# 2. Get Slack channel ID:
+#    - Right-click #opp-discussion in Slack
+#    - View channel details
+#    - Copy Channel ID (looks like C01234ABCD)
+
+# 3. Update clusters/app.ci/ci-chat-bot/chaibot-configmap.yaml
+#    Replace REPLACE_WITH_CHANNEL_ID with actual ID
+```
+
+### Step 2: Configure Secrets
+
+```bash
+# Add to core-services/ci-secret-bootstrap/_config.yaml:
+# (Follow pattern in chaibot-secret-config.yaml)
+
+# Store OpenAI key in Vault
+# Reference: https://docs.ci.openshift.org/docs/how-tos/adding-a-new-secret-to-ci/
+```
+
+### Step 3: Update ci-chat-bot Deployment
+
+Edit `clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml` and add:
+
+```yaml
+# Add to spec.template.spec.volumes:
+- name: triage-config
+  configMap:
+    name: ci-chat-bot-triage-config
+- name: chaibot-secrets
+  secret:
+    secretName: ci-chat-bot-chaibot-secrets
+
+# Add to spec.template.spec.containers[name=bot].volumeMounts:
+- name: triage-config
+  mountPath: /etc/triage-config
+  readOnly: true
+- name: chaibot-secrets
+  mountPath: /etc/chaibot-secrets
+  readOnly: true
+
+# Add to spec.template.spec.containers[name=bot].env:
+- name: CHAIBOT_ENABLED
+  value: "true"
+- name: OPENAI_API_KEY
+  valueFrom:
+    secretKeyRef:
+      name: ci-chat-bot-chaibot-secrets
+      key: openai-api-key
+
+# Add to spec.template.spec.containers[name=bot].args:
+--enable-triage=true \
+--triage-config-path=/etc/triage-config/triage-config.yaml \
+```
+
+### Step 4: Deploy
+
+```bash
+# From openshift/release repo root:
+make update
+
+# Apply ConfigMap
+oc apply -f clusters/app.ci/ci-chat-bot/chaibot-configmap.yaml
+
+# Apply updated deployment (after editing)
+oc apply -f clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml
+
+# Watch rollout
+oc rollout status deployment/ci-chat-bot -n ci
+```
+
+### Step 5: Test
+
+```
+# In Slack #opp-discussion:
+Post a message with a Prow job URL:
+
+"This job failed: https://prow.ci.openshift.org/view/gs/origin-ci-test/..."
+
+# Chaibot should respond in thread within 30-60 seconds
+```
+
+## Example Output
+
+```
+Chaibot [BOT]: :cloud: Test Failure Analysis
+
+Job: pull-ci-openshift-installer-master-e2e-aws
+Status: Failed after 2h 15m
+Root Cause: Infrastructure - AWS EC2 Capacity (85% confidence)
+
+Analysis:
+Cluster provisioning failed due to AWS InsufficientInstanceCapacity error.
+
+Evidence:
+Error: creating EC2 Instance: InsufficientInstanceCapacity (us-east-1c)
+
+Historical:
+8 similar failures in last 24h (transient AWS issue)
+
+Recommendations:
+1. Retest - likely to succeed on retry
+2. Check AWS Service Health Dashboard
+
+Classification: Transient Infrastructure Issue
+```
+
+## Configuration
+
+Edit `core-services/ci-chat-bot/triage-config.yaml`:
+
+```yaml
+# Add channels
+monitored_channels:
+  - name: "opp-discussion"
+    channel_id: "C01234567"
+
+# Adjust AI settings
+analysis:
+  ai_provider: "openai"
+  model: "gpt-4"  # or "gpt-3.5-turbo" for lower cost
+
+# Rate limiting
+rate_limiting:
+  max_analyses_per_hour: 100
+```
+
+## Monitoring
+
+```bash
+# Check logs
+oc logs -n ci deployment/ci-chat-bot -c bot | grep chaibot
+
+# View metrics
+curl http://ci-chat-bot.ci.svc:9090/metrics | grep chaibot
+
+# Grafana dashboard
+https://grafana.ci.openshift.org/d/chaibot/
+```
+
+## Troubleshooting
+
+**Not responding?**
+```bash
+oc get pods -n ci -l app=ci-chat-bot
+oc logs -n ci -l app=ci-chat-bot -c bot --tail=50
+```
+
+**Wrong channel ID?**
+```bash
+oc get configmap ci-chat-bot-triage-config -n ci -o yaml
+# Update and reapply
+```
+
+**API errors?**
+```bash
+# Check secret exists
+oc get secret ci-chat-bot-chaibot-secrets -n ci
+
+# View metrics for errors
+curl http://ci-chat-bot.ci.svc:9090/metrics | grep chaibot_api_errors
+```
+
+## Cost
+
+- GPT-4: ~$0.03/analysis (~$90/month at 100 failures/day)
+- GPT-3.5-turbo: ~$0.003/analysis (~$9/month at 100 failures/day)
+
+Rate limiting prevents cost overruns.
+
+## Support
+
+- **Questions**: #forum-ocp-testplatform
+- **ci-chat-bot team**: #forum-ocp-crt
+- **Full docs**: docs/chaibot-test-failure-triage.md
+- **Issues**: https://github.com/openshift/ci-tools/issues
+
+## Important Note
+
+⚠️ This configuration requires **code implementation** in the ci-tools repo (openshift/ci-tools cmd/ci-chat-bot) to function. The configs are ready, but the bot logic needs development to:
+
+1. Parse triage-config.yaml
+2. Listen to Slack events
+3. Fetch job logs from GCS
+4. Call OpenAI API
+5. Format and post responses
+
+See `docs/chaibot-test-failure-triage.md` for implementation details.
diff --git a/DEPLOY_CHAIBOT.md b/DEPLOY_CHAIBOT.md
new file mode 100644
index 0000000000000..e98e79beb907c
--- /dev/null
+++ b/DEPLOY_CHAIBOT.md
@@ -0,0 +1,406 @@
+# Chaibot Deployment Guide
+
+## Status
+
+✅ Configuration files created  
+✅ ci-chat-bot deployment updated with Chaibot volumes and mounts  
+⚠️ Requires: Slack channel ID and OpenAI API key  
+⚠️ Requires: Code implementation in openshift/ci-tools  
+
+## What's Ready
+
+All configuration and deployment files are prepared:
+
+```
+✓ core-services/ci-chat-bot/triage-config.yaml
+✓ clusters/app.ci/ci-chat-bot/chaibot-configmap.yaml
+✓ clusters/app.ci/ci-chat-bot/chaibot-deployment-patch.yaml
+✓ clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml (UPDATED)
+✓ docs/chaibot-test-failure-triage.md
+```
+
+## Prerequisites
+
+### 1. Get Slack Channel ID for #opp-discussion
+
+In Slack:
+1. Right-click `#opp-discussion` channel
+2. Select "View channel details"
+3. Scroll down in the About section
+4. Copy the Channel ID (format: `C` followed by alphanumeric, e.g., `C01234ABCD`)
+
+### 2. Get OpenAI API Key
+
+Option A - New Key:
+1. Go to https://platform.openai.com/api-keys
+2. Create new secret key
+3. Copy the key (starts with `sk-`)
+4. **Important**: Save it securely - you can't view it again
+
+Option B - Use Existing:
+If your organization already has a key in Vault, confirm the path.
+
+### 3. Verify Cluster Access
+
+```bash
+# Login to app.ci cluster
+oc login https://api.ci.l2s4.p1.openshiftapps.com:6443
+
+# Verify access to ci namespace
+oc get pods -n ci
+```
+
+## Deployment Steps
+
+### Step 1: Update Slack Channel ID
+
+```bash
+# Edit the ConfigMap with the actual channel ID
+vi clusters/app.ci/ci-chat-bot/chaibot-configmap.yaml
+
+# Find this line (around line 12):
+#   channel_id: "REPLACE_WITH_CHANNEL_ID"
+# Replace with actual ID:
+#   channel_id: "C01234ABCD"  # Your actual channel ID
+```
+
+### Step 2: Create Secret for OpenAI API Key
+
+**Option A: Via kubectl (for testing/dev)**
+
+```bash
+# Create the secret directly
+oc create secret generic ci-chat-bot-chaibot-secrets \
+  --from-literal=openai-api-key="sk-YOUR-ACTUAL-KEY-HERE" \
+  -n ci \
+  --dry-run=client -o yaml | oc apply -f -
+
+# Verify
+oc get secret ci-chat-bot-chaibot-secrets -n ci
+```
+
+**Option B: Via ci-secret-bootstrap (for production)**
+
+```bash
+# 1. Store the key in Vault (ask DPTP team for path)
+
+# 2. Add to core-services/ci-secret-bootstrap/_config.yaml:
+- from:
+    openai-api-key:
+      path: <vault-path-to-key>
+  to:
+    - cluster: app.ci
+      namespace: ci
+      name: ci-chat-bot-chaibot-secrets
+
+# 3. Submit PR to openshift/release
+# 4. After merge, ci-secret-bootstrap will sync the secret
+```
+
+### Step 3: Apply ConfigMap
+
+```bash
+# Apply the Chaibot configuration ConfigMap
+oc apply -f clusters/app.ci/ci-chat-bot/chaibot-configmap.yaml
+
+# Verify
+oc get configmap ci-chat-bot-triage-config -n ci -o yaml
+```
+
+### Step 4: Apply Prometheus Alerts
+
+```bash
+# Extract and apply just the PrometheusRule from the patch file
+cat > /tmp/chaibot-alerts.yaml << 'EOF'
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+  name: chaibot-alerts
+  namespace: ci
+spec:
+  groups:
+    - name: chaibot
+      interval: 30s
+      rules:
+        - alert: ChaibotHighErrorRate
+          expr: |
+            rate(chaibot_api_errors_total[5m]) > 0.1
+          for: 10m
+          labels:
+            severity: warning
+            team: test-platform
+          annotations:
+            summary: "Chaibot experiencing high error rate"
+            description: "Chaibot has {{ $value }} errors per second over the last 5 minutes."
+
+        - alert: ChaibotAnalysisTimeout
+          expr: |
+            histogram_quantile(0.95, rate(chaibot_analysis_duration_seconds_bucket[5m])) > 120
+          for: 15m
+          labels:
+            severity: warning
+            team: test-platform
+          annotations:
+            summary: "Chaibot analysis taking too long"
+            description: "95th percentile analysis duration is {{ $value }}s, exceeding 120s timeout."
+
+        - alert: ChaibotDown
+          expr: |
+            up{job="ci-chat-bot"} == 0
+          for: 5m
+          labels:
+            severity: critical
+            team: test-platform
+          annotations:
+            summary: "Chaibot service is down"
+            description: "ci-chat-bot service (including Chaibot) has been down for 5 minutes."
+EOF
+
+oc apply -f /tmp/chaibot-alerts.yaml
+
+# Verify
+oc get prometheusrule chaibot-alerts -n ci
+```
+
+### Step 5: Deploy Updated ci-chat-bot
+
+```bash
+# The deployment YAML has already been updated with:
+# - Chaibot volumes (triage-config, chaibot-secrets)
+# - Volume mounts in bot container
+# - Environment variables (CHAIBOT_ENABLED, OPENAI_API_KEY)
+# - Command args (--enable-triage, --triage-config-path)
+
+# Review the changes
+git diff clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml
+
+# Apply the updated deployment
+oc apply -f clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml
+
+# Watch the rollout (this will restart the pod)
+oc rollout status deployment/ci-chat-bot -n ci --timeout=5m
+```
+
+### Step 6: Verify Deployment
+
+```bash
+# Check pod status
+oc get pods -n ci -l app=ci-chat-bot
+
+# Check logs for Chaibot initialization
+oc logs -n ci deployment/ci-chat-bot -c bot --tail=100 | grep -i chaibot
+
+# Should see something like:
+# INFO: Chaibot triage enabled
+# INFO: Monitoring channels: [opp-discussion]
+# INFO: AI provider: openai (model: gpt-4)
+```
+
+### Step 7: Test Functionality
+
+**Method 1: Post a test message in Slack**
+
+In `#opp-discussion`:
+```
+Test failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/12345/...
+```
+
+Wait 30-60 seconds for Chaibot to respond in thread.
+
+**Method 2: Check metrics**
+
+```bash
+# Port-forward to access metrics
+oc port-forward -n ci deployment/ci-chat-bot 9090:9090 &
+
+# Query metrics
+curl http://localhost:9090/metrics | grep chaibot
+
+# Look for:
+# chaibot_messages_processed_total
+# chaibot_failures_detected_total
+# chaibot_analyses_completed_total
+```
+
+**Method 3: Monitor logs**
+
+```bash
+# Follow logs for Chaibot activity
+oc logs -n ci deployment/ci-chat-bot -c bot -f | grep -i "chaibot\|triage"
+```
+
+## Troubleshooting
+
+### Pod won't start
+
+```bash
+# Check events
+oc describe pod -n ci -l app=ci-chat-bot
+
+# Common issues:
+# - Missing secret: ci-chat-bot-chaibot-secrets
+# - Missing configmap: ci-chat-bot-triage-config
+# - Invalid YAML syntax in configmap
+```
+
+### Chaibot not responding in Slack
+
+```bash
+# 1. Check if feature is enabled
+oc exec -n ci deployment/ci-chat-bot -c bot -- env | grep CHAIBOT_ENABLED
+# Should output: CHAIBOT_ENABLED=true
+
+# 2. Check config is mounted
+oc exec -n ci deployment/ci-chat-bot -c bot -- cat /etc/triage-config/triage-config.yaml
+
+# 3. Check for errors in logs
+oc logs -n ci deployment/ci-chat-bot -c bot --tail=200 | grep -i error
+
+# 4. Verify channel ID is correct
+oc get configmap ci-chat-bot-triage-config -n ci -o jsonpath='{.data.triage-config\.yaml}' | grep channel_id
+```
+
+### OpenAI API errors
+
+```bash
+# Check if API key is set
+oc exec -n ci deployment/ci-chat-bot -c bot -- env | grep OPENAI_API_KEY
+# Should show: OPENAI_API_KEY=sk-...
+
+# Check rate limits
+curl http://localhost:9090/metrics | grep chaibot_api_errors_total
+
+# Common issues:
+# - Invalid API key
+# - Rate limit exceeded
+# - No credits remaining in OpenAI account
+```
+
+### Wrong Slack channel
+
+```bash
+# Update the channel ID in ConfigMap
+oc edit configmap ci-chat-bot-triage-config -n ci
+
+# Find the channel_id line and update it
+# Save and exit
+
+# Restart the deployment to pick up changes
+oc rollout restart deployment/ci-chat-bot -n ci
+```
+
+## Important Notes
+
+### 1. Code Implementation Required
+
+⚠️ **CRITICAL**: This deployment assumes the ci-chat-bot binary in the container already has Chaibot support. The code needs to be implemented in the `openshift/ci-tools` repository (`cmd/ci-chat-bot`).
+
+If the code doesn't exist yet, the bot will start but ignore the `--enable-triage` flag and related configs.
+
+To check if Chaibot code exists:
+```bash
+# Check the container image source
+# Look in https://github.com/openshift/ci-tools/tree/master/cmd/ci-chat-bot
+# Search for "triage" or "chaibot" functionality
+```
+
+### 2. Slack App Permissions
+
+Ensure the ci-chat-bot Slack app has these OAuth scopes:
+- `channels:history` - Read channel messages
+- `channels:read` - View channel info
+- `chat:write` - Post messages
+- `files:read` - Access logs
+- `reactions:write` - Add reactions
+
+And subscribed to these events:
+- `message.channels`
+- `app_mention`
+
+Check/update at: https://api.slack.com/apps (find ci-chat-bot app)
+
+### 3. Cost Management
+
+Monitor OpenAI API usage to control costs:
+
+```bash
+# Check number of analyses
+curl http://localhost:9090/metrics | grep chaibot_analyses_completed_total
+
+# At $0.03 per analysis (GPT-4):
+# 100/day = $3/day = ~$90/month
+# 
+# To reduce costs:
+# - Use GPT-3.5-turbo (~$0.003/analysis)
+# - Adjust rate_limiting in config
+# - Increase cooldown_seconds
+```
+
+Edit ConfigMap to switch models:
+```yaml
+analysis:
+  model: "gpt-3.5-turbo"  # Change from "gpt-4"
+```
+
+### 4. Production Readiness Checklist
+
+Before enabling in production:
+
+- [ ] OpenAI API key stored in Vault (not hardcoded)
+- [ ] Correct Slack channel ID configured
+- [ ] Slack app permissions verified
+- [ ] PrometheusRules deployed and alerting configured
+- [ ] Grafana dashboard created
+- [ ] Rate limits tuned appropriately
+- [ ] Cost monitoring set up
+- [ ] Team trained on Chaibot usage
+- [ ] Runbook created for oncall
+- [ ] Code implementation verified in ci-tools
+
+## Rollback
+
+If you need to disable Chaibot:
+
+```bash
+# Method 1: Disable via environment variable
+oc set env deployment/ci-chat-bot CHAIBOT_ENABLED=false -n ci
+
+# Method 2: Remove volumes and mounts
+# Revert clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml to previous version
+git checkout HEAD~1 -- clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml
+oc apply -f clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml
+
+# Method 3: Delete ConfigMap (feature will fail gracefully)
+oc delete configmap ci-chat-bot-triage-config -n ci
+```
+
+## Next Steps
+
+After successful deployment:
+
+1. **Monitor initial performance**
+   - Watch metrics and logs for 24-48 hours
+   - Collect feedback from #opp-discussion users
+
+2. **Tune configuration**
+   - Adjust confidence thresholds based on accuracy
+   - Add/remove failure patterns
+   - Optimize AI prompts
+
+3. **Expand coverage**
+   - Add more monitored channels
+   - Create team-specific configurations
+   - Integrate with retester for auto-retry
+
+4. **Documentation**
+   - Update team wiki with usage examples
+   - Create runbook for DPTP oncall
+   - Add to CI documentation site
+
+## Support
+
+- **Documentation**: `docs/chaibot-test-failure-triage.md`
+- **Quick Reference**: `core-services/ci-chat-bot/CHAIBOT.md`
+- **Questions**: #forum-ocp-testplatform
+- **ci-chat-bot team**: #forum-ocp-crt
+- **Issues**: https://github.com/openshift/ci-tools/issues
diff --git a/OWNERS_ALIASES b/OWNERS_ALIASES
index 5f21b57be933f..c361db28d8531 100644
--- a/OWNERS_ALIASES
+++ b/OWNERS_ALIASES
@@ -332,6 +332,7 @@ aliases:
   - danalanerh
   - sg-rh
   - etirta
+  - chaclark1974
   openstack-k8s-operators-approvers:
   - abays
   - arxcruz
diff --git a/clusters/app.ci/ci-chat-bot/chaibot-configmap.yaml b/clusters/app.ci/ci-chat-bot/chaibot-configmap.yaml
new file mode 100644
index 0000000000000..c86dc8a4f7dda
--- /dev/null
+++ b/clusters/app.ci/ci-chat-bot/chaibot-configmap.yaml
@@ -0,0 +1,135 @@
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: ci-chat-bot-triage-config
+  namespace: ci
+data:
+  triage-config.yaml: |
+    # Chaibot Test Failure Triage Configuration
+    enabled: true
+
+    monitored_channels:
+      - name: "opp-discussion"
+        channel_id: "C04TMLC6DRV"
+        auto_respond: true
+        response_mode: "thread"
+
+    failure_detection:
+      prow_job_patterns:
+        - "https://prow.ci.openshift.org/view/gs/"
+        - "https://prow.ci.openshift.org/?pr="
+        - "https://deck-internal-ci.apps.ci.l2s4.p1.openshiftapps.com/"
+
+      failure_keywords:
+        - "test failed"
+        - "job failed"
+        - "failure"
+        - "flaky"
+        - "regression"
+
+      require_job_url: false
+
+    analysis:
+      timeout: 120
+      ai_provider: "openai"
+      model: "gpt-4"
+
+      analyze_components:
+        - job_metadata
+        - failure_logs
+        - historical_data
+        - infrastructure
+        - known_issues
+
+      failure_categories:
+        infrastructure:
+          patterns:
+            - "InsufficientInstanceCapacity"
+            - "RequestLimitExceeded"
+            - "could not create instance"
+            - "timeout waiting for"
+          confidence_threshold: 0.7
+
+        flaky_test:
+          patterns:
+            - "race condition"
+            - "intermittent"
+            - "timeout.*eventually"
+          confidence_threshold: 0.6
+
+        product_bug:
+          patterns:
+            - "panic:"
+            - "nil pointer"
+            - "assertion failed"
+          confidence_threshold: 0.8
+
+        configuration:
+          patterns:
+            - "missing environment"
+            - "invalid configuration"
+            - "secret.*not found"
+          confidence_threshold: 0.75
+
+    response:
+      include_sections:
+        - summary
+        - root_cause
+        - evidence
+        - historical
+        - recommendations
+        - related_issues
+
+      use_emojis: true
+      emoji_map:
+        infrastructure: ":cloud:"
+        flaky_test: ":game_die:"
+        product_bug: ":bug:"
+        configuration: ":wrench:"
+        unknown: ":question:"
+
+      include_actions:
+        - label: "View Full Logs"
+          action: "open_url"
+        - label: "Mark Flaky"
+          action: "mark_flaky"
+
+    integrations:
+      sippy:
+        enabled: true
+        base_url: "https://sippy.dptools.openshift.org"
+        lookback_days: 7
+        min_occurrences: 2
+
+      jira:
+        enabled: true
+        endpoint: "https://redhat.atlassian.net"
+        search_projects:
+          - "OCPBUGS"
+          - "DPTP"
+        max_results: 5
+
+      prow:
+        enabled: true
+        gcs_bucket: "gs://origin-ci-test"
+        max_log_size_mb: 50
+        fetch_artifacts:
+          - "build-log.txt"
+          - "junit*.xml"
+
+      ai_api:
+        enabled: true
+        secret_name: "chaibot-openai-key"
+        secret_namespace: "ci"
+        rate_limit_rpm: 50
+
+    rate_limiting:
+      max_analyses_per_hour: 100
+      max_analyses_per_user_per_hour: 10
+      max_concurrent_analyses: 5
+      cooldown_seconds: 30
+
+    monitoring:
+      metrics_enabled: true
+      metrics_port: 9090
+      log_level: "info"
diff --git a/clusters/app.ci/ci-chat-bot/chaibot-deployment-patch.yaml b/clusters/app.ci/ci-chat-bot/chaibot-deployment-patch.yaml
new file mode 100644
index 0000000000000..2b5202b6abd85
--- /dev/null
+++ b/clusters/app.ci/ci-chat-bot/chaibot-deployment-patch.yaml
@@ -0,0 +1,109 @@
+# This file contains the necessary additions to ci-chat-bot deployment
+# to enable Chaibot test failure triage functionality
+#
+# Apply these changes to clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml
+
+---
+# Additional volume for triage config
+# Add to spec.template.spec.volumes in the Deployment:
+
+# - name: triage-config
+#   configMap:
+#     name: ci-chat-bot-triage-config
+
+# - name: chaibot-secrets
+#   secret:
+#     secretName: ci-chat-bot-chaibot-secrets
+#     items:
+#       - key: openai-api-key
+#         path: openai-api-key
+
+---
+# Additional volumeMount for the bot container
+# Add to spec.template.spec.containers[name=bot].volumeMounts:
+
+# - name: triage-config
+#   mountPath: /etc/triage-config
+#   readOnly: true
+
+# - name: chaibot-secrets
+#   mountPath: /etc/chaibot-secrets
+#   readOnly: true
+
+---
+# Additional command-line arguments
+# Add to spec.template.spec.containers[name=bot].args:
+
+# --enable-triage=true \
+# --triage-config-path=/etc/triage-config/triage-config.yaml \
+
+---
+# Additional environment variables
+# Add to spec.template.spec.containers[name=bot].env:
+
+# - name: CHAIBOT_ENABLED
+#   value: "true"
+# - name: OPENAI_API_KEY
+#   valueFrom:
+#     secretKeyRef:
+#       name: ci-chat-bot-chaibot-secrets
+#       key: openai-api-key
+
+---
+apiVersion: v1
+kind: Secret
+metadata:
+  name: ci-chat-bot-chaibot-secrets
+  namespace: ci
+type: Opaque
+stringData:
+  # Replace with actual OpenAI API key
+  # This should be managed via ci-secret-bootstrap
+  openai-api-key: "REPLACE_WITH_ACTUAL_KEY"
+---
+# ServiceMonitor update to include new metrics
+# The existing ServiceMonitor already scrapes port 9090,
+# so chaibot metrics will be automatically collected
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+  name: chaibot-alerts
+  namespace: ci
+spec:
+  groups:
+    - name: chaibot
+      interval: 30s
+      rules:
+        - alert: ChaibotHighErrorRate
+          expr: |
+            rate(chaibot_api_errors_total[5m]) > 0.1
+          for: 10m
+          labels:
+            severity: warning
+            team: test-platform
+          annotations:
+            summary: "Chaibot experiencing high error rate"
+            description: "Chaibot has {{ $value }} errors per second over the last 5 minutes."
+            runbook_url: "https://github.com/openshift/release/blob/main/docs/dptp-triage-sop/chaibot.md"
+
+        - alert: ChaibotAnalysisTimeout
+          expr: |
+            histogram_quantile(0.95, rate(chaibot_analysis_duration_seconds_bucket[5m])) > 120
+          for: 15m
+          labels:
+            severity: warning
+            team: test-platform
+          annotations:
+            summary: "Chaibot analysis taking too long"
+            description: "95th percentile analysis duration is {{ $value }}s, exceeding 120s timeout."
+
+        - alert: ChaibotDown
+          expr: |
+            up{job="ci-chat-bot"} == 0
+          for: 5m
+          labels:
+            severity: critical
+            team: test-platform
+          annotations:
+            summary: "Chaibot service is down"
+            description: "ci-chat-bot service (including Chaibot) has been down for 5 minutes."
diff --git a/clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml b/clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml
index f0564d67a3a69..70ac802160464 100644
--- a/clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml
+++ b/clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml
@@ -284,6 +284,15 @@ spec:
                 path: rosa-admin-ocm
         - name: runtimedir
           emptyDir: { }
+        - name: triage-config
+          configMap:
+            name: ci-chat-bot-triage-config
+        - name: chaibot-secrets
+          secret:
+            secretName: ci-chat-bot-chaibot-secrets
+            items:
+              - key: openai-api-key
+                path: openai-api-key
       initContainers:
         - name: git-sync-init
           command:
@@ -371,6 +380,12 @@ spec:
               readOnly: true
             - mountPath: /runtimedir
               name: runtimedir
+            - name: triage-config
+              mountPath: /etc/triage-config
+              readOnly: true
+            - name: chaibot-secrets
+              mountPath: /etc/chaibot-secrets
+              readOnly: true
           env:
             - name: BOT_TOKEN
               valueFrom:
@@ -410,6 +425,13 @@ spec:
               value: us-east-1
             - name: XDG_RUNTIME_DIR
               value: /runtimedir
+            - name: CHAIBOT_ENABLED
+              value: "true"
+            - name: OPENAI_API_KEY
+              valueFrom:
+                secretKeyRef:
+                  name: ci-chat-bot-chaibot-secrets
+                  key: openai-api-key
           command: ["/bin/sh"]
           args:
             - -c
@@ -432,4 +454,6 @@ spec:
               --rosa-cluster-limit=30 \\
               --rosa-subnetlist-path=/etc/subnet-ids/rosa-subnet-ids \\
               --rosa-oidcConfigId-path=/etc/oidc-config-id/rosa-oidc-config-id \\
-              --rosa-billingAccount-path=/etc/billing-account-id/rosa-billing-account-id
+              --rosa-billingAccount-path=/etc/billing-account-id/rosa-billing-account-id \\
+              --enable-triage=true \\
+              --triage-config-path=/etc/triage-config/triage-config.yaml
diff --git a/core-services/ci-chat-bot/CHAIBOT.md b/core-services/ci-chat-bot/CHAIBOT.md
new file mode 100644
index 0000000000000..4d890d9a1b0e6
--- /dev/null
+++ b/core-services/ci-chat-bot/CHAIBOT.md
@@ -0,0 +1,259 @@
+# Chaibot Test Failure Triage Extension
+
+This directory contains configuration for the **Chaibot** test failure triage feature, an AI-powered extension to ci-chat-bot.
+
+## What is Chaibot?
+
+Chaibot automatically monitors Slack channels (like `#opp-discussion`) for test failure messages, analyzes the failures using AI, and posts detailed triage analysis in threads.
+
+## Files
+
+- `triage-config.yaml` - Main configuration for Chaibot (source of truth)
+- `workflows-config.yaml` - Cluster provisioning workflows (existing ci-chat-bot config)
+
+## Quick Start
+
+### 1. Prerequisites
+
+- OpenAI API key (stored in ci-secret-bootstrap)
+- Slack channel ID for #opp-discussion
+- ci-chat-bot deployment with Chaibot support (requires ci-tools update)
+
+### 2. Configuration
+
+The `triage-config.yaml` file is mounted as a ConfigMap in the ci-chat-bot deployment.
+
+To enable Chaibot:
+
+```yaml
+enabled: true
+
+monitored_channels:
+  - name: "opp-discussion"
+    channel_id: "YOUR_CHANNEL_ID"  # Get from Slack
+    auto_respond: true
+```
+
+### 3. Get Slack Channel ID
+
+In Slack:
+1. Right-click the `#opp-discussion` channel
+2. Select "View channel details"
+3. Copy the Channel ID from the About section
+4. Update `channel_id` in `triage-config.yaml`
+
+### 4. Deploy
+
+The configuration is automatically deployed when you:
+
+```bash
+# Update from this directory
+make update
+
+# Apply ConfigMap (done automatically by postsubmit)
+oc apply -f ../../clusters/app.ci/ci-chat-bot/chaibot-configmap.yaml
+```
+
+## How It Works
+
+1. **Detection**: Monitors configured Slack channels for messages containing:
+   - Prow job URLs
+   - Failure keywords ("test failed", "job failed", etc.)
+
+2. **Analysis**: When a failure is detected:
+   - Fetches job logs from GCS
+   - Analyzes with AI (OpenAI GPT-4)
+   - Categorizes failure (infrastructure, flaky, bug, config)
+   - Searches Sippy for historical patterns
+   - Looks up related JIRA issues
+
+3. **Response**: Posts analysis in thread:
+   - Root cause with confidence level
+   - Evidence from logs
+   - Historical context
+   - Actionable recommendations
+
+## Example
+
+User posts in #opp-discussion:
+```
+Job failed again: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/...
+```
+
+Chaibot responds in thread:
+```
+:cloud: Test Failure Analysis
+
+Root Cause: Infrastructure - AWS Capacity (85% confidence)
+Analysis: Instance launch failed due to InsufficientInstanceCapacity in us-east-1c
+Evidence: "Error: creating EC2 Instance: InsufficientInstanceCapacity..."
+Historical: 8 similar failures in last 24h (transient issue)
+Recommendation: Retest - likely to succeed
+
+Classification: Transient Infrastructure Issue
+```
+
+## Configuration Options
+
+### Monitored Channels
+
+Add or remove channels:
+
+```yaml
+monitored_channels:
+  - name: "opp-discussion"
+    channel_id: "C01234567"
+    auto_respond: true      # Auto-analyze or require @mention
+    response_mode: "thread" # thread, channel, or dm
+```
+
+### Analysis Settings
+
+Adjust AI provider and timeout:
+
+```yaml
+analysis:
+  timeout: 120          # seconds
+  ai_provider: "openai" # openai or anthropic
+  model: "gpt-4"        # or gpt-3.5-turbo for lower cost
+```
+
+### Failure Categories
+
+Customize or add categories:
+
+```yaml
+failure_categories:
+  infrastructure:
+    patterns:
+      - "InsufficientInstanceCapacity"
+      - "RequestLimitExceeded"
+    confidence_threshold: 0.7
+```
+
+### Rate Limiting
+
+Prevent abuse:
+
+```yaml
+rate_limiting:
+  max_analyses_per_hour: 100
+  max_analyses_per_user_per_hour: 10
+  cooldown_seconds: 30  # Min time between analyses for same job
+```
+
+## Integrations
+
+### Sippy
+
+Enabled by default, shows historical failure patterns:
+
+```yaml
+integrations:
+  sippy:
+    enabled: true
+    base_url: "https://sippy.dptools.openshift.org"
+    lookback_days: 7
+```
+
+### JIRA
+
+Searches for related issues:
+
+```yaml
+integrations:
+  jira:
+    enabled: true
+    endpoint: "https://redhat.atlassian.net"
+    search_projects: ["OCPBUGS", "DPTP"]
+```
+
+### OpenAI
+
+AI analysis requires API key:
+
+```yaml
+integrations:
+  ai_api:
+    enabled: true
+    secret_name: "chaibot-openai-key"
+    secret_namespace: "ci"
+    rate_limit_rpm: 50
+```
+
+## Monitoring
+
+Metrics exposed on port 9090 (same as ci-chat-bot):
+
+- `chaibot_messages_processed_total`
+- `chaibot_failures_detected_total`
+- `chaibot_analyses_completed_total`
+- `chaibot_analysis_duration_seconds`
+- `chaibot_api_errors_total`
+
+Alerts configured in `clusters/app.ci/ci-chat-bot/chaibot-deployment-patch.yaml`
+
+## Troubleshooting
+
+### Chaibot not responding
+
+```bash
+# Check if enabled
+oc get configmap ci-chat-bot-triage-config -n ci -o yaml | grep enabled
+
+# Check logs
+oc logs -n ci deployment/ci-chat-bot -c bot | grep -i chaibot
+
+# Verify secrets exist
+oc get secret ci-chat-bot-chaibot-secrets -n ci
+```
+
+### Analysis timeout
+
+- Check `max_log_size_mb` - reduce if logs are too large
+- Increase `analysis.timeout` value
+- Check OpenAI API status
+
+### Wrong analysis
+
+- Review and tune `failure_categories` patterns
+- Adjust `confidence_threshold` values
+- Update AI prompts in `ai_prompts` section
+
+## Cost Management
+
+OpenAI API costs (approximate):
+- GPT-4: ~$0.03 per analysis
+- GPT-3.5-turbo: ~$0.003 per analysis
+
+At 100 analyses/day:
+- GPT-4: ~$90/month
+- GPT-3.5-turbo: ~$9/month
+
+Control costs with:
+- Rate limiting
+- Cooldown periods
+- Switching to GPT-3.5-turbo
+
+## Development
+
+To add new features:
+
+1. Update `triage-config.yaml` schema
+2. Implement in [openshift/ci-tools](https://github.com/openshift/ci-tools) cmd/ci-chat-bot
+3. Add tests
+4. Update this documentation
+
+## Support
+
+- Questions: `#forum-ocp-testplatform`
+- ci-chat-bot team: `#forum-ocp-crt`
+- Issues: https://github.com/openshift/ci-tools/issues
+- Docs: https://docs.ci.openshift.org/tools/chaibot/
+
+## Related
+
+- [ci-chat-bot README](README.md) - Cluster provisioning workflows
+- [Chaibot Full Documentation](../../docs/chaibot-test-failure-triage.md)
+- [Sippy](https://sippy.dptools.openshift.org/) - Test analysis platform
+- [ci-tools](https://github.com/openshift/ci-tools) - Source code
diff --git a/core-services/ci-chat-bot/triage-config.yaml b/core-services/ci-chat-bot/triage-config.yaml
new file mode 100644
index 0000000000000..584fabcb7d29d
--- /dev/null
+++ b/core-services/ci-chat-bot/triage-config.yaml
@@ -0,0 +1,205 @@
+# Chaibot Test Failure Triage Configuration
+# This config enables ci-chat-bot to monitor Slack channels for test failures
+# and provide automated triage analysis
+
+# Feature flag to enable/disable triage functionality
+enabled: true
+
+# Slack channels to monitor for test failures
+monitored_channels:
+  - name: "opp-discussion"
+    channel_id: "C01234567"  # Replace with actual channel ID
+    auto_respond: true
+    response_mode: "thread"  # Options: thread, channel, dm
+
+  # Additional channels can be added
+  # - name: "forum-ocp-testplatform"
+  #   channel_id: "C98765432"
+  #   auto_respond: false  # Require @mention to trigger
+
+# Patterns to detect test failure messages
+failure_detection:
+  # URL patterns that indicate Prow job failures
+  prow_job_patterns:
+    - "https://prow.ci.openshift.org/view/gs/"
+    - "https://prow.ci.openshift.org/?pr="
+    - "https://deck-internal-ci.apps.ci.l2s4.p1.openshiftapps.com/"
+
+  # Keywords that indicate test failures
+  failure_keywords:
+    - "test failed"
+    - "job failed"
+    - "failure"
+    - "test timeout"
+    - "flaky test"
+    - "regression"
+    - "broken test"
+
+  # Message must contain job URL OR (keyword + context)
+  require_job_url: false
+
+# Analysis configuration
+analysis:
+  # Maximum time to spend analyzing a single failure (seconds)
+  timeout: 120
+
+  # AI provider configuration
+  ai_provider: "openai"  # Options: openai, anthropic, none
+  model: "gpt-4"
+
+  # What to analyze
+  analyze_components:
+    - job_metadata        # Job name, duration, timestamp
+    - failure_logs        # Pod logs, junit output
+    - historical_data     # Sippy integration for past failures
+    - infrastructure      # Cloud provider issues, cluster state
+    - known_issues        # JIRA search for similar failures
+
+  # Categorization rules
+  failure_categories:
+    infrastructure:
+      patterns:
+        - "InsufficientInstanceCapacity"
+        - "RequestLimitExceeded"
+        - "could not create instance"
+        - "timeout waiting for"
+        - "connection refused"
+      confidence_threshold: 0.7
+
+    flaky_test:
+      patterns:
+        - "race condition"
+        - "intermittent"
+        - "sometimes fails"
+        - "timeout.*eventually"
+      confidence_threshold: 0.6
+
+    product_bug:
+      patterns:
+        - "panic:"
+        - "nil pointer"
+        - "assertion failed"
+        - "unexpected error"
+      confidence_threshold: 0.8
+
+    configuration:
+      patterns:
+        - "missing environment"
+        - "invalid configuration"
+        - "could not find image"
+        - "secret.*not found"
+      confidence_threshold: 0.75
+
+# Response formatting
+response:
+  # Template for Slack message response
+  include_sections:
+    - summary           # Brief one-line summary
+    - root_cause        # Identified root cause with confidence
+    - evidence          # Key log excerpts and patterns
+    - historical        # Similar past failures from Sippy
+    - recommendations   # Suggested actions
+    - related_issues    # JIRA issues or documentation
+
+  # Emoji indicators for quick visual parsing
+  use_emojis: true
+  emoji_map:
+    infrastructure: ":cloud:"
+    flaky_test: ":game_die:"
+    product_bug: ":bug:"
+    configuration: ":wrench:"
+    unknown: ":question:"
+
+  # Add interactive buttons
+  include_actions:
+    - label: "View Full Logs"
+      action: "open_url"
+    - label: "Retest"
+      action: "trigger_retest"
+    - label: "Create JIRA"
+      action: "create_issue"
+    - label: "Mark Flaky"
+      action: "mark_flaky"
+
+# Integration settings
+integrations:
+  # Sippy integration for historical failure data
+  sippy:
+    enabled: true
+    base_url: "https://sippy.dptools.openshift.org"
+    lookback_days: 7
+    min_occurrences: 2  # Minimum failures to show pattern
+
+  # JIRA integration for known issues
+  jira:
+    enabled: true
+    endpoint: "https://redhat.atlassian.net"
+    search_projects:
+      - "OCPBUGS"
+      - "DPTP"
+    max_results: 5
+
+  # Prow/GCS access for log fetching
+  prow:
+    enabled: true
+    gcs_bucket: "gs://origin-ci-test"
+    max_log_size_mb: 50
+    fetch_artifacts:
+      - "build-log.txt"
+      - "junit*.xml"
+      - "e2e-events*.json"
+
+  # OpenAI/Anthropic API
+  ai_api:
+    enabled: true
+    secret_name: "chaibot-openai-key"  # Kubernetes secret
+    secret_namespace: "ci"
+    rate_limit_rpm: 50  # Requests per minute
+
+# Rate limiting and abuse prevention
+rate_limiting:
+  max_analyses_per_hour: 100
+  max_analyses_per_user_per_hour: 10
+  max_concurrent_analyses: 5
+  cooldown_seconds: 30  # Min time between analyses for same job
+
+# Monitoring and observability
+monitoring:
+  metrics_enabled: true
+  metrics_port: 9090
+  log_level: "info"  # Options: debug, info, warn, error
+
+  # Prometheus metrics to export
+  metrics:
+    - chaibot_messages_processed_total
+    - chaibot_failures_detected_total
+    - chaibot_analyses_completed_total
+    - chaibot_analysis_duration_seconds
+    - chaibot_api_errors_total
+    - chaibot_category_detections_total
+
+# Prompt template for AI analysis
+ai_prompts:
+  system_prompt: |
+    You are Chaibot, an expert CI/CD test failure analyst for OpenShift.
+    Analyze test failures and provide concise, actionable triage information.
+    Focus on root cause identification and practical next steps.
+    Categorize failures as: infrastructure, flaky_test, product_bug, or configuration.
+    Be confident but acknowledge uncertainty when appropriate.
+
+  analysis_prompt: |
+    Analyze this OpenShift CI test failure:
+
+    Job: {job_name}
+    Status: {status}
+    Duration: {duration}
+
+    Error Logs:
+    {error_excerpt}
+
+    Provide:
+    1. Root Cause (category + confidence %)
+    2. Brief Analysis (2-3 sentences)
+    3. Key Evidence (specific log excerpts)
+    4. Recommendations (numbered action items)
+    5. Classification (transient vs persistent issue)
diff --git a/core-services/ci-secret-bootstrap/_config.yaml b/core-services/ci-secret-bootstrap/_config.yaml
index fcdf93992551f..d0a44ade19371 100644
--- a/core-services/ci-secret-bootstrap/_config.yaml
+++ b/core-services/ci-secret-bootstrap/_config.yaml
@@ -4218,6 +4218,14 @@ secret_configs:
   - cluster: core-ci
     name: pj-rehearse
     namespace: ci
+- from:
+    openai-api-key:
+      field: openai-api-key
+      path: selfservice/cspi-qe/chaibot-openai-key
+  to:
+  - cluster: app.ci
+    name: ci-chat-bot-chaibot-secrets
+    namespace: ci
 - from:
     sa.ci-chat-bot.build01.config:
       field: sa.ci-chat-bot.build01.config
diff --git a/core-services/ci-secret-bootstrap/chaibot-secret-config.yaml b/core-services/ci-secret-bootstrap/chaibot-secret-config.yaml
new file mode 100644
index 0000000000000..4433c1658ac5c
--- /dev/null
+++ b/core-services/ci-secret-bootstrap/chaibot-secret-config.yaml
@@ -0,0 +1,44 @@
+# Secret configuration for Chaibot
+# This file should be added to ci-secret-bootstrap configuration
+# to manage the OpenAI API key and Slack channel ID securely
+
+# Add this entry to core-services/ci-secret-bootstrap/_config.yaml:
+
+# - from:
+#     openai-api-key:
+#       dockerconfigJSON: <path-to-vault>
+#   to:
+#     - cluster: app.ci
+#       namespace: ci
+#       name: ci-chat-bot-chaibot-secrets
+
+---
+# Instructions for setting up secrets:
+#
+# 1. OpenAI API Key:
+#    - Obtain API key from OpenAI dashboard (https://platform.openai.com/api-keys)
+#    - Store in Vault at the appropriate path
+#    - Reference: https://docs.ci.openshift.org/docs/how-tos/adding-a-new-secret-to-ci/
+#
+# 2. Slack Channel ID for #opp-discussion:
+#    - In Slack, right-click the #opp-discussion channel
+#    - Select "View channel details"
+#    - Copy the Channel ID from the bottom of the modal
+#    - Update the channel_id in the triage-config.yaml ConfigMap
+#
+# 3. Alternative AI Providers (optional):
+#    - For Anthropic Claude: Store ANTHROPIC_API_KEY
+#    - Update triage-config.yaml ai_provider to "anthropic"
+#
+# 4. Required Slack App Permissions:
+#    The ci-chat-bot-slack-app secret should include these OAuth scopes:
+#    - channels:history (read messages in public channels)
+#    - channels:read (view basic channel info)
+#    - chat:write (post messages)
+#    - files:read (access uploaded failure logs)
+#    - reactions:write (add emoji reactions to indicate processing)
+#
+# 5. Slack Event Subscriptions:
+#    Subscribe to these events in the Slack App configuration:
+#    - message.channels (receive messages from monitored channels)
+#    - app_mention (respond when @chaibot is mentioned)
diff --git a/docs/chaibot-test-failure-triage.md b/docs/chaibot-test-failure-triage.md
new file mode 100644
index 0000000000000..a6a3eaf01dffe
--- /dev/null
+++ b/docs/chaibot-test-failure-triage.md
@@ -0,0 +1,433 @@
+# Chaibot - Automated Test Failure Triage for Slack
+
+## Overview
+
+Chaibot is an AI-powered extension to the ci-chat-bot service that automatically triages and analyzes OpenShift CI test failures posted in Slack channels. It provides intelligent root cause analysis and actionable recommendations directly in Slack threads.
+
+## Features
+
+### Automatic Detection
+- Monitors configured Slack channels (e.g., `#opp-discussion`)
+- Detects Prow job failure messages and URLs
+- Identifies test failure keywords and patterns
+
+### Intelligent Analysis
+- Fetches job logs and artifacts from GCS
+- Analyzes failure patterns using AI (OpenAI GPT-4 or Anthropic Claude)
+- Categorizes failures into:
+  - **Infrastructure Issues**: Cloud provider capacity, networking, timeouts
+  - **Flaky Tests**: Race conditions, intermittent failures
+  - **Product Bugs**: Panics, assertion failures, regressions
+  - **Configuration Issues**: Missing secrets, invalid config
+
+### Historical Context
+- Integrates with Sippy to show historical failure patterns
+- Searches JIRA for related known issues
+- Identifies if this is a new or recurring failure
+
+### Actionable Output
+- Posts analysis in Slack threads to keep channels clean
+- Provides confidence levels for root cause identification
+- Suggests specific next steps (retest, file bug, investigate)
+- Includes direct links to logs, Sippy, and related issues
+
+## Example Output
+
+When a test failure is posted in `#opp-discussion`:
+
+```
+User: The e2e-aws job is failing again 😞
+https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/12345/pull-ci-openshift-installer-master-e2e-aws/678901
+
+---
+
+Chaibot [BOT]: :mag: Analyzing failure... (typically takes 30-60 seconds)
+
+[30 seconds later]
+
+Chaibot [BOT]: :cloud: **Test Failure Analysis**
+
+**Job:** `pull-ci-openshift-installer-master-e2e-aws`
+**Status:** Failed after 2h 15m
+**Root Cause:** Infrastructure - AWS EC2 Capacity (Confidence: 85%)
+
+**Analysis:**
+Test failed during cluster provisioning when attempting to launch EC2 instances in us-east-1c. 
+AWS returned "InsufficientInstanceCapacity" error after multiple retry attempts over 45 minutes.
+
+**Evidence:**
+```
+Error: creating EC2 Instance (i-0a1b2c3d4e5f): InsufficientInstanceCapacity
+status code: 500, request id: xyz-123
+```
+
+**Historical Pattern:**
+This failure has occurred 8 times in the last 24 hours across multiple jobs, all in us-east-1c AZ.
+Sippy shows this as a known transient infrastructure issue.
+
+**Recommendations:**
+1. ✅ **Retest** - This is a transient AWS issue, likely to succeed on retry
+2. 📊 Check AWS Service Health Dashboard for us-east-1 incidents
+3. 🔔 If failures persist >6 hours, escalate to infrastructure team
+
+**Related:**
+- <https://sippy.dptools.openshift.org/|Sippy Dashboard>
+- <https://issues.redhat.com/browse/DPTP-5678|DPTP-5678>: Similar AWS capacity issues
+
+**Classification:** Transient Infrastructure (Not a product bug)
+
+[Buttons: View Logs | Retest | Mark as Known Issue]
+```
+
+## Setup and Configuration
+
+### Prerequisites
+
+1. **OpenAI API Key** (or Anthropic API Key)
+   - Required for AI-powered analysis
+   - Store securely via ci-secret-bootstrap
+
+2. **Slack Channel ID**
+   - Get the channel ID for `#opp-discussion`
+   - Update in ConfigMap configuration
+
+3. **ci-chat-bot Deployment**
+   - Chaibot runs as part of ci-chat-bot service
+   - Requires deployment update to enable
+
+### Installation Steps
+
+#### 1. Configure Secrets
+
+Add OpenAI API key via ci-secret-bootstrap:
+
+```bash
+# Edit core-services/ci-secret-bootstrap/_config.yaml
+# Add entry for chaibot-openai-key pointing to vault path
+```
+
+#### 2. Get Slack Channel ID
+
+```bash
+# In Slack:
+# Right-click #opp-discussion → View channel details → Copy Channel ID
+# Update clusters/app.ci/ci-chat-bot/chaibot-configmap.yaml
+```
+
+#### 3. Deploy Configuration
+
+```bash
+# Create the ConfigMap
+oc apply -f clusters/app.ci/ci-chat-bot/chaibot-configmap.yaml
+
+# Create the secrets
+# (Managed via ci-secret-bootstrap after PR merge)
+
+# Update ci-chat-bot deployment
+# Edit clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml
+# Add volumes, volumeMounts, and env vars from chaibot-deployment-patch.yaml
+```
+
+#### 4. Update ci-chat-bot Deployment
+
+Apply the changes from `chaibot-deployment-patch.yaml`:
+
+```yaml
+# Add to volumes:
+- name: triage-config
+  configMap:
+    name: ci-chat-bot-triage-config
+
+- name: chaibot-secrets
+  secret:
+    secretName: ci-chat-bot-chaibot-secrets
+
+# Add to volumeMounts (bot container):
+- name: triage-config
+  mountPath: /etc/triage-config
+  readOnly: true
+
+- name: chaibot-secrets
+  mountPath: /etc/chaibot-secrets
+  readOnly: true
+
+# Add to env (bot container):
+- name: CHAIBOT_ENABLED
+  value: "true"
+- name: OPENAI_API_KEY
+  valueFrom:
+    secretKeyRef:
+      name: ci-chat-bot-chaibot-secrets
+      key: openai-api-key
+
+# Add to args (bot container):
+--enable-triage=true \
+--triage-config-path=/etc/triage-config/triage-config.yaml \
+```
+
+#### 5. Configure Slack App Permissions
+
+Ensure the ci-chat-bot Slack app has these OAuth scopes:
+
+- `channels:history` - Read messages in public channels
+- `channels:read` - View channel information
+- `chat:write` - Post messages and replies
+- `files:read` - Access uploaded logs
+- `reactions:write` - Add reactions to indicate processing
+
+Subscribe to these events:
+- `message.channels` - Receive channel messages
+- `app_mention` - Respond to @chaibot mentions
+
+#### 6. Deploy and Verify
+
+```bash
+# Apply changes
+make update
+oc apply -f clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml
+
+# Watch deployment
+oc rollout status deployment/ci-chat-bot -n ci
+
+# Check logs
+oc logs -f deployment/ci-chat-bot -n ci -c bot | grep -i chaibot
+
+# Test in Slack
+# Post a test failure message in #opp-discussion with a Prow URL
+```
+
+## Usage
+
+### Automatic Triggering
+
+Chaibot automatically responds to messages in monitored channels that contain:
+- Prow job URLs (`https://prow.ci.openshift.org/view/gs/...`)
+- Failure keywords + context
+
+### Manual Triggering
+
+Mention `@chaibot analyze` with a job URL:
+
+```
+@chaibot analyze https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/...
+```
+
+### Response Modes
+
+**Thread Mode (default):**
+- Posts analysis in a thread reply
+- Keeps channels clean and organized
+
+**Reaction Mode:**
+- Adds 👀 emoji when processing starts
+- Adds ✅ when complete, ❌ if failed
+
+## Configuration
+
+### Adding Channels
+
+Edit `clusters/app.ci/ci-chat-bot/chaibot-configmap.yaml`:
+
+```yaml
+monitored_channels:
+  - name: "opp-discussion"
+    channel_id: "C01234567"
+    auto_respond: true
+    response_mode: "thread"
+
+  - name: "forum-testplatform"  # Add new channel
+    channel_id: "C98765432"
+    auto_respond: false  # Require @mention
+    response_mode: "thread"
+```
+
+### Adjusting Analysis
+
+**Timeout:**
+```yaml
+analysis:
+  timeout: 120  # seconds
+```
+
+**AI Model:**
+```yaml
+analysis:
+  ai_provider: "openai"  # or "anthropic"
+  model: "gpt-4"         # or "claude-3-opus-20240229"
+```
+
+**Failure Categories:**
+```yaml
+failure_categories:
+  custom_category:
+    patterns:
+      - "specific error pattern"
+      - "another pattern"
+    confidence_threshold: 0.75
+```
+
+### Rate Limiting
+
+```yaml
+rate_limiting:
+  max_analyses_per_hour: 100
+  max_analyses_per_user_per_hour: 10
+  cooldown_seconds: 30
+```
+
+## Monitoring
+
+### Metrics
+
+Chaibot exposes Prometheus metrics on port 9090:
+
+- `chaibot_messages_processed_total` - Messages evaluated
+- `chaibot_failures_detected_total` - Failures identified
+- `chaibot_analyses_completed_total` - Analyses finished
+- `chaibot_analysis_duration_seconds` - Analysis latency
+- `chaibot_api_errors_total` - API errors (Slack, OpenAI, etc.)
+- `chaibot_category_detections_total{category="..."}` - Failure categories
+
+### Alerts
+
+PrometheusRules are configured for:
+- High error rate (>10% over 5 minutes)
+- Analysis timeouts (>120 seconds)
+- Service down
+
+View alerts: https://prometheus.ci.openshift.org/
+
+### Dashboards
+
+Grafana dashboard: https://grafana.ci.openshift.org/d/chaibot/
+
+## Troubleshooting
+
+### Chaibot Not Responding
+
+1. **Check service status:**
+   ```bash
+   oc get pods -n ci -l app=ci-chat-bot
+   oc logs -n ci -l app=ci-chat-bot -c bot --tail=100 | grep chaibot
+   ```
+
+2. **Verify configuration:**
+   ```bash
+   oc get configmap ci-chat-bot-triage-config -n ci -o yaml
+   ```
+
+3. **Check secrets:**
+   ```bash
+   oc get secret ci-chat-bot-chaibot-secrets -n ci
+   ```
+
+4. **Review metrics:**
+   ```bash
+   curl http://ci-chat-bot.ci.svc:9090/metrics | grep chaibot
+   ```
+
+### Analysis Timeout
+
+If analyses are timing out:
+- Check `chaibot_analysis_duration_seconds` metric
+- Increase timeout in config
+- Reduce `max_log_size_mb` if log fetching is slow
+- Check OpenAI API rate limits
+
+### Inaccurate Analysis
+
+- Review AI prompts in `triage-config.yaml`
+- Adjust confidence thresholds for categories
+- Add more specific patterns to failure categories
+- Consider switching AI models or providers
+
+### Rate Limiting Issues
+
+- Check `chaibot_api_errors_total{reason="rate_limit"}`
+- Increase OpenAI rate limits
+- Adjust `rate_limiting.max_analyses_per_hour`
+
+## Cost Considerations
+
+### OpenAI API Usage
+
+Estimated costs (GPT-4):
+- ~$0.03 per analysis (8K input tokens, 2K output tokens)
+- 100 analyses/day = ~$3/day = ~$90/month
+- Adjust by configuring rate limits
+
+### Optimization
+
+- Use GPT-3.5-turbo for lower cost (~$0.003/analysis)
+- Limit `max_log_size_mb` to reduce input tokens
+- Configure cooldown to prevent duplicate analyses
+- Set per-user rate limits
+
+## Security
+
+### API Keys
+- Never commit API keys to git
+- Use ci-secret-bootstrap and Vault
+- Rotate keys regularly
+
+### Log Access
+- Chaibot has read access to GCS buckets
+- Only fetches publicly accessible job artifacts
+- Does not access private/embargoed job logs
+
+### Slack Permissions
+- Only monitors configured public channels
+- Cannot read DMs or private channels
+- Rate limited to prevent abuse
+
+## Development
+
+### Local Testing
+
+```bash
+# Clone ci-tools repo
+git clone https://github.com/openshift/ci-tools
+cd ci-tools/cmd/ci-chat-bot
+
+# Add chaibot feature flag
+# Implement triage module
+
+# Run locally
+go run . \
+  --triage-config-path=/path/to/triage-config.yaml \
+  --enable-triage=true \
+  --dry-run
+```
+
+### Adding Features
+
+1. Update `triage-config.yaml` schema
+2. Implement in ci-tools codebase
+3. Add tests
+4. Update documentation
+5. Submit PR to openshift/ci-tools
+
+## Support
+
+### Documentation
+- This guide: https://docs.ci.openshift.org/tools/chaibot/
+- ci-chat-bot docs: https://docs.ci.openshift.org/architecture/ci-chat-bot/
+
+### Slack Channels
+- `#forum-ocp-testplatform` - Ask questions
+- `#forum-ocp-crt` - ci-chat-bot team
+
+### Issues
+- Report bugs: https://github.com/openshift/ci-tools/issues
+- Feature requests: Same, label with `chaibot`
+
+## Roadmap
+
+Planned features:
+- [ ] Multi-turn conversation for deep analysis
+- [ ] Automatic JIRA ticket creation for bugs
+- [ ] Integration with retester for auto-retry
+- [ ] Flaky test database population
+- [ ] Weekly failure summary reports
+- [ ] Support for analyzing multiple jobs in one request
+- [ ] Custom analysis templates per team