Add Chaibot test failure triage workflow to ci-chat-bot#80476
Add Chaibot test failure triage workflow to ci-chat-bot#80476chaclark1974 wants to merge 3 commits into
Conversation
This PR adds Chaibot, an AI-powered Slack workflow that automatically triages and analyzes test failures posted in designated Slack channels. ## Overview Chaibot extends the existing ci-chat-bot service to monitor Slack channels (initially #opp-discussion) for test failure messages, analyze failures using OpenAI GPT-4, and post detailed triage analysis in threads. ## What's Added ### Configuration Files - `core-services/ci-chat-bot/triage-config.yaml` - Main Chaibot configuration - `clusters/app.ci/ci-chat-bot/chaibot-configmap.yaml` - Kubernetes ConfigMap - `clusters/app.ci/ci-chat-bot/chaibot-deployment-patch.yaml` - Prometheus alerts - `core-services/ci-secret-bootstrap/chaibot-secret-config.yaml` - Secret config guide ### Deployment Changes - `clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml` - Updated with: - Chaibot triage-config and secrets volumes - CHAIBOT_ENABLED and OPENAI_API_KEY environment variables - --enable-triage command line argument ### Documentation - `docs/chaibot-test-failure-triage.md` - Comprehensive user/admin guide - `core-services/ci-chat-bot/CHAIBOT.md` - Quick reference - `CHAIBOT_QUICKSTART.md` - Quick start guide - `DEPLOY_CHAIBOT.md` - Deployment instructions ## Features - **Automatic Detection**: Monitors channels for Prow job failures - **AI Analysis**: Uses OpenAI to categorize failures (infrastructure, flaky, bug, config) - **Historical Context**: Integrates with Sippy for past failure patterns - **JIRA Integration**: Searches for related known issues - **Actionable Output**: Posts analysis with recommendations in Slack threads ## Example Output When a failure is posted, Chaibot responds with: - Root cause identification (with confidence %) - Evidence from logs - Historical failure patterns - Specific recommendations - Links to Sippy, logs, and related JIRA issues ## Configuration Required Before this can function, the following must be configured: 1. **Slack Channel ID**: Update `chaibot-configmap.yaml` with actual channel ID for #opp-discussion 2. **OpenAI API Key**: Add to ci-secret-bootstrap (see `chaibot-secret-config.yaml`) 3. **Slack App Permissions**: Ensure ci-chat-bot app has required OAuth scopes ## Implementation Note⚠️ This PR provides the complete configuration and deployment manifests, but requires code implementation in openshift/ci-tools (cmd/ci-chat-bot) to actually process the configuration and perform analysis. Without the code implementation, the deployment will succeed but Chaibot will not respond to messages (the --enable-triage flag will be ignored). ## Cost Estimate - GPT-4: ~$0.03/analysis (~$90/month at 100 failures/day) - GPT-3.5-turbo: ~$0.003/analysis (~$9/month at 100 failures/day) - Rate limiting configured to prevent cost overruns ## Testing After deployment: 1. Update ConfigMap with actual Slack channel ID 2. Configure OpenAI API key secret 3. Post test failure message with Prow URL in #opp-discussion 4. Verify Chaibot responds in thread within 60 seconds ## Related - Extends existing ci-chat-bot service - Integrates with Sippy for historical data - Complements retester for automated failure handling /cc @openshift/test-platform
Add Vault sync configuration for the Chaibot OpenAI API key stored in selfservice/cspi-qe/chaibot-openai-key. This configures ci-secret-bootstrap to automatically sync the key from Vault to the ci-chat-bot-chaibot-secrets Kubernetes secret in the ci namespace on the app.ci cluster. Vault path: selfservice/cspi-qe/chaibot-openai-key Target secret: ci-chat-bot-chaibot-secrets (ci namespace, app.ci cluster)
|
[REHEARSALNOTIFIER] Note: If this PR includes changes to step registry files ( |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: chaclark1974 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Caution Review failedAn error occurred during the review process. Please try again later. WalkthroughThis PR adds Chaibot, an AI-powered Slack workflow that automatically triages OpenShift CI test failures. It includes quick-start and deployment documentation, triage configuration files, Kubernetes manifest updates, secrets bootstrap configuration, and comprehensive operational guides across the repository. ChangesChaibot Test Failure Triage Feature
🎯 2 (Simple) | ⏱️ ~12 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/retest |
1 similar comment
|
/retest |
|
@chaclark1974: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add Chaibot test failure triage workflow to ci-chat-bot
This PR adds Chaibot, an AI-powered Slack workflow that automatically
triages and analyzes test failures posted in designated Slack channels.
Overview
Chaibot extends the existing ci-chat-bot service to monitor Slack channels
(initially #opp-discussion) for test failure messages, analyze failures using
OpenAI GPT-4, and post detailed triage analysis in threads.
What's Added
Configuration Files
core-services/ci-chat-bot/triage-config.yaml- Main Chaibot configurationclusters/app.ci/ci-chat-bot/chaibot-configmap.yaml- Kubernetes ConfigMapclusters/app.ci/ci-chat-bot/chaibot-deployment-patch.yaml- Prometheus alertscore-services/ci-secret-bootstrap/chaibot-secret-config.yaml- Secret config guideDeployment Changes
clusters/app.ci/ci-chat-bot/ci-chat-bot.yaml- Updated with:Documentation
docs/chaibot-test-failure-triage.md- Comprehensive user/admin guidecore-services/ci-chat-bot/CHAIBOT.md- Quick referenceCHAIBOT_QUICKSTART.md- Quick start guideDEPLOY_CHAIBOT.md- Deployment instructionsFeatures
Example Output
When a failure is posted, Chaibot responds with:
Configuration Required
Before this can function, the following must be configured:
chaibot-configmap.yamlwith actual channel ID for #opp-discussionchaibot-secret-config.yaml)Implementation Note
requires code implementation in openshift/ci-tools (cmd/ci-chat-bot) to
actually process the configuration and perform analysis.
Without the code implementation, the deployment will succeed but Chaibot
will not respond to messages (the --enable-triage flag will be ignored).
Cost Estimate
$0.03/analysis ($90/month at 100 failures/day)$0.003/analysis ($9/month at 100 failures/day)Testing
After deployment:
Related
/cc @openshift/test-platform
Summary by CodeRabbit
This PR introduces Chaibot, an AI-powered Slack workflow extension to the OpenShift CI's
ci-chat-botservice that automatically triages Prow test failures. The feature monitors designated Slack channels (initially#opp-discussion) for test failure messages, analyzes them using OpenAI's language models, and posts detailed triage analyses directly in Slack threads.Key Additions
Configuration & Deployment:
triage-config.yaml— Core Chaibot configuration defining monitored channels, failure detection patterns, AI analysis parameters, failure categorization rules (infrastructure, flaky tests, bugs, configuration), integrations (Sippy for historical context, JIRA for known issues, Prow for logs), rate limiting, and metrics settingschaibot-configmap.yaml) embedding the triage configuration, updated deployment manifest (ci-chat-bot.yaml) adding volumes, environment variables, and CLI flags to enable triage, and Prometheus alert rules for monitoring Chaibot healthci-chat-bot-chaibot-secretssecret in thecinamespaceDocumentation:
CHAIBOT_QUICKSTART.md,DEPLOY_CHAIBOT.md) — Step-by-step instructions for deploying Chaibot, including credential setup, secret configuration, manifest application, and validationcore-services/ci-chat-bot/CHAIBOT.md,docs/chaibot-test-failure-triage.md) — Detailed configuration reference, integration guidance, monitoring/metrics setup, troubleshooting, cost estimation, and security best practicesKey Features
Important Implementation Note
The manifests and configuration are fully provided and deployment-ready, but the runtime implementation in the
openshift/ci-toolsrepository (specificallycmd/ci-chat-bot) must be completed separately for Chaibot to function. Without that code, the deployment will not enable triage behavior despite the configuration being in place.Infrastructure Impact
This adds a new operational capability to the OpenShift CI tooling for test failure analysis, reducing manual triage work and providing engineers with rapid, AI-assisted insight into test failures in Slack.