Summary
When the config watcher thread detects a new runtime.sls_logtail_path in the configuration file, it fetches the ECS owner account ID via the metadata endpoint (100.100.100.200). If that fetch fails (network timeout, endpoint unreachable), the thread calls std::process::exit(1), immediately terminating the entire agentsight process without running destructors.
This skips Drop impls, potentially losing buffered SQLite WAL data and in-flight log records.
Bug Location
src/agentsight/src/unified.rs, lines 930-937:
let uid = crate::genai::instance_id::get_owner_account_id();
if uid.is_empty() {
log::error!("Config watcher: SLS activation requested but uid fetch failed. Terminating process.");
std::process::exit(1); // kills process from background thread
}
Inconsistency
The main initialization path (lines 297-303 and 321-327) performs the exact same check but uses anyhow::bail!() for graceful error propagation. Same check, different behavior.
How to Reproduce on ECS
# 1. Create config WITHOUT sls_logtail_path
echo '{"rules":[],"enable_ssl":false}' > /tmp/as_cfg.json
# 2. Start agentsight
RUST_LOG=info nohup agentsight trace --config /tmp/as_cfg.json > /tmp/test.log 2>&1 &
APID=$!; sleep 3
# 3. Block metadata endpoint
iptables -A OUTPUT -d 100.100.100.200 -j DROP
# 4. Add SLS path to trigger config watcher
echo '{"rules":[],"enable_ssl":false,"runtime":{"sls_logtail_path":"/tmp/test_logtail"}}' > /tmp/as_cfg.json
# 5. Wait and check
sleep 8
kill -0 $APID 2>/dev/null && echo "alive" || echo "DEAD"
# Expected: DEAD
# 6. Restore
iptables -D OUTPUT -d 100.100.100.200 -j DROP
Reproduced
Tested on ECS (kernel 6.6.102+). Process confirmed dead after process::exit(1).
Suggested Fix
Replace process::exit(1) with log::error!() + continue, allowing retry on next config change. Consistent with the existing error handling pattern at lines 911-916.
Summary
When the config watcher thread detects a new
runtime.sls_logtail_pathin the configuration file, it fetches the ECS owner account ID via the metadata endpoint (100.100.100.200). If that fetch fails (network timeout, endpoint unreachable), the thread callsstd::process::exit(1), immediately terminating the entire agentsight process without running destructors.This skips Drop impls, potentially losing buffered SQLite WAL data and in-flight log records.
Bug Location
src/agentsight/src/unified.rs, lines 930-937:Inconsistency
The main initialization path (lines 297-303 and 321-327) performs the exact same check but uses
anyhow::bail!()for graceful error propagation. Same check, different behavior.How to Reproduce on ECS
Reproduced
Tested on ECS (kernel 6.6.102+). Process confirmed dead after process::exit(1).
Suggested Fix
Replace
process::exit(1)withlog::error!()+continue, allowing retry on next config change. Consistent with the existing error handling pattern at lines 911-916.