Skip to content

[sight] bug(sight): config watcher calls process::exit(1) on transient metadata failure #782

@jfeng18

Description

@jfeng18

Summary

When the config watcher thread detects a new runtime.sls_logtail_path in the configuration file, it fetches the ECS owner account ID via the metadata endpoint (100.100.100.200). If that fetch fails (network timeout, endpoint unreachable), the thread calls std::process::exit(1), immediately terminating the entire agentsight process without running destructors.

This skips Drop impls, potentially losing buffered SQLite WAL data and in-flight log records.

Bug Location

src/agentsight/src/unified.rs, lines 930-937:

let uid = crate::genai::instance_id::get_owner_account_id();
if uid.is_empty() {
    log::error!("Config watcher: SLS activation requested but uid fetch failed. Terminating process.");
    std::process::exit(1);  // kills process from background thread
}

Inconsistency

The main initialization path (lines 297-303 and 321-327) performs the exact same check but uses anyhow::bail!() for graceful error propagation. Same check, different behavior.

How to Reproduce on ECS

# 1. Create config WITHOUT sls_logtail_path
echo '{"rules":[],"enable_ssl":false}' > /tmp/as_cfg.json

# 2. Start agentsight
RUST_LOG=info nohup agentsight trace --config /tmp/as_cfg.json > /tmp/test.log 2>&1 &
APID=$!; sleep 3

# 3. Block metadata endpoint
iptables -A OUTPUT -d 100.100.100.200 -j DROP

# 4. Add SLS path to trigger config watcher
echo '{"rules":[],"enable_ssl":false,"runtime":{"sls_logtail_path":"/tmp/test_logtail"}}' > /tmp/as_cfg.json

# 5. Wait and check
sleep 8
kill -0 $APID 2>/dev/null && echo "alive" || echo "DEAD"
# Expected: DEAD

# 6. Restore
iptables -D OUTPUT -d 100.100.100.200 -j DROP

Reproduced

Tested on ECS (kernel 6.6.102+). Process confirmed dead after process::exit(1).

Suggested Fix

Replace process::exit(1) with log::error!() + continue, allowing retry on next config change. Consistent with the existing error handling pattern at lines 911-916.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions