Skip to content

Remove code 0 latency metrics from latency average calculations#71

Open
meecethereese wants to merge 4 commits into
mainfrom
maufe/cleanup-latency
Open

Remove code 0 latency metrics from latency average calculations#71
meecethereese wants to merge 4 commits into
mainfrom
maufe/cleanup-latency

Conversation

@meecethereese

@meecethereese meecethereese commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

This PR updates our latency metrics to only report latency for requests that received a response from the server. It accomplishes this by reusing the logic from the merge script and filtering out code 0 latency data. We also add success count and success rate metrics too.

Mauricio Ferrari and others added 4 commits June 10, 2026 15:56
…_rate

Extract per-second bucketing logic into modules/vegeta/aggregate.awk
shared between the live runner and merge.sh.

Latency percentiles now ignore status_code=0 records (vegeta's marker
for transport-level failures: connection refused, EOF, timeout) so
percentiles reflect service response time rather than failure-detection
time. Adds successful_count and success_rate fields so consumers can
tell whether p99 was computed over many samples or a handful of
survivors. rps, code.hist, and byte sums still count every record, so
sum(code.hist[*]) == rps remains an invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
modules/vegeta/run/run.sh now pipes vegeta encode --to csv through
gawk -f aggregate.awk, the same aggregator merge.sh uses. Live and
merged JSON outputs now share one schema, and the live path inherits
the code-0 latency filter plus the new successful_count and
success_rate fields.

Code histogram keys are now exact status codes (e.g. "0", "200",
"503") rather than jaggr's binned families ("100"..."500").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New synthetic-data test in modules/vegeta/test/test.sh exercises three
buckets: mixed (5 successes + 5 code-0 timeouts), all successes, all
failures. Asserts that p99 in the mixed bucket excludes the 30s code-0
timeouts, that successful_count and success_rate match expected values,
that the all-failure bucket emits latency:{0,0,0} without crashing, and
that the live and merge paths produce byte-identical output for in-order
input.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Data flow for results section now describes the gawk-based
aggregator pipeline (replacing the previous jaggr step), the unified
live/merge schema, and the latency-filter behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@meecethereese meecethereese requested a review from a team as a code owner June 10, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant