Skip to content

Avoid std::exit() from a worker thread in fastq_mergepairs#635

Merged
torognes merged 2 commits into
torognes:devfrom
trognes:fix/fastq-mergepairs-worker-exit
Jun 23, 2026
Merged

Avoid std::exit() from a worker thread in fastq_mergepairs#635
torognes merged 2 commits into
torognes:devfrom
trognes:fix/fastq-mergepairs-worker-exit

Conversation

@trognes

@trognes trognes commented Jun 23, 2026

Copy link
Copy Markdown

Problem

fastq_mergepairs processes read pairs with a pool of worker threads. Two
error paths reachable from a worker called std::exit() directly:

  • get_qual() (reached from process() in the worker pool) on a FASTQ
    quality value outside [fastq_qmin, fastq_qmax], and
  • the "More forward reads than reverse reads" check in read_pair().

Calling std::exit() from a worker is unsafe while sibling workers are still
running: exit() flushes and closes the shared output streams and runs static
destructors concurrently with threads that are still writing to those streams.
That is a data race on stdio/global teardown. It can corrupt libc state and
crash, and it can also drop the fatal message before it reaches the --log
file.

The crash is timing-dependent, so it is intermittent and platform-dependent: it
was observed as an intermittent Illegal instruction (core dumped) on FreeBSD
(clang libc), while the same race happens to be benign on glibc/Linux.

Fix

Make the worker error path cooperative instead of exiting in place:

  • A worker that detects the error records the first occurrence (reason and
    value) and sets an std::atomic<bool> abort flag.
  • process(), the chunk-processing loop, and pair_worker() observe the flag
    and unwind; finished_all is set under the lock with a notify_all() so any
    parked worker wakes and exits.
  • After the worker pool has joined, pair_all() reports the recorded error and
    calls exit() on the main thread. Teardown is then single-threaded, and the
    message reliably reaches both stderr and the --log file.

Output messages and exit status are unchanged.

Testing

  • Functional: out-of-range qmin/qmax still print the correct fatal message and
    exit non-zero; the --log message is now reliably present; the forward/reverse
    read-count mismatch is still reported; normal merging is unchanged.
  • Stress: repeated multi-threaded runs over large inputs with an out-of-range
    value injected mid-stream, and a concurrent read-count-mismatch case — all
    clean exits, no crash or hang.
  • A ThreadSanitizer build reported no data races on the abort or normal paths.

claude added 2 commits June 23, 2026 16:47
fastq_mergepairs runs a pool of worker threads. On an out-of-range FASTQ
quality value, get_qual() (reached from process() via the worker pool) and
the "More forward reads than reverse reads" check in read_pair() called
std::exit() directly from a worker. exit() flushes and closes the shared
output streams and runs static destructors while sibling workers are still
writing to those streams, a data race that intermittently corrupts libc
state and crashes (observed as an "Illegal instruction" core dump on
FreeBSD CI; benign on glibc/Linux, hence intermittent and platform-
dependent). It can also drop the fatal message before it reaches the
--log file.

Make the error path cooperative instead: a worker records the first error
and sets an atomic abort flag; every worker then unwinds its loop, and
pair_all() reports the error and exits from the main thread after all
workers have joined. Teardown is then single-threaded and the message
reliably reaches stderr and the log file.

Verified with a multi-threaded stress loop of the offending inputs and a
ThreadSanitizer build (no races on the abort or normal paths).
The comments describing why the chunk mutex/condition variable are locals
still referenced the old hazard of a worker calling std::exit() while
siblings wait on the condition variable. That can no longer happen: with
the cooperative abort, exit() occurs only on the main thread after the
worker pool has joined. Reword the comments to reflect the current design.
@torognes torognes merged commit 4aa3efe into torognes:dev Jun 23, 2026
9 checks passed
@trognes trognes deleted the fix/fastq-mergepairs-worker-exit branch June 24, 2026 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants