Skip to content

armv7-m/armv8-m: honor SP modifications on exception return#19123

Open
cshung wants to merge 1 commit into
apache:masterfrom
cshung:arm-sp-context-restore
Open

armv7-m/armv8-m: honor SP modifications on exception return#19123
cshung wants to merge 1 commit into
apache:masterfrom
cshung:arm-sp-context-restore

Conversation

@cshung

@cshung cshung commented Jun 11, 2026

Copy link
Copy Markdown

Summary

On ARMv7-M and ARMv8-M, the exception return path in arm_exception.S writes PSP/MSP from the HW exception frame pointer (r0), ignoring any software modification to REG_R13 in the saved register context. This means signal handlers that adjust SP (e.g., a managed runtime unwinding past a trampoline frame) have their SP change silently discarded.

Root cause: Hardware determines the final SP from the physical location of the HW exception frame (PSP + frame_size), not from the software-saved SP value. The existing code does msr psp, r0 where r0 points to the HW frame, so the restored SP is always r0 + frame_size regardless of what regs[REG_R13] says.

Fix: Add optional HW frame relocation (CONFIG_ARMV7M_SP_CONTEXT_RESTORE / CONFIG_ARMV8M_SP_CONTEXT_RESTORE, both default n) that compares the desired SP with the implied SP. If they differ, the HW exception frame is copied to (desired_SP - frame_size) so hardware exception return produces the correct final SP.

The relocation:

  • Handles both copy directions (memmove semantics) for overlapping regions
  • Determines frame size at runtime from EXC_RETURN bit 4 (32 bytes standard vs 104 bytes with FPU)
  • Has zero cost when disabled (default) and minimal fast-path cost when enabled (cmp + beq on match)

Companion test PR: apache/nuttx-apps#3536

Impact

  • No impact when disabled (default n): zero code change in the binary
  • When enabled: adds a compare-and-branch on every exception return (taken only when SP was modified). The relocation loop runs only in the rare case where a signal handler modified SP.
  • Users affected: managed runtimes (e.g., .NET NativeAOT) that need to adjust SP during signal/fault handling on Cortex-M
  • Architectures: ARMv7-M and ARMv8-M only

Testing

Host: Ubuntu 22.04 x86_64, arm-none-eabi-gcc 13.3, QEMU 8.2.2 (via Docker)

Targets tested:

  • lm3s6965-ek:qemu-flat (ARMv7-M, Cortex-M3) with CONFIG_ARMV7M_SP_CONTEXT_RESTORE=y
  • mps2-an521:nsh (ARMv8-M, Cortex-M33) with CONFIG_ARMV8M_SP_CONTEXT_RESTORE=y

Test procedure:

  1. Verified bug exists (RED): without fix, test reports popped value = 2, FAIL
  2. Applied fix (GREEN): test reports popped value = 1, PASS
  3. Tested both slide directions (SP increase and SP decrease)
  4. Ran ostest to verify no regressions

Test log (ARMv7-M, lm3s6965evb):

nsh> sig_sp_test
sig_sp_test: Signal SP restore test
sig_sp_test: push 1, push 2, alarm, handler SP+=4, pop => 1
sig_sp_test: handler - PC=0x00022770 SP=0x2000a9a8
sig_sp_test: handler - new SP=0x2000a9ac PC=0x000226ac
sig_sp_test: popped value = 1 (expected 1)
sig_sp_test: PASS

Test log (ARMv8-M, mps2-an521):

nsh> sig_sp_test
sig_sp_test: Signal SP restore test
sig_sp_test: push 1, push 2, alarm, handler SP+=4, pop => 1
sig_sp_test: handler - PC=0x1002b2b8 SP=0x38007b08
sig_sp_test: handler - new SP=0x38007b0c PC=0x1002b1fc
sig_sp_test: popped value = 1 (expected 1)
sig_sp_test: PASS

On ARMv7-M and ARMv8-M, the exception return path writes PSP/MSP
from the HW exception frame pointer, ignoring software modifications
to REG_R13 in the saved register context. This means signal handlers
that adjust SP (e.g., a managed runtime unwinding past a trampoline
frame) have their SP change silently discarded.

Add CONFIG_ARMV7M_SP_CONTEXT_RESTORE and CONFIG_ARMV8M_SP_CONTEXT_RESTORE
(both default n) that, when enabled, compare the desired SP with the
implied SP after exception return. If they differ, relocate the HW
exception frame so that the hardware-computed final SP matches the
desired value.

The relocation handles both copy directions (memmove semantics) for
overlapping regions when SP changes by less than the frame size.
Frame size is determined at runtime from EXC_RETURN bit 4 to correctly
handle both standard (32-byte) and FPU-extended (104-byte) frames.

Signed-off-by: Andrew Au <cshung@gmail.com>
@github-actions github-actions Bot added Arch: arm Issues related to ARM (32-bit) architecture Size: M The size of the change in this PR is medium labels Jun 12, 2026
@xiaoxiang781216

Copy link
Copy Markdown
Contributor

Summary

On ARMv7-M and ARMv8-M, the exception return path in arm_exception.S writes PSP/MSP from the HW exception frame pointer (r0), ignoring any software modification to REG_R13 in the saved register context. This means signal handlers that adjust SP (e.g., a managed runtime unwinding past a trampoline frame) have their SP change silently discarded.

Root cause: Hardware determines the final SP from the physical location of the HW exception frame (PSP + frame_size), not from the software-saved SP value. The existing code does msr psp, r0 where r0 points to the HW frame, so the restored SP is always r0 + frame_size regardless of what regs[REG_R13] says.

could you point out POSIX spec which describe this behavior?

Fix: Add optional HW frame relocation (CONFIG_ARMV7M_SP_CONTEXT_RESTORE / CONFIG_ARMV8M_SP_CONTEXT_RESTORE, both default n) that compares the desired SP with the implied SP. If they differ, the HW exception frame is copied to (desired_SP - frame_size) so hardware exception return produces the correct final SP.

the interrupt handle becomes much slow when enabling CONFIG_ARMVxM_SP_CONTEXT_RESTORE.

@cshung

cshung commented Jun 12, 2026

Copy link
Copy Markdown
Author

Thanks for the feedback on the performance concern. I wanted to share some context about the trade-offs here:

1. The behavior is opt-in (default=n)

This is the most important point. Users who don't enable this configuration pay zero overhead. Only users who need POSIX-correct signal handler behavior (like managed runtimes) choose to enable it, and they're choosing correctness over marginal performance.

2. The fast path is minimal overhead

The fast path check (comparing register values + a little math + branch) happens on every exception, but it's just ~10-20 cycles. For 99.99%+ of exceptions where SP is not modified, we branch back to the existing behavior with no copy overhead.

3. The slow path is vanishingly rare

The expensive copy only happens when an exception occurs right inside a trampoline (like interface dispatch with multi inheritance). JITted code, async/await, and coroutines are not trampolines. This is an extreme edge case.

4. Correctness vs performance trade-off

If managed runtime developers accept GC pause costs (which can be 10-100ms, orders of magnitude slower than my 10-20 cycle fast path), then accepting a small cost for correct signal handler behavior is reasonable. The OS is supposed to abstract hardware away - when hardware doesn't restore SP correctly, the OS must compensate.

5. POSIX expectation

POSIX sigaltstack() allows signal handlers to use alternate stacks, and the implicit contract is that when a signal handler returns, the thread resumes with the context the handler established. Without this fix, SP changes are silently discarded, which breaks managed runtimes.

Can you share actual measurements of how much slower? What's the baseline interrupt latency, and what percentage slower are you seeing? I'd like to understand the real impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arch: arm Issues related to ARM (32-bit) architecture Size: M The size of the change in this PR is medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants