armv7-m/armv8-m: honor SP modifications on exception return#19123
armv7-m/armv8-m: honor SP modifications on exception return#19123cshung wants to merge 1 commit into
Conversation
On ARMv7-M and ARMv8-M, the exception return path writes PSP/MSP from the HW exception frame pointer, ignoring software modifications to REG_R13 in the saved register context. This means signal handlers that adjust SP (e.g., a managed runtime unwinding past a trampoline frame) have their SP change silently discarded. Add CONFIG_ARMV7M_SP_CONTEXT_RESTORE and CONFIG_ARMV8M_SP_CONTEXT_RESTORE (both default n) that, when enabled, compare the desired SP with the implied SP after exception return. If they differ, relocate the HW exception frame so that the hardware-computed final SP matches the desired value. The relocation handles both copy directions (memmove semantics) for overlapping regions when SP changes by less than the frame size. Frame size is determined at runtime from EXC_RETURN bit 4 to correctly handle both standard (32-byte) and FPU-extended (104-byte) frames. Signed-off-by: Andrew Au <cshung@gmail.com>
could you point out POSIX spec which describe this behavior?
the interrupt handle becomes much slow when enabling CONFIG_ARMVxM_SP_CONTEXT_RESTORE. |
|
Thanks for the feedback on the performance concern. I wanted to share some context about the trade-offs here: 1. The behavior is opt-in (default=n) This is the most important point. Users who don't enable this configuration pay zero overhead. Only users who need POSIX-correct signal handler behavior (like managed runtimes) choose to enable it, and they're choosing correctness over marginal performance. 2. The fast path is minimal overhead The fast path check (comparing register values + a little math + branch) happens on every exception, but it's just ~10-20 cycles. For 99.99%+ of exceptions where SP is not modified, we branch back to the existing behavior with no copy overhead. 3. The slow path is vanishingly rare The expensive copy only happens when an exception occurs right inside a trampoline (like interface dispatch with multi inheritance). JITted code, async/await, and coroutines are not trampolines. This is an extreme edge case. 4. Correctness vs performance trade-off If managed runtime developers accept GC pause costs (which can be 10-100ms, orders of magnitude slower than my 10-20 cycle fast path), then accepting a small cost for correct signal handler behavior is reasonable. The OS is supposed to abstract hardware away - when hardware doesn't restore SP correctly, the OS must compensate. 5. POSIX expectation POSIX Can you share actual measurements of how much slower? What's the baseline interrupt latency, and what percentage slower are you seeing? I'd like to understand the real impact. |
Summary
On ARMv7-M and ARMv8-M, the exception return path in
arm_exception.Swrites PSP/MSP from the HW exception frame pointer (r0), ignoring any software modification toREG_R13in the saved register context. This means signal handlers that adjust SP (e.g., a managed runtime unwinding past a trampoline frame) have their SP change silently discarded.Root cause: Hardware determines the final SP from the physical location of the HW exception frame (
PSP + frame_size), not from the software-saved SP value. The existing code doesmsr psp, r0wherer0points to the HW frame, so the restored SP is alwaysr0 + frame_sizeregardless of whatregs[REG_R13]says.Fix: Add optional HW frame relocation (
CONFIG_ARMV7M_SP_CONTEXT_RESTORE/CONFIG_ARMV8M_SP_CONTEXT_RESTORE, both defaultn) that compares the desired SP with the implied SP. If they differ, the HW exception frame is copied to(desired_SP - frame_size)so hardware exception return produces the correct final SP.The relocation:
Companion test PR: apache/nuttx-apps#3536
Impact
n): zero code change in the binaryTesting
Host: Ubuntu 22.04 x86_64, arm-none-eabi-gcc 13.3, QEMU 8.2.2 (via Docker)
Targets tested:
lm3s6965-ek:qemu-flat(ARMv7-M, Cortex-M3) withCONFIG_ARMV7M_SP_CONTEXT_RESTORE=ymps2-an521:nsh(ARMv8-M, Cortex-M33) withCONFIG_ARMV8M_SP_CONTEXT_RESTORE=yTest procedure:
popped value = 2, FAILpopped value = 1, PASSTest log (ARMv7-M, lm3s6965evb):
Test log (ARMv8-M, mps2-an521):