Fix loss of root/meta blkno after self-invalidation; defer invalidation processing under page locks#792
Open
Serge-sudo wants to merge 1 commit into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses a correctness issue in OrioleDB descriptor invalidation where self-invalidation could recreate B-tree descriptors mid-operation and reset metaPageBlkno / rootPageBlkno, leading to assertion failures. It also aims to prevent unsafe invalidation handling while Oriole page locks are held by deferring descriptor invalidation work until a safe point.
Changes:
- Adds a backend-local queue to defer OrioleDB descriptor invalidations while Oriole page locks are held, and flushes the queue via a custom invalidation hook.
- Ensures recreated table/index descriptors re-adopt shared-memory B-tree root/meta blkno state when available (
o_btree_try_use_shmem()after recreation). - Registers
ReceiveCustomInvalMessage_hookto drive deferred invalidation processing.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/tableam/descr.c |
Adds deferred descriptor invalidation queue/hook and refreshes B-tree root/meta blkno after descriptor recreation. |
src/orioledb.c |
Registers the custom invalidation hook (ReceiveCustomInvalMessage_hook). |
include/tableam/descr.h |
Exposes orioledb_deferred_inval_message_hook() for hook registration. |
include/orioledb.h |
Includes utils/inval.h and exposes prev_ReceiveCustomInvalMessage_hook. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Comment on lines
989
to
+1028
| o_invalidate_descrs(Oid datoid, Oid reloid, Oid relfilenode) | ||
| { | ||
| DeferredDescrInvalidation *deferred; | ||
| MemoryContext oldcontext; | ||
|
|
||
| /* | ||
| * If we currently hold page locks, recreating descriptors may attempt to | ||
| * read sys-tree pages and violate lock invariants. Defer processing to | ||
| * the next invalidation callback when locks are released. | ||
| */ | ||
| if (have_locked_pages()) | ||
| { | ||
| oldcontext = MemoryContextSwitchTo(TopMemoryContext); | ||
| deferred = palloc(sizeof(DeferredDescrInvalidation)); | ||
| deferred->datoid = datoid; | ||
| deferred->reloid = reloid; | ||
| deferred->relfilenode = relfilenode; | ||
| deferred_descr_invals = lappend(deferred_descr_invals, deferred); | ||
| /* | ||
| * Each entry is freed during the next lock-free processing pass; list | ||
| * cells and the list container are also freed there. All of this | ||
| * lives in TopMemoryContext so it survives across callbacks and | ||
| * transaction boundaries. | ||
| */ | ||
| MemoryContextSwitchTo(oldcontext); | ||
| return; | ||
| } | ||
|
|
||
| /* | ||
| * Process any previously deferred invalidations first to maintain correct order. | ||
| * This queue most certainly should be empty | ||
| * (as hook is called before processing any invalidation), | ||
| * but we need to be sure that no invalidations are left unprocessed | ||
| * before we handle the current one. | ||
| */ | ||
| orioledb_deferred_inval_message_hook(); | ||
|
|
||
| /* Handle the current invalidation. */ | ||
| o_invalidate_descrs_internal(datoid, reloid, relfilenode); | ||
| } |
Comment on lines
+959
to
+966
| /* call prev hook if any */ | ||
| if (prev_ReceiveCustomInvalMessage_hook) | ||
| prev_ReceiveCustomInvalMessage_hook(); | ||
|
|
||
| /* Can't safely process invalidations while holding locks, so defer it. */ | ||
| if (have_locked_pages()) | ||
| return; | ||
|
|
Comment on lines
+1017
to
+1025
| /* | ||
| * Process any previously deferred invalidations first to maintain correct order. | ||
| * This queue most certainly should be empty | ||
| * (as hook is called before processing any invalidation), | ||
| * but we need to be sure that no invalidations are left unprocessed | ||
| * before we handle the current one. | ||
| */ | ||
| orioledb_deferred_inval_message_hook(); | ||
|
|
254d401 to
6b708a7
Compare
…on processing under page locks Fix an issue where B-tree descriptor state (`metaPageBlkno` and `rootPageBlkno`) could be lost after processing an invalidation message within the same transaction. Previously, after `o_btree_load_shmem()` initialized the descriptor, a invalidation message could be processed before the descriptor was used. This caused the descriptor to be recreated, resetting `metaPageBlkno` and `rootPageBlkno` to `InvalidBlockNumber`, which led to assertion failures (e.g. in `btree_ctid_get_and_inc`). While investigating, another problem was identified: invalidation message processing may occur while holding page locks. Invalidation handlers access system trees, and if required pages are evicted, they trigger page loads. This leads to `load_page()` being called while locks are held, hitting assertions that forbid holding locks during page load. To address both issues: * Ensure descriptor state is not lost due to mid-operation self-invalidation * Introduce deferred invalidation handling: * Skip invalidation message processing when it is unsafe (e.g. while holding page locks) * Process pending invalidation messages at the next safe point This prevents descriptor corruption and avoids unsafe page loads under locks.
6b708a7 to
3a4f5c8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix an issue where B-tree descriptor state (
metaPageBlknoandrootPageBlkno) could be lost after processing an invalidation message within the same transaction.Previously, after
o_btree_load_shmem()initialized the descriptor, a invalidation message could be processed before the descriptor was used. This caused the descriptor to be recreated, resettingmetaPageBlknoandrootPageBlknotoInvalidBlockNumber, which led to assertion failures (e.g. inbtree_ctid_get_and_inc).While investigating, another problem was identified: invalidation message processing may occur while holding page locks. Invalidation handlers access system trees, and if required pages are evicted, they trigger page loads. This leads to
load_page()being called while locks are held, hitting assertions that forbid holding locks during page load.To address both issues:
Ensure descriptor state is not lost due to mid-operation self-invalidation
Introduce deferred invalidation handling:
This prevents descriptor corruption and avoids unsafe page loads under locks.
uplink-> orioledb/postgres#55
fix for #789