Skip to content

Fix loss of root/meta blkno after self-invalidation; defer invalidation processing under page locks#792

Open
Serge-sudo wants to merge 1 commit into
orioledb:mainfrom
Serge-sudo:fixed-invalid-meta-blocknum
Open

Fix loss of root/meta blkno after self-invalidation; defer invalidation processing under page locks#792
Serge-sudo wants to merge 1 commit into
orioledb:mainfrom
Serge-sudo:fixed-invalid-meta-blocknum

Conversation

@Serge-sudo

@Serge-sudo Serge-sudo commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

Fix an issue where B-tree descriptor state (metaPageBlkno and rootPageBlkno) could be lost after processing an invalidation message within the same transaction.

Previously, after o_btree_load_shmem() initialized the descriptor, a invalidation message could be processed before the descriptor was used. This caused the descriptor to be recreated, resetting metaPageBlkno and rootPageBlkno to InvalidBlockNumber, which led to assertion failures (e.g. in btree_ctid_get_and_inc).

While investigating, another problem was identified: invalidation message processing may occur while holding page locks. Invalidation handlers access system trees, and if required pages are evicted, they trigger page loads. This leads to load_page() being called while locks are held, hitting assertions that forbid holding locks during page load.

To address both issues:

  • Ensure descriptor state is not lost due to mid-operation self-invalidation

  • Introduce deferred invalidation handling:

    • Skip invalidation message processing when it is unsafe (e.g. while holding page locks)
    • Process pending invalidation messages at the next safe point

This prevents descriptor corruption and avoids unsafe page loads under locks.

uplink-> orioledb/postgres#55

fix for #789

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a correctness issue in OrioleDB descriptor invalidation where self-invalidation could recreate B-tree descriptors mid-operation and reset metaPageBlkno / rootPageBlkno, leading to assertion failures. It also aims to prevent unsafe invalidation handling while Oriole page locks are held by deferring descriptor invalidation work until a safe point.

Changes:

  • Adds a backend-local queue to defer OrioleDB descriptor invalidations while Oriole page locks are held, and flushes the queue via a custom invalidation hook.
  • Ensures recreated table/index descriptors re-adopt shared-memory B-tree root/meta blkno state when available (o_btree_try_use_shmem() after recreation).
  • Registers ReceiveCustomInvalMessage_hook to drive deferred invalidation processing.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/tableam/descr.c Adds deferred descriptor invalidation queue/hook and refreshes B-tree root/meta blkno after descriptor recreation.
src/orioledb.c Registers the custom invalidation hook (ReceiveCustomInvalMessage_hook).
include/tableam/descr.h Exposes orioledb_deferred_inval_message_hook() for hook registration.
include/orioledb.h Includes utils/inval.h and exposes prev_ReceiveCustomInvalMessage_hook.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/tableam/descr.c
Comment on lines 989 to +1028
o_invalidate_descrs(Oid datoid, Oid reloid, Oid relfilenode)
{
DeferredDescrInvalidation *deferred;
MemoryContext oldcontext;

/*
* If we currently hold page locks, recreating descriptors may attempt to
* read sys-tree pages and violate lock invariants. Defer processing to
* the next invalidation callback when locks are released.
*/
if (have_locked_pages())
{
oldcontext = MemoryContextSwitchTo(TopMemoryContext);
deferred = palloc(sizeof(DeferredDescrInvalidation));
deferred->datoid = datoid;
deferred->reloid = reloid;
deferred->relfilenode = relfilenode;
deferred_descr_invals = lappend(deferred_descr_invals, deferred);
/*
* Each entry is freed during the next lock-free processing pass; list
* cells and the list container are also freed there. All of this
* lives in TopMemoryContext so it survives across callbacks and
* transaction boundaries.
*/
MemoryContextSwitchTo(oldcontext);
return;
}

/*
* Process any previously deferred invalidations first to maintain correct order.
* This queue most certainly should be empty
* (as hook is called before processing any invalidation),
* but we need to be sure that no invalidations are left unprocessed
* before we handle the current one.
*/
orioledb_deferred_inval_message_hook();

/* Handle the current invalidation. */
o_invalidate_descrs_internal(datoid, reloid, relfilenode);
}
Comment thread src/tableam/descr.c
Comment on lines +959 to +966
/* call prev hook if any */
if (prev_ReceiveCustomInvalMessage_hook)
prev_ReceiveCustomInvalMessage_hook();

/* Can't safely process invalidations while holding locks, so defer it. */
if (have_locked_pages())
return;

Comment thread src/tableam/descr.c
Comment on lines +1017 to +1025
/*
* Process any previously deferred invalidations first to maintain correct order.
* This queue most certainly should be empty
* (as hook is called before processing any invalidation),
* but we need to be sure that no invalidations are left unprocessed
* before we handle the current one.
*/
orioledb_deferred_inval_message_hook();

@Serge-sudo Serge-sudo force-pushed the fixed-invalid-meta-blocknum branch from 254d401 to 6b708a7 Compare March 19, 2026 14:59
…on processing under page locks

Fix an issue where B-tree descriptor state (`metaPageBlkno` and `rootPageBlkno`) could be lost after processing an invalidation message within the same transaction.

Previously, after `o_btree_load_shmem()` initialized the descriptor, a invalidation message could be processed before the descriptor was used. This caused the descriptor to be recreated, resetting `metaPageBlkno` and `rootPageBlkno` to `InvalidBlockNumber`, which led to assertion failures (e.g. in `btree_ctid_get_and_inc`).

While investigating, another problem was identified: invalidation message processing may occur while holding page locks. Invalidation handlers access system trees, and if required pages are evicted, they trigger page loads. This leads to `load_page()` being called while locks are held, hitting assertions that forbid holding locks during page load.

To address both issues:

* Ensure descriptor state is not lost due to mid-operation self-invalidation
* Introduce deferred invalidation handling:

  * Skip invalidation message processing when it is unsafe (e.g. while holding page locks)
  * Process pending invalidation messages at the next safe point

This prevents descriptor corruption and avoids unsafe page loads under locks.
@Serge-sudo Serge-sudo force-pushed the fixed-invalid-meta-blocknum branch from 6b708a7 to 3a4f5c8 Compare March 19, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants