Skip to content

Detect broken partitions#478

Draft
otselnik wants to merge 8 commits into
yandex:mainfrom
otselnik:detect-broken-partitions/refactor-and-reporting
Draft

Detect broken partitions#478
otselnik wants to merge 8 commits into
yandex:mainfrom
otselnik:detect-broken-partitions/refactor-and-reporting

Conversation

@otselnik

@otselnik otselnik commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary by Sourcery

Extend broken part detection in object storage to classify and optionally repair parts, and add utilities to regenerate structural files for recoverable parts.

New Features:

  • Add a --restore-recoverable flag to detect-broken-partitions to automatically detach, repair, and reattach recoverable parts while detaching unrecoverable ones.
  • Introduce part file classification logic to categorize missing files by recoverability for MergeTree parts.
  • Add utilities to regenerate simple structural files (e.g., columns.txt, count.txt, default_compression_codec.txt, metadata_version.txt) from ClickHouse metadata and rewrite corresponding S3 disk metadata.

Enhancements:

  • Optimize S3 checks for part files by batching key existence lookups and reducing repeated queries to ClickHouse metadata.
  • Improve logging and reporting for broken parts, including per-part recovery results and clearer debug output.

@sourcery-ai

sourcery-ai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Refactors and extends detect_broken_partitions to batch-check S3 objects, classify missing files per-part, and optionally recover or detach parts, introducing reusable utilities for part file classification and recovery plus metadata serialization helpers.

File-Level Changes

Change Details Files
Redesigned broken-part detection flow to batch S3 checks, classify broken parts, and support recover-or-detach actions per part/partition.
  • Replaced per-file list_objects-based S3 existence checks with a three-pass algorithm: collect all part files and S3 keys, batch-check existence via head_object, and then classify each part based on missing files.
  • Introduced per-part status aggregation using a classifier that distinguishes recoverable, partially-recoverable, and unrecoverable missing files, factoring in part type and critical columns.
  • Changed CLI options from a single --reattach flag to --restore-recoverable and --detach, making them compatible and adjusting control flow to first attempt recovery, then optionally detach unrecoverable partitions.
  • Reworked partition repair logic to track affected partitions, print them once, and add detailed logging of broken parts and recovery outcomes.
ch_tools/chadmin/cli/data_store_group.py
Added a part recovery subsystem that can regenerate specific structural files for detached parts and upload them to S3 under new keys, updating local disk metadata accordingly.
  • Defined PartRecoveryContext to gather required system.parts/system.tables/system.columns metadata for a part and helper to build it from ClickHouse queries.
  • Implemented generation of simple structural files (columns.txt, count.txt, default_compression_codec.txt, metadata_version.txt) using existing structural_files helpers.
  • Implemented S3 upload flow that always writes regenerated blobs under fresh keys and rewrites local S3 metadata files to point at the new objects.
  • Implemented per-part recovery procedure that detaches the part, finds its detached directory, regenerates or unlinks missing files as appropriate, and then attempts to reattach the part, returning a structured PartRecoveryResult.
ch_tools/chadmin/cli/data_store_group.py
ch_tools/chadmin/internal/object_storage/part_recovery.py
ch_tools/chadmin/internal/object_storage/structural_files.py
Introduced file-level classification utilities to determine how missing files affect a part and whether recovery is safe.
  • Added classify_file to categorize a missing file as recoverable, partially-recoverable, or unrecoverable based on filename patterns, part type, and whether the affected column participates in PARTITION/ORDER BY.
  • Added classify_part to aggregate multiple missing-file statuses into a single part status using a worst-wins ranking.
  • Captured critical columns for each table by parsing sorting_key and partition_key expressions from system.tables, and used this in classification to treat key-column data files as unrecoverable.
ch_tools/chadmin/internal/object_storage/part_file_classifier.py
ch_tools/chadmin/cli/data_store_group.py
Extended S3 metadata utilities to support safe round-trip serialization and atomic rewrite of on-disk metadata files.
  • Added to_string on S3ObjectLocalMetaData to serialize metadata into the ClickHouse disk text format compatible with the existing parser.
  • Added to_file to atomically write the serialized metadata to disk via a temporary file and os.replace, used by the recovery flow when updating file pointers.
ch_tools/chadmin/internal/object_storage/s3_object_metadata.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant