Add part recover-broken command for recovering parts from s3#470
Draft
otselnik wants to merge 5 commits into
Draft
Add part recover-broken command for recovering parts from s3#470otselnik wants to merge 5 commits into
otselnik wants to merge 5 commits into
Conversation
Contributor
Reviewer's GuideAdds a new chadmin Sequence diagram for the new part recover-broken commandsequenceDiagram
actor Admin
participant part_group_recover_broken_command as recover_broken_part_command
participant Recover as recover_broken_part
participant S3 as S3Service
participant CH as ClickhouseServer
Admin->>part_group_recover_broken_command: chadmin part recover-broken
part_group_recover_broken_command->>Recover: recover_broken_part(client,disk_conf,part_path,output_tsv,...)
note over Recover: Scan part and collect blob keys
Recover->>Recover: scan_blob_keys(part_path,disk_conf)
Recover->>S3: check_blobs_parallel(disk_conf,all_blob_keys)
S3-->>Recover: blob_status
Recover->>Recover: classify(part_path,blob_status,disk_conf)
alt dry_run
Recover-->>part_group_recover_broken_command: RecoveryReport (dry-run)
else perform recovery
Recover->>Recover: download_part_files(disk_conf,part_files,assembled_dir)
Recover->>Recover: _copy_local_files(assembled_dir,part_files)
Recover->>Recover: _reconstruct_files(assembled_dir,part_files,...)
Recover->>CH: run_server_recovery(client,part_path,assembled_dir,...)
CH-->>Recover: rows,tsv_labels
Recover->>Recover: build_report(part_path,part_files,rows,tsv_columns)
Recover-->>part_group_recover_broken_command: RecoveryReport
end
part_group_recover_broken_command-->>Admin: TSV output file & optional report
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've found 3 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="ch_tools/chadmin/internal/part_recovery/server_runner.py" line_range="503-507" />
<code_context>
+ # Use streaming HTTP response to avoid loading all data into memory
+ response = client.query(sql, stream=True)
+
+ rows = 0
+ with output_path.open("wb") as fh:
+ for chunk in response.iter_content(chunk_size=65536):
+ if chunk:
+ fh.write(chunk)
+ rows += chunk.count(b"\n")
+
</code_context>
<issue_to_address>
**suggestion:** Row counting via newline bytes can undercount the last line and is slightly misleading.
Because `rows` is incremented with `chunk.count(b"\n")`, it will miss the final line when the file doesn't end with a newline and will count empty trailing lines as rows. Since this is only for reporting, consider either making it clear this is an approximate count (e.g. renaming to `approx_rows`) or obtaining the exact row count from ClickHouse (e.g. via a `COUNT()` query) so the reported value matches the logical row count.
Suggested implementation:
```python
Returns the approximate number of rows written (based on newline-delimited output).
```
```python
# Use streaming HTTP response to avoid loading all data into memory
response = client.query(sql, stream=True)
# Approximate row count based on newline-delimited output; may undercount the final line
# if the file does not end with a newline and will include trailing empty lines.
approx_rows = 0
with output_path.open("wb") as fh:
for chunk in response.iter_content(chunk_size=65536):
if chunk:
fh.write(chunk)
approx_rows += chunk.count(b"\n")
```
```python
return approx_rows
```
</issue_to_address>
### Comment 2
<location path="ch_tools/chadmin/internal/part_recovery/server_runner.py" line_range="346-103" />
<code_context>
+
+ client = clickhouse_client(ctx)
+
+ try:
+ report = recover_broken_part(
+ client=client,
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Chown logic may miss AttributeError on platforms without os.lchown.
`place_part_in_detached` currently catches only `PermissionError` and `OSError` around `os.lchown`, but on platforms/Python builds without `os.lchown`, an `AttributeError` may be raised instead. Please also handle `AttributeError` (or guard with `hasattr(os, "lchown")`) to prevent crashes in those environments.
</issue_to_address>
### Comment 3
<location path="ch_tools/chadmin/internal/part_recovery/classify.py" line_range="207-212" />
<code_context>
+
+ client = clickhouse_client(ctx)
+
+ try:
+ report = recover_broken_part(
+ client=client,
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Swallowing all metadata parsing errors in scan_blob_keys can hide genuine issues.
`scan_blob_keys` currently catches and ignores `ValueError`, `IndexError`, and `OSError` from `S3ObjectLocalMetaData.from_file`, treating any parse or I/O failure as "no blobs". This can mask real metadata or filesystem issues. Please consider at least emitting a debug log (including the filename and exception) on parse failure to aid diagnosis while preserving the current control flow.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Contributor
|
Integration with |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary by Sourcery
Introduce a CLI command and internal tooling to recover broken ClickHouse MergeTree parts stored on S3 and export their data to TSV, including health checks, reconstruction logic, and integration tests.
New Features:
part recover-brokencommand to repair detached MergeTree parts using S3 metadata and output recovered data as TSV.Enhancements:
Tests: