fix: update request to return raw data instead of encoded text#421
fix: update request to return raw data instead of encoded text#421hudcap wants to merge 4 commits into
Conversation
When the response is decoded, any bytes that cannot be converted to text are replaced with the unicode replacement character, and are therefore lost forever.
📝 WalkthroughWalkthroughReads fetch response body as an ArrayBuffer immediately (preserving original bytes), concurrently reads and prefix-strips text for JSON parsing, and updates tests to assert exact binary content bytes. ChangesResponse Content Handling
🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pydoll/constants.py (2)
255-257: ⚡ Quick winClone and dual-read strategy correctly preserves binary content.
Reading
arrayBuffer()from the cloned response ensures the raw bytes are captured before any decoding, which is exactly what's needed to fix the binary corruption issue described in#419. The original response'stext()method leverages the browser's native charset detection.⚡ Optional: parallelize the two reads for faster response handling
- const responseClone = response.clone(); - const content = await responseClone.arrayBuffer(); - let text = await response.text(); + const responseClone = response.clone(); + const [content, text] = await Promise.all([ + responseClone.arrayBuffer(), + response.text() + ]);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pydoll/constants.py` around lines 255 - 257, The current dual-read uses sequential awaits on responseClone.arrayBuffer() then response.text(), which is correct but slower; change to run them in parallel by creating the clone (responseClone) and then await Promise.all([responseClone.arrayBuffer(), response.text()]) so you populate content and text concurrently (keep using responseClone for arrayBuffer() to preserve raw bytes and response.text() for decoded text).
276-279: ⚡ Quick winClarify the comment to reflect what changed.
The comment "For non-JSON, keep original text handling" could be misinterpreted. The actual change here is that the previous code was re-encoding
textintocontent(causing binary corruption), and that re-encode step has been removed. The text from line 257 is simply kept as-is, andcontentis already set from the arrayBuffer read on line 256.📝 Suggested comment clarification
} else { - // For non-JSON, keep original text handling + // Non-JSON: text is already decoded; content already captured as raw bytes jsonData = null; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pydoll/constants.py` around lines 276 - 279, Update the comment next to the jsonData branch to explicitly state that we removed the prior re-encoding of text into content (which caused binary corruption); note that content is now assigned from the arrayBuffer read earlier and text from the upstream variable is preserved as-is instead of being re-encoded. Reference the variables jsonData, text, content, and the arrayBuffer read so it's clear the change: no re-encoding occurs and original text is kept while content comes from the arrayBuffer.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pydoll/constants.py`:
- Around line 255-257: The current dual-read uses sequential awaits on
responseClone.arrayBuffer() then response.text(), which is correct but slower;
change to run them in parallel by creating the clone (responseClone) and then
await Promise.all([responseClone.arrayBuffer(), response.text()]) so you
populate content and text concurrently (keep using responseClone for
arrayBuffer() to preserve raw bytes and response.text() for decoded text).
- Around line 276-279: Update the comment next to the jsonData branch to
explicitly state that we removed the prior re-encoding of text into content
(which caused binary corruption); note that content is now assigned from the
arrayBuffer read earlier and text from the upstream variable is preserved as-is
instead of being re-encoded. Reference the variables jsonData, text, content,
and the arrayBuffer read so it's clear the change: no re-encoding occurs and
original text is kept while content comes from the arrayBuffer.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 316a1aa8-60ee-4403-bf5d-f95467ad64d8
📒 Files selected for processing (1)
pydoll/constants.py
|
It's failing this line: I think that test is incorrect (it should be checking |
|
I guess there's no point in the PR if the tests are failing. I'll just change everything and await review |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pydoll/constants.py (1)
274-280: 💤 Low valueConsider clarifying the comments.
The comments on lines 275 and 279 state "Return raw bytes," but raw bytes are always returned in the
contentfield regardless of these code paths. The comments might be clearer if they explicitly stated what changes in these branches, e.g., "jsonData will be null" or "JSON parsing skipped; caller should use content field."📝 Suggested comment refinement
try { jsonData = JSON.parse(text); text = JSON.stringify(jsonData); }} catch (e) {{ - // Return raw bytes if parsing fails + // JSON parsing failed; jsonData will be null jsonData = null; }} }} else {{ - // For non-JSON, return raw bytes + // Non-JSON response; jsonData will be null jsonData = null; }}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pydoll/constants.py` around lines 274 - 280, The comments in the try/catch and else branches that say "Return raw bytes" are misleading because the code always returns raw bytes in the content field; update those comments to clearly state that jsonData is set to null and that JSON parsing is being skipped so callers should use the content field (e.g., replace "Return raw bytes" with "jsonData will be null — JSON parsing failed; caller should use content bytes" in the catch block, and "JSON parsing skipped; caller should use content bytes" in the non-JSON else branch). Ensure the comments reference the jsonData variable and the content field near the catch/else around the JSON parsing logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pydoll/constants.py`:
- Around line 274-280: The comments in the try/catch and else branches that say
"Return raw bytes" are misleading because the code always returns raw bytes in
the content field; update those comments to clearly state that jsonData is set
to null and that JSON parsing is being skipped so callers should use the content
field (e.g., replace "Return raw bytes" with "jsonData will be null — JSON
parsing failed; caller should use content bytes" in the catch block, and "JSON
parsing skipped; caller should use content bytes" in the non-JSON else branch).
Ensure the comments reference the jsonData variable and the content field near
the catch/else around the JSON parsing logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6c9e6d4e-0ab7-492d-bf4b-62f96315fbf2
📒 Files selected for processing (2)
pydoll/constants.pytests/integration/test_tab_request_integration.py
|
Sorry, I'm out of my depth. Tests seem to have passed but there's a public key error. No idea what to do about that |
Bug Fix Pull Request
Related Issue(s)
Fixes #419
Bug Description
tab.requestsometimes returns corrupted binary content, with the Unicode replacement character mixed into the bytes.Root Cause
What's being returned is an encoded version of the decoded bytes, and the decoded bytes can lose data when the bytes have no known associated character.
Solution
Return the original bytes instead of the encoded text.
Verification Steps
Testing Checklist
Impact
Backwards Compatibility
Checklist before requesting a review
This change is so minimal, I don't think most of these are relevant. I don't think I could possibly have messed up any of the below, but if you really want me to, I'll double-check.
poetry run task lintand fixed any issuespoetry run task testand all tests passSummary by CodeRabbit