Story 2451: AI-Assisted Description for Link Posts type#2477
Story 2451: AI-Assisted Description for Link Posts type#2477javiercoronadonarvaez wants to merge 41 commits into
Conversation
f72233e to
e57a705
Compare
eda4415 to
ebd76ef
Compare
- Add description box alongside 'Saving' + Saving Icon section
- Does not display anything if the user hasn't started typing - Display Saving text + Icon while user types and display Saved text + Icon once he's finished typing
- Corroborate auto generate description works with wysiwyg component
- Fix in accordance to GitHub Actions failure
…ated - Add 'Hold on! We are generating a description for your content, it may take a few seconds' as placeholder in description box
…ter post has been created
…ry.summary - Persist user-typed summary - Skip background regeneration when set
…task keeps retries
- Auto-Generate button disabled unless a properly composed URL is input
… background automatic generation
…ded Link is invalid - 'Auto-Generate only works for .html links from cppalliance.org' message displayed only when Link does not follow this pattern
16566d2 to
671b8a7
Compare
- Truncation to comply with 1000 character limit from Design templates - Optimize prompt
- Update pre commit config file to match black's version in requirements (26.1.0)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # templates/news/v3/create.html
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds AI-assisted description generation: safe link fetching/extraction, centralized summarization (sync + Celery wrapper), new backend endpoints, form/template/JS wiring for create-post UI with draft persistence, and supporting dependency/settings and maintenance edits. ChangesNews description generation and persistence
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
- Remove cppappliance custom parsing strategy and replace with trafilatura
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
frontend/wysiwyg-editor.js (1)
824-827:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMarkdown-mode edits are not propagated to the host state.
Line 824 updates
state.markdownText, but Line 916 only emitswysiwyg-updateon ProseMirror updates. In Markdown mode, the host page’s draft persistence/save indicator/counter can become stale (Line 908 also reads characters from editor state only).Proposed fix
mdTextarea.addEventListener("input", () => { state.markdownText = mdTextarea.value; debouncedMdPreview(); + dispatchState(false); }); @@ + const currentCharacters = () => + state.mode === "markdown" + ? state.markdownText.length + : editor.state.doc.textContent.length; + const dispatchState = (programmatic) => { editorEl.dispatchEvent( new CustomEvent("wysiwyg-update", { detail: { id: textareaId, - characters: editor.state.doc.textContent.length, + characters: currentCharacters(), value: currentValue(), programmatic: !!programmatic, }, bubbles: true, }), ); };Also applies to: 903-909, 916-917
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/wysiwyg-editor.js` around lines 824 - 827, The markdown input handler for mdTextarea currently sets state.markdownText and calls debouncedMdPreview but does not notify the host that the document changed; update the mdTextarea "input" listener (the block that sets state.markdownText and calls debouncedMdPreview) to also emit the same wysiwyg-update event used by the ProseMirror change handler, including the current markdownText and computed character/count payload the host expects (the same shape used where wysiwyg-update is emitted for ProseMirror updates), so the host-side draft persistence/indicator and character counter stay in sync; ensure the emitted payload mirrors what the existing ProseMirror emitter builds (character count and document text) and reuse the same emit function or helper if one exists.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@libraries/forms.py`:
- Around line 371-383: The tuple unpacking from mailinglist_wordcloud_task.get()
includes an unused first element; change the variable name mailinglist_words to
_mailinglist_words (or just _) in the unpacking expression inside the function
where mailinglist_wordcloud_task.get() is called so the unused value is explicit
and linter warnings are resolved (leave mailinglist_wordcloud_base64 and
mailinglist_wordcloud_top_words unchanged).
- Around line 442-444: The call to determine_versions in generate_context
currently unpacks three values into report_before_release, prior_version,
version but report_before_release is never used; change the unpack to use a
throwaway name (e.g. _report_before_release or _) for the first element while
keeping prior_version and version so lint stops flagging an unused
binding—locate the determine_versions call inside generate_context and replace
report_before_release with the chosen underscore variable.
In `@news/helpers.py`:
- Around line 73-77: The loop in safe_get currently downloads the full response
body (requests.get(..., allow_redirects=False)) which can exhaust memory; change
to request with stream=True inside safe_get, read the response body
incrementally (e.g., via resp.iter_content) and enforce a hard byte cap (e.g.,
MAX_RESPONSE_BYTES) across reads, raising an error or closing the connection if
the cap is exceeded; ensure you still check _url_host_is_safe(url) and handle
redirects via max_redirects/timeout and propagate or wrap errors with
UnsafeURLError or a new ResponseTooLargeError as appropriate.
In `@news/tasks.py`:
- Line 180: The log at logger.info currently includes the raw fetched markup
(markup[:100]) which can leak third-party content/PII and create
high-cardinality noise; update the log in news/tasks.py to remove the markup
snippet and instead log only safe metadata such as entry.external_url (or a
derived safe identifier like its hostname) and/or the markup length (e.g.,
len(markup)) so you preserve useful telemetry without exposing content.
- Around line 99-104: The code currently returns a falsy summary value which can
be an empty string and then get persisted, potentially clearing an existing
Entry.summary; change the falsy-check so that when summary is empty or invalid
you return None (not the empty summary) to signal “do not save” to the saver
task, and add a brief logger.debug or logger.warning with context (e.g., include
response metadata) before returning; update the branch that checks summary (the
variable summary from response.choices[0].message.content) to return None on
falsy values so the persistence path can skip saving.
In `@news/views.py`:
- Around line 588-627: The generate_description view is currently
unauthenticated and exposes a paid LLM endpoint; wrap it with Django's
login_required decorator so authentication runs before the POST check by adding
`@login_required` as the outer decorator above generate_description (i.e., ensure
`@login_required` appears above `@require_POST`), and consider adding a
rate-limiting decorator or middleware around generate_description to prevent
abuse and limit calls/timeouts.
- Around line 634-697: Add the missing authentication decorator to the
generate_link_description view: update the function decorated with `@require_POST`
in news/views.py by placing Django's `@login_required` decorator (from
django.contrib.auth.decorators import login_required) immediately above the view
so the symbol generate_link_description is only callable by authenticated users;
ensure the import for login_required is added if not present and keep
`@login_required` above `@require_POST` to enforce login before method check.
In `@requirements.txt`:
- Around line 321-329: Update the vulnerable pinned packages: change the lxml
pin (currently lxml[html-clean]==6.0.2) to at least lxml>=6.1.0 and bump urllib3
(currently urllib3==2.6.3) to at least urllib3>=2.7.0, then regenerate the
flattened requirements file from the input pins using the repo's compile command
(e.g., run the same pip-compile/uv pip compile command used elsewhere to produce
./requirements.txt) so the new versions and resolved transitive pins are written
to requirements.txt.
In `@templates/news/v3/create.html`:
- Around line 96-111: The textarea with id "field-post_description"
(x-model="description", `@input`="onDescriptionEdited") is left with
name="description" even when hidden for link/video posts, causing duplicate form
keys; update the template to only submit the write-up value when active — for
example bind the name or disabled attribute to the write-up state (use
:name="isWriteUp ? 'description' : null" or :disabled="!isWriteUp") or remove
the static name and conditionally set it when isWriteUp is true so the hidden
textarea does not send a stale value.
---
Outside diff comments:
In `@frontend/wysiwyg-editor.js`:
- Around line 824-827: The markdown input handler for mdTextarea currently sets
state.markdownText and calls debouncedMdPreview but does not notify the host
that the document changed; update the mdTextarea "input" listener (the block
that sets state.markdownText and calls debouncedMdPreview) to also emit the same
wysiwyg-update event used by the ProseMirror change handler, including the
current markdownText and computed character/count payload the host expects (the
same shape used where wysiwyg-update is emitted for ProseMirror updates), so the
host-side draft persistence/indicator and character counter stay in sync; ensure
the emitted payload mirrors what the existing ProseMirror emitter builds
(character count and document text) and reuse the same emit function or helper
if one exists.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: fae02e15-aa7c-46e5-b45b-c37eb300a3f9
📒 Files selected for processing (45)
.pre-commit-config.yamlak/views.pyconfig/settings.pyconfig/urls.pycore/context_processors.pycore/tests/test_managers.pycore/tests/test_tasks.pycore/views.pyfrontend/wysiwyg-editor.jsgunicorn.conf.pylibraries/admin.pylibraries/forms.pylibraries/tests/fixtures.pylibraries/tests/test_commands.pylibraries/tests/test_tasks.pymailing_list/management/commands/sync_mailinglist_stats.pymailing_list/tasks.pymarketing/tests.pynews/acl.pynews/constants.pynews/forms.pynews/helpers.pynews/tasks.pynews/tests/test_forms.pynews/urls.pynews/views.pyrequirements.inrequirements.txtslack/management/commands/clear_slack_activity.pyslack/management/commands/fetch_slack_activity.pystatic/css/v3/create-post-page.cssstatic/js/v3/wysiwyg-editor.jstemplates/includes/icon.htmltemplates/news/v3/create.htmltemplates/v3/includes/_field_textarea.htmlusers/forms.pyusers/tests/test_urls.pyusers/tests/test_views.pyusers/urls.pyusers/views.pyversions/releases.pyversions/tasks.pyversions/tests/test_ai_tasks.pyversions/tests/test_api.pyversions/views.py
💤 Files with no reviewable changes (18)
- gunicorn.conf.py
- users/forms.py
- slack/management/commands/clear_slack_activity.py
- ak/views.py
- users/tests/test_views.py
- core/context_processors.py
- core/tests/test_managers.py
- mailing_list/tasks.py
- core/tests/test_tasks.py
- news/acl.py
- libraries/tests/test_commands.py
- slack/management/commands/fetch_slack_activity.py
- users/urls.py
- versions/releases.py
- users/tests/test_urls.py
- news/urls.py
- versions/tests/test_api.py
- versions/tests/test_ai_tasks.py
| mailinglist_contributor_release_count, mailinglist_contributor_new_count = ( | ||
| mailing_list_contributors_task.get() | ||
| ) | ||
| (mailinglist_post_stats, total_mailinglist_count) = ( | ||
| mailing_list_stats_task.get() | ||
| ) | ||
| (commit_contributors_release_count, commit_contributors_new_count) = ( | ||
| mailinglist_post_stats, total_mailinglist_count = mailing_list_stats_task.get() | ||
| commit_contributors_release_count, commit_contributors_new_count = ( | ||
| commit_contributors_task.get() | ||
| ) | ||
| ( | ||
| mailinglist_words, | ||
| mailinglist_wordcloud_base64, | ||
| mailinglist_wordcloud_top_words, | ||
| ) = mailinglist_wordcloud_task.get() | ||
| (search_wordcloud_base64, search_wordcloud_top_words, search_stats) = ( | ||
| search_wordcloud_base64, search_wordcloud_top_words, search_stats = ( |
There was a problem hiding this comment.
Unused unpacked value in task result handling.
mailinglist_words is unpacked but never used. Rename it to _mailinglist_words (or _) to make intent explicit and satisfy lint rules.
🧰 Tools
🪛 Ruff (0.15.15)
[warning] 379-379: Unpacked variable mailinglist_words is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@libraries/forms.py` around lines 371 - 383, The tuple unpacking from
mailinglist_wordcloud_task.get() includes an unused first element; change the
variable name mailinglist_words to _mailinglist_words (or just _) in the
unpacking expression inside the function where mailinglist_wordcloud_task.get()
is called so the unused value is explicit and linter warnings are resolved
(leave mailinglist_wordcloud_base64 and mailinglist_wordcloud_top_words
unchanged).
Source: Linters/SAST tools
There was a problem hiding this comment.
out of scope for this PR. They're pre-existing lint, unrelated to anything changed.
There was a problem hiding this comment.
Understood, thanks for the context. Would you like me to open a follow-up GitHub issue to track the cleanup of the unused mailinglist_words variable so it doesn't get lost?
(ᴖ̥̥̥̥ ‸ ᴖ̥̥̥̥)
| report_before_release, prior_version, version = determine_versions( | ||
| report_configuration.version | ||
| ) |
There was a problem hiding this comment.
Unused value from determine_versions unpacking.
report_before_release is assigned but unused in generate_context. Replace with _report_before_release (or _) to avoid dead bindings and keep lint clean.
🧰 Tools
🪛 Ruff (0.15.15)
[warning] 442-442: Unpacked variable report_before_release is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@libraries/forms.py` around lines 442 - 444, The call to determine_versions in
generate_context currently unpacks three values into report_before_release,
prior_version, version but report_before_release is never used; change the
unpack to use a throwaway name (e.g. _report_before_release or _) for the first
element while keeping prior_version and version so lint stops flagging an unused
binding—locate the determine_versions call inside generate_context and replace
report_before_release with the chosen underscore variable.
Source: Linters/SAST tools
There was a problem hiding this comment.
out of scope for this PR. They're pre-existing lint, unrelated to anything changed.
There was a problem hiding this comment.
Understood, @javiercoronadonarvaez — the formatting change here is purely cosmetic (parens removal) and the RUF059 warning on report_before_release is indeed pre-existing. I'll dismiss this comment.
Would you like me to open a follow-up GitHub issue to track fixing the unused report_before_release binding in generate_context (and the similar one in get_stats) so it doesn't get lost?
(´• ω •`) 🐇
| @require_POST | ||
| def generate_description(request): | ||
| """Generate an AI description from submitted content (synchronous). | ||
|
|
||
| Backs the "Auto-Generate Description" button on the v3 create-post page. | ||
| Runs the summarization model inline and returns the result as JSON so the | ||
| browser can drop it into the Description field. | ||
|
|
||
| NOTE: intentionally not login-gated yet (local testing). This endpoint calls | ||
| a paid LLM, so add @login_required (and rate limiting) before it ships. | ||
| """ | ||
| title = request.POST.get("title", "").strip() | ||
| content = request.POST.get("content", "").strip() | ||
|
|
||
| if not content: | ||
| return JsonResponse({"error": "Add some content first."}, status=400) | ||
|
|
||
| try: | ||
| # Call the plain helper | ||
| summary = generate_summary( | ||
| content, | ||
| title, | ||
| settings.SUMMARIZATION_MODEL, | ||
| DESCRIPTION_SUMMARY_MAX_LENGTH, | ||
| timeout=30, | ||
| ) | ||
| except Exception: | ||
| logger.exception("generate_description: summarization failed") | ||
| return JsonResponse( | ||
| {"error": "Could not generate a description. Please try again."}, | ||
| status=502, | ||
| ) | ||
|
|
||
| if not summary: | ||
| return JsonResponse( | ||
| {"error": "Could not generate a description. Please try again."}, | ||
| status=502, | ||
| ) | ||
|
|
||
| return JsonResponse({"description": summary.strip()}) |
There was a problem hiding this comment.
Missing authentication exposes paid LLM endpoint to abuse.
The docstring notes this endpoint should have @login_required before shipping, but it's being merged without authentication. Unauthenticated access to an endpoint that calls paid LLM APIs creates:
- Cost exposure from repeated calls
- Resource exhaustion (30s timeout ties up workers)
Add @login_required before the @require_POST decorator (order matters: @login_required should be outer so it runs first). Consider rate limiting as a follow-up.
🛡️ Proposed fix
+from django.contrib.auth.decorators import login_required
+
+@login_required
`@require_POST`
def generate_description(request):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@news/views.py` around lines 588 - 627, The generate_description view is
currently unauthenticated and exposes a paid LLM endpoint; wrap it with Django's
login_required decorator so authentication runs before the POST check by adding
`@login_required` as the outer decorator above generate_description (i.e., ensure
`@login_required` appears above `@require_POST`), and consider adding a
rate-limiting decorator or middleware around generate_description to prevent
abuse and limit calls/timeouts.
There was a problem hiding this comment.
Out of scope for this ticket
There was a problem hiding this comment.
Understood, @javiercoronadonarvaez. The docstring already captures the intent (add @login_required (and rate limiting) before it ships), so the risk is documented in-code.
Would you like me to open a follow-up GitHub issue to track adding authentication and rate limiting to both generate_description and generate_link_description before they ship to production?
| @require_POST | ||
| def generate_link_description(request): | ||
| """Fetch the linked page, extract its main text, and summarize it. | ||
|
|
||
| Works for any public http(s) URL (trafilatura isolates the main article and | ||
| falls back to a visible-text dump for unusual templates). The fetch goes | ||
| through ``safe_get``, which blocks SSRF to internal/private hosts and | ||
| re-validates redirect targets. | ||
|
|
||
| Failure modes return separate JSON errors: | ||
| - URL is missing, malformed, or points at a non-public host (400). | ||
| - Fetch failed or no readable text could be extracted (502, "couldn't | ||
| read that link"). | ||
| - Summarization failed or returned empty (502, "couldn't generate"). | ||
|
|
||
| NOTE: intentionally not login-gated yet (matches `generate_description`). | ||
| Add @login_required + rate-limiting before this ships. | ||
| """ | ||
| url = request.POST.get("url", "").strip() | ||
| if not url: | ||
| return JsonResponse({"error": _LINK_INVALID_ERROR}, status=400) | ||
|
|
||
| try: | ||
| resp = safe_get(url, timeout=10) | ||
| resp.raise_for_status() | ||
| except UnsafeURLError: | ||
| logger.warning("generate_link_description: blocked unsafe url", url=url) | ||
| return JsonResponse({"error": _LINK_INVALID_ERROR}, status=400) | ||
| except requests.RequestException: | ||
| logger.exception("generate_link_description: fetch failed", url=url) | ||
| return JsonResponse({"error": _LINK_FETCH_ERROR}, status=502) | ||
|
|
||
| title, body = extract_article(resp.text, url=url) | ||
| if not body: | ||
| # Page fetched OK but no readable text could be isolated. | ||
| logger.warning("generate_link_description: extraction empty", url=url) | ||
| return JsonResponse({"error": _LINK_FETCH_ERROR}, status=502) | ||
|
|
||
| # Feed the extracted body through the same summarizer used by the Blog/News | ||
| # path — synchronously, with a real timeout so a hung upstream doesn't tie | ||
| # up a web worker (autoretry_for on the Celery task is a no-op when called | ||
| # inline; see news/tasks.py). | ||
| try: | ||
| summary = generate_summary( | ||
| body, | ||
| title, | ||
| settings.SUMMARIZATION_MODEL, | ||
| DESCRIPTION_SUMMARY_MAX_LENGTH, | ||
| timeout=30, | ||
| ) | ||
| except Exception: | ||
| logger.exception("generate_link_description: summarization failed", url=url) | ||
| return JsonResponse( | ||
| {"error": "Could not generate a description. Please try again."}, | ||
| status=502, | ||
| ) | ||
|
|
||
| if not summary: | ||
| return JsonResponse( | ||
| {"error": "Could not generate a description. Please try again."}, | ||
| status=502, | ||
| ) | ||
|
|
||
| return JsonResponse({"description": summary.strip()}) |
There was a problem hiding this comment.
Same authentication gap as generate_description.
This endpoint also lacks @login_required despite calling the paid LLM API. Additionally, the total potential worker tie-up is ~40s (10s fetch + 30s summarization), making unauthenticated abuse more impactful.
Apply the same fix: add @login_required before @require_POST.
🛡️ Proposed fix
+@login_required
`@require_POST`
def generate_link_description(request):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@news/views.py` around lines 634 - 697, Add the missing authentication
decorator to the generate_link_description view: update the function decorated
with `@require_POST` in news/views.py by placing Django's `@login_required`
decorator (from django.contrib.auth.decorators import login_required)
immediately above the view so the symbol generate_link_description is only
callable by authenticated users; ensure the import for login_required is added
if not present and keep `@login_required` above `@require_POST` to enforce login
before method check.
There was a problem hiding this comment.
Out of scope for this ticket
There was a problem hiding this comment.
Understood, @javiercoronadonarvaez — the docstring already notes this as a known gap to address before shipping. Would you like me to open a follow-up GitHub issue to track adding @login_required and rate-limiting to both generate_description and generate_link_description?
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
requirements.in (1)
78-78:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPin trafilatura to a specific version.
The trafilatura dependency lacks a version constraint, which creates security and reproducibility risks. Since this library parses arbitrary external web content (a high-risk surface), unpinned versions can silently introduce vulnerabilities or breaking changes.
📌 Recommended version constraint
# Main-content extraction from arbitrary web pages (Link post Auto-Generate) -trafilatura +trafilatura>=2.0.0,<3.0.0 # pin major version for stabilityAdjust the specific version range based on current testing. At minimum, pin the major version to prevent breaking changes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@requirements.in` at line 78, Replace the unpinned dependency entry "trafilatura" in requirements.in with a specific, constrained version to ensure reproducible and secure installs; update the line that currently contains trafilatura to either an exact pin (e.g., trafilatura==<stable-version>) or a bounded range that pins the major version (e.g., trafilatura>=<min>,<max>) after verifying compatibility in tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@requirements.in`:
- Line 78: Replace the unpinned dependency entry "trafilatura" in
requirements.in with a specific, constrained version to ensure reproducible and
secure installs; update the line that currently contains trafilatura to either
an exact pin (e.g., trafilatura==<stable-version>) or a bounded range that pins
the major version (e.g., trafilatura>=<min>,<max>) after verifying compatibility
in tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: d7fa9b30-4af4-476d-bd5e-d78a1d41e795
📒 Files selected for processing (2)
requirements.inrequirements.txt
✅ Files skipped from review due to trivial changes (1)
- requirements.txt
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
news/helpers.py (1)
74-77:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocstring claims auth + rate-limiting that don't exist yet.
The docstring states the DNS-rebinding gap is "acceptable here given the endpoint is also gated by auth + rate limiting", but per the PR notes the endpoint is currently not login-gated or rate-limited. This documentation/implementation mismatch could mislead future maintainers about the actual threat model.
Update the comment to reflect reality, or add a TODO referencing the planned auth/rate-limiting work.
Suggested fix
- Residual gap: DNS rebinding (host resolves public at check time, private at - connect time) is not closed — that needs pinning the validated IP into the - connection. Acceptable here given the endpoint is also gated by auth + rate - limiting; revisit if the threat model tightens. + Residual gap: DNS rebinding (host resolves public at check time, private at + connect time) is not closed — that needs pinning the validated IP into the + connection. TODO: add auth + rate-limiting to the calling endpoints before + production use; revisit if the threat model tightens.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@news/helpers.py` around lines 74 - 77, The comment in news/helpers.py that says the DNS-rebinding gap is "acceptable here given the endpoint is also gated by auth + rate limiting" is inaccurate because auth/rate-limiting aren't implemented; update that docstring/comment to reflect reality by removing or revising the claim and either (a) stating explicitly that the endpoint is currently not login-gated or rate-limited and documenting the current risk, or (b) adding a TODO with the planned auth/rate-limiting ticket reference and a short note that the DNS-pin mitigation should be added when that work lands; locate the existing DNS-rebinding text (the lines mentioning "Residual gap: DNS rebinding...") and edit it accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@news/helpers.py`:
- Around line 74-77: The comment in news/helpers.py that says the DNS-rebinding
gap is "acceptable here given the endpoint is also gated by auth + rate
limiting" is inaccurate because auth/rate-limiting aren't implemented; update
that docstring/comment to reflect reality by removing or revising the claim and
either (a) stating explicitly that the endpoint is currently not login-gated or
rate-limited and documenting the current risk, or (b) adding a TODO with the
planned auth/rate-limiting ticket reference and a short note that the DNS-pin
mitigation should be added when that work lands; locate the existing
DNS-rebinding text (the lines mentioning "Residual gap: DNS rebinding...") and
edit it accordingly.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
templates/news/v3/create.html (3)
236-243:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse the same http(s) rule for submit validation.
linkUrlValidalready treats onlyhttp:/https:as valid, butprepareSubmit()accepts any absolute URL thatnew URL()can parse. With native validation disabled, schemes likeftp:still pass submit-time checks even though the rest of this form treats them as invalid.Suggested fix
} else { - try { new URL(url); } catch (_) { + try { + const parsed = new URL(url); + if (parsed.protocol !== 'http:' && parsed.protocol !== 'https:') { + throw new Error('unsupported scheme'); + } + } catch (_) { this.errors.external_url = 'Please enter a valid URL.'; } }Also applies to: 580-586
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@templates/news/v3/create.html` around lines 236 - 243, linkUrlValid enforces only http/https but prepareSubmit currently accepts any absolute URL; update prepareSubmit to reuse the same validation logic. In prepareSubmit (the submit handler), instead of only trying new URL(externalUrl), call or replicate linkUrlValid's checks: trim externalUrl, ensure it's non-empty, parse with new URL(), and require parsed.protocol === 'http:' || parsed.protocol === 'https:'; if it fails, treat the URL as invalid and block submission. Reference: linkUrlValid, prepareSubmit, externalUrl.
564-615:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCancel pending autosave timers before clearing drafts.
If the user submits within the 800ms debounce window, the scheduled
save*Draft()callbacks can run afterclear*Draft()and recreate stale drafts inlocalStorage. That makes old content reappear on the next visit even though the submit path was successful.Suggested fix
} else { // Validation passed; the form will POST — drop the saved drafts. // (On a server-side reject, the description re-initializes from // `form.summary.value` so the user's text stays visible.) + clearTimeout(this._titleSaveTimer); + clearTimeout(this._externalUrlSaveTimer); + clearTimeout(this._contentSaveTimer); + clearTimeout(this._descriptionSaveTimer); + clearTimeout(this._linkDescriptionSaveTimer); this.clearTitleDraft(); this.clearExternalUrlDraft(); this.clearContentDraft(); this.clearDescriptionDraft(); this.clearLinkDescriptionDraft();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@templates/news/v3/create.html` around lines 564 - 615, prepareSubmit currently clears drafts (clearTitleDraft, clearExternalUrlDraft, clearContentDraft, clearDescriptionDraft, clearLinkDescriptionDraft) but does not cancel pending debounced autosave callbacks, allowing save*Draft timers to run after clearing and recreate stale drafts; before calling the clear*Draft methods, cancel any pending autosave timers (e.g. clearTimeout on whatever timeout IDs the debounced save handlers store, or call a new this.cancelPendingAutosaveTimers() helper that clears those timeout IDs and/or aborts in-flight saves), and if such timeout IDs or a cancel helper do not exist, add them to the corresponding saveTitleDraft/saveContentDraft/saveExternalUrlDraft/saveDescriptionDraft/saveLinkDescriptionDraft implementations so prepareSubmit can reliably cancel outstanding autosave callbacks before clearing drafts.
138-165:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPersist video descriptions too.
This textarea is active for both
videoandlink, but the draft helpers only save/restore whenthis.isLinkis true. For video posts the field shows a “Saved” state without ever writing tolocalStorage, so the description is lost on refresh.Suggested fix
- // ── Link description draft: localStorage, keyed per post type ───────── + // ── Link/Video description draft: localStorage, keyed per post type ─── linkDescriptionDraftKey() { return 'boost:createPost:linkDescriptionDraft:' + (this.postType || ''); }, @@ saveLinkDescriptionDraft() { - if (!this.isLink) return; + if (!this.isLinkType) return; try { if (this.linkDescription) localStorage.setItem(this.linkDescriptionDraftKey(), this.linkDescription); else localStorage.removeItem(this.linkDescriptionDraftKey()); } catch (_) {} }, restoreLinkDescriptionDraft({ allowEmpty = false } = {}) { - if (!this.isLink) { + if (!this.isLinkType) { if (allowEmpty) this.linkDescriptionSaveStatus = ''; return; }Also applies to: 460-498
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@templates/news/v3/create.html` around lines 138 - 165, The description textarea is used for both video and link posts (x-model="linkDescription") but the draft save/restore logic and "saved" indicator only run when this.isLink, so video descriptions never get written to localStorage; update the draft helpers to run for the same condition that shows the textarea (use isLinkType or include isVideo) and/or rename/refactor onLinkDescriptionEdited to a generic onDescriptionEdited that saves/restores linkDescription for both video and link posts; also adjust any checks around generateLinkDescription() usage so only the auto-generate button remains link-only but persistence uses the unified condition (isLinkType || isVideo or a single isDescriptionVisible flag) to ensure video descriptions are persisted and restored.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@templates/news/v3/create.html`:
- Around line 236-243: linkUrlValid enforces only http/https but prepareSubmit
currently accepts any absolute URL; update prepareSubmit to reuse the same
validation logic. In prepareSubmit (the submit handler), instead of only trying
new URL(externalUrl), call or replicate linkUrlValid's checks: trim externalUrl,
ensure it's non-empty, parse with new URL(), and require parsed.protocol ===
'http:' || parsed.protocol === 'https:'; if it fails, treat the URL as invalid
and block submission. Reference: linkUrlValid, prepareSubmit, externalUrl.
- Around line 564-615: prepareSubmit currently clears drafts (clearTitleDraft,
clearExternalUrlDraft, clearContentDraft, clearDescriptionDraft,
clearLinkDescriptionDraft) but does not cancel pending debounced autosave
callbacks, allowing save*Draft timers to run after clearing and recreate stale
drafts; before calling the clear*Draft methods, cancel any pending autosave
timers (e.g. clearTimeout on whatever timeout IDs the debounced save handlers
store, or call a new this.cancelPendingAutosaveTimers() helper that clears those
timeout IDs and/or aborts in-flight saves), and if such timeout IDs or a cancel
helper do not exist, add them to the corresponding
saveTitleDraft/saveContentDraft/saveExternalUrlDraft/saveDescriptionDraft/saveLinkDescriptionDraft
implementations so prepareSubmit can reliably cancel outstanding autosave
callbacks before clearing drafts.
- Around line 138-165: The description textarea is used for both video and link
posts (x-model="linkDescription") but the draft save/restore logic and "saved"
indicator only run when this.isLink, so video descriptions never get written to
localStorage; update the draft helpers to run for the same condition that shows
the textarea (use isLinkType or include isVideo) and/or rename/refactor
onLinkDescriptionEdited to a generic onDescriptionEdited that saves/restores
linkDescription for both video and link posts; also adjust any checks around
generateLinkDescription() usage so only the auto-generate button remains
link-only but persistence uses the unified condition (isLinkType || isVideo or a
single isDescriptionVisible flag) to ensure video descriptions are persisted and
restored.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 7e11b967-ab55-4876-ab66-5cd84e231215
📒 Files selected for processing (2)
news/tasks.pytemplates/news/v3/create.html
🚧 Files skipped from review as they are similar to previous changes (1)
- news/tasks.py
Issue: #2451
Summary & Context
Adds an Auto-Generate Description flow to the Link post type for any public web URL: the backend fetches the page, extracts the main article text (title + body) with trafilatura, and runs it through the same summarizer used by Blog/News so the user gets an editable draft instead of a blank Description field.
localhost:8000/news/add(select Link as the post type)Changes
POST /v3/news/generate-link-description/that accepts any public http(s) URL, fetches the page (10s timeout) through an SSRF-guarded helper, extracts title + body, and returns a summarized description as JSON. Three distinct failure modes (invalid/non-public URL → 400, fetch/extraction failure → 502, summarization failure → 502) map to separate inline messages.extract_articleinnews/helpers.py: trafilatura isolates the main article and strips boilerplate (navigation, footers, comments, ads), keeping the summarizer focused and cheaper than feeding it the whole page. Falls back to a naive visible-text dump (extract_content) when trafilatura can't parse a page, and returns("", "")only when even the fallback is empty. (Replaces the previous cppalliance.org-onlyextract_cppalliance_postCSS selectors.)safe_getinnews/helpers.py, used by both the endpoint and the backgroundset_summary_for_link_entrytask: rejects non-http(s) schemes and hosts resolving to loopback/private/link-local/reserved IPs (e.g.127.0.0.1,10.x,169.254.169.254), and follows redirects manually so each hop's host is re-validated.set_summary_for_link_entrytask now uses trafilatura extraction + the SSRF guard, and skips gracefully (logs and returns) when no readable text can be extracted.descriptionfield is persisted toLink.summaryon submit (same path Blog/News already use), so the generated text is available immediately on the post page without waiting for thesummary_dispatchertask.safe_get, which blocks loopback/private/link-local/reserved targets (including the cloud-metadata address169.254.169.254) and re-validates each redirect hop. Known residual gap: DNS rebinding (a host that resolves to a public IP at check time and a private one at connect time) is not closed — documented insafe_get; fully closing it would require pinning the validated IP into the actual connection.generate_descriptionendpoint — flagged in aNOTE:innews/views.py). Both should get@login_required+ throttling before this is exposed externally — more important now that the fetch target is fully user-controlled.developincludes that PR's work; review against the parent branch (javiercoronarv/2428-ai-assisted-desc-blog-news) for a clean view of the changes unique to this PR.Screenshots
Persistence and Functionality
Functionality.Persistence.mov
Dimensionality and Dark/Light Modes
Dimensionality.DarkLightMode.mov
Self-review Checklist
Frontend
Summary by CodeRabbit
New Features
Improvements
Chores