Support Gateways, Simplify subdomain logic and allow just using --site or DD_SITE (breaking: removes --subdomain)#576
Support Gateways, Simplify subdomain logic and allow just using --site or DD_SITE (breaking: removes --subdomain)#576platinummonkey wants to merge 10 commits into
--site or DD_SITE (breaking: removes --subdomain)#576Conversation
Fixes issue #574 and cleans up the SAML/SSO flag surface: - Remove the `--subdomain` login-only flag. Users now pass the full vanity host directly: `pup auth login --site mycompany.datadoghq.com` (or `DD_SITE=mycompany.datadoghq.com pup ...` for API calls). - Introduce site classification: `KNOWN_SITES` enumerates the canonical Datadog endpoints (datadoghq.com, us3/us5/ap1/ap2, datadoghq.eu, ddog-gov.com, datad0g.com staging). Canonical sites keep their `api.`/`app.` prefix behavior unchanged. Everything else is treated as a **literal host** used verbatim for both API requests and OAuth flows. - `DcrClient` now derives `api_host`/`auth_host` via the shared free functions (`api_host_for`/`auth_host_for`) so literal hosts are honored in DCR registration, token exchange, and the authorize URL — fixing the token-refresh path through a gateway too. - `make_dd_config` routes literal hosts through the SDK's mock-style template (server_index=1, `{protocol}://{name}`) so the SDK targets the host verbatim without prepending `api.`. Canonical sites use index 2 (no enum restriction, same `https://api.{site}` template). - `set_site_explicit` and `from_env` validate the normalized site with `validate_site`, which rejects URL-smuggling chars (`/`, `#`, `@`, etc.) while permitting DNS hostnames with an optional `:port` suffix. - Fix TROUBLESHOOTING.md: document `DD_SITE` as the override mechanism (with literal-host and port examples); note `DD_HOST` is not recognized. Behavior changes: - Vanity `mycompany.datadoghq.com` was routed to `api.mycompany.datadoghq.com`; now used verbatim (app. and api. are the same Datadog infrastructure). - The old `--site datad0g.com --subdomain dd` becomes `--site dd.datad0g.com`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Use cfg.auth_host() in acp.rs and bits.rs instead of hardcoding
format!("https://app.{}", cfg.site"), so vanity/literal hosts are
respected by the ACP and Bits commands.
- Fix set_site_explicit: validate raw input before normalize_site so
an empty --site "" is rejected rather than silently falling through to
"datadoghq.com" via normalize_site's empty-string fallback.
- Add explanatory comment on api_base_url documenting the intentional
asymmetry with api_host for PUP_MOCK_SERVER (full URL vs. bare host).
- Add tests: set_site_explicit rejects smuggling values and empty string
(site/site_explicit unchanged on failure); from_env rejects invalid
DD_SITE; DcrClient api_host/auth_host fields verified for canonical,
vanity, gateway, and staging sites.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ble bleed
Address issues found in the second review pass:
- Move validate_site to cover all site sources in from_env (explicit,
session-derived, and default) instead of only the explicit DD_SITE/
config-file branch — prevents a tampered sessions file from routing
to an attacker-controlled host without any validation.
- Add validate_site with warn-and-skip in apply_org_override so
session-derived sites are also checked when --org resolves a site
from the session registry.
- Fix literal-host make_dd_config branch: remove the "site" server
variable (set by Configuration::new() from DD_SITE env) before
switching to server_index=1 so it cannot bleed into the
{protocol}://{name} template. New test assertions confirmed the bug.
- Update OAUTH2.md: replace stale --subdomain example with --site
literal-host form.
- Add from_params comment explaining the trusted browser caller context
(why validate_site is intentionally omitted there).
- Add is_canonical_site comment documenting the oncall substring
convention and its effect on prefix handling.
- Add tests: whitespace-only DD_SITE behavior, site variable absence
for both literal-host make_dd_config paths.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… test
- Change apply_org_override to Result<()>: bail on invalid session site
rather than warn-and-skip, matching from_env's behavior and preventing
silent routing to the wrong (or attacker-controlled) endpoint when the
sessions file is tampered. Update all callers (main.rs, extensions/mod.rs)
with ? propagation; apply_to also becomes Result<()>.
- Reject whitespace-only DD_SITE in from_env (consistent with set_site_explicit
which already trims-and-rejects): without this, DD_SITE=" " would silently
normalize to "datadoghq.com" with site_explicit=true, blocking --org from
correcting the site. Update test from "falls back" to "rejects".
- Add apply_org_override negative test: saves a malformed session site and
asserts the function errors and leaves cfg.site unchanged.
- Add test_make_dd_config_literal_host_dd_site_env_does_not_bleed: sets
DD_SITE in the environment before calling make_dd_config with a literal
host, confirming the server_variables.remove("site") actually fires (the
two prior tests called remove_var first, making the assertion vacuous).
- Add debug_assert on server_variables.remove("site") return value to catch
a future SDK version that stops pre-populating this key.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
🐑 PR Shepherd is maintaining this PRI watch your PR and automatically fix CI failures, rebase your branch, handle flaky tests, and push it to the merge queue when it's ready. More about what I do → Guide To pause me on this PR, add the |
This comment has been minimized.
This comment has been minimized.
--site or DD_SITE (breaking: removes --subdomain)
CI clippy (Rust 1.95) flags `get("site").is_none()` in the make_dd_config
tests. Replace with `!contains_key("site")` as the lint suggests; no
behavior change.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
I'm working on reviewing this. So far I ran and tested it locally, and everything works, but it doesn't send all requests to the custom site, which I think was a main goal (besides simplification). We'll need to update README.md:260-263 which still reflects the old way. I'm looking into if there is a good way to get all traffic to go through the specified host. |
|
Key question seems to be what you asked in #574, so let's wait a bit to see? Independent of that particular user request, this does seem to simplify things, so I'm also OK with shipping this and following up about what they really want. This seems to work fine for regular Datadog subdomains (custom.datadoghq.com), but I'm not so sure about fully custom proxies (datadog-proxy.company.internal) Which one(s) are we trying to support... |
Simplify the site/host model so one recorded `site` (from --site/DD_SITE)
drives everything, with derivation kept only where Datadog's topology
genuinely requires it:
- make_dd_config: replace the three-way server_index branch (mock /
literal-host index 1 with the server_variables.remove("site") guard /
canonical index 2) with a single path that routes the SDK at
cfg.api_base_url() — split on "://" into protocol+name, always
server_index=1. api_base_url() already encapsulates the mock override,
api.{site} derivation for canonical sites, and verbatim literal hosts.
Drop the now-dead Config::is_literal_host().
- Fix gateway login site: resolve_effective_site() keeps the user's
--site/DD_SITE host for verbatim hosts (vanity/gateway/oncall) instead
of adopting the OAuth callback's reported backend region, so the
exchanged token, saved session, and DCR credential all stay keyed to
the single host the user pinned. Canonical Datadog sites still adopt
the callback region (datadoghq.com -> us3.datadoghq.com).
- Extract the shared uses_datadog_subdomains() predicate so api_host_for,
auth_host_for, and resolve_effective_site classify hosts identically —
fixing an oncall inconsistency and preventing future drift.
Tests rewritten to assert the resolved host (api.datadoghq.com,
verbatim gateway/vanity/oncall) rather than SDK internals, plus a
scheme-less mock case and resolve_effective_site coverage for canonical,
gateway, vanity, and oncall inputs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Harden a pre-existing trust gap in the login flow: the OAuth callback's `domain` (attacker-influenceable via the stdin-paste fallback) was used as the token-exchange host and storage key with no validation. On a canonical-site login a tampered `domain=evil.com` would send the auth code, PKCE verifier, and DCR client credentials to that host. resolve_effective_site now adopts the callback domain only when it is a Datadog-owned host (new config::is_datadog_owned_host), otherwise falling back to the user's already-validated pinned site. The check is a suffix-based allowlist over the Datadog parent domains (*.datadoghq.com, *.datadoghq.eu, *.ddog-gov.com, *.datad0g.com) so new regions keep working without a code change. A plain syntactic check (validate_site) was insufficient — a well-formed evil.com passes it. Tests: is_datadog_owned_host (parent + subdomain of each base owned; foreign/look-alike/smuggling rejected); resolve_effective_site rejects a well-formed foreign callback domain, not just malformed ones. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The suffix-only ownership check was bypassable: a callback domain like
`evil.com/x.datadoghq.com` survives strip_suffix("datadoghq.com") as
"evil.com/x." (ends with '.'), so it passed the allowlist — yet parses as
the URL host `evil.com` with the rest as path, re-opening the exact
credential-exfiltration vector. Folded validate_site into
is_datadog_owned_host as a mandatory first gate so the check is
self-contained: anything that isn't a bare DNS hostname (contains /, @,
#, whitespace, schemes) is rejected before the suffix allowlist runs.
Tests: added the smuggling-prefix payloads (evil.com/x.datadoghq.com,
foo@evil.com/.datadoghq.com, evil.com#x.datadoghq.com) to both the
is_datadog_owned_host and resolve_effective_site rejection cases.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
this PR is designed around |
Summary
Removes the
--subdomainlogin flag and makes a singlesite(from--site/DD_SITE) the one host pup records. Canonical Datadog sites still deriveapi./app.subdomains; everything else (vanity domains, custom gateways) is used verbatim for API requests and OAuth. Fixes #574.Changes
--subdomain; pass the full host directly (pup auth login --site mycompany.datadoghq.com).KNOWN_SITESmarks canonical Datadog endpoints (which keepapi./app.derivation); anything else is a literal host used verbatim (src/config.rs,src/commands/auth.rs,src/main.rs).make_dd_configroutes the SDK atcfg.api_base_url()for every case (mock, canonical, literal), replacing the prior three-wayserver_indexbranch (src/client.rs).uses_datadog_subdomains()is the one predicate behindapi_host_for/auth_host_forand login site resolution, so the API host, auth host, and token-exchange host can't diverge (src/config.rs).domainis ignored so the token, session, and DCR credential stay keyed to the user's host. For canonical sites the callback region is adopted only if it's a Datadog-owned host (*.datadoghq.com,*.datadoghq.eu,*.ddog-gov.com,*.datad0g.com) — closing a pre-existing gap where an attacker-influenceable callback could redirect the code exchange (src/commands/auth.rs,src/config.rs).validate_site(rejects URL-smuggling, allows a DNS host + optional:port); docs updated for theDD_SITEliteral-host form (docs/OAUTH2.md,docs/TROUBLESHOOTING.md).Behavior changes
mycompany.datadoghq.comis now used verbatim instead of being routed toapi.mycompany.datadoghq.com.--site datadoghq.com --subdomain mycompanybecomes--site mycompany.datadoghq.com.Testing
cargo test --bin pup -- --test-threads=1(1321 pass),cargo clippy -- -D warnings,cargo fmt --checkCloses #574
🤖 Generated with Claude Code