Skip to content

ci: install python3-dev so PyCall can find libpython#64

Draft
ChrisRackauckas-Claude wants to merge 2 commits into
SciML:masterfrom
ChrisRackauckas-Claude:fix-pycall-libpython-ci
Draft

ci: install python3-dev so PyCall can find libpython#64
ChrisRackauckas-Claude wants to merge 2 commits into
SciML:masterfrom
ChrisRackauckas-Claude:fix-pycall-libpython-ci

Conversation

@ChrisRackauckas-Claude

@ChrisRackauckas-Claude ChrisRackauckas-Claude commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Status: diagnosis update — python3-dev is correct but does NOT make CI green

Investigation (see ChrisRackauckas/InternalJunk#52 for full evidence) shows the real blocker is the runner fleet, not python3-dev.

Real root cause

The centralized grouped-tests.yml@v1 / downgrade.yml@v1 provision the Python stack via apt-packages, which the reusable workflow installs with sudo apt-get update && sudo apt-get install. The ubuntu-latest label is answered by BOTH:

  • ephemeral autoscaled ...-cxnps-runner-* runners (have passwordless sudo) and arctic1-* — apt works here, and
  • persistent demeter4-* runners — no passwordless sudo, so the apt step dies with:
    sudo: a terminal is required to read the password ...
    sudo: a password is required
    ##[error]Process completed with exit code 1
    

So whether a leg passes is a non-deterministic dice-roll on runner assignment. On this PR head, all 3 failing legs (Downgrade Core, QA julia 1, QA lts) landed on demeter4-12/demeter4-24 and died at sudo apt-get; the passing Core legs landed on arctic1-6 / a cxnps ephemeral runner. Master 073f2cd is red for the same reason.

The container: alternative is also broken on the persistent runners: those runner users have no Docker socket access (permission denied ... unix:///var/run/docker.sock). FEniCS.jl (which uses container:) is red on every recent run for exactly this reason.

The pre-migration CI sidestepped all of this with PYTHON: '' on buildpkg → PyCall built against Conda.jl's Python → pyimport_conda provisioned scipy in userspace via Conda (no sudo, no docker). The @v1 reusable workflows expose no env / build-env / PYTHON hook, so that userspace route is not expressible by a caller. (And on Julia 1.10 the Conda route additionally hits a libstdc++ CXXABI_1.3.15 clash with current conda-forge scipy.)

What this PR does

Keeps the legitimate python3-dev addition (it fixes the "Couldn't find libpython" build error on the legs that can run apt, and is a no-op where already present). It does not, on its own, make CI deterministically green — that requires a fleet/centralized-workflow fix (see below). Not faking a pass.

Correct fix (out of this repo's config-only scope)

Either/both, tracked in InternalJunk#52:

  1. Give persistent demeter*/arctic* runners passwordless sudo OR Docker-socket access.
  2. Add a build-env hook to the centralized tests.yml/downgrade.yml (e.g. set PYTHON: '' on buildpkg, optionally prepend the conda libstdcxx for Julia ≤1.10) so Python-stack packages can use the sudo/docker-free userspace-Conda provisioning the pre-migration CI relied on.

Local verification (this machine)

  • PYTHON='' + Conda route: using SciPyDiffEq (full __init__/pyimport_conda codepath) succeeds on Julia 1.12, sudo-free/docker-free. On Julia 1.10 it fails with conda-forge scipy 1.17 needing CXXABI_1.3.15 (Julia 1.10's bundled libstdc++ tops out at 1.3.14); LD_PRELOAD of the conda env's libstdcxx fixes it, but there's no workflow hook to set it.
  • apt route (debian scipy 1.11.4) imports scipy.integrate cleanly on Julia 1.10 (no CXXABI issue); verified end-to-end (using SciPyDiffEq + buildpkg) inside ubuntu:24.04 and cimg/base:current containers. Its only failure mode is sudo on demeter.

Please ignore until reviewed by @ChrisRackauckas.

🤖 Generated with Claude Code

ChrisRackauckas and others added 2 commits June 15, 2026 10:46
The grouped-tests.yml migration dropped the previous CI's
`PYTHON: ''` build env, so PyCall now builds against the runner's
system python3 instead of its bundled Conda python. On the Ubuntu
Noble runners the system python3 (3.12) has no libpython shared
object on disk unless a dev package pulls it in, so the buildpkg
step fails with "Couldn't find libpython" and every Core/QA job
errors before tests run.

Add python3-dev to apt-packages: it transitively installs the
libpython that PyCall's build dlopens, alongside the existing
python3-scipy. Verified locally: with PYTHON=python3, PyCall builds
against system python3 and the QA group (Aqua + JET) passes 12/12.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Verify the scoped GitHub-hosted runner routing (SciML/.github #97) now
sends this repo's apt-packages/container legs to ubuntu-24.04.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants