Skip to content

[PowerPC][AIX] Support #pragma comment copyright for AIX#6

Closed
tonykuttai wants to merge 1779 commits into
mainfrom
tvarghese/pragma-copyright-comment-update
Closed

[PowerPC][AIX] Support #pragma comment copyright for AIX#6
tonykuttai wants to merge 1779 commits into
mainfrom
tvarghese/pragma-copyright-comment-update

Conversation

@tonykuttai

Copy link
Copy Markdown
Owner

[PowerPC][AIX] Support #pragma comment copyright for AIX

  • Emit !aix.copyright.comment from Clang for the pragma.
  • Lower it in LLVM to a TU-local string + llvm.used + !implicit.ref.
  • Add module-import and backend relocation tests.

[Clang][Modules] Fix -Wunused-variable (llvm#196577)

Mark some variables [[maybe_unused]] and inline others that do not have
side effects to avoid -Wunused-variable in non-assert builds.

[AArch64][GlobalISel] Legalize F64 to BF16 fptruncates (llvm#196077)

This two-step expansion of bf16 fptrunc steps needs to be careful to
avoid double-rounding error. Under AArch64 we can apparently convert to
a fcvtxn that performs round-to-odd, followed by a standard fp truncate
to bf16 to make sure the rounding from there is done correctly. This
reuses the existing lowering added for vector operations.

[SLP][NFC]Add a test with the revectorization of the struct-returning intrinsics

Reviewers:

Pull Request: llvm#196581

[AMDGPU] Add missing CMake link component (llvm#196579)

The issue was triggered by llvm#196547.

[SLP] Vectorize struct-returning intrinsics

Allow SLP to combine across lanes calls that return a literal struct
(llvm.sincos, llvm.*.with.overflow, llvm.frexp, ...) into a single
call returning a struct of vectors, by widening {T, T, ...} to
{, ...} via VectorTypeUtils and emitting extractvalue +
extractelement for external uses.

Reviewers: hiraditya, bababuck, RKSimon

Pull Request: llvm#195521

[PowerPC][NFC]Refactor EmitInstrWithCustomInserter (llvm#196114)

Currently PPCTargetLowering::EmitInstrWithCustomInserter() uses a large
if/else-if structure. Update to use switch and
move ATOMIC_CMP_SWAP and SELECT code to helper functions for better
readability and maintenance.

[AMDGPU] Pre-commit unit test for RP tracking reset/advance inconsistencies fix (llvm#196098)

This adds a new AMDGPU unit test file for testing the behavior of
GCNRPTracker and its related classes. The two test showcase confusing
return value and behavioral semantics for variants of the advance and
reset functions, which will be clarified in a follow up commit.

Revert "[SLP] Vectorize struct-returning intrinsics"

This reverts commit b0c6df7 to fix
buildbots https://lab.llvm.org/buildbot/#/builders/52/builds/17118

Reviewers:

Pull Request: llvm#196591

[OFFLOAD][L0] Fix incorrect values in the Level Zero cached header (llvm#196587)

The current ZE_STRUCTURE_TYPE_DEVICE_IP_VERSION_EXT and
ZE_STRUCTURE_TYPE_RELAXED_ALLOCATION_LIMITS_EXP_DESC values are
incorrect as seen here:
*
https://github.com/oneapi-src/level-zero/blob/0f246f6edf90d56604f00f83b41d783dc6a9394e/include/ze_api.h#L318
*
https://github.com/oneapi-src/level-zero/blob/0f246f6edf90d56604f00f83b41d783dc6a9394e/include/ze_api.h#L324

clang: Consolidate -aux-triple handling (llvm#196551)

All of the offload languages were essentially doing the
same thing, with overcomplicated conditions conditional on
the language.

[flang][OpenACC] support collapse on unstructured acc.loop (llvm#196174)

PR llvm#164992 added unstructured-loop support to OpenACC lowering (no
bounds on acc.loop, IVs privatized, body emitted as explicit cf), but it
didn't covered the collapse(N) case. Compiling

  !$acc parallel loop collapse(2)
  do j = 1, n
    do i = 1, n
      if (i == jdiag) then
        a(i,j) = 0.0d0
        cycle
      end if
      a(i,j) = real(i + j, 8)
    end do
  end do

asserted in MLIR's runRegionDCE: "Assertion `mightHaveTerminator()'
failed".

Root cause: visitLoopControl unconditionally marked every inner DO of a
collapsed nest via markDoConstructAsCollapsed. genFIR(DoConstruct) then
read that marking and skipped the inner DO's loop machinery on the
assumption that the parent acc.loop iterates and supplies the IV via a
block argument. That assumption holds for the structured case, but not
for the unstructured case added in llvm#164992. Skipping it left the
PFT-pre-allocated scaffold blocks (pre-header, header, exit) without
terminators.

Fix: add a markInnerCollapsed parameter (default true) to
visitLoopControl, and pass false from privatizeInductionVariables (the
unstructured case of buildACCLoopOp).

Assisted-by: AI

[flang][OpenMP] Fix component-level initializer in declare reduction (llvm#195751)

When a declare reduction initializer uses a component assignment such as
initializer(omp_priv%member = 0), the lowering would store the scalar
RHS value (i32) directly to the whole derived-type reference, causing a
FIR verification error: 'fir.store' op store value type must match memory reference type.

The root cause is that MakeEvaluateExpr extracts only the RHS
expression
from the AssignmentStmt, discarding the LHS component information. The
lowering callback then returns this scalar value which gets stored to
the
wrong type.

Fix this by mirroring the approach already used for combiner
expressions:
pass the parser-level OmpStylizedInstance to processInitializer so
the
callback can access the typed assignment and lower the full assignment
(both LHS and RHS), correctly handling component designators, function
calls on the RHS, and user-defined assignment.

Fixes llvm#184927 (with-initializer part; the without-initializer case
remains unsupported).

Assisted-by: Claude Opus 4.6.

Co-authored-by: Matt P. Dziubinski matt-p.dziubinski@hpe.com

[compiler-rt][profile][NFC] Introduce INSTR_PROF_INSTRUMENT_GPU_FUNC macro (llvm#196538)

Add a macro INSTR_PROF_INSTRUMENT_GPU_FUNC for the name of the GPU
profiling function __llvm_profile_instrument_gpu (added in llvm#187136),
following the same pattern as INSTR_PROF_VALUE_PROF_MEMOP_FUNC. Use the
macro in both the declaration in InstrProfiling.h and the definition in
InstrProfilingPlatformGPU.c.

This prepares the upcoming HIP/AMDGPU offload PGO patch (llvm#177665) to use
the same macro when calling this function.

[AMDGPU] Add subtarget features for MAD NC and 64-bit MIN/MAX instructions (llvm#196326)

[InstCombine][NFC] Replace buildAssumeFromKnowledge with CreateAlignmentAssumption (llvm#196254)

[DWARFLinker] Emit .debug_names entries for DW_TAG_template_alias (llvm#196440)

The tag was missing from the accelerator-records saver's switch, so
template alias DIEs were skipped and --verify-dwarf=output rejected the
result.

[lldb] Fix TestPtrauthBRKc47xX16Invalid.py (llvm#196408)

LLDB correctly detects the pointer authentication failure.

[lldb] Remove __iter/len__ from SBTypeEnumMember (llvm#196610)

SBTypeEnumMember doesn't have a GetSize and
GetTypeEnumMemberAtIndex, so having __iter__ and __len__ doesn't
make sense. These are on SBTypeEnumMemberList. From the docstrings, it
looks like the extensions were copied from said type.

[CIR] Implement CoawaitExpr for ComplexType (llvm#194027)

Implement CoawaitExpr support for ComplexType

Issue llvm#192331

[cir] fix IR dump comments from llvm#195198 (llvm#196605)

[VPlan] Unify inner and outer loop paths (NFCI). (llvm#192868)

Move combine the logic of tryToBuildVPlanWithVPRecipes and
tryToBuildVPlan, as well as planInVPlanNativePath and plan.

This unifies the code paths to construct plans for both inner and outer
loop vectorization, and removes some duplication. It also ensures we run
almost the same VPlan-transformations in both modes. Currently a few
code paths need to be guarded with a check if we are dealing with an
inner and outer loop.

PR: llvm#192868

[gn] port 2e2d90b (llvm#196618)

[gn build] Port 3fe311f (llvm#196619)

[gn build] Port c507e20 (llvm#196620)

[gn build] Port e6efa1a (llvm#196621)

[gn build] Port ebb9a79 (llvm#196622)

[gn] port 7e74c78 (llvm#196624)

[clang][deps] Use ModuleDepCollector for Make output (llvm#182063)

The dependency scanner works significantly differently depending on what
kind of output it's asked to produce. The Make output format has been
using the regular Clang dependency collection mechanism since it was
first implemented. This means the implementation works very differently
to the rest of the scanner and isn't able to turn implicit module
command lines into Makefiles using explicit modules.

This PR unifies the two implementations, using ModuleDepCollector even
for Make output. Emitting explicit module builds into Makefiles will
come in a later PR.

[libc++] Remove _LIBCPP_HIDE_FROM_ABI from <__utility/pair.h> (llvm#196508)

This is a follow-up to llvm#193045. This only drops _LIBCPP_HIDE_FROM_ABI
in a small part of the code base to make sure everything works as
expected. Once this has been in trunk for a while and there aren't any
problems, there will be larger follow-up patches to remove
_LIBCPP_HIDE_FROM_ABI throughout the code base.

[mlir][core] Restore dropped printIR behavior. (llvm#196628)

Restore checking for module scope which is dropped in llvm#195198

[VPlan] Fix cyclic phi type inference in early outer loop plans. (llvm#196634)

For phis check if any of the operands are VPIRValues or we already have
cached types. If so, return them.

This fixes a verification stack overflow in the VPlan outer loop path
after llvm#192868.

[DWARFLinker] Deduplicate .debug_frame CIEs across LinkContexts (llvm#195393)

Each LinkContext held its own EmittedCIEs map, so linking the same
object twice (or two objects with identical CIEs) produced one CIE per
LinkContext instead of one shared CIE. Hoist the registry to linker
scope and split emission into three phases so contexts can emit their
frames concurrently while still sharing one deduplicated CIE pool:

  1. Scan (parallel, during link). scanFrameData() records the unique CIEs
    referenced by retained FDEs, in first-reference order, into
    FrameScanResult::CIEs. scanAndUnloadInput() chains the scan in front of
    the existing input-unload so the DWARFContext can be released before the
    post-link emit pass.

  2. Merge (serial, after link completes). registerCIEs() walks each
    context's scanned CIEs in ObjectContexts order and try_emplaces them
    into the linker-wide CIERegistry. The first LinkContext to reference a
    CIE becomes its owner and reserves a local offset in its own
    .debug_frame section; later contexts only learn the owner's section and
    offset.

  3. Emit (parallel). emitDebugFrame() writes each context's owned CIEs
    followed by its FDEs into its own SectionDescriptor. FDE CIE_pointers
    are recorded as DebugOffsetPatches against the owner's section; the
    existing patch resolver rebinds them to OwnerStartOffset + LocalOffset
    when global offsets are assigned. Each task writes only to its own
    section, so no locking is needed.

Output is fully deterministic: ownership assignment, per-context CIE
order, FDE order within a section, and section concatenation order all
depend only on the input, not on thread scheduling. A context's CIEs may
now appear after FDEs (from other contexts) that reference them — DWARF
allows this, and cross-context FDE -> CIE pointers resolve correctly via
the patch mechanism.

AMDGPU/GlobalISel: RegBankLegalize rules for cluster_load_b32/b64/b128 (llvm#196186)

AMDGPU/GlobalISel: RegBankLegalize rules for cvt fp8 e5m3 intrinsics (llvm#196369)

[libc] Skip targets with unavailable __ONLY flags (llvm#196637)

When SKIP_FLAG_EXPANSION strips a flag that has the __ONLY modifier,
remove_duplicated_flags drops the flag from the list. This leaves
expand_flags_for_target with an empty flag list, causing it to create a
plain (non-flag) target. The __ONLY semantics, "only build this target
with the flag active", are silently violated.

On x86-64 CI runners without FMA, this results in cosf_float_test and
sinf_float_test being built and linked without FMA. The sincosf
algorithm was tuned assuming fused multiply-add precision, so the
unfused x*y+z fallback exceeds the 3.5 ULP tolerance (57 ULP for cosf,
12 ULP for sinf).

Added an early return in add_target_with_flags: if any flag with the
__ONLY modifier would be skipped, the target is not generated.

Assisted-by: Automated tooling, human reviewed.

[DWARFLinker] Don't duplicate classes with in-class static decls (llvm#196442)

An in-class static declaration was forced to PlainDwarf placement and
cascaded that up to its enclosing class. If the class was already in the
type table via the out-of-line definition's specification, it ended up
with Both placement and cloneDIE emitted two copies. Keep in-class
static declarations in the type table so they stay with their enclosing
type.

[libc] Disable -march=native in CI to fix sccache poisoning (llvm#196560)

-march=native is incompatible with shared build caches because sccache
treats it as a literal string. Object files compiled on one CPU model
get silently served to runners with a different CPU, causing SIGILL
crashes in the opt_host memory tests.

Made LIBC_COMPILE_OPTIONS_NATIVE a CMake cache variable so CI can
override it. Both overlay and fullbuild workflows now pass
-DLIBC_COMPILE_OPTIONS_NATIVE="" to disable -march=native. Local
developer builds are unaffected and still default to -march=native.

Reverted the per-CPU cache key approach from llvm#196477 in favour of this
fix, which addresses the root cause.

Bumped sccache key versions (v2) in both workflows to invalidate the
poisoned caches.

Assisted-by: Automated tooling, human reviewed.

[lldb] Add lldb.summary and lldb.synthetic decorators (llvm#195351)

Adds two new decorators, @lldb.summary and @lldb.synthetic,
analogous to the existing @lldb.command decorator.

@lldb.summary("MyType")
def MyType_summary(valobj, _):
      return "summary string"

@lldb.synthetic("MyContainer")
class MyContainerSynthetic:
    def __init__(self, valobj, _): ...

These decorators result in type summary add and type synthetic add
commands being run.

An additional motivation: these decorators will make it straightforward
to invoke the Python-to-LLDB formatter bytecode compiler
(formatter_bytecode.Compiler), which currently requires command-line
flags to know how to register formatters. With these decorators, the
registration metadata is associated directly with the implementing
function or class.

See the docstrings and formatters.py test fixture for usage examples.

Assisted-by: claude

[X86] combine-add.ll - regenerate to show missing add asm comments (llvm#196647)

[lld][WebAssembly] Remove the experimental warning for PIC/dynamic linking (llvm#196566)

The current dynamic linking support has been used for several years not
both in emscripten and in wasi-sdk and is documented
https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md.
We did/do have have plans to develop another version of the dynamic
linking ABI that doesn't use a global symbol namespace, and that can
still happen, but the current API is clearly production worthy
regardless of future plans.

This change removes the linker warning and the corresponding
--experimental-pic flag.

If we do want to still make breaking changes to the dylink format we can
rename the dylink.1 section (which already contains a version number).

This change is leads the way for enabling shared libraries by default in
emscripten.

[flang][cuda] Widen stream argument to i64 in stream intrinsic lowering (llvm#196650)

genCUDASetDefaultStream and genCUDAStreamDestroy build their runtime
call with an i64 stream parameter but pass the actual argument
straight through, so a smaller-kind actual (e.g. the literal 0 in
cudaforSetDefaultStream(0)) produces an ill-typed fir.call:

error: 'llvm.call' op operand type mismatch for operand 0: 'i32' != 'i64'

Insert a fir.convert to i64 before the call, matching what
genCUDASetDefaultStreamArray already does.

[mlir][AMDGPU] Add, unify verification of memref index counts (llvm#196657)

This PR verifies that, on operations that have
%memref[%idx0, %idx1, ...] arguments, the number of indices matches
the rank of the memref being passed in.

While we're here, fixes capitalization for certain verification error
messages.

Assisted-by: Codex 5.5 (handled much of the implementation)

[lldb] Handle SIGINT via the MainLoop signal thread (on POSIX) (llvm#195959)

The driver's async SIGINT handler called
SBDebugger::DispatchInputInterrupt directly. That is not
async-signal-safe and can lead to a crash.

Register SIGINT with the existing signal-thread MainLoop instead so
DispatchInputInterrupt runs in normal thread context. The Windows path
is unchanged and keeps the legacy async handler.

While DispatchInputInterrupt runs, the callback temporarily installs
SIG_DFL so a second Ctrl-C still hard-terminates the process, preserving
the escape hatch users rely on when the debugger is unresponsive.

Moving SIGINT off the main thread means a Ctrl-C no longer interrupts
blocking syscalls there (e.g. a Python REPL waiting on input or
sleeping), so Python never observes the queued interrupt and
KeyboardInterrupt is not raised. To restore that behavior, after
dispatching the interrupt the callback re-raises SIGINT on the main
thread via pthread_kill; the resulting EINTR lets Python pick up the
pending interrupt. A skip flag suppresses the re-entry that this
self-send produces. Because the callback only ever runs on the signal
thread, the flag and the captured main-thread id live in the lambda's
captures and need no synchronization.

rdar://158218595

[BOLT][NFCI] Consolidate DataReader::setEntryCounts (llvm#196411)

FuncBranchData/BinaryFunction exec/external entry counts are set
in multiple places in DataReader:

  • FBD: in parse and appendFrom,
  • BF: in preprocessProfile and matchProfileData.

Consolidate to setEntryCounts called from readProfile.
Drop explicit counters, compute them from FBD::EntryData.

Test Plan: NFCI

[DirectX] Not print invalid root signature definitions. (llvm#196444)

This patch adds a check during root signature printing pass, that makes
sure we have valid root signature before starting printing. This is
required after llvm#194858 changed
reportError to not stop after emitting the first error.

Fix: llvm#196430

[clang][deps] Move ScanningOutputFormat out of the library (llvm#196631)

Basing behavior of the dependency scanner on the final output format is
a leaky abstraction. Instead, we should aim to introduce proper feature
flags.

[RISCV] Use the nhs.lea.h/w/d instead of nhs.lea.h/w/d.ze with Sh1AddPat. (llvm#196660)

The srliw already took care of zeroing the upper bits. Using the non-.ze
form is consistent with the Zba version of this pattern.

Revert "[BOLT] Fix EH data encoding checks in relocateEHFrameSection (llvm#195691)" (llvm#196672)

This reverts commit 7ab26d7.

There is test failure in bolt-tests::exceptions-split-strip.test.

[mlir][tensor] Enhance pattern to fold extract_slice(insert_slice) (llvm#195045)

Extend the DropRedundantRankExpansionOnExtractSliceOfInsertSlice pattern
to support cases where the expanded dimensions are a subset of the
dropped dimensions, rather than requiring them to be exactly equal.
For example:

%inserted_slice = tensor.insert_slice %src into %dest[0, 0, 0, 0] [1, 1, 128, 480] [1, 1, 1, 1] :
        tensor<128x480xf32> into tensor<1x1x128x480xf32>
%extracted_slice = tensor.extract_slice %inserted_slice[0, 0, 0, 0] [1, 1, 123, 1] [1, 1, 1, 1] :
        tensor<1x1x128x480xf32> to tensor<123xf32>

can be folded into:

%extracted_slice = tensor.extract_slice %src[0, 0] [123, 1] [1, 1] :
        tensor<128x480xf32> to tensor<123xf32>

[CodeGen] Use unique_ptr for FunctionInfo to prevent memory leaks (llvm#196603)

Raw pointer return from FunctionInfo::create caused leaks in callers
like computeABIInfoUsingLib, breaking BPF tests on ASan bots.
Using std::unique_ptr enforces automatic cleanup.

Fixes leak from llvm#194460.
Buildbot: https://lab.llvm.org/buildbot/#/builders/52/builds/17090

Assisted-by: Gemini

[CIR][RISCV] Support zksh builtin codegen (llvm#196463)

[lldb] Fix CommandObjects that don't set a return status (llvm#196588)

Several CommandObject subclasses had DoExecute paths that returned
without ever calling SetStatus on the CommandReturnObject. The status
was silently left at its initial eReturnStatusStarted value, which made
Succeeded() report false for what were really successful commands and
left CommandReturnObject in an undefined state.

[AMDGPU] Support atomic load and store for vector float types (v2f16, v2i16, v4i16, v4f16, v2f32) (llvm#192904)

Add support for atomic load and store on <2 x half>, <4 x half>, and
<2 x float> vector types in the AMDGPU backend.

These types are promoted to equivalently sized integer types before
instruction selection:
<2 x half> -> i32
<4 x half> -> i64
<2 x i16> -> i32
<4 x i16> -> i64
<2 x float> -> i64

Revert "[lldb] Handle SIGINT via the MainLoop signal thread (on POSIX)" (llvm#196684)

Reverts llvm#195959 because it caused
TestIOHandlerCompletion.py to fail in CI (GreenDragon).

[Clang] Do not eat SFINAE diagnostics for explicit template arguments (llvm#139066)

Instead of merely suggesting the template arguments are invalid, we now
provide an explanation of why the explicit template argument is invalid.

[Utils] Fix duplicate DomTree updates in SplitIndirectBrCriticalEdges (llvm#196475)

SplitIndirectBrCriticalEdges generates DomTree Insert/Delete pairs for
each predecessor in OtherPreds. However, OtherPreds can contain
duplicate entries when a conditional branch has both targets pointing to
the same block (e.g., br i1 %c, label %X, label %X). This produces
duplicate DomTree updates for the same edge, triggering the assertion
std::abs(NumInsertions) <= 1 && "Unbalanced operations!" in
LegalizeUpdates.

Fix by tracking which source blocks have already had DomTree updates
emitted, and skipping duplicates.

[CIR][CUDA][NVPTX] Set ptx_kernel calling convention on CUDA kernels (llvm#195382)

Related: llvm#179278,
llvm#175871

More target attributes like: NoInline on kernels, CUDALaunchBoundsAttr,
CUDAGridConstantAttr param attrs, nvvm.annotations for surface/texture
VarDecls to be deferred for later patches.

[DAGTypeLegalizer] Add missing BR_CC handler for soft-promoted half operands (llvm#196214)

SoftPromoteHalfOperand had no case for ISD::BR_CC, causing a crash
when a half-typed fcmp result fed directly into a conditional branch.
All other comparison-related nodes (SETCC, SELECT_CC) were already
handled. Add SoftPromoteHalfOp_BR_CC following the same pattern as
SoftPromoteHalfOp_SELECT_CC.

Fixes llvm#195562


Co-authored-by: Tony Varghese tony.varghese@ibm.com

[RISCV][GISel] Add test coverage for the srliw+shXadd patterns. NFC (llvm#196676)

GISel isn't canonicalizing the shift pair to an AND the same way
SelectionDAG does so the patterns weren't firing. Add more directed
tests that use an And explicitly.

[clang][AMDGPU] Reject malformed target IDs with empty components (llvm#196140)

Fixes llvm#196078

An extra colon in -mcpu (e.g. gfx900::xnack+) produced an empty
feature component and triggered an assertion in StringRef::back().

Return std::nullopt for malformed target IDs instead.

[AArch64][GlobalISel] Enable BF16 legalization for fadd and friends. (llvm#196081)

This enabled bf16 promotion for the following operations in GISel,
promoting them to f32 and truncating the result back:
G_FADD, G_FSUB, G_FMUL, G_FDIV, G_FMA, G_FSQRT, G_FMAXNUM, G_FMINNUM,
G_FMAXIMUM, G_FMINIMUM, G_FCEIL, G_FFLOOR, G_FRINT, G_FNEARBYINT,
G_INTRINSIC_TRUNC, G_INTRINSIC_ROUND, G_INTRINSIC_ROUNDEVEN

[AArch64][NFC] Remove unused TRI member from class (llvm#184363)

I’ve removed the TRI member and its initialization, leaving only MRI and
TII as the stored pointers.


Co-authored-by: Benjamin Maxwell benjamin.maxwell@arm.com

[ObjectYAML][NFC] Extract BBAddrMap YAML types into shared namespace (llvm#196019)

Move BBAddrMapEntry and PGOAnalysisMapEntry out of namespace ELFYAML
into a new format-agnostic namespace BBAddrMapYAML so that COFF
YAML support can reuse the same schema and MappingTraits.

[clang] Update cxx_dr_status.html (llvm#196702)

Updates from 2026-05-08 CWG telecon.

[clang-tidy] Avoid use-nodiscard false positives for class templates (llvm#196661)

Do not suggest adding [[nodiscard]] to functions returning a class
template specialization whose primary template is already marked
[[nodiscard]].

Class template specializations do not carry the [[nodiscard]]
attribute on their own declarations, so modernize-use-nodiscard
previously missed this case and emitted redundant diagnostics for return
types such as:

template <class T>
struct [[nodiscard]] Result;

Result<int> f() const;

Fixes llvm#163425.

[CI] Ignore TidyFastChecks.inc for formatter CI. NFC. (llvm#196682)

TidyFastChecks.inc is generated and its contents should not be checked
by clang-format CI workflow. Add a local .clang-format-ignore entry so
the PR formatting check does not report diffs for this file.

Related run:
llvm#194516 (comment)

[clang-tidy] Migrate explicit-constructor check from google to misc and add relative aliases (llvm#194807)

Fixes llvm#126032

[AArch64][GlobalISel] Promote BF16 G_FCMP (llvm#196093)

This adds bf16 legalization for floating point compares.

[RISCV][NFC] Rename Zvvmm instruction file to Zvvm (llvm#196692)

Renames RISCVInstrInfoZvvmm.td to RISCVInstrInfoZvvm.td so Zvvmm
and Zvvfmm share the same IME instruction file according to the spec.
And all future instructions from the Zvvm family will be placed here
too.

This PR is required for reviewing llvm#196486 in order to make GitHub show
the diff correcrly.

[BPF] Support Stack Arguments (llvm#189060)

Currently, bpf program and kfunc only support 5 register parameters. As
bpf community and use cases keep expanding, there are some need to
extend 5 register parameters by allocating additional parameters on
stack. There are two main use cases here:

  1. Currently kfunc is limited to 5 register parameters. In some special
    situation, people may want to have more than 5 parameters. One of
    example is for sched_ext.
  2. Allowing more stack parameters can make bpf prog writer easier since
    they do not need to carefully limit the number of parameters for their
    programs.

The following is the high-level design:

  • Use bpf register R11 as the frame pointer to stack parameters. This is
    to avoid mixing stacks due to R10.
    • Stack parameters must be after 5 register parameters.
  • All parameters should be at most 16 bytes as ByVal parameters are not
    supported.
  • Support for cpu v1 to v4 so all cpu versions can use this. A feature
    macro __BPF_FEATURE_STACK_ARGUMENT is defined and users can check
    whether stack argument is supported or not.

The below is a simple asm code example about stack parameters:

  bar:
    /* Retrieve two parameters from the caller of bar(). */
    rX = *(u64 *)(r11 + 8)  // 1st arg
    rY = *(u64 *)(r11 + 16) // 2nd arg
    ...
    /* Prepare the single stack parameters for foo1 */
    *(u64 *)(r11 - 8) = rZ  // 1st arg
    call foo1
    ...
    /* Prepare the single stack parameters for foo2 */
    *(u64 *)(r11 - 8) = rX  // 1st arg
    *(u64 *)(r11 - 16) = rY // 2nd arg
    call foo2
    ...
  foo1:
    /* Retrieve parameter '*(u64 *)(r11 - 8) = rZ' from bar(),
     * and assign the value rZ to rX.
     */
    rX = *(u64 *)(r11 + 8)  // 1st arg
    ...
  foo2:
    /* Retrieve parameters '*(u64 *)(r11 - 8/16) = rZ' from bar(),
     * and assign values rX/rY to rU/rV.
     */
    rU = *(u64 *)(r11 + 8)  // 1st arg
    rV = *(u64 *)(r11 + 16) // 2nd arg
    ...

The code patterns in the above try to follow x86_64/arm64 calling
conventions. That is, the first argument is in lower location than
the second argument, etc. The r11 based load should retrieve the value
directly from the caller stack. The r11 based store should push
the value directly on the specificed stack location.

Internally in bpf backend, pseudo insns are generated for
load_stack_arg and store_stack_arg. The BPFMIPeephole pass
changes pseudo insns into proper real bpf insns like the above.

[VectorCombine] foldShuffleChainsToReduce - add support for partial vector reductions (llvm#195119)

Extend foldShuffleChainsToReduce to recognize partial reduction patterns where only a subvector of the full vector is being reduced.

For example, a <16 x i16> vector where the shuffle chain only reduces the lower 8 elements can now be folded into:
shufflevector (extract lower <8 x i16>) + vector.reduce.smax

The detection works by noticing when the bottom-up walk through the
shuffle/op chain ends before consuming the full vector. The number of
levels visited determines the subvector size (2^levels), and an
extract_subvector + scalar reduction replaces the original chain when
profitable.

Fixes llvm#194617

[clang-tidy] Correct std::has_one_bit to std::has_single_bit in modernize-use-std-bit (llvm#196721)

There isn't std::has_one_bit in standard library, the function checks
if a number is an integral power of 2 is std::has_single_bit.

https://en.cppreference.com/cpp/header/bit

[SelectionDAG] Don't convert sextload to zextload through a multi-use freeze (llvm#196700)

Resolves llvm#196590.

The patch llvm#189317 to teach
DAGCombiner to look through freeze incorrectly introduce a miscompile of
sext -> zext. This resolves resolves the miscompile.

[libc++] LWG4324: unique_ptr<void>::operator* is not SFINAE-friendly (llvm#190919)


Co-authored-by: Hristo Hristov zingam@outlook.com

[clang-format] Add BreakFunctionDeclarationParameters option. (llvm#196567)

Adds an option the break function declaration parameters, always putting
them on the next line after the function opening parentheses.

This is an equivalent of BreakFunctionDefinitionParameters, but for
function declarations.


Co-authored-by: Lukas Jirkovsky lukas.jirkovsky@aveco.com

[mlir][SPIR-V] Convert math.fpowi to spirv.CL.pown (llvm#196701)

[VPlan] Lift isUsedByLoadStoreAddr into vputils, operate on VPValue(NFC) (llvm#196415)

Extract the helper previously scoped to VPReplicateRecipe::computeCost
and make it available from VPlanUtils so other transforms can query
whether a VPValue is used as part of another load or store's address.

Also relax the input type from VPUser * to VPValue *: the worklist now
tracks VPValues directly, and traversal is gated on the user being a
VPSingleDefRecipe before walking its own users. This is NFC for the
existing caller.

clang: Fix using -march=amdgcn in some r600 run lines (llvm#196745)

clang/AMDGPU: Use all_equal instead of building a temporary set (llvm#196742)

[mlir][SPIR-V] Support spirv.selection_control attribute on scf.if (llvm#196510)

[SLP][NFC]Add a test with scalable vector type in struct-returning intrinsic, NFC

Reviewers:

Pull Request: llvm#196747

[SLP][NFC]Add a test with struct-returning intrinsics in different basic blocks, NFC

Reviewers:

Pull Request: llvm#196748

[X86] Hoist ReservedIdentifiers to MCAsmInfo and shrink setup cost. NFC (llvm#196699)

PR llvm#186570 added a per-MCAsmInfo StringSet<> populated with X86
register names plus Intel-syntax keywords, which caused a minor
instructions:u increase.

Avoid heap allocation and hoist ReservedIdentifiers to MCAsmInfo for
other targets.

For the register-name source, prefer
X86IntelInstPrinter::getRegisterName over MCRegisterInfo::getName.
The former is a TableGen-emitted accessor into a static const char AsmStrs[] pool in X86GenAsmWriter1.inc, populated from the lowercase
asm-name argument of each def XX : X86Reg<"xx", ...>; in
X86RegisterInfo.td.

[MCParser] .incbin: Don't retain the buffer, don't require NUL termination (llvm#196696)

processIncbinFile uses SourceMgr::AddIncludeFile, which

  • sets RequiresNullTerminator=true and disable mmap when the file
    size is a multiple of the page size,
  • and unnecessarily retains the throwaway buffer in Buffers.

Switch to OpenIncludeFile so the buffer is freed when processIncbinFile
returns, and pass RequiresNullTerminator=false. The buffer is consumed
only by emitBytes; the lexer never scans it, so it does not need a
trailing '\0' (different from llvm#154972). Without that requirement,
MemoryBuffer mmaps the file and RSS tracks only the touched pages.

Stress test (1000 .incbin "blob.bin", 0, 16 against a 1 MiB blob):

                  Maximum RSS
  Before          1042944 KiB
  After             15360 KiB

Fix llvm#62339

Revert "Avoid assert in substqualifier (llvm#182707)" (llvm#196755)

This reverts commit e2def10.

[DAG] canCreateUndefOrPoison - out of range vector insert/extract element indices only generate poison (llvm#196720)

Matches ValueTracking / GISel implementations - although testing options are limited until DAG has actual uses of UndefPoisonKind::UndefOnly

[clang][NFC] Actually add the testcase for llvm#195416 (llvm#196759)

[Docs] Match body/toctree ordering on Reference and UserGuides (llvm#195542)

The toctree section is hidden but used for previous/next breadcrumbs.

This was suggested in
llvm#184440 (comment)

[clangd] Add InsertReplaceEdit for code completion (llvm#187623)

Handle new insertReplaceSupport capability (defined in LSP 3.16). Add
the new option to the protocol layer and pass it around to the code
completion logic. Update CompletionItem::textEdit to become the union
type as per the LSP specification.

Add a new helper function to the Lexer public API to find the end of an
identifier with full context lexing, to avoid duplicating the logic. Use
the helper both in the Sema flow and in the comment completion flow. Use
a simpler ASCII-only scan in no-Sema mode.

Add LIT tests to verify auto-triggered completions, mid-word
replacement, Unicode, and snippets. Add unit tests to verify
insert/replace ranges with and without Sema, including comments and the
feature-off case.

Update the release notes to document the new capability.

Fixes clangd/clangd#2190


Co-authored-by: timon-ul timon.ulrich@advantest.com

Revert "[clang-format][NFC] Format with the new formatter" (llvm#196771)

Reverts llvm#196523

[ELF] Fix --reproduce non-determinism with parallel input loading (llvm#196773)

After llvm#191690, LoadJob::Archive runs in parallel and getArchiveMembers()
calls ctx.tar->append() from the parallel body. TarWriter::append is
unsynchronized. Member order in the tar is also non-deterministic
because parallelFor scheduling determines append order.

Buffer per-job tar entries during the parallel pass and flush them in
the
existing serial post-pass, mirroring the thinBufs / files pattern.

[Clang] Make matrix type trivially copyable (llvm#193634)

In order to simplify matrix casting and follow the existing pattern HLSL
is doing, the matrix needs to be trivially copyable.
related to: llvm#184471


Co-authored-by: Joao Saffran jderezende@microsoft.com

[ADT] Decouple xxhash.h from ADT. NFC (llvm#196774)

Move xxHash64, xxh3_64bits, and xxh3_128bits ArrayRef/StringRef
overloads from llvm/Support/xxhash.h to inline overloads in
llvm/ADT/ArrayRef.h and llvm/ADT/StringRef.h, so xxhash.h has no ADT
dependencies.

This is prerequisite for using xxh3 as the combine_bytes backend in
llvm/ADT/Hashing.h (llvm#194567), which would otherwise reintroduce a header
dependency cycle.

FoldingSet.h and StableHashing.h adjust to call the new
pointer-and-length entry point.

[Hashing] Replace CityHash mixers with xxh3 (llvm#194567)

Replace the CityHash-style mixer in hash_combine and (transitively)
hash_value(std::basic_string), hash_value(StringRef), and therefore
DenseMap<StringRef, X> lookups, with a flatten-and-call into
xxh3_64bits, a modern hash superior to CityHash.

hash_value(int) / hash_value(ptr) keep the existing Murmur-style
hash_16_bytes mixer; those are the dominant DenseMap key paths and a
fully-inline 16-byte mix beats inlining xxh3's larger 0..16-byte short
path.

To break dependency cycle: xxHash64, xxh3_64bits, and xxh3_128bits
ArrayRef/StringRef overloads move from llvm/Support/xxhash.h to inline
overloads in llvm/ADT/ArrayRef.h and llvm/ADT/StringRef.h, so xxhash.h
has no ADT dependencies.

A variant that inlined xxh3's 0..16-byte fast path at every
combine_bytes call site (vs. always calling out-of-line xxh3_64bits)
showed no measurable compile-time improvement on the tracker, so
combine_bytes is a one-liner over the out-of-line entry point.

llvm-compile-time-tracker.com (CTMark, instructions:u)

  stage1-O0-g           -1.76%   (sqlite3 -3.78%)
  stage1-aarch64-O0-g   -1.40%   (sqlite3 -2.86%)
  stage1-ReleaseLTO-g   -1.13%
  stage1-ReleaseThinLTO -0.45%
  stage1-O3             -0.43%
  stage1-aarch64-O3     -0.42%
  stage2-O0-g           -0.42%
  stage2-O3             -0.15%
  clang build           -0.71%   (wall -0.42%)

DenseMap-of-pointer paths (dominant at -O3) are untouched, so higher-
optimization configs see smaller wins as expected. opt's .text shrinks
~92 KB. Subsumes the StringRef-only carve-out proposed in llvm#191115.

Notes on properties not introduced by this patch:

  • Endianness: hash_combine over native integers was already not
    cross-host
    stable. memcpy of a native integer into the buffer is host-encoded;
    fetch32 normalized the read but not the underlying bytes, so on LE vs
    BE the value fed to the mixer already differed. xxh3 inherits the same
    property: same byte stream, different mixer.

  • Process seed: combine_bytes XORs get_execution_seed into the result,
    which cancels under hash_combine(x) ^ hash_combine(y). The pre-patch
    short/state paths fed the seed through hash_16_bytes / shift_mix
    non-linearly, so this is a regression in seed effectiveness under that
    pattern. Default seed is constant, so this only matters under
    LLVM_ENABLE_ABI_BREAKING_CHECKS. Follow-up: add a seeded xxh3 entry
    point in libSupport.

Aided by Claude opus 4.7

[MC] Remove deprecated lookupTarget overload (llvm#196778)

This has been deprecated for a while and was slated for removal after
the branching of LLVM 22. Remove it since I'm on on the Google integrate
rotation this week and can take care of any failures on our end.

[libc] Add barebones dl_iterate_phdr implementation (llvm#194196)

Add a basic dl_iterate_phdr implementation so that we can get libunwind
building. This implementation is bare and not fully compliant with the
man page for fully static binaries (which are all that we support
currently with the lack of a dynamic linker) due to the lack of TLS
info, but that can be added at a future date if it is needed, as it is
not needed by libunwind.

Add some very basic smoke tests.

[ADT] Remove xxHash64 ArrayRef/StringRef overloads. NFC (llvm#196781)

xxHash64 is a legacy, pre-XXH3 hash whose only non-test caller in the
monorepo is llvm::getKCFITypeID. llvm#196774 accidentally exposed the API.

[Clang] Transform lambda's constraints when instantiating parameter mapping (llvm#195995)

This way we can remove a few workarounds of lambda expressions where
outer template arguments of concepts have to be preserved through
ImplicitConceptSpecializationDecls.

Fixes llvm#193944

[cmake] use target names instead of legacy variables (llvm#185463)

Use the name of the imported
targets

when testing the libraries during cmake configuration. This removes the
need to also set CMAKE_REQUIRED_INCLUDES and
CMAKE_REQUIRED_DEFINITIONS and reflects more modern CMake usage where
targets are preferred over variables.

This is already the case when checking libcurl in the same file.

[clang-tidy] Remove hicpp module [1/4] (llvm#194516)

This is part one of removing the hicpp-* checks.

RFC:
https://discourse.llvm.org/t/rfc-regarding-the-current-status-of-hicpp-checks/89883

Part of llvm#183462

[DAG][GISel] Rename CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF/CTTZ_ELTS_ZERO_UNDEF -> CTTZ_ZERO_POISON/CTLZ_ZERO_POISON/CTTZ_ELTS_ZERO_POISON (llvm#196732)

DAG/GISel are ambiguous about whether zero-input results in
UNDEF/POISON, unlike the rest of LLVM which makes it clear its POISON.

I've tried to clean this up once and for all by ensuring
SelectionDAG::canCreateUndefOrPoison does a includesPoison(Kind) check,
renaming the opcodes (including the VP variants) and updating as many
comments/tests as possible (I may still have missed some...).

[SPIR-V] Fix inttoptr type deduction with ptr.annotation (llvm#189219)

Opaque pointer inttoptr was recording ptr as a pointee type, so
OpConvertUToPtr was emitted as pointer-to-pointer and then bitcasted
back. Please see an example below.

LLVM IR:

%p = inttoptr i64 %x to ptr addrspace(1)
%a = call ptr addrspace(1) @llvm.ptr.annotation(... %p ...)
call spir_func void @prefetch(ptr addrspace(1) %a, ...)

SPIR-V (before the change):

%p2 = OpConvertUToPtr %_ptr_CrossWorkgroup__ptr_CrossWorkgroup_uchar %x
%p1 = OpBitcast %_ptr_CrossWorkgroup_uchar %p2
OpFunctionCall ... %p1 ...

Skip assigning pointee type for inttoptr when the destination is
untyped, fallback later recovers the correct single pointer type.

[clang-tidy] Fix FP in readability-container-size-empty with compairing to unrelated type (llvm#190535)

Fixes llvm#162287.

[lldb][Windows] Invalidate cached register values on thread stop (llvm#192430)

Invalidate cached values in register context data structures on every
thread stop.

NativeRegisterContextRegisterInfo::InvalidateAllRegisters performs no
operation by default. Subclasses may override it to clear cached values
within their register context data structures whenever a thread stops.

This change intends to set up the necessary infrastructure to support
caching of the thread context in NativeRegisterContextWindows_arm64,
which will improve read performance. Currently, the thread context is
retrieved for every read or write operation.

Revert "[VectorCombine] foldShuffleChainsToReduce - add support for partial vector reductions" (llvm#196796)

Reverts llvm#195119 while reported assertions are investigated.

[LifetimeSafety] Warn on incorrectly placed [[clang::lifetimebound]] attributes (llvm#196144)

Adds new warning that is emitted when parameter is marked as
[[clang::lifetimebound]] but is not returned in one way or another
(tracked via OriginEscapeFact).

Closes llvm#182935

[libc] Fix -Wshadow warnings in freetrie.h (llvm#196529)

clang-format: ensure ternary operands are aligned (llvm#196697)

Set ParentState::AlignedTo for ternary operands.

[gn] Make ClangDependencyScanningTests depend on Testing/Support (llvm#196809)

Needed after ebb9a79.

[flang][OpenMP] Consistent names for non-executable directives, NFC (llvm#196803)

Change
OpenMPGroupprivate -> OmpGroupprivateDirective
OpenMPThreadprivate -> OmpThreadprivateDirective
OpenMPRequiresConstruct -> OmpRequiresDirective
OpenMPUtilityConstruct -> OmpUtilityDirective

[AArch64] Improve post-inc stores of SIMD/FP values (llvm#151372)

Add patterns to match post-increment truncating stores from lane 0 of
wide integer vectors (v4i32/v2i64) to narrower types (i8/i16/i32). This
avoids transferring the value through a GPR when storing.

Also remove the pre-legaliztion early-exit in combineStoreValueFPToInt
as it prevented the optimization from applying in some cases.

[clang-tidy] Rename hicpp-multiway-paths-covered to bugprone-unhandled-code-paths (llvm#191625)

Part of the work in llvm#183462.

Closes llvm#183464.

Splitting the check into two more focused checks was considered during
discussion, but since clang-tidy does not support one-to-many aliases, a
single name covering both behaviors was chosen instead that is more
clear than multiway-paths-covered.


Co-authored-by: Zeyi Xu mitchell.xu2@gmail.com

[IRBuilder] Split CreateAssumption to one with bundle and one with condition [NFC] (llvm#196795)

as it is not possible to combine bundles and conditions from
llvm#160460 reflect that in
CreateAssumption

[clang-tidy] Reland "An option for conditional skipping overloaded functions in modernize-use-string-view" (llvm#196387)

[CIR][AArch64] Lower NEON vuzp intrinsics (llvm#195591)

Summary

part of : llvm#185382

lower vuzp intrinsics in:
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#unzip-elements

this is a follow up : llvm#195527

Lower NEON::BI__builtin_neon_vuzp_v and
NEON::BI__builtin_neon_vuzpq_vin CIRGenBuiltinAArch64.cpp by porting
by porting the existing incubator
logic(clangir/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp) : two
bitcasts on the input vectors,two rounds of cir.vec.shuffle generating
the deinterleave (even/odd) shuffle patterns with indices 2*i+vi, each
stored via ptr_stride on the sret base pointer.

[llvm][RISCV] Optimize fneg for fixed vectors (llvm#194555)

vfneg is not available on zvfhmin or zvfbfmin, it's expected to expand
to integer operations instead of unrolling to scalar operations.
General expandFNEG already handles that in most of cases except for
fixed vector types that are not promotable, we need to find a better
heuristic to gate this.

[llvm][RISCV] Optimize fabs for fixed vectors (llvm#194554)

vfabs is not available on zvfhmin or zvfbfmin, it's expected to expand
to integer operations instead of unrolling to scalar operations.
General expandFABS already handles that in most of cases except for
fixed vector types that are not promotable, we need to find a better
heuristic to gate this.

[llvm][RISCV] Optimize fcopysign for fixed vectors (llvm#193802)

vfsgnj is not available on zvfhmin or zvfbfmin, it's expected to expand
to integer operations instead of unrolling to scalar operations.
General expandFCOPYSIGN already handles that in most of cases except for
fixed vector types that are not promotable, we need to find a better
heuristic to gate this.

[InstCombine] Fold constant byte stores to integer stores (llvm#196740)

Byte constants are equivalent to integer constants when stored to
memory. Replacing them in store instructions reduces IR differences and
enables existing optimizations over integer constants.

[libcxx] Switch to check-runtimes for generic-llvm-libc (llvm#196780)

Move KCFI type ID hash helpers out of LLVMSupport (llvm#196784)

PR llvm#167254 inappropriately introduced llvm/Support/Hash.{h,cpp} for the
KCFI helpers. The name is misleading — it has nothing to do with the
generic hashing facility in llvm/ADT/Hashing.h — and KCFI is a
CodeGen/IR feature that does not belong in the foundational Support
layer.

Move the files to llvm/lib/Transforms/Utils/KCFIHash.cpp, alongside
setKCFIType, which is the only existing KCFI helper in TransformUtils.

Also relocate the deprecated pre-xxh3 xxHash64 implementation into
KCFIHash.cpp, the sole user. clang/test/CodeGen/kcfi-generalize.c and
kcfi-normalize.c are end-to-end regression tests for the xxHash64 output

[Coverage] Fix assertion failure when a -isystem header invokes a user macro (llvm#195427)

  // a.cc
  static void foo(int x) {
    switch (x) {
  #define GENERIC(n) case n:
  #include "types.def"   // -isystem header invokes a user macro
      break;
    }
  }

  // sys/types.def
  #define MID(name) GENERIC(name)
  MID(0)
  MID(1)
  MID(2)
$ clang -fprofile-instr-generate -fcoverage-mapping -isystem sys -c a.cc
Assertion `SystemHeadersCoverage ||
           !SM.isInSystemHeader(SM.getSpellingLoc(Loc))' failed.

Commit 702a2b6 ("[Coverage] Rework !SystemHeadersCoverage")
replaced the system-header skip in gatherFileIDs with this assertion,
which trips as SM.isInSystemHeader(SM.getSpellingLoc(Loc)) is false.

This patch adds back the pre-llvm#91446 condition but folds it with
the macro-token remap if statement.

Fixes llvm#179316/llvm#195422.
Clang Opus 4.7 identified clang/lib/Parse/ParseExpr.cpp, created a
minimal reproduce with cvise, and wrote the initial version of this
CodeGen patch. (An earlier session papered over the bug by patching
llvm-cov instead, which I abandoned).

[clang-tidy][NFC] Move ClassifiedToken to cpp file (llvm#196820)

ClassifiedToken is used in only the implementation of
UseTrailingReturnTypeCheck. Move it into the unnamed namespace of the
cpp file instead of it being in the header.

[Bazel] Fixes 2f4c387 (llvm#196822)

This fixes 2f4c387.

Co-authored-by: Google Bazel Bot google-bazel-bot@google.com

[libc] Move a few -Wshadow warnings in __support/File (llvm#196810)

No behavior change.

[libc][math] Fix -Wshadow warnings in cos.h (llvm#196342)

cos() does using namespace range_reduction_double_internal; and
range_reduction_double_internal after 51e9430 contains

using LIBC_NAMESPACE::fputil::DoubleDouble;
using Float128 = LIBC_NAMESPACE::fputil::DyadicFloat<128>;

So the local using statements for DoubleDouble and Float128 shadowed
these. Just remove the local using statements.

No behavior change.

[AArch64] New pass for code layout optimizations. (llvm#184434)

This pass is intended to optimize code layout prior to AsmPrinter. The
initial version handles two known cases:
I. FCMP-FCSEL
II. CMP/CMN-CSEL, 32-bit only

Using existing directives, the pass induces function-alignment (of
64-bytes by default) when a pair is detected, and possibly induces
block-alignment of up to 4-bytes on top of that if the pair would
straddle cache-lines.

Beyond performance improvement, this pass reduces noise due to code
layout thus stabilizes measured performance over-time. For example,
knock-out effects on a "sensitive function" won't be triggered by
codegen changes outside it.

Enabled by default on processors with the new FeatureAlignCmpCSelPairs
subtarget feature (gated per sub-case by FeatureFuseCmpCSel /
FeatureFuseFCmpFCSel); each case can also be forced through the
-aarch64-code-layout-opt enumerated bit-mask


Co-authored-by: Jon Roelofs jroelofs@gmail.com
rdar://171283264

[mlir][spirv] Remove stale NV CooperativeMatrix attributes (llvm#196639)

Since the support for NV CooperativeMatrix has been removed a while
back, those attributes can be safely removed.

[mlir][spirv] Enforce execution scope for group operations in ODS (llvm#196644)

This adds a new class SPIRV_ExecutionScopeAttrIs shared between group
and non-uniform group operations.

Assisted-by: Codex

[LV] Add tests for load/store scalarization and ptrcasts (NFC) (llvm#196839)

Add missing test coverage for range of pointer casts and load/store
scalarization.

[LV] Add missing cost tests for various unary and binary ops (NFC) (llvm#196841)

Add missing direct includes for bit.h/SwapByteOrder.h. NFC (llvm#196843)

These translation units use llvm::endianness, llvm::byteswap,
llvm::has_single_bit, or sys::IsLittleEndianHost without explicitly
including the header that declares them. They currently compile only
because llvm/ADT/Hashing.h transitively pulls in
llvm/Support/SwapByteOrder.h (which includes llvm/ADT/bit.h).

[libc] Fix a copyright comment typo (llvm#196846)

No behavior change.

[clang-tidy] comment braced and parenthesized init arguments (llvm#180408)

Handle arguments like {}, Type{} and Type() in
bugprone-argument-comment and
add coverage for initializer_list and designated initializers.

Fixes: llvm#171842

[ADT] Avoid map storage for small SmallMapVector (llvm#196473)

SmallMapVector previously used SmallDenseMap for its index, which still
initializes and maintains map storage even when the number of entries is
tiny.

Teach MapVector to support a vector-only small mode. While the entry
count stays
within the configured small size, operations use the underlying vector
directly.
When the size grows past the threshold, the map index is built and
subsequent
operations use the regular MapVector path.

This mirrors the small-size strategy used by SmallSetVector.

[clangd][Parser][Sema] Fix TemplateIdAnnotation UAF with template-id declarator and lambda default argument (llvm#196788)

I think this is another case of template annotations lifetime bug,
similar to the one fixed by
llvm#89494.

Closes llvm#196725.

[clang] Add arm64_neon.h wrapper on windows (llvm#196014)

Add an MSVC-compatible <arm64_neon.h> resource header that forwards to
Clang's generated <arm_neon.h>. This lets ARM64 Windows code using the
MSVC header name lower NEON intrinsics through Clang builtins instead of
eaving external neon_* calls such as neon_ld1m4_q32

Fixes llvm#195683

[clang][test] Add AArch64 requirement to arm64_neon.h test (llvm#196867)

Only run test when the AArch64 target is built

[LV][NFC] Reshape pointer_iv_non_uniform_0 test to use distinct loads (llvm#196494)

The followup patch
is folding some of the idempotent binary ops This test has sub x - x
operation which is affected by the followup patch. This patch is making
the test immune to the fold.

[InstCombine][NFC] Change the order of checks in SliceUpIllegalIntegerPHI for faster compile time. (llvm#183726)

SliceUpIllegalIntegerPHI searches for PHIs that have illegal type and
are only used by trunc or trunc(lshr) operations. It bails out if
encounters invoke or EH pad instructions.
It first checks whether it encounters invoke or EH pad, which is time
consuming as it checks every instruction. Then it checks whether it is
used by trunc or trunc(lshr). The former check is generally loose, while
the latter one is stricter. Switch the order of the checks will speed up
compilation.

Signed-off-by: XinlongZHANG-Bob zhangxinlong.bob@bytedance.com

[NFC] Fix C++23 build failures caused by incomplete types (llvm#196814)

[AArch64][CostModel] Model sve costs for ctpop (llvm#192428)

Targets supporting sve prefer sve for ctpop with fixed length vectors.
Update cost model to reflect the same.

[MLIR][NVVM][NFC] Restructure NVVM dialect (llvm#195811)

Moves the declarations of the NVVM dialect and some widely used enums
(FPRoundingModeAttr and SaturationModeAttr) to separate files to make
them easier to maintain and also use in the NVGPU dialect.

[clang][bytecode] Allow const mutation in all variable initializers (llvm#195794)

So the attached test case works even though it's just an InitListExpr.

[libc][stdlib] Add setenv (llvm#163018)

Add the POSIX setenv() function, with EnvironmentManager::set()
handling environment array management and ownership tracking.

Registered for x86_64, aarch64, and riscv architectures. Integration
tests cover overwrite/no-overwrite semantics, empty/invalid names,
empty values, and repeated replacement.

Assisted-by: Automated tooling, human reviewed.


Co-authored-by: Michael Jones michaelrj@google.com

[GlobalISel] Delay match table builder initialization (llvm#196506)

MachineIRBuilder::setInstrAndDebugLoc is expensive, delay until needed.

CTMark -0.10% geomean improvement on aarch64-O0-g.

https://llvm-compile-time-tracker.com/compare.php?from=71fef6d5a306d1adf8bf7d30d2fe9e286380fecf&to=8a87845dfde9de9d141b42d2fce92fcf3be02276&stat=instructions%3Au

Assisted-by: codex

[GlobalISel] Avoid repeated target info queries in combiners (llvm#196530)

tryCombineAllImpl queries target info for every instruction. Cache
TargetInstrInfo/TargetRegisterInfo/RegisterBankInfo in CombinerHelper
and pass to executeMatchTable instead.

This avoids repeated virtual calls on the combiner executeMatchTable
path.

CTMark -0.08% geomean improvement on aarch64-O0-g.

https://llvm-compile-time-tracker.com/compare.php?from=71fef6d5a306d1adf8bf7d30d2fe9e286380fecf&to=13bc49510657450402c066098e3a4b7d1af9d0e6&stat=instructions%3Au

Assisted-by: codex

[DebugInfo] Pack DILocation hash inputs (llvm#196556)

Pack DILocation fields before hashing. Now that column is 16-bits
Line/Column/ImplicitCode fit in one 64-bit value (32 + 16 + 1 = 49 bits)
and AtomGroup and AtomRank also fit cleanly in one 64-bit value (61 + 3
= 64 bits).

Fewer hash_combine inputs on the hot DILocation path is a small
compile-time improvement.

CTMark geomean:

  • stage1-ReleaseLTO-g: -0.10%
  • stage1-O0-g: -0.23%
  • stage1-aarch64-O0-g: -0.19%
  • stage2-O0-g: -0.07%

https://llvm-compile-time-tracker.com/compare.php?from=71fef6d5a306d1adf8bf7d30d2fe9e286380fecf&to=1d80b5f5aa98561d2ba09adc3f20c3eacd24cb88&stat=instructions%3Au

Assisted-by: codex

[LoopFusion] Remove SCEV-based dependence analysis path (llvm#195864)

Loop Fusion has used Dependence Analysis (DA) as the default dependence
check since the option default was flipped in llvm#187309. The SCEV-based
strategy and the combined "all" mode were retained only for fallback and
experimentation, with a comment noting that the SCEV code would be
removed in a follow-up.

This patch removes the SCEV-based dependence path and the now-unused
selector machinery.

Fixes llvm#194821.

Assisted by Cursor.

[clang-tidy][NFC] Fix tests on 32bit ARM (llvm#196873)

Should fix
llvm#191386 (comment).

[libc] Fix partial multi-byte write detection in File (llvm#196402)

File::write_unlocked(const wchar_t*, size_t) checked 'write_res.value <
1' after writing a converted UTF-8 sequence. For multi-byte characters,
a short platform write (e.g. 2 of 3 bytes for a 3-byte character) passed
this check and was counted as a successful write. The output stream
would then contain an incomplete UTF-8 sequence with no error reported
to the caller.

Changed the check to 'write_res.value < char_size' and set the error
indicator on the stream when it triggers.

Added a regression test using a mock File subclass that limits
platform_write to 2 bytes per call, simulating short writes on pipes and
sockets.

Assisted-by: Automated tooling, human reviewed.


Co-authored-by: Michael Jones michaelrj@google.com

[AA] No synchronization effects for never-escaping identified local (llvm#193939)

Fences and other synchronizing operations (such as atomic accesses
stronger than monotonic) are modelled as reading and writing all memory,
in order to enforce their implied ordering constraints.

Currently, this happens even for identified function locals that do not
escape. This patch excludes those objects.

Notably, we can not reason based on captures-before here, because the
synchronizing operation still has an effect even if the object only
escapes later.

The hope here is that with this restriction in place, it may be viable
to respect potential synchronization inside non-nosync function calls.

[Bazel] Fixes ce6605a (llvm#196880)

This fixes ce6605a.

Co-authored-by: Google Bazel Bot google-bazel-bot@google.com

[clang][NFC] Remove alignment checks from test/CodeGen/c-strings.c (llvm#196501)

and re-enable it on more targets.

I don't think this test was intended to check for alignment. Those
expectations were added as part of FileCheck-izing the test in
e29dadb and we've been working around
them or xfailing the test since.

[CIR][AMDGPU] Add lowering for amdgcn ds swizzle builtin. (llvm#196011)

Upstreaming clangIR PR: llvm/clangir#2052

This PR adds support for lowering of _builtin_amdgcn_ds_swizzle* amdgpu
builtin to clangIR.

[lldb] Fix TestDelayedBreakpoint on ARM Thumb (llvm#196888)

The original address used for the "fake breakpoint" is not valid in
Thumb mode. To be safe, change it to have 0's in the LSBs.

[clang][bytecode] Visit tryEvaluateObjectSize expr as lvalue (llvm#196010)

Just like we do with the first parameter of a regular
__builtin_object_size call.

This still doesn't fix the bigger bos test cases since e.g.

int NoViableOverloadObjectSize3(void *const p PS(3))
    __attribute__((overloadable)) {
  return __builtin_object_size(p, 3);
}
void test4(struct Foo *t) {
  gi = NoViableOverloadObjectSize3(&t[1].t[1]);
}

is still broken because we don't have special handling for the
&t[1].t[1] handling here and we can't usually access a one-past-end
pointer.

Use auto for DenseMap/SmallDenseMap iterator variables. NFC (llvm#196883)

To match the prevailing style.

[AArch64] Use dup (lane mov) over ext for high-half extract (llvm#195010)

This changes the instruction we use to extract the high half of a vector
register from a ext v0, v1, v1, 8 to a dup d0, v1.d[1]. This is
apparently slightly quicker on certain cpus and is generally a simpler
instruction. This matches the instruction that gisel produced.

Some of the old patterns for extract_subvector with index of 1 seem
incorrect but were never used as we do not reach selection with such
instructions. They have been repurposed to emit the new DUPi64
instructions.

Revert "[AA] No synchronization effects for never-escaping identified local" (llvm#196890)

Reverts llvm#193939

Caused buildbot failure.

Update GitHub PR Greeter (llvm#194307)

Following these two discussions:

add a reference to the LLVM AI policy in the GH greeter.

In addition:

  • Update the message to include links to other relevant policies as
    well, since these are often shared during PR review.
  • Add FAQ section and move some of the original content there.
  • Include a request for people to confirm that they have familiarised themselves with
    the policies.
  • Add Hello @{self.author} :wave: to make the greeting more personal.

[flang] dummy arguments used as function calls (llvm#196426)

Adding an error when a dummy argument is used as a statement function.

SUBROUTINE a(foo)
foo(c) = 0
END SUBROUTINE a

This PR now points out:

  1. Dummy argument 'foo' may not be used as a statement function
  2. 'foo' is not a callable procedure

Handles issue
196424


Co-authored-by: Sunil Kuravinakop kuravina@pe31.hpc.amslabs.hpecorp.net

[SelectionDAG] Split vector types for atomic load (llvm#165818)

Vector types that aren't widened are split so that a single ATOMIC_LOAD
is issued for the entire vector at once. This change utilizes the load
vectorization infrastructure in SelectionDAG in order to group the
vectors. This enables SelectionDAG to translate vectors with type
bfloat,half.

Add support for Ubuntu 26.10 - Stonking Stingray (llvm#196896)

Co-authored-by: Oliver Reiche oliver.reiche@canonical.com

[clang-tidy] Remove hicpp modules [2/4] (llvm#196870)

This is part two of removing the hicpp-* checks.

RFC:
https://discourse.llvm.org/t/rfc-regarding-the-current-status-of-hicpp-checks/89883

Part of llvm#183462

[LV] Handle FSub Partial Reductions (llvm#191186)

Introduces a new RecurKind value 'FSub' in order to handle partial
reductions of floating point values.

This is done by following the existing method for integer partial
reductions, doing a positive accumulation followed by a final
subtraction in the middle block.

[LV][NFC] Remove instcombine pass from RUN lines of simple tests (llvm#196257)

Most of the work done by the instcombine pass on these files involves
canonicalising GEPs and shuffling code around. I don't believe there is
any value running instcombine in these cases.

[GISel][X86] port X86PreLegalizerCombiner to npm (llvm#182638)

Porting X86PreLegalizerCombiner to npm as part of
llvm#178192

[X86] Cast atomic vectors in IR to support floats (llvm#148899)

This commit casts floats to ints in an atomic load during AtomicExpand
to support
floating point types. It also is required to support 128 bit vectors in
SSE/AVX.

[AMDGPU] Add VMovB64 subtarget feature (llvm#196340)

[mlir][SPIR-V] Add CL.{exp2,exp10,log2,log10} ops (llvm#196869)

[Clang] Fix incorrect type for __mfp8 in extractelement codegen (llvm#192977)

The codegen for extracting an element from an FP8 vector was emitting a
simple extractelement with i8 type for the extracted element. The
__mfp8 type is represented as <1 x i8> in LLVM IR. This codegen
created inconsistency in Clang - some __mfp8 expressions would
correspond to LLVM IR values with <1 x i8> type and some to i8 type.

It also caused an assertion failure when the extracted element was
passed as a function argument.

This patch fixes the issue by inserting the extracted element
into a <1 x i8>.

[mlir][tosa] Add a pass to downgrade TOSA 1.1.draft to 1.0 (llvm#194971)

This commit adds a pass that will allow 1.1.draft operations to be
rewritten to their 1.0 counterparts where possible. The pass currently
covers the following operations:

  • bool <-> fp32 casts via i8 bridge casts
  • bool gather/scatter with i32 indices via i8 payload rewrites

Note that the downgrade is 'best-effort' and the pass does not perform
any validation itself. The validation pass should be run after
downgrading to check that the resulting IR was downgraded successfully.

Motivation: This decouples the target specification version in
legalizations and backends. Legalizations from higher level frameworks
may be updated to support producing TOSA 1.1.draft variants of
operations, while backends can still consume TOSA 1.0 IR after running
the downgrade pass.

[llubi] Upstream

hbatagelo and others added 30 commits May 20, 2026 23:23
…eak (llvm#198019)

Fixes llvm#196225.

Root cause: When clangd shuts down, `ClangdLSPServer`'s destructor calls
`Server.reset()` to destroy the `ClangdServer` object. `ClangdServer`
then destroys the work scheduler, which blocks and waits for in-flight
tasks to finish. If a rename task is in flight, the callback will try to
access the now-reset `ClangdLSPServer::Server` in
https://github.com/llvm/llvm-project/blob/4f9a7d09f4760ac9c5745e8bb829366d29ff9687/clang-tools-extra/clangd/ClangdLSPServer.cpp#L916-L917

The same issue occurs with `onCommandApplyTweak`, as its callback also
accesses `ClangdLSPServer::Server`:
https://github.com/llvm/llvm-project/blob/4f9a7d09f4760ac9c5745e8bb829366d29ff9687/clang-tools-extra/clangd/ClangdLSPServer.cpp#L828-L829

A reproducer has been added to llvm#196225.

This PR fixes these issues by capturing a reference to `ClangdServer`
and using it in the callbacks instead of the `ClangdLSPServer::Server`
optional. The reference is guaranteed to remain valid because the work
scheduler syncs while the server is being destroyed.

Includes a unit test that verifies the fix for `rename`. The test for
`applyTweak` was omitted because the mocked LSP client would trigger an
`ADD_FAILURE()` ("Unexpected server->client call") when intercepting the
outgoing `workspace/applyEdit` request. While this could be handled with
`EXPECT_NONFATAL_FAILURE_ON_ALL_THREADS`, it would make the test flaky
because the task could occasionally finish before the server shuts down.
This file uses `EFIAPI`, but it's not included. It looks like
compilation currently succeeds because `EFI_SYSTEM_TABLE.h` is the only
header that includes `EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL.h`, and it happens
to include `EFIAPI-macros.h` indirectly.

We will be adding Bazel rules for this file, and Bazel typically
requires all headers to be compilable on their own. This build error is
theoretically reproducable by running cmake build with
`-DCMAKE_VERIFY_INTERFACE_HEADER_SETS` if we had the appropriate
FILE_SETs defined.
…ication (llvm#198761)

llvm#182382 introduced a language extension to accumulate field values:
“append” concatenates the new value after the current value, whilst
"prepend" concatenates before the existing value. This change uses that
feature to eliminate repetition in the definition of some of the
compressed instructions.

For example, line 267 of RISCVIntrInfoC.td establishes a scope for “`let
Predicates = [HasStdExtZca] in {`”; this scope ends on line 515.
Meanwhile, line 454 wants to add the `IsRV64` predicate for a single
instruction but was forced to duplicate the previous condition as well:
“`let Predicates = [HasStdExtZca, IsRV64] in`”. That’s no longer
necessary since the addition can now be explicit: “`let append
Predicates = [IsRV64] in `”

I‘ve verified that this change has no effect on the TableGen output.

It seems quite likely that this same change could be made in some of the
other RISC-V TableGen source files…
…lvm#198866)

65f8a7c accidentally introduced some
diagnostics from having a switch statement with a default label but no
case labels. This removes the switch statements until we have more cases
to add.
Move the Policy struct and PolicyStack class from lldb/Target to
lldb/Utility. This is a pure relocation -- no API or behavior changes.

This is needed so that lldbHost (which contains ProcessRunLock) can
depend on Policy without introducing a layering violation, since
lldbHost cannot depend on lldbTarget.

----

The following PRs are related to the Policy feature:
- llvm#195762
- llvm#195771
- llvm#198897
- llvm#195774
- llvm#195775

rdar://176223894

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
…m#198795)

The Windows API can deliver `LOAD_DLL_DEBUG_EVENT` with `hFile` set to
NULL. In practice, this happens in practice
inside Windows Server Core containers for system DLLs like
`kernel32.dll`, `KernelBase.dll` and `ucrtbase.dll`. This is the issue
described by @Nerixyz in
llvm#132800.
    
`DebuggerThread::HandleLoadDllEvent` previously dropped any such event.
    
This patch improves the module path resolution with the 2 following
methods:
- `GetFileNameByLoadAddress`: resolve the module path given its base
load address.
- `GetFileNameFromImageNameField`: resolve the module path given the
`LOAD_DLL_DEBUG_INFO::lpImageName` field.

This requires:
- llvm#198794
tl;dr: This makes e.g. `while (({ break; 1; })) {}` ill-formed.

GCC used to allow this a long time ago (< GCC 9 I believe), but
eventually removed support for it; we originally allowed this both for
GCC compatibility and because there was actual code in the wild using it
(see Richard’s comment here for more background:
llvm#152606 (comment)).

Note that this _is_ still allowed inside another loop, e.g. this
```c++
for (;;) {
    while (({ break; true; })) {}
}
```
is well-formed; the `break` here will break out of the `for` loop.

Removing support for this gets rid of quite a bit of code and has a few
more benefits:

1. Currently, GCC and Clang _disagree_ on the meaning of this construct
in nested loops: in the code snippet above, Clang breaks out of the
`while` loop, whereas GCC breaks out of the `for` loop; this patch
changes Clang to align with GCC here. As a result, we can also remove
code that emits a warning for such cases.

2. It frees up a bit in `ScopeFlags::Flags`, which is a good thing
because I need one for expansion statements, and we’re out of bits. This
is because we currently use a bit as a hack to disallow `continue`
inside the declaration of the condition variable (because we’d
`continue` to the condition before initialising the variable); this bit
becomes obsolete with this patch because `continue` is now disallowed
entirely within the condition (if there is no outer loop).

Without this change, I’d have to refactor `Flags` to be a 64-bit
integer, which would also entail updating every place where we use an
`unsigned` to store scope flags.

There is another change that I needed to make here: we currently
suppress `-Wcomma` (which warns about comma operators, because in most
contexts, you probably didn’t mean to use one) in some places, including
the third part of a C-style `for` loop. This is implemented by checking
if we’re in a `BreakScope | ContinueScope`, but those scope flags are
now only set when parsing the loop body. Instead, we now check for
`ControlScope`. This requires us to also set that flag in C90 mode, but
that seems to be harmless as the only use of `ControlScope` I was able
to find is in C++ code paths, where that bit was already set anyway.

When we introduced named loops for C2y, we purposefully didn’t support
labeled break/continue in the condition, so there’s nothing to be done
there (we already have tests for this too).
The patch that added this release note was reverted (llvm#198341), but then
(llvm#198167) accidentally added it back.
When a copyable scalar in the bundle being scheduled has a same-block,
non-PHI, non-schedulable user with multiple uses, and at least one of
those uses is a non-PHI use in another block, the user's dependency
tracking across multiple bundles can be inconsistent.
Cancel scheduling of such copyable bundles instead.

Fixes llvm#198364.

Reviewers: 

Pull Request: llvm#198915
llvm#196868)

This patch aims to improve parity with OG codegen on targets with
non-flat alloca address space. I observed this after getting some
crashes while compiling PolybenchGpu for HIP (amdgpu). This work had
previously been merged in the incubator, most notably:
llvm/clangir#2090,
llvm/clangir#2088.

CIR currently returns the raw `cir.alloca` address from temporary/local
alloca creation. On AMDGPU, stack allocas live in private addrspace(5),
but ordinary C/C++/HIP auto variables are still used through the
language-visible generic/flat address space.

OG CodeGen handles this by creating the alloca in the target stack
address space and immediately casting it to the language-visible address
space when those differ. For example:

```llvm
%fmt = alloca [6 x i8], align 1, addrspace(5)
%tmp = alloca ptr, align 8, addrspace(5)

%fmt.ascast = addrspacecast ptr addrspace(5) %fmt to ptr
%tmp.ascast = addrspacecast ptr addrspace(5) %tmp to ptr

%arraydecay = getelementptr inbounds [6 x i8], ptr %fmt.ascast, i64 0, i64 0
store ptr %arraydecay, ptr %tmp.ascast, align 8
```
I've also added a helper to recover the underlying alloca through casts
for callers that need to annotate the original cir.alloca, and preserves
source pointer address spaces when creating pointer bitcasts.
…8654)

prctl declaration should typically use variadic arguments (e.g. see
https://man7.org/linux/man-pages/man2/prctl.2.html), as the types /
quantity of subsequent arguments depends on the `option`. We can't
depend on all `<prctl.h>` users to explicitly cast arguments to
`unsigned long` and passing all 5 of them every time.

* Don't add any option-specific logic, and just consume `arg2`-`arg5`
from variadic arguments and pass them to syscall implementation as-is,
assuming that they won't be used by the kernel if they are not needed,
and consuming these arguments won't lead to crashes.
* Updated the test to use `prctl` variants with less than 5 explicit
arguments (for PR_SET_NAME and PR_GET_NAME).
* Extract `rt_sigprocmask` syscall wrapper into the
libc/src/__support/OSUtil/linux/syscall_wrappers/ directory
* Convert all existing users of this syscall, and simplify the logic
where applicable.
* Implement `pthread_sigmask`, which is effectively another POSIX
wrapper around `rt_sigprocmask` syscall similar to `sigprocmask`
…f inverted predicates (llvm#191890)

Inverted predicates can be used freely in PTX. If we can invert a
predicate and CSE the generating instruction we can save calculating the
inverse.

Teach the NVPTX `commuteInstructionImpl` that SETP instructions can be
inverted to allow CSEing with previous SETP that match the inverted
form. This also inverts the branch users of the predicate to maintain
correctness.

Currently only allow the SETP inversion if all users are branches.
Future work can extend this to `sel` and `not` instructions.

Depends on llvm#191889.

Assisted-by: Cursor / Claude
This change adds support for properly defining and obtaining
`__libcpp_thread_id` when llvm-libc is used. It defines the integral
thread-id (which satisfies necessary restrictions of having total order,
being hashable and formattable) as `pthread_id_np_t` type and uses
`pthread_getthreadid_np` and `pthread_getunique_np` functions to obtain
it (added in
llvm#197027, following the
discussions in llvm#195139 and
llvm#195202).

We also let `_LIBCPP_NULL_THREAD` macro use a more portable
`PTHREAD_NULL` (defined in the latest POSIX) when this macro is
available, so that it would work as expected for opaque `pthread_t`
implementations, where default constructor might not necessarily
zero-initialize all the members.

This is the last remaining change to allow building libc++ against
llvm-libc with threads enabled (test-suite results TBD).
633539b used the ToT version but does
not necessarily need it. Use the latest release and the standard syntax.

This follows
https://llvm.org/docs/CIBestPractices.html#hash-pinning-dependencies.
This is covered in our CI best practices document in
https://llvm.org/docs/CIBestPractices.html#ensuring-workflows-run-on-the-correct-events.

Otherwise we cannot run libc CI workflows on stacked pull requests.
LLDB invokes xcrun to find SDKs on disk. This is usually very fast, but
sometimes (after an Xcode update, or when the searched SDK does not
exist) it can take very long (10s or more). The progress event provides
user feedback to explain the hang.
…late

This replaces the manual boilerplate for DecodeGPRNoX0, DecodeGPRNoX2,
DecodeGPRNoX31, and DecodeGPRPairNoX0 with a universal filtering template
and constexpr predicate functions.

I will need more of these for the RVY patch series, so submitting this NFC
cleanup first.

Pull Request: llvm#198146
These flags did not exist in LLVM 3.7 so should be omitted.
Introduce the initial TosaToSPIRVTosa conversion pass and library
wiring. This slice converts func.func regions to spirv.ARM.Graph inside
spirv.module, rewrites graph input/result types to SPIR-V ARM tensor
types, maps func.return to spirv.ARM.GraphOutputs, and adds focused
tests for type conversion, descriptor bindings, and nested containers.

Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>
…tInjection.py (llvm#198884)

If the test is run arm64e while the just-built libunwind is arm64 only,
the test will not function correctly.
This way we can count instructions before the optimization pipeline for
analysis sake
None of these jobs do not take anywhere close to the six hour timeout
that Github uses by default. Set timeouts that are 2-3x the typical job
runtime so that if there is a test/build step that hangs indefinitely,
the job times out in a reasonable amount of time and does not hold any
resources that could be used elsewhere.

This should not impact any jobs that do not hang, will not change the
result of jobs that do hang, and means we can more effectively deal with
cases like today where tests were hanging, from a resource perspective.

This is also standard in some other workflows like the main premerge
workflow definition.
…" (llvm#198945)

This change requires Host link against Core, and it cannot do that; it
may only link in Utility. Reverting so Adrian can decide what to do.

This reverts commit 5c63509.
…attern (llvm#198658)

We're currently able to recognize the following popcount pattern
```
int popcnt(unsigned x) {
 x = x - ((x >> 1) & 0x55555555);
 x = x - 3*((x >> 2) & 0x33333333);
 x = (x + (x >> 4)) & 0x0F0F0F0F;
 x = x + (x >> 8);
 x = x + (x >> 16);
 return x & 0x0000003F;
}
```
but if a truncation follows right after the last AND instruction:
```
int16_t popcnt(unsigned x) {
 x = x - ((x >> 1) & 0x55555555);
 x = x - 3*((x >> 2) & 0x33333333);
 x = (x + (x >> 4)) & 0x0F0F0F0F;
 x = x + (x >> 8);
 x = x + (x >> 16);
 return int16_t(x & 0x0000003F);
}
```
since InstCombine canonicalizes `(trunc (and y, C))` into `(and
trunc(y), C')`, we might loose the opportunity to turn the above snippet
into `(trunc (popcount x))` as there is a `trunc` interrupting the
pattern matching.

This patch fixes this issue by considering this extra `trunc` during
pattern matching, and appending it in the final popcount result, if
there is any.
…98437)

Co-authored-by: Shilei Tian <i@tianshilei.me>
Co-authored-by: Chinmay Deshpande <chdeshpa@amd.com>
SymbolFileCommon::GetTypeSystemForLanguage unconditionally writes this
pointer with `ts->SetSymbolFile(this)` on every lookup, which races with
concurrent reads from other threads.

The race is benign in practice: there is exactly one SymbolFile per
Module, so every writer stores the same pointer, but it is still
undefined behavior under the C++ memory model.

Make the field std::atomic<SymbolFile *> and turn SetSymbolFile into a
compare-exchange that asserts a TypeSystem is never rebound to a
different SymbolFile, documenting the invariant that lets us get away
with this.

The alternative is to have the SymbolFile pointer passed in through the
constructor, but that would require updating a bunch of call sites,
including various plugin interfaces.

Found by ThreadSanitizer as part of llvm#197792.
Summary:
The test previously did not account for CMake overrides, so we just grab
the file that's actually generated. `sort -u` should handle the case
where there's both a .so and .a.
igorban-intel and others added 28 commits May 22, 2026 09:08
…se (llvm#197667)

Take the function-pointer placeholder operand as a parameter rather
than reading MI.getOperand(2) directly, so visitFunPtrUse can be
reused from instructions with a different operand layout. Pure
refactor.

---------

Co-authored-by: Marcos Maronas <mmaronas@amd.com>
AIX native echo doesn't support the `-n` flag.
Use the POSIX-standard `\c` escape sequence instead to suppress the
trailing newline, ensuring the test works across all systems and make it
portable.


The current test fails as follows:
```
FAIL: lit :: unit/Util.py (1 of 1)
******************** TEST 'lit :: unit/Util.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
"/opt/freeware/bin/python3.12" /home/himadhit/llvm/community/build/utils/lit/tests/unit/Util.py
# executed command: /opt/freeware/bin/python3.12 /home/himadhit/llvm/community/build/utils/lit/tests/unit/Util.py
# .---command stderr------------
# | F..
# | ======================================================================
# | FAIL: test_basic (__main__.TestCommandCache.test_basic)
# | ----------------------------------------------------------------------
# | Traceback (most recent call last):
# |   File "/home/himadhit/llvm/community/build/utils/lit/tests/unit/Util.py", line 32, in test_basic
# |     self.assertEqual(lit_config.run_command_cached(["echo", "hi"]), b"hi")
# | AssertionError: b'hi\n' != b'hi'
# |
# | ----------------------------------------------------------------------
# | Ran 3 tests in 2.050s
# |
# | FAILED (failures=1)
# `-----------------------------
# error: command failed with exit status: 1
```

---------

Co-authored-by: himadhith <himadhith.v@ibm.com>
We have done this optimization in ISel and this PR just models it
in TTI.

---------

Co-authored-by: Luke Lau <luke_lau@icloud.com>
…9195)

Make it easier to bypass unnecessary ops based of DemandedElts
…C). (llvm#195891)

Add assertion to VPValue::setOperand to check if types of the new
operand matches the old operand.

This makes it easier to catch replacements with incorrect types at the
source, instead only later during verification.

A few places currently preform replacements with mis-matching types,
which only get fixed up later. Update those to avoid type-violation.

Depends on llvm#195485

PR: llvm#195891
Implements https://wg21.link/P3383R3

Closes llvm#148149

---------

Co-authored-by: A. Jiang <de34@live.cn>
…198796)

This is to fix an infinite loop in the parser when using un-implemented
clauses. See https://godbolt.org/z/f775asrea .
This patch also fixes this crash: https://godbolt.org/z/WKrsbTGGe .
…98793)

Summary:
We need the offload project's RPC thread to handle the IO requests
originating from the GPU. Previously we did the 'easy' solution and just
linked this handler directly into the offload proejct. This is not ideal
because it prevents people's ability to build and configure libraries
separately.

This PR inverts the dependency, flang-rt now conditionally enables
support using the existing RPC callback mechanism. The cost is that
every flang-rt program now pays the cost of a boolean compare, the
benefit is the libraries are now independent of each-other.
fadd/fsub instructions that canConvertToFMA returns valid for were
unconditionally skipped in tryToVectorize, causing regressions
where SLP failed to vectorize loops containing such patterns even when
FMA formation never fires.
Collect skipped FMA candidates during vectorizeChainsInBlock and retry
them with AllowFMACandidates=true after all other instructions in the
block have been processed. The cost model still rejects the retry when
actual FMA formation is more profitable (e.g. FMA4 on bdver2), so
existing FMA-profitable cases are unaffected.

Fixes llvm#198040

Reviewers: davemgreen, bababuck, RKSimon, hiraditya

Pull Request: llvm#198174
…m#199214)

When we create the Clang types for methods, we ignored the qualifiers.
So `const` methods would become non-const.

With this PR, we use the qualifiers from `*this` for the function type.
llvm#199083)

This will allow it to be used for checking that users are members of the
llvm-committer team or possibly others.
…99218)

Guard `mutex.unlock()` with `if (mutex.try_lock())` to satisfy thread
safety analysis. Statically, the compiler cannot verify that
`mutex.try_lock()` succeeded when it is only asserted by `EXPECT_TRUE`,
leading to a "releasing mutex 'mutex' that was not held" compilation
error.

This fixes a regression introduced in llvm#198941.
…no bindings (llvm#199215)

Clang supports empty structured binding groups as an extension, and the
text node dumper has some special handling for giving a name to
anonymous declarations, which assumed a decomposition would have at
least one binding.

Fixes llvm#198842
…lvm#198949)

In llvm#198429 (reland), CommandObject::GetTarget() was tightened to return
nullptr instead of the dummy target when no real target exists, unless
the command explicitly opts in via eCommandAllowsDummyTarget or standard
target requirements
 
However in CommandInterpreter::GetExecutionContext(bool
adopt_dummy_target) :
  
  ```   
    ExecutionContext
CommandInterpreter::GetExecutionContext(bool adopt_dummy_target) const {
      return !m_overriden_exe_contexts.empty()
                 ? m_overriden_exe_contexts.top()
: m_debugger.GetSelectedExecutionContext(adopt_dummy_target);
    }
  ```
  
If m_overriden_exe_contexts is not empty, the method returned the top
context immediately—completely ignoring the adopt_dummy_target argument
requested by the command object.
  
  Because of this:
  
1. During sourced script runs, process attach received the dummy target
as its execution target (since adopt_dummy_target = false was ignored).
2. It bypassed the target == nullptr check and proceeded to attach
directly to the dummy target.
3. As the dummy target was never registered in m_target_list , the main
target list remained empty ( No targets. ), causing all subsequent
commands (e.g., setting breakpoints or continuing) to fail with invalid
target errors.
  
  ### The Fix:
  
   lldb/source/Interpreter/CommandInterpreter.cpp :
Respect adopt_dummy_target = false in GetExecutionContext when a dummy
target is present in the overridden execution context stackm so that if
adopt_dummy_target is false and the overridden context on the stack
contains the dummy target, we clear the context before returning it.
This forces GetTarget() to return nullptr as originally intended.

### Test:
  
  •  lldb/test/Shell/Commands/process-attach-dummy.test :
Add a new standalone Lit shell test to replicate this scenario. The test
sources a command sequence executing process attach when no target
exists, and verifies that target list successfully registers the newly
created real
  target ( target #0: <none> ) instead of leaving the list empty.
…vm#199222)

VPlanTransforms::convertToStridedAccesses calls
VPWidenMemoryRecipe::computeCost, which uses VPTypeAnalysis in
VPCostContext to infer the pointer type of the load address. However,
CachedTypes in VPTypeAnalysis may be invalidated since earlier
transformations in tryToBuildVPlan could erase recipes from the plan.
This pollutes the cache with stale types.

Fix this by creating a new VPCostContext locally scoped to
convertToStridedAccesses, ensuring VPTypeAnalysis reflects the current
plan state. This serves as a quick fix to prevent accidental reuse by
future transformations.
Follow-up to llvm#198941, which introduced Locked<T> and SharedLocked<T>.
Add GetObjectFileLocked, GetSymbolFileLocked, GetSymtabLocked, and
GetSectionListLocked alongside the existing accessors.

The locked variants cover two things:

1. They prevent the pointer from being swapped out from under the
caller. The old getters take m_mutex only during lazy initialization and
release it before returning. The unique_ptr or shared_ptr that owns the
pointee can therefore be reassigned by another thread while the caller
still holds the raw value. LockedPtr keeps the Module mutex held
alongside the borrowed pointer, pinning the binding for the lifetime of
the handle.

2. They serialize access to the pointee itself. This is not new, the
classes in question were already relying on the Module mutex for
synchronization.

Migrate the four call sites in Module where the existing patter maps to
a single LockedPtr.

The legacy raw-pointer getters remain so call sites can migrate
incrementally.
…vm#199126)

Most callers are unchanged, since they either ignore the specific error
or have their own formatting of the error that includes both the path
and the errorToErrorCode-unwrapped value. However, for clients that just
forward the error it's helpful to ensure we do not lose track of the
filename that the error is associated with, so use FileError.

Incidentally remove two uses of errorToErrorCode that were being used
instead of consumeError; in both cases getOptionalFileRef was more
appropriate.
…results (llvm#199119)

With layout conflict handling this case is no longer an issue.
…ing (llvm#199189)

This prevents generating invalid C code in mixed-language headers by
leaving `typedef` declarations inside `extern "C"` blocks intact by
default.

Fixes llvm#141394
…ameter mapping (llvm#195995)" (llvm#199228)

This reverts commit 7e2821e, which
causes a crash-on-valid in clang:
llvm#199209
…essage (llvm#199233)

Help track whether a fold was attempted or not

Copy link
Copy Markdown
Owner Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@tonykuttai tonykuttai closed this May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment