Skip to content

[Clang][Modules] Fix -Wunused-variable (#196577)#7

Closed
tonykuttai wants to merge 1833 commits into
mainfrom
tvarghese/pragma-copyright-comment-update
Closed

[Clang][Modules] Fix -Wunused-variable (#196577)#7
tonykuttai wants to merge 1833 commits into
mainfrom
tvarghese/pragma-copyright-comment-update

Conversation

@tonykuttai

Copy link
Copy Markdown
Owner

[Clang][Modules] Fix -Wunused-variable (llvm#196577)

Mark some variables [[maybe_unused]] and inline others that do not have
side effects to avoid -Wunused-variable in non-assert builds.

[AArch64][GlobalISel] Legalize F64 to BF16 fptruncates (llvm#196077)

This two-step expansion of bf16 fptrunc steps needs to be careful to
avoid double-rounding error. Under AArch64 we can apparently convert to
a fcvtxn that performs round-to-odd, followed by a standard fp truncate
to bf16 to make sure the rounding from there is done correctly. This
reuses the existing lowering added for vector operations.

[SLP][NFC]Add a test with the revectorization of the struct-returning intrinsics

Reviewers:

Pull Request: llvm#196581

[AMDGPU] Add missing CMake link component (llvm#196579)

The issue was triggered by llvm#196547.

[SLP] Vectorize struct-returning intrinsics

Allow SLP to combine across lanes calls that return a literal struct
(llvm.sincos, llvm.*.with.overflow, llvm.frexp, ...) into a single
call returning a struct of vectors, by widening {T, T, ...} to
{, ...} via VectorTypeUtils and emitting extractvalue +
extractelement for external uses.

Reviewers: hiraditya, bababuck, RKSimon

Pull Request: llvm#195521

[PowerPC][NFC]Refactor EmitInstrWithCustomInserter (llvm#196114)

Currently PPCTargetLowering::EmitInstrWithCustomInserter() uses a large
if/else-if structure. Update to use switch and
move ATOMIC_CMP_SWAP and SELECT code to helper functions for better
readability and maintenance.

[AMDGPU] Pre-commit unit test for RP tracking reset/advance inconsistencies fix (llvm#196098)

This adds a new AMDGPU unit test file for testing the behavior of
GCNRPTracker and its related classes. The two test showcase confusing
return value and behavioral semantics for variants of the advance and
reset functions, which will be clarified in a follow up commit.

Revert "[SLP] Vectorize struct-returning intrinsics"

This reverts commit b0c6df7 to fix
buildbots https://lab.llvm.org/buildbot/#/builders/52/builds/17118

Reviewers:

Pull Request: llvm#196591

[OFFLOAD][L0] Fix incorrect values in the Level Zero cached header (llvm#196587)

The current ZE_STRUCTURE_TYPE_DEVICE_IP_VERSION_EXT and
ZE_STRUCTURE_TYPE_RELAXED_ALLOCATION_LIMITS_EXP_DESC values are
incorrect as seen here:
*
https://github.com/oneapi-src/level-zero/blob/0f246f6edf90d56604f00f83b41d783dc6a9394e/include/ze_api.h#L318
*
https://github.com/oneapi-src/level-zero/blob/0f246f6edf90d56604f00f83b41d783dc6a9394e/include/ze_api.h#L324

clang: Consolidate -aux-triple handling (llvm#196551)

All of the offload languages were essentially doing the
same thing, with overcomplicated conditions conditional on
the language.

[flang][OpenACC] support collapse on unstructured acc.loop (llvm#196174)

PR llvm#164992 added unstructured-loop support to OpenACC lowering (no
bounds on acc.loop, IVs privatized, body emitted as explicit cf), but it
didn't covered the collapse(N) case. Compiling

  !$acc parallel loop collapse(2)
  do j = 1, n
    do i = 1, n
      if (i == jdiag) then
        a(i,j) = 0.0d0
        cycle
      end if
      a(i,j) = real(i + j, 8)
    end do
  end do

asserted in MLIR's runRegionDCE: "Assertion `mightHaveTerminator()'
failed".

Root cause: visitLoopControl unconditionally marked every inner DO of a
collapsed nest via markDoConstructAsCollapsed. genFIR(DoConstruct) then
read that marking and skipped the inner DO's loop machinery on the
assumption that the parent acc.loop iterates and supplies the IV via a
block argument. That assumption holds for the structured case, but not
for the unstructured case added in llvm#164992. Skipping it left the
PFT-pre-allocated scaffold blocks (pre-header, header, exit) without
terminators.

Fix: add a markInnerCollapsed parameter (default true) to
visitLoopControl, and pass false from privatizeInductionVariables (the
unstructured case of buildACCLoopOp).

Assisted-by: AI

[flang][OpenMP] Fix component-level initializer in declare reduction (llvm#195751)

When a declare reduction initializer uses a component assignment such as
initializer(omp_priv%member = 0), the lowering would store the scalar
RHS value (i32) directly to the whole derived-type reference, causing a
FIR verification error: 'fir.store' op store value type must match memory reference type.

The root cause is that MakeEvaluateExpr extracts only the RHS
expression
from the AssignmentStmt, discarding the LHS component information. The
lowering callback then returns this scalar value which gets stored to
the
wrong type.

Fix this by mirroring the approach already used for combiner
expressions:
pass the parser-level OmpStylizedInstance to processInitializer so
the
callback can access the typed assignment and lower the full assignment
(both LHS and RHS), correctly handling component designators, function
calls on the RHS, and user-defined assignment.

Fixes llvm#184927 (with-initializer part; the without-initializer case
remains unsupported).

Assisted-by: Claude Opus 4.6.

Co-authored-by: Matt P. Dziubinski matt-p.dziubinski@hpe.com

[compiler-rt][profile][NFC] Introduce INSTR_PROF_INSTRUMENT_GPU_FUNC macro (llvm#196538)

Add a macro INSTR_PROF_INSTRUMENT_GPU_FUNC for the name of the GPU
profiling function __llvm_profile_instrument_gpu (added in llvm#187136),
following the same pattern as INSTR_PROF_VALUE_PROF_MEMOP_FUNC. Use the
macro in both the declaration in InstrProfiling.h and the definition in
InstrProfilingPlatformGPU.c.

This prepares the upcoming HIP/AMDGPU offload PGO patch (llvm#177665) to use
the same macro when calling this function.

[AMDGPU] Add subtarget features for MAD NC and 64-bit MIN/MAX instructions (llvm#196326)

[InstCombine][NFC] Replace buildAssumeFromKnowledge with CreateAlignmentAssumption (llvm#196254)

[DWARFLinker] Emit .debug_names entries for DW_TAG_template_alias (llvm#196440)

The tag was missing from the accelerator-records saver's switch, so
template alias DIEs were skipped and --verify-dwarf=output rejected the
result.

[lldb] Fix TestPtrauthBRKc47xX16Invalid.py (llvm#196408)

LLDB correctly detects the pointer authentication failure.

[lldb] Remove __iter/len__ from SBTypeEnumMember (llvm#196610)

SBTypeEnumMember doesn't have a GetSize and
GetTypeEnumMemberAtIndex, so having __iter__ and __len__ doesn't
make sense. These are on SBTypeEnumMemberList. From the docstrings, it
looks like the extensions were copied from said type.

[CIR] Implement CoawaitExpr for ComplexType (llvm#194027)

Implement CoawaitExpr support for ComplexType

Issue llvm#192331

[cir] fix IR dump comments from llvm#195198 (llvm#196605)

[VPlan] Unify inner and outer loop paths (NFCI). (llvm#192868)

Move combine the logic of tryToBuildVPlanWithVPRecipes and
tryToBuildVPlan, as well as planInVPlanNativePath and plan.

This unifies the code paths to construct plans for both inner and outer
loop vectorization, and removes some duplication. It also ensures we run
almost the same VPlan-transformations in both modes. Currently a few
code paths need to be guarded with a check if we are dealing with an
inner and outer loop.

PR: llvm#192868

[gn] port 2e2d90b (llvm#196618)

[gn build] Port 3fe311f (llvm#196619)

[gn build] Port c507e20 (llvm#196620)

[gn build] Port e6efa1a (llvm#196621)

[gn build] Port ebb9a79 (llvm#196622)

[gn] port 7e74c78 (llvm#196624)

[clang][deps] Use ModuleDepCollector for Make output (llvm#182063)

The dependency scanner works significantly differently depending on what
kind of output it's asked to produce. The Make output format has been
using the regular Clang dependency collection mechanism since it was
first implemented. This means the implementation works very differently
to the rest of the scanner and isn't able to turn implicit module
command lines into Makefiles using explicit modules.

This PR unifies the two implementations, using ModuleDepCollector even
for Make output. Emitting explicit module builds into Makefiles will
come in a later PR.

[libc++] Remove _LIBCPP_HIDE_FROM_ABI from <__utility/pair.h> (llvm#196508)

This is a follow-up to llvm#193045. This only drops _LIBCPP_HIDE_FROM_ABI
in a small part of the code base to make sure everything works as
expected. Once this has been in trunk for a while and there aren't any
problems, there will be larger follow-up patches to remove
_LIBCPP_HIDE_FROM_ABI throughout the code base.

[mlir][core] Restore dropped printIR behavior. (llvm#196628)

Restore checking for module scope which is dropped in llvm#195198

[VPlan] Fix cyclic phi type inference in early outer loop plans. (llvm#196634)

For phis check if any of the operands are VPIRValues or we already have
cached types. If so, return them.

This fixes a verification stack overflow in the VPlan outer loop path
after llvm#192868.

[DWARFLinker] Deduplicate .debug_frame CIEs across LinkContexts (llvm#195393)

Each LinkContext held its own EmittedCIEs map, so linking the same
object twice (or two objects with identical CIEs) produced one CIE per
LinkContext instead of one shared CIE. Hoist the registry to linker
scope and split emission into three phases so contexts can emit their
frames concurrently while still sharing one deduplicated CIE pool:

  1. Scan (parallel, during link). scanFrameData() records the unique CIEs
    referenced by retained FDEs, in first-reference order, into
    FrameScanResult::CIEs. scanAndUnloadInput() chains the scan in front of
    the existing input-unload so the DWARFContext can be released before the
    post-link emit pass.

  2. Merge (serial, after link completes). registerCIEs() walks each
    context's scanned CIEs in ObjectContexts order and try_emplaces them
    into the linker-wide CIERegistry. The first LinkContext to reference a
    CIE becomes its owner and reserves a local offset in its own
    .debug_frame section; later contexts only learn the owner's section and
    offset.

  3. Emit (parallel). emitDebugFrame() writes each context's owned CIEs
    followed by its FDEs into its own SectionDescriptor. FDE CIE_pointers
    are recorded as DebugOffsetPatches against the owner's section; the
    existing patch resolver rebinds them to OwnerStartOffset + LocalOffset
    when global offsets are assigned. Each task writes only to its own
    section, so no locking is needed.

Output is fully deterministic: ownership assignment, per-context CIE
order, FDE order within a section, and section concatenation order all
depend only on the input, not on thread scheduling. A context's CIEs may
now appear after FDEs (from other contexts) that reference them — DWARF
allows this, and cross-context FDE -> CIE pointers resolve correctly via
the patch mechanism.

AMDGPU/GlobalISel: RegBankLegalize rules for cluster_load_b32/b64/b128 (llvm#196186)

AMDGPU/GlobalISel: RegBankLegalize rules for cvt fp8 e5m3 intrinsics (llvm#196369)

[libc] Skip targets with unavailable __ONLY flags (llvm#196637)

When SKIP_FLAG_EXPANSION strips a flag that has the __ONLY modifier,
remove_duplicated_flags drops the flag from the list. This leaves
expand_flags_for_target with an empty flag list, causing it to create a
plain (non-flag) target. The __ONLY semantics, "only build this target
with the flag active", are silently violated.

On x86-64 CI runners without FMA, this results in cosf_float_test and
sinf_float_test being built and linked without FMA. The sincosf
algorithm was tuned assuming fused multiply-add precision, so the
unfused x*y+z fallback exceeds the 3.5 ULP tolerance (57 ULP for cosf,
12 ULP for sinf).

Added an early return in add_target_with_flags: if any flag with the
__ONLY modifier would be skipped, the target is not generated.

Assisted-by: Automated tooling, human reviewed.

[DWARFLinker] Don't duplicate classes with in-class static decls (llvm#196442)

An in-class static declaration was forced to PlainDwarf placement and
cascaded that up to its enclosing class. If the class was already in the
type table via the out-of-line definition's specification, it ended up
with Both placement and cloneDIE emitted two copies. Keep in-class
static declarations in the type table so they stay with their enclosing
type.

[libc] Disable -march=native in CI to fix sccache poisoning (llvm#196560)

-march=native is incompatible with shared build caches because sccache
treats it as a literal string. Object files compiled on one CPU model
get silently served to runners with a different CPU, causing SIGILL
crashes in the opt_host memory tests.

Made LIBC_COMPILE_OPTIONS_NATIVE a CMake cache variable so CI can
override it. Both overlay and fullbuild workflows now pass
-DLIBC_COMPILE_OPTIONS_NATIVE="" to disable -march=native. Local
developer builds are unaffected and still default to -march=native.

Reverted the per-CPU cache key approach from llvm#196477 in favour of this
fix, which addresses the root cause.

Bumped sccache key versions (v2) in both workflows to invalidate the
poisoned caches.

Assisted-by: Automated tooling, human reviewed.

[lldb] Add lldb.summary and lldb.synthetic decorators (llvm#195351)

Adds two new decorators, @lldb.summary and @lldb.synthetic,
analogous to the existing @lldb.command decorator.

@lldb.summary("MyType")
def MyType_summary(valobj, _):
      return "summary string"

@lldb.synthetic("MyContainer")
class MyContainerSynthetic:
    def __init__(self, valobj, _): ...

These decorators result in type summary add and type synthetic add
commands being run.

An additional motivation: these decorators will make it straightforward
to invoke the Python-to-LLDB formatter bytecode compiler
(formatter_bytecode.Compiler), which currently requires command-line
flags to know how to register formatters. With these decorators, the
registration metadata is associated directly with the implementing
function or class.

See the docstrings and formatters.py test fixture for usage examples.

Assisted-by: claude

[X86] combine-add.ll - regenerate to show missing add asm comments (llvm#196647)

[lld][WebAssembly] Remove the experimental warning for PIC/dynamic linking (llvm#196566)

The current dynamic linking support has been used for several years not
both in emscripten and in wasi-sdk and is documented
https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md.
We did/do have have plans to develop another version of the dynamic
linking ABI that doesn't use a global symbol namespace, and that can
still happen, but the current API is clearly production worthy
regardless of future plans.

This change removes the linker warning and the corresponding
--experimental-pic flag.

If we do want to still make breaking changes to the dylink format we can
rename the dylink.1 section (which already contains a version number).

This change is leads the way for enabling shared libraries by default in
emscripten.

[flang][cuda] Widen stream argument to i64 in stream intrinsic lowering (llvm#196650)

genCUDASetDefaultStream and genCUDAStreamDestroy build their runtime
call with an i64 stream parameter but pass the actual argument
straight through, so a smaller-kind actual (e.g. the literal 0 in
cudaforSetDefaultStream(0)) produces an ill-typed fir.call:

error: 'llvm.call' op operand type mismatch for operand 0: 'i32' != 'i64'

Insert a fir.convert to i64 before the call, matching what
genCUDASetDefaultStreamArray already does.

[mlir][AMDGPU] Add, unify verification of memref index counts (llvm#196657)

This PR verifies that, on operations that have
%memref[%idx0, %idx1, ...] arguments, the number of indices matches
the rank of the memref being passed in.

While we're here, fixes capitalization for certain verification error
messages.

Assisted-by: Codex 5.5 (handled much of the implementation)

[lldb] Handle SIGINT via the MainLoop signal thread (on POSIX) (llvm#195959)

The driver's async SIGINT handler called
SBDebugger::DispatchInputInterrupt directly. That is not
async-signal-safe and can lead to a crash.

Register SIGINT with the existing signal-thread MainLoop instead so
DispatchInputInterrupt runs in normal thread context. The Windows path
is unchanged and keeps the legacy async handler.

While DispatchInputInterrupt runs, the callback temporarily installs
SIG_DFL so a second Ctrl-C still hard-terminates the process, preserving
the escape hatch users rely on when the debugger is unresponsive.

Moving SIGINT off the main thread means a Ctrl-C no longer interrupts
blocking syscalls there (e.g. a Python REPL waiting on input or
sleeping), so Python never observes the queued interrupt and
KeyboardInterrupt is not raised. To restore that behavior, after
dispatching the interrupt the callback re-raises SIGINT on the main
thread via pthread_kill; the resulting EINTR lets Python pick up the
pending interrupt. A skip flag suppresses the re-entry that this
self-send produces. Because the callback only ever runs on the signal
thread, the flag and the captured main-thread id live in the lambda's
captures and need no synchronization.

rdar://158218595

[BOLT][NFCI] Consolidate DataReader::setEntryCounts (llvm#196411)

FuncBranchData/BinaryFunction exec/external entry counts are set
in multiple places in DataReader:

  • FBD: in parse and appendFrom,
  • BF: in preprocessProfile and matchProfileData.

Consolidate to setEntryCounts called from readProfile.
Drop explicit counters, compute them from FBD::EntryData.

Test Plan: NFCI

[DirectX] Not print invalid root signature definitions. (llvm#196444)

This patch adds a check during root signature printing pass, that makes
sure we have valid root signature before starting printing. This is
required after llvm#194858 changed
reportError to not stop after emitting the first error.

Fix: llvm#196430

[clang][deps] Move ScanningOutputFormat out of the library (llvm#196631)

Basing behavior of the dependency scanner on the final output format is
a leaky abstraction. Instead, we should aim to introduce proper feature
flags.

[RISCV] Use the nhs.lea.h/w/d instead of nhs.lea.h/w/d.ze with Sh1AddPat. (llvm#196660)

The srliw already took care of zeroing the upper bits. Using the non-.ze
form is consistent with the Zba version of this pattern.

Revert "[BOLT] Fix EH data encoding checks in relocateEHFrameSection (llvm#195691)" (llvm#196672)

This reverts commit 7ab26d7.

There is test failure in bolt-tests::exceptions-split-strip.test.

[mlir][tensor] Enhance pattern to fold extract_slice(insert_slice) (llvm#195045)

Extend the DropRedundantRankExpansionOnExtractSliceOfInsertSlice pattern
to support cases where the expanded dimensions are a subset of the
dropped dimensions, rather than requiring them to be exactly equal.
For example:

%inserted_slice = tensor.insert_slice %src into %dest[0, 0, 0, 0] [1, 1, 128, 480] [1, 1, 1, 1] :
        tensor<128x480xf32> into tensor<1x1x128x480xf32>
%extracted_slice = tensor.extract_slice %inserted_slice[0, 0, 0, 0] [1, 1, 123, 1] [1, 1, 1, 1] :
        tensor<1x1x128x480xf32> to tensor<123xf32>

can be folded into:

%extracted_slice = tensor.extract_slice %src[0, 0] [123, 1] [1, 1] :
        tensor<128x480xf32> to tensor<123xf32>

[CodeGen] Use unique_ptr for FunctionInfo to prevent memory leaks (llvm#196603)

Raw pointer return from FunctionInfo::create caused leaks in callers
like computeABIInfoUsingLib, breaking BPF tests on ASan bots.
Using std::unique_ptr enforces automatic cleanup.

Fixes leak from llvm#194460.
Buildbot: https://lab.llvm.org/buildbot/#/builders/52/builds/17090

Assisted-by: Gemini

[CIR][RISCV] Support zksh builtin codegen (llvm#196463)

[lldb] Fix CommandObjects that don't set a return status (llvm#196588)

Several CommandObject subclasses had DoExecute paths that returned
without ever calling SetStatus on the CommandReturnObject. The status
was silently left at its initial eReturnStatusStarted value, which made
Succeeded() report false for what were really successful commands and
left CommandReturnObject in an undefined state.

[AMDGPU] Support atomic load and store for vector float types (v2f16, v2i16, v4i16, v4f16, v2f32) (llvm#192904)

Add support for atomic load and store on <2 x half>, <4 x half>, and
<2 x float> vector types in the AMDGPU backend.

These types are promoted to equivalently sized integer types before
instruction selection:
<2 x half> -> i32
<4 x half> -> i64
<2 x i16> -> i32
<4 x i16> -> i64
<2 x float> -> i64

Revert "[lldb] Handle SIGINT via the MainLoop signal thread (on POSIX)" (llvm#196684)

Reverts llvm#195959 because it caused
TestIOHandlerCompletion.py to fail in CI (GreenDragon).

[Clang] Do not eat SFINAE diagnostics for explicit template arguments (llvm#139066)

Instead of merely suggesting the template arguments are invalid, we now
provide an explanation of why the explicit template argument is invalid.

[Utils] Fix duplicate DomTree updates in SplitIndirectBrCriticalEdges (llvm#196475)

SplitIndirectBrCriticalEdges generates DomTree Insert/Delete pairs for
each predecessor in OtherPreds. However, OtherPreds can contain
duplicate entries when a conditional branch has both targets pointing to
the same block (e.g., br i1 %c, label %X, label %X). This produces
duplicate DomTree updates for the same edge, triggering the assertion
std::abs(NumInsertions) <= 1 && "Unbalanced operations!" in
LegalizeUpdates.

Fix by tracking which source blocks have already had DomTree updates
emitted, and skipping duplicates.

[CIR][CUDA][NVPTX] Set ptx_kernel calling convention on CUDA kernels (llvm#195382)

Related: llvm#179278,
llvm#175871

More target attributes like: NoInline on kernels, CUDALaunchBoundsAttr,
CUDAGridConstantAttr param attrs, nvvm.annotations for surface/texture
VarDecls to be deferred for later patches.

[DAGTypeLegalizer] Add missing BR_CC handler for soft-promoted half operands (llvm#196214)

SoftPromoteHalfOperand had no case for ISD::BR_CC, causing a crash
when a half-typed fcmp result fed directly into a conditional branch.
All other comparison-related nodes (SETCC, SELECT_CC) were already
handled. Add SoftPromoteHalfOp_BR_CC following the same pattern as
SoftPromoteHalfOp_SELECT_CC.

Fixes llvm#195562


Co-authored-by: Tony Varghese tony.varghese@ibm.com

[RISCV][GISel] Add test coverage for the srliw+shXadd patterns. NFC (llvm#196676)

GISel isn't canonicalizing the shift pair to an AND the same way
SelectionDAG does so the patterns weren't firing. Add more directed
tests that use an And explicitly.

[clang][AMDGPU] Reject malformed target IDs with empty components (llvm#196140)

Fixes llvm#196078

An extra colon in -mcpu (e.g. gfx900::xnack+) produced an empty
feature component and triggered an assertion in StringRef::back().

Return std::nullopt for malformed target IDs instead.

[AArch64][GlobalISel] Enable BF16 legalization for fadd and friends. (llvm#196081)

This enabled bf16 promotion for the following operations in GISel,
promoting them to f32 and truncating the result back:
G_FADD, G_FSUB, G_FMUL, G_FDIV, G_FMA, G_FSQRT, G_FMAXNUM, G_FMINNUM,
G_FMAXIMUM, G_FMINIMUM, G_FCEIL, G_FFLOOR, G_FRINT, G_FNEARBYINT,
G_INTRINSIC_TRUNC, G_INTRINSIC_ROUND, G_INTRINSIC_ROUNDEVEN

[AArch64][NFC] Remove unused TRI member from class (llvm#184363)

I’ve removed the TRI member and its initialization, leaving only MRI and
TII as the stored pointers.


Co-authored-by: Benjamin Maxwell benjamin.maxwell@arm.com

[ObjectYAML][NFC] Extract BBAddrMap YAML types into shared namespace (llvm#196019)

Move BBAddrMapEntry and PGOAnalysisMapEntry out of namespace ELFYAML
into a new format-agnostic namespace BBAddrMapYAML so that COFF
YAML support can reuse the same schema and MappingTraits.

[clang] Update cxx_dr_status.html (llvm#196702)

Updates from 2026-05-08 CWG telecon.

[clang-tidy] Avoid use-nodiscard false positives for class templates (llvm#196661)

Do not suggest adding [[nodiscard]] to functions returning a class
template specialization whose primary template is already marked
[[nodiscard]].

Class template specializations do not carry the [[nodiscard]]
attribute on their own declarations, so modernize-use-nodiscard
previously missed this case and emitted redundant diagnostics for return
types such as:

template <class T>
struct [[nodiscard]] Result;

Result<int> f() const;

Fixes llvm#163425.

[CI] Ignore TidyFastChecks.inc for formatter CI. NFC. (llvm#196682)

TidyFastChecks.inc is generated and its contents should not be checked
by clang-format CI workflow. Add a local .clang-format-ignore entry so
the PR formatting check does not report diffs for this file.

Related run:
llvm#194516 (comment)

[clang-tidy] Migrate explicit-constructor check from google to misc and add relative aliases (llvm#194807)

Fixes llvm#126032

[AArch64][GlobalISel] Promote BF16 G_FCMP (llvm#196093)

This adds bf16 legalization for floating point compares.

[RISCV][NFC] Rename Zvvmm instruction file to Zvvm (llvm#196692)

Renames RISCVInstrInfoZvvmm.td to RISCVInstrInfoZvvm.td so Zvvmm
and Zvvfmm share the same IME instruction file according to the spec.
And all future instructions from the Zvvm family will be placed here
too.

This PR is required for reviewing llvm#196486 in order to make GitHub show
the diff correcrly.

[BPF] Support Stack Arguments (llvm#189060)

Currently, bpf program and kfunc only support 5 register parameters. As
bpf community and use cases keep expanding, there are some need to
extend 5 register parameters by allocating additional parameters on
stack. There are two main use cases here:

  1. Currently kfunc is limited to 5 register parameters. In some special
    situation, people may want to have more than 5 parameters. One of
    example is for sched_ext.
  2. Allowing more stack parameters can make bpf prog writer easier since
    they do not need to carefully limit the number of parameters for their
    programs.

The following is the high-level design:

  • Use bpf register R11 as the frame pointer to stack parameters. This is
    to avoid mixing stacks due to R10.
    • Stack parameters must be after 5 register parameters.
  • All parameters should be at most 16 bytes as ByVal parameters are not
    supported.
  • Support for cpu v1 to v4 so all cpu versions can use this. A feature
    macro __BPF_FEATURE_STACK_ARGUMENT is defined and users can check
    whether stack argument is supported or not.

The below is a simple asm code example about stack parameters:

  bar:
    /* Retrieve two parameters from the caller of bar(). */
    rX = *(u64 *)(r11 + 8)  // 1st arg
    rY = *(u64 *)(r11 + 16) // 2nd arg
    ...
    /* Prepare the single stack parameters for foo1 */
    *(u64 *)(r11 - 8) = rZ  // 1st arg
    call foo1
    ...
    /* Prepare the single stack parameters for foo2 */
    *(u64 *)(r11 - 8) = rX  // 1st arg
    *(u64 *)(r11 - 16) = rY // 2nd arg
    call foo2
    ...
  foo1:
    /* Retrieve parameter '*(u64 *)(r11 - 8) = rZ' from bar(),
     * and assign the value rZ to rX.
     */
    rX = *(u64 *)(r11 + 8)  // 1st arg
    ...
  foo2:
    /* Retrieve parameters '*(u64 *)(r11 - 8/16) = rZ' from bar(),
     * and assign values rX/rY to rU/rV.
     */
    rU = *(u64 *)(r11 + 8)  // 1st arg
    rV = *(u64 *)(r11 + 16) // 2nd arg
    ...

The code patterns in the above try to follow x86_64/arm64 calling
conventions. That is, the first argument is in lower location than
the second argument, etc. The r11 based load should retrieve the value
directly from the caller stack. The r11 based store should push
the value directly on the specificed stack location.

Internally in bpf backend, pseudo insns are generated for
load_stack_arg and store_stack_arg. The BPFMIPeephole pass
changes pseudo insns into proper real bpf insns like the above.

[VectorCombine] foldShuffleChainsToReduce - add support for partial vector reductions (llvm#195119)

Extend foldShuffleChainsToReduce to recognize partial reduction patterns where only a subvector of the full vector is being reduced.

For example, a <16 x i16> vector where the shuffle chain only reduces the lower 8 elements can now be folded into:
shufflevector (extract lower <8 x i16>) + vector.reduce.smax

The detection works by noticing when the bottom-up walk through the
shuffle/op chain ends before consuming the full vector. The number of
levels visited determines the subvector size (2^levels), and an
extract_subvector + scalar reduction replaces the original chain when
profitable.

Fixes llvm#194617

[clang-tidy] Correct std::has_one_bit to std::has_single_bit in modernize-use-std-bit (llvm#196721)

There isn't std::has_one_bit in standard library, the function checks
if a number is an integral power of 2 is std::has_single_bit.

https://en.cppreference.com/cpp/header/bit

[SelectionDAG] Don't convert sextload to zextload through a multi-use freeze (llvm#196700)

Resolves llvm#196590.

The patch llvm#189317 to teach
DAGCombiner to look through freeze incorrectly introduce a miscompile of
sext -> zext. This resolves resolves the miscompile.

[libc++] LWG4324: unique_ptr<void>::operator* is not SFINAE-friendly (llvm#190919)


Co-authored-by: Hristo Hristov zingam@outlook.com

[clang-format] Add BreakFunctionDeclarationParameters option. (llvm#196567)

Adds an option the break function declaration parameters, always putting
them on the next line after the function opening parentheses.

This is an equivalent of BreakFunctionDefinitionParameters, but for
function declarations.


Co-authored-by: Lukas Jirkovsky lukas.jirkovsky@aveco.com

[mlir][SPIR-V] Convert math.fpowi to spirv.CL.pown (llvm#196701)

[VPlan] Lift isUsedByLoadStoreAddr into vputils, operate on VPValue(NFC) (llvm#196415)

Extract the helper previously scoped to VPReplicateRecipe::computeCost
and make it available from VPlanUtils so other transforms can query
whether a VPValue is used as part of another load or store's address.

Also relax the input type from VPUser * to VPValue *: the worklist now
tracks VPValues directly, and traversal is gated on the user being a
VPSingleDefRecipe before walking its own users. This is NFC for the
existing caller.

clang: Fix using -march=amdgcn in some r600 run lines (llvm#196745)

clang/AMDGPU: Use all_equal instead of building a temporary set (llvm#196742)

[mlir][SPIR-V] Support spirv.selection_control attribute on scf.if (llvm#196510)

[SLP][NFC]Add a test with scalable vector type in struct-returning intrinsic, NFC

Reviewers:

Pull Request: llvm#196747

[SLP][NFC]Add a test with struct-returning intrinsics in different basic blocks, NFC

Reviewers:

Pull Request: llvm#196748

[X86] Hoist ReservedIdentifiers to MCAsmInfo and shrink setup cost. NFC (llvm#196699)

PR llvm#186570 added a per-MCAsmInfo StringSet<> populated with X86
register names plus Intel-syntax keywords, which caused a minor
instructions:u increase.

Avoid heap allocation and hoist ReservedIdentifiers to MCAsmInfo for
other targets.

For the register-name source, prefer
X86IntelInstPrinter::getRegisterName over MCRegisterInfo::getName.
The former is a TableGen-emitted accessor into a static const char AsmStrs[] pool in X86GenAsmWriter1.inc, populated from the lowercase
asm-name argument of each def XX : X86Reg<"xx", ...>; in
X86RegisterInfo.td.

[MCParser] .incbin: Don't retain the buffer, don't require NUL termination (llvm#196696)

processIncbinFile uses SourceMgr::AddIncludeFile, which

  • sets RequiresNullTerminator=true and disable mmap when the file
    size is a multiple of the page size,
  • and unnecessarily retains the throwaway buffer in Buffers.

Switch to OpenIncludeFile so the buffer is freed when processIncbinFile
returns, and pass RequiresNullTerminator=false. The buffer is consumed
only by emitBytes; the lexer never scans it, so it does not need a
trailing '\0' (different from llvm#154972). Without that requirement,
MemoryBuffer mmaps the file and RSS tracks only the touched pages.

Stress test (1000 .incbin "blob.bin", 0, 16 against a 1 MiB blob):

                  Maximum RSS
  Before          1042944 KiB
  After             15360 KiB

Fix llvm#62339

Revert "Avoid assert in substqualifier (llvm#182707)" (llvm#196755)

This reverts commit e2def10.

[DAG] canCreateUndefOrPoison - out of range vector insert/extract element indices only generate poison (llvm#196720)

Matches ValueTracking / GISel implementations - although testing options are limited until DAG has actual uses of UndefPoisonKind::UndefOnly

[clang][NFC] Actually add the testcase for llvm#195416 (llvm#196759)

[Docs] Match body/toctree ordering on Reference and UserGuides (llvm#195542)

The toctree section is hidden but used for previous/next breadcrumbs.

This was suggested in
llvm#184440 (comment)

[clangd] Add InsertReplaceEdit for code completion (llvm#187623)

Handle new insertReplaceSupport capability (defined in LSP 3.16). Add
the new option to the protocol layer and pass it around to the code
completion logic. Update CompletionItem::textEdit to become the union
type as per the LSP specification.

Add a new helper function to the Lexer public API to find the end of an
identifier with full context lexing, to avoid duplicating the logic. Use
the helper both in the Sema flow and in the comment completion flow. Use
a simpler ASCII-only scan in no-Sema mode.

Add LIT tests to verify auto-triggered completions, mid-word
replacement, Unicode, and snippets. Add unit tests to verify
insert/replace ranges with and without Sema, including comments and the
feature-off case.

Update the release notes to document the new capability.

Fixes clangd/clangd#2190


Co-authored-by: timon-ul timon.ulrich@advantest.com

Revert "[clang-format][NFC] Format with the new formatter" (llvm#196771)

Reverts llvm#196523

[ELF] Fix --reproduce non-determinism with parallel input loading (llvm#196773)

After llvm#191690, LoadJob::Archive runs in parallel and getArchiveMembers()
calls ctx.tar->append() from the parallel body. TarWriter::append is
unsynchronized. Member order in the tar is also non-deterministic
because parallelFor scheduling determines append order.

Buffer per-job tar entries during the parallel pass and flush them in
the
existing serial post-pass, mirroring the thinBufs / files pattern.

[Clang] Make matrix type trivially copyable (llvm#193634)

In order to simplify matrix casting and follow the existing pattern HLSL
is doing, the matrix needs to be trivially copyable.
related to: llvm#184471


Co-authored-by: Joao Saffran jderezende@microsoft.com

[ADT] Decouple xxhash.h from ADT. NFC (llvm#196774)

Move xxHash64, xxh3_64bits, and xxh3_128bits ArrayRef/StringRef
overloads from llvm/Support/xxhash.h to inline overloads in
llvm/ADT/ArrayRef.h and llvm/ADT/StringRef.h, so xxhash.h has no ADT
dependencies.

This is prerequisite for using xxh3 as the combine_bytes backend in
llvm/ADT/Hashing.h (llvm#194567), which would otherwise reintroduce a header
dependency cycle.

FoldingSet.h and StableHashing.h adjust to call the new
pointer-and-length entry point.

[Hashing] Replace CityHash mixers with xxh3 (llvm#194567)

Replace the CityHash-style mixer in hash_combine and (transitively)
hash_value(std::basic_string), hash_value(StringRef), and therefore
DenseMap<StringRef, X> lookups, with a flatten-and-call into
xxh3_64bits, a modern hash superior to CityHash.

hash_value(int) / hash_value(ptr) keep the existing Murmur-style
hash_16_bytes mixer; those are the dominant DenseMap key paths and a
fully-inline 16-byte mix beats inlining xxh3's larger 0..16-byte short
path.

To break dependency cycle: xxHash64, xxh3_64bits, and xxh3_128bits
ArrayRef/StringRef overloads move from llvm/Support/xxhash.h to inline
overloads in llvm/ADT/ArrayRef.h and llvm/ADT/StringRef.h, so xxhash.h
has no ADT dependencies.

A variant that inlined xxh3's 0..16-byte fast path at every
combine_bytes call site (vs. always calling out-of-line xxh3_64bits)
showed no measurable compile-time improvement on the tracker, so
combine_bytes is a one-liner over the out-of-line entry point.

llvm-compile-time-tracker.com (CTMark, instructions:u)

  stage1-O0-g           -1.76%   (sqlite3 -3.78%)
  stage1-aarch64-O0-g   -1.40%   (sqlite3 -2.86%)
  stage1-ReleaseLTO-g   -1.13%
  stage1-ReleaseThinLTO -0.45%
  stage1-O3             -0.43%
  stage1-aarch64-O3     -0.42%
  stage2-O0-g           -0.42%
  stage2-O3             -0.15%
  clang build           -0.71%   (wall -0.42%)

DenseMap-of-pointer paths (dominant at -O3) are untouched, so higher-
optimization configs see smaller wins as expected. opt's .text shrinks
~92 KB. Subsumes the StringRef-only carve-out proposed in llvm#191115.

Notes on properties not introduced by this patch:

  • Endianness: hash_combine over native integers was already not
    cross-host
    stable. memcpy of a native integer into the buffer is host-encoded;
    fetch32 normalized the read but not the underlying bytes, so on LE vs
    BE the value fed to the mixer already differed. xxh3 inherits the same
    property: same byte stream, different mixer.

  • Process seed: combine_bytes XORs get_execution_seed into the result,
    which cancels under hash_combine(x) ^ hash_combine(y). The pre-patch
    short/state paths fed the seed through hash_16_bytes / shift_mix
    non-linearly, so this is a regression in seed effectiveness under that
    pattern. Default seed is constant, so this only matters under
    LLVM_ENABLE_ABI_BREAKING_CHECKS. Follow-up: add a seeded xxh3 entry
    point in libSupport.

Aided by Claude opus 4.7

[MC] Remove deprecated lookupTarget overload (llvm#196778)

This has been deprecated for a while and was slated for removal after
the branching of LLVM 22. Remove it since I'm on on the Google integrate
rotation this week and can take care of any failures on our end.

[libc] Add barebones dl_iterate_phdr implementation (llvm#194196)

Add a basic dl_iterate_phdr implementation so that we can get libunwind
building. This implementation is bare and not fully compliant with the
man page for fully static binaries (which are all that we support
currently with the lack of a dynamic linker) due to the lack of TLS
info, but that can be added at a future date if it is needed, as it is
not needed by libunwind.

Add some very basic smoke tests.

[ADT] Remove xxHash64 ArrayRef/StringRef overloads. NFC (llvm#196781)

xxHash64 is a legacy, pre-XXH3 hash whose only non-test caller in the
monorepo is llvm::getKCFITypeID. llvm#196774 accidentally exposed the API.

[Clang] Transform lambda's constraints when instantiating parameter mapping (llvm#195995)

This way we can remove a few workarounds of lambda expressions where
outer template arguments of concepts have to be preserved through
ImplicitConceptSpecializationDecls.

Fixes llvm#193944

[cmake] use target names instead of legacy variables (llvm#185463)

Use the name of the imported
targets

when testing the libraries during cmake configuration. This removes the
need to also set CMAKE_REQUIRED_INCLUDES and
CMAKE_REQUIRED_DEFINITIONS and reflects more modern CMake usage where
targets are preferred over variables.

This is already the case when checking libcurl in the same file.

[clang-tidy] Remove hicpp module [1/4] (llvm#194516)

This is part one of removing the hicpp-* checks.

RFC:
https://discourse.llvm.org/t/rfc-regarding-the-current-status-of-hicpp-checks/89883

Part of llvm#183462

[DAG][GISel] Rename CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF/CTTZ_ELTS_ZERO_UNDEF -> CTTZ_ZERO_POISON/CTLZ_ZERO_POISON/CTTZ_ELTS_ZERO_POISON (llvm#196732)

DAG/GISel are ambiguous about whether zero-input results in
UNDEF/POISON, unlike the rest of LLVM which makes it clear its POISON.

I've tried to clean this up once and for all by ensuring
SelectionDAG::canCreateUndefOrPoison does a includesPoison(Kind) check,
renaming the opcodes (including the VP variants) and updating as many
comments/tests as possible (I may still have missed some...).

[SPIR-V] Fix inttoptr type deduction with ptr.annotation (llvm#189219)

Opaque pointer inttoptr was recording ptr as a pointee type, so
OpConvertUToPtr was emitted as pointer-to-pointer and then bitcasted
back. Please see an example below.

LLVM IR:

%p = inttoptr i64 %x to ptr addrspace(1)
%a = call ptr addrspace(1) @llvm.ptr.annotation(... %p ...)
call spir_func void @prefetch(ptr addrspace(1) %a, ...)

SPIR-V (before the change):

%p2 = OpConvertUToPtr %_ptr_CrossWorkgroup__ptr_CrossWorkgroup_uchar %x
%p1 = OpBitcast %_ptr_CrossWorkgroup_uchar %p2
OpFunctionCall ... %p1 ...

Skip assigning pointee type for inttoptr when the destination is
untyped, fallback later recovers the correct single pointer type.

[clang-tidy] Fix FP in readability-container-size-empty with compairing to unrelated type (llvm#190535)

Fixes llvm#162287.

[lldb][Windows] Invalidate cached register values on thread stop (llvm#192430)

Invalidate cached values in register context data structures on every
thread stop.

NativeRegisterContextRegisterInfo::InvalidateAllRegisters performs no
operation by default. Subclasses may override it to clear cached values
within their register context data structures whenever a thread stops.

This change intends to set up the necessary infrastructure to support
caching of the thread context in NativeRegisterContextWindows_arm64,
which will improve read performance. Currently, the thread context is
retrieved for every read or write operation.

Revert "[VectorCombine] foldShuffleChainsToReduce - add support for partial vector reductions" (llvm#196796)

Reverts llvm#195119 while reported assertions are investigated.

[LifetimeSafety] Warn on incorrectly placed [[clang::lifetimebound]] attributes (llvm#196144)

Adds new warning that is emitted when parameter is marked as
[[clang::lifetimebound]] but is not returned in one way or another
(tracked via OriginEscapeFact).

Closes llvm#182935

[libc] Fix -Wshadow warnings in freetrie.h (llvm#196529)

clang-format: ensure ternary operands are aligned (llvm#196697)

Set ParentState::AlignedTo for ternary operands.

[gn] Make ClangDependencyScanningTests depend on Testing/Support (llvm#196809)

Needed after ebb9a79.

[flang][OpenMP] Consistent names for non-executable directives, NFC (llvm#196803)

Change
OpenMPGroupprivate -> OmpGroupprivateDirective
OpenMPThreadprivate -> OmpThreadprivateDirective
OpenMPRequiresConstruct -> OmpRequiresDirective
OpenMPUtilityConstruct -> OmpUtilityDirective

[AArch64] Improve post-inc stores of SIMD/FP values (llvm#151372)

Add patterns to match post-increment truncating stores from lane 0 of
wide integer vectors (v4i32/v2i64) to narrower types (i8/i16/i32). This
avoids transferring the value through a GPR when storing.

Also remove the pre-legaliztion early-exit in combineStoreValueFPToInt
as it prevented the optimization from applying in some cases.

[clang-tidy] Rename hicpp-multiway-paths-covered to bugprone-unhandled-code-paths (llvm#191625)

Part of the work in llvm#183462.

Closes llvm#183464.

Splitting the check into two more focused checks was considered during
discussion, but since clang-tidy does not support one-to-many aliases, a
single name covering both behaviors was chosen instead that is more
clear than multiway-paths-covered.


Co-authored-by: Zeyi Xu mitchell.xu2@gmail.com

[IRBuilder] Split CreateAssumption to one with bundle and one with condition [NFC] (llvm#196795)

as it is not possible to combine bundles and conditions from
llvm#160460 reflect that in
CreateAssumption

[clang-tidy] Reland "An option for conditional skipping overloaded functions in modernize-use-string-view" (llvm#196387)

[CIR][AArch64] Lower NEON vuzp intrinsics (llvm#195591)

Summary

part of : llvm#185382

lower vuzp intrinsics in:
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#unzip-elements

this is a follow up : llvm#195527

Lower NEON::BI__builtin_neon_vuzp_v and
NEON::BI__builtin_neon_vuzpq_vin CIRGenBuiltinAArch64.cpp by porting
by porting the existing incubator
logic(clangir/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp) : two
bitcasts on the input vectors,two rounds of cir.vec.shuffle generating
the deinterleave (even/odd) shuffle patterns with indices 2*i+vi, each
stored via ptr_stride on the sret base pointer.

[llvm][RISCV] Optimize fneg for fixed vectors (llvm#194555)

vfneg is not available on zvfhmin or zvfbfmin, it's expected to expand
to integer operations instead of unrolling to scalar operations.
General expandFNEG already handles that in most of cases except for
fixed vector types that are not promotable, we need to find a better
heuristic to gate this.

[llvm][RISCV] Optimize fabs for fixed vectors (llvm#194554)

vfabs is not available on zvfhmin or zvfbfmin, it's expected to expand
to integer operations instead of unrolling to scalar operations.
General expandFABS already handles that in most of cases except for
fixed vector types that are not promotable, we need to find a better
heuristic to gate this.

[llvm][RISCV] Optimize fcopysign for fixed vectors (llvm#193802)

vfsgnj is not available on zvfhmin or zvfbfmin, it's expected to expand
to integer operations instead of unrolling to scalar operations.
General expandFCOPYSIGN already handles that in most of cases except for
fixed vector types that are not promotable, we need to find a better
heuristic to gate this.

[InstCombine] Fold constant byte stores to integer stores (llvm#196740)

Byte constants are equivalent to integer constants when stored to
memory. Replacing them in store instructions reduces IR differences and
enables existing optimizations over integer constants.

[libcxx] Switch to check-runtimes for generic-llvm-libc (llvm#196780)

Move KCFI type ID hash helpers out of LLVMSupport (llvm#196784)

PR llvm#167254 inappropriately introduced llvm/Support/Hash.{h,cpp} for the
KCFI helpers. The name is misleading — it has nothing to do with the
generic hashing facility in llvm/ADT/Hashing.h — and KCFI is a
CodeGen/IR feature that does not belong in the foundational Support
layer.

Move the files to llvm/lib/Transforms/Utils/KCFIHash.cpp, alongside
setKCFIType, which is the only existing KCFI helper in TransformUtils.

Also relocate the deprecated pre-xxh3 xxHash64 implementation into
KCFIHash.cpp, the sole user. clang/test/CodeGen/kcfi-generalize.c and
kcfi-normalize.c are end-to-end regression tests for the xxHash64 output

[Coverage] Fix assertion failure when a -isystem header invokes a user macro (llvm#195427)

  // a.cc
  static void foo(int x) {
    switch (x) {
  #define GENERIC(n) case n:
  #include "types.def"   // -isystem header invokes a user macro
      break;
    }
  }

  // sys/types.def
  #define MID(name) GENERIC(name)
  MID(0)
  MID(1)
  MID(2)
$ clang -fprofile-instr-generate -fcoverage-mapping -isystem sys -c a.cc
Assertion `SystemHeadersCoverage ||
           !SM.isInSystemHeader(SM.getSpellingLoc(Loc))' failed.

Commit 702a2b6 ("[Coverage] Rework !SystemHeadersCoverage")
replaced the system-header skip in gatherFileIDs with this assertion,
which trips as SM.isInSystemHeader(SM.getSpellingLoc(Loc)) is false.

This patch adds back the pre-llvm#91446 condition but folds it with
the macro-token remap if statement.

Fixes llvm#179316/llvm#195422.
Clang Opus 4.7 identified clang/lib/Parse/ParseExpr.cpp, created a
minimal reproduce with cvise, and wrote the initial version of this
CodeGen patch. (An earlier session papered over the bug by patching
llvm-cov instead, which I abandoned).

[clang-tidy][NFC] Move ClassifiedToken to cpp file (llvm#196820)

ClassifiedToken is used in only the implementation of
UseTrailingReturnTypeCheck. Move it into the unnamed namespace of the
cpp file instead of it being in the header.

[Bazel] Fixes 2f4c387 (llvm#196822)

This fixes 2f4c387.

Co-authored-by: Google Bazel Bot google-bazel-bot@google.com

[libc] Move a few -Wshadow warnings in __support/File (llvm#196810)

No behavior change.

[libc][math] Fix -Wshadow warnings in cos.h (llvm#196342)

cos() does using namespace range_reduction_double_internal; and
range_reduction_double_internal after 51e9430 contains

using LIBC_NAMESPACE::fputil::DoubleDouble;
using Float128 = LIBC_NAMESPACE::fputil::DyadicFloat<128>;

So the local using statements for DoubleDouble and Float128 shadowed
these. Just remove the local using statements.

No behavior change.

[AArch64] New pass for code layout optimizations. (llvm#184434)

This pass is intended to optimize code layout prior to AsmPrinter. The
initial version handles two known cases:
I. FCMP-FCSEL
II. CMP/CMN-CSEL, 32-bit only

Using existing directives, the pass induces function-alignment (of
64-bytes by default) when a pair is detected, and possibly induces
block-alignment of up to 4-bytes on top of that if the pair would
straddle cache-lines.

Beyond performance improvement, this pass reduces noise due to code
layout thus stabilizes measured performance over-time. For example,
knock-out effects on a "sensitive function" won't be triggered by
codegen changes outside it.

Enabled by default on processors with the new FeatureAlignCmpCSelPairs
subtarget feature (gated per sub-case by FeatureFuseCmpCSel /
FeatureFuseFCmpFCSel); each case can also be forced through the
-aarch64-code-layout-opt enumerated bit-mask


Co-authored-by: Jon Roelofs jroelofs@gmail.com
rdar://171283264

[mlir][spirv] Remove stale NV CooperativeMatrix attributes (llvm#196639)

Since the support for NV CooperativeMatrix has been removed a while
back, those attributes can be safely removed.

[mlir][spirv] Enforce execution scope for group operations in ODS (llvm#196644)

This adds a new class SPIRV_ExecutionScopeAttrIs shared between group
and non-uniform group operations.

Assisted-by: Codex

[LV] Add tests for load/store scalarization and ptrcasts (NFC) (llvm#196839)

Add missing test coverage for range of pointer casts and load/store
scalarization.

[LV] Add missing cost tests for various unary and binary ops (NFC) (llvm#196841)

Add missing direct includes for bit.h/SwapByteOrder.h. NFC (llvm#196843)

These translation units use llvm::endianness, llvm::byteswap,
llvm::has_single_bit, or sys::IsLittleEndianHost without explicitly
including the header that declares them. They currently compile only
because llvm/ADT/Hashing.h transitively pulls in
llvm/Support/SwapByteOrder.h (which includes llvm/ADT/bit.h).

[libc] Fix a copyright comment typo (llvm#196846)

No behavior change.

[clang-tidy] comment braced and parenthesized init arguments (llvm#180408)

Handle arguments like {}, Type{} and Type() in
bugprone-argument-comment and
add coverage for initializer_list and designated initializers.

Fixes: llvm#171842

[ADT] Avoid map storage for small SmallMapVector (llvm#196473)

SmallMapVector previously used SmallDenseMap for its index, which still
initializes and maintains map storage even when the number of entries is
tiny.

Teach MapVector to support a vector-only small mode. While the entry
count stays
within the configured small size, operations use the underlying vector
directly.
When the size grows past the threshold, the map index is built and
subsequent
operations use the regular MapVector path.

This mirrors the small-size strategy used by SmallSetVector.

[clangd][Parser][Sema] Fix TemplateIdAnnotation UAF with template-id declarator and lambda default argument (llvm#196788)

I think this is another case of template annotations lifetime bug,
similar to the one fixed by
llvm#89494.

Closes llvm#196725.

[clang] Add arm64_neon.h wrapper on windows (llvm#196014)

Add an MSVC-compatible <arm64_neon.h> resource header that forwards to
Clang's generated <arm_neon.h>. This lets ARM64 Windows code using the
MSVC header name lower NEON intrinsics through Clang builtins instead of
eaving external neon_* calls such as neon_ld1m4_q32

Fixes llvm#195683

[clang][test] Add AArch64 requirement to arm64_neon.h test (llvm#196867)

Only run test when the AArch64 target is built

[LV][NFC] Reshape pointer_iv_non_uniform_0 test to use distinct loads (llvm#196494)

The followup patch
is folding some of the idempotent binary ops This test has sub x - x
operation which is affected by the followup patch. This patch is making
the test immune to the fold.

[InstCombine][NFC] Change the order of checks in SliceUpIllegalIntegerPHI for faster compile time. (llvm#183726)

SliceUpIllegalIntegerPHI searches for PHIs that have illegal type and
are only used by trunc or trunc(lshr) operations. It bails out if
encounters invoke or EH pad instructions.
It first checks whether it encounters invoke or EH pad, which is time
consuming as it checks every instruction. Then it checks whether it is
used by trunc or trunc(lshr). The former check is generally loose, while
the latter one is stricter. Switch the order of the checks will speed up
compilation.

Signed-off-by: XinlongZHANG-Bob zhangxinlong.bob@bytedance.com

[NFC] Fix C++23 build failures caused by incomplete types (llvm#196814)

[AArch64][CostModel] Model sve costs for ctpop (llvm#192428)

Targets supporting sve prefer sve for ctpop with fixed length vectors.
Update cost model to reflect the same.

[MLIR][NVVM][NFC] Restructure NVVM dialect (llvm#195811)

Moves the declarations of the NVVM dialect and some widely used enums
(FPRoundingModeAttr and SaturationModeAttr) to separate files to make
them easier to maintain and also use in the NVGPU dialect.

[clang][bytecode] Allow const mutation in all variable initializers (llvm#195794)

So the attached test case works even though it's just an InitListExpr.

[libc][stdlib] Add setenv (llvm#163018)

Add the POSIX setenv() function, with EnvironmentManager::set()
handling environment array management and ownership tracking.

Registered for x86_64, aarch64, and riscv architectures. Integration
tests cover overwrite/no-overwrite semantics, empty/invalid names,
empty values, and repeated replacement.

Assisted-by: Automated tooling, human reviewed.


Co-authored-by: Michael Jones michaelrj@google.com

[GlobalISel] Delay match table builder initialization (llvm#196506)

MachineIRBuilder::setInstrAndDebugLoc is expensive, delay until needed.

CTMark -0.10% geomean improvement on aarch64-O0-g.

https://llvm-compile-time-tracker.com/compare.php?from=71fef6d5a306d1adf8bf7d30d2fe9e286380fecf&to=8a87845dfde9de9d141b42d2fce92fcf3be02276&stat=instructions%3Au

Assisted-by: codex

[GlobalISel] Avoid repeated target info queries in combiners (llvm#196530)

tryCombineAllImpl queries target info for every instruction. Cache
TargetInstrInfo/TargetRegisterInfo/RegisterBankInfo in CombinerHelper
and pass to executeMatchTable instead.

This avoids repeated virtual calls on the combiner executeMatchTable
path.

CTMark -0.08% geomean improvement on aarch64-O0-g.

https://llvm-compile-time-tracker.com/compare.php?from=71fef6d5a306d1adf8bf7d30d2fe9e286380fecf&to=13bc49510657450402c066098e3a4b7d1af9d0e6&stat=instructions%3Au

Assisted-by: codex

[DebugInfo] Pack DILocation hash inputs (llvm#196556)

Pack DILocation fields before hashing. Now that column is 16-bits
Line/Column/ImplicitCode fit in one 64-bit value (32 + 16 + 1 = 49 bits)
and AtomGroup and AtomRank also fit cleanly in one 64-bit value (61 + 3
= 64 bits).

Fewer hash_combine inputs on the hot DILocation path is a small
compile-time improvement.

CTMark geomean:

  • stage1-ReleaseLTO-g: -0.10%
  • stage1-O0-g: -0.23%
  • stage1-aarch64-O0-g: -0.19%
  • stage2-O0-g: -0.07%

https://llvm-compile-time-tracker.com/compare.php?from=71fef6d5a306d1adf8bf7d30d2fe9e286380fecf&to=1d80b5f5aa98561d2ba09adc3f20c3eacd24cb88&stat=instructions%3Au

Assisted-by: codex

[LoopFusion] Remove SCEV-based dependence analysis path (llvm#195864)

Loop Fusion has used Dependence Analysis (DA) as the default dependence
check since the option default was flipped in llvm#187309. The SCEV-based
strategy and the combined "all" mode were retained only for fallback and
experimentation, with a comment noting that the SCEV code would be
removed in a follow-up.

This patch removes the SCEV-based dependence path and the now-unused
selector machinery.

Fixes llvm#194821.

Assisted by Cursor.

[clang-tidy][NFC] Fix tests on 32bit ARM (llvm#196873)

Should fix
llvm#191386 (comment).

[libc] Fix partial multi-byte write detection in File (llvm#196402)

File::write_unlocked(const wchar_t*, size_t) checked 'write_res.value <
1' after writing a converted UTF-8 sequence. For multi-byte characters,
a short platform write (e.g. 2 of 3 bytes for a 3-byte character) passed
this check and was counted as a successful write. The output stream
would then contain an incomplete UTF-8 sequence with no error reported
to the caller.

Changed the check to 'write_res.value < char_size' and set the error
indicator on the stream when it triggers.

Added a regression test using a mock File subclass that limits
platform_write to 2 bytes per call, simulating short writes on pipes and
sockets.

Assisted-by: Automated tooling, human reviewed.


Co-authored-by: Michael Jones michaelrj@google.com

[AA] No synchronization effects for never-escaping identified local (llvm#193939)

Fences and other synchronizing operations (such as atomic accesses
stronger than monotonic) are modelled as reading and writing all memory,
in order to enforce their implied ordering constraints.

Currently, this happens even for identified function locals that do not
escape. This patch excludes those objects.

Notably, we can not reason based on captures-before here, because the
synchronizing operation still has an effect even if the object only
escapes later.

The hope here is that with this restriction in place, it may be viable
to respect potential synchronization inside non-nosync function calls.

[Bazel] Fixes ce6605a (llvm#196880)

This fixes ce6605a.

Co-authored-by: Google Bazel Bot google-bazel-bot@google.com

[clang][NFC] Remove alignment checks from test/CodeGen/c-strings.c (llvm#196501)

and re-enable it on more targets.

I don't think this test was intended to check for alignment. Those
expectations were added as part of FileCheck-izing the test in
e29dadb and we've been working around
them or xfailing the test since.

[CIR][AMDGPU] Add lowering for amdgcn ds swizzle builtin. (llvm#196011)

Upstreaming clangIR PR: llvm/clangir#2052

This PR adds support for lowering of _builtin_amdgcn_ds_swizzle* amdgpu
builtin to clangIR.

[lldb] Fix TestDelayedBreakpoint on ARM Thumb (llvm#196888)

The original address used for the "fake breakpoint" is not valid in
Thumb mode. To be safe, change it to have 0's in the LSBs.

[clang][bytecode] Visit tryEvaluateObjectSize expr as lvalue (llvm#196010)

Just like we do with the first parameter of a regular
__builtin_object_size call.

This still doesn't fix the bigger bos test cases since e.g.

int NoViableOverloadObjectSize3(void *const p PS(3))
    __attribute__((overloadable)) {
  return __builtin_object_size(p, 3);
}
void test4(struct Foo *t) {
  gi = NoViableOverloadObjectSize3(&t[1].t[1]);
}

is still broken because we don't have special handling for the
&t[1].t[1] handling here and we can't usually access a one-past-end
pointer.

Use auto for DenseMap/SmallDenseMap iterator variables. NFC (llvm#196883)

To match the prevailing style.

[AArch64] Use dup (lane mov) over ext for high-half extract (llvm#195010)

This changes the instruction we use to extract the high half of a vector
register from a ext v0, v1, v1, 8 to a dup d0, v1.d[1]. This is
apparently slightly quicker on certain cpus and is generally a simpler
instruction. This matches the instruction that gisel produced.

Some of the old patterns for extract_subvector with index of 1 seem
incorrect but were never used as we do not reach selection with such
instructions. They have been repurposed to emit the new DUPi64
instructions.

Revert "[AA] No synchronization effects for never-escaping identified local" (llvm#196890)

Reverts llvm#193939

Caused buildbot failure.

Update GitHub PR Greeter (llvm#194307)

Following these two discussions:

add a reference to the LLVM AI policy in the GH greeter.

In addition:

  • Update the message to include links to other relevant policies as
    well, since these are often shared during PR review.
  • Add FAQ section and move some of the original content there.
  • Include a request for people to confirm that they have familiarised themselves with
    the policies.
  • Add Hello @{self.author} :wave: to make the greeting more personal.

[flang] dummy arguments used as function calls (llvm#196426)

Adding an error when a dummy argument is used as a statement function.

SUBROUTINE a(foo)
foo(c) = 0
END SUBROUTINE a

This PR now points out:

  1. Dummy argument 'foo' may not be used as a statement function
  2. 'foo' is not a callable procedure

Handles issue
196424


Co-authored-by: Sunil Kuravinakop kuravina@pe31.hpc.amslabs.hpecorp.net

[SelectionDAG] Split vector types for atomic load (llvm#165818)

Vector types that aren't widened are split so that a single ATOMIC_LOAD
is issued for the entire vector at once. This change utilizes the load
vectorization infrastructure in SelectionDAG in order to group the
vectors. This enables SelectionDAG to translate vectors with type
bfloat,half.

Add support for Ubuntu 26.10 - Stonking Stingray (llvm#196896)

Co-authored-by: Oliver Reiche oliver.reiche@canonical.com

[clang-tidy] Remove hicpp modules [2/4] (llvm#196870)

This is part two of removing the hicpp-* checks.

RFC:
https://discourse.llvm.org/t/rfc-regarding-the-current-status-of-hicpp-checks/89883

Part of llvm#183462

[LV] Handle FSub Partial Reductions (llvm#191186)

Introduces a new RecurKind value 'FSub' in order to handle partial
reductions of floating point values.

This is done by following the existing method for integer partial
reductions, doing a positive accumulation followed by a final
subtraction in the middle block.

[LV][NFC] Remove instcombine pass from RUN lines of simple tests (llvm#196257)

Most of the work done by the instcombine pass on these files involves
canonicalising GEPs and shuffling code around. I don't believe there is
any value running instcombine in these cases.

[GISel][X86] port X86PreLegalizerCombiner to npm (llvm#182638)

Porting X86PreLegalizerCombiner to npm as part of
llvm#178192

[X86] Cast atomic vectors in IR to support floats (llvm#148899)

This commit casts floats to ints in an atomic load during AtomicExpand
to support
floating point types. It also is required to support 128 bit vectors in
SSE/AVX.

[AMDGPU] Add VMovB64 subtarget feature (llvm#196340)

[mlir][SPIR-V] Add CL.{exp2,exp10,log2,log10} ops (llvm#196869)

[Clang] Fix incorrect type for __mfp8 in extractelement codegen (llvm#192977)

The codegen for extracting an element from an FP8 vector was emitting a
simple extractelement with i8 type for the extracted element. The
__mfp8 type is represented as <1 x i8> in LLVM IR. This codegen
created inconsistency in Clang - some __mfp8 expressions would
correspond to LLVM IR values with <1 x i8> type and some to i8 type.

It also caused an assertion failure when the extracted element was
passed as a function argument.

This patch fixes the issue by inserting the extracted element
into a <1 x i8>.

[mlir][tosa] Add a pass to downgrade TOSA 1.1.draft to 1.0 (llvm#194971)

This commit adds a pass that will allow 1.1.draft operations to be
rewritten to their 1.0 counterparts where possible. The pass currently
covers the following operations:

  • bool <-> fp32 casts via i8 bridge casts
  • bool gather/scatter with i32 indices via i8 payload rewrites

Note that the downgrade is 'best-effort' and the pass does not perform
any validation itself. The validation pass should be run after
downgrading to check that the resulting IR was downgraded successfully.

Motivation: This decouples the target specification version in
legalizations and backends. Legalizations from higher level frameworks
may be updated to support producing TOSA 1.1.draft variants of
operations, while backends can still consume TOSA 1.0 IR after running
the downgrade pass.

[llubi] Upstream existing floating-point intrinsics (llvm#196034)

This PR upstreams existing floating-point intrinsics in the out-of-tree
version of llubi. Including FP vector reduction, FP min/max operations,
etc. Some minor bugs from llvm#188453 are also fi

RKSimon and others added 30 commits May 21, 2026 09:35
…element counts (llvm#198989)

32-bit targets will attempt to lower vXi64 reductions prior to argtype legalization

Crash fix - we can improve the handling in a future commit
…vm#147297)

This patch detects non-consecutive load accesses (i.e. gather) with a
constant stride, such as:
```
  void stride(int* a, int *b, int n) {
    for (int i = 0; i < n; i++)
      a[i * 5] = b[i * 5] + i;
  }
```
and converts them into strided loads when legal and profitable, using
experimental_vp_strided_load.
The new VPlan transformation, convertToStridedAccesses, hoists the
functionality of RISCVGatherScatterLowering into the vectorizer,
enabling a more precise cost estimation during vectorization.
Additionally, by leveraging SCEV for stride analysis, the vectorizer can
potentially detect more opportunities to optimize gathers into strided
loads.

This enables more efficient code generation for targets like RISC-V that
support strided loads natively.
…duce.add codegen (llvm#198993)

The middle-end will detect vector.reduce.add patterns - update the
Codegen tests to use the intrinsics directly and add PhaseOrdering tests
to ensure vector.reduce.add intrinsics are created
…m#192954)

Currently, to save compile-time, LoopInterchange limits the number of
memory instructions and bails out early if it exceeds a threshold.
However, the dependence analysis phase in LoopInterchange has `O(N^2)`
complexity, where `N` is the number of memory instructions. This means
that even a small number of memory instructions can have a
non‑negligible impact on compile-time. In fact, I found such a case
(about +5% compile‑time regression), which the most instructions in the
loop are stores.
This patch replaces the heuristic which determines whether we should
continue the analysis or bail out to save compile time. The idea is that
if the ratio of the squared number of memory instructions to the total
number of instructions is small, LoopInterchange is allowed to continue
its analysis. The existing option `-loop-interchange-max-meminstr-count`
is removed.

Compile-time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=f344adcd2fb876d61f016fb92369a6530cc85a5b&to=6f7e5b0e4b35116728563913f2d98b7f9341409b&stat=instructions:u
We replaced `std::unordered_map` with LLVM's `DenseMap` for the DIE maps
in DIEBuilder. Since this map is accessed frequently during DWARF
rewriting, the improved data layout translates directly into reduced
cache misses. As shown in the benchmark results, this change yields
1.22x–1.27x speedup.

**Program from Bytedance**
| BatchSize | Baseline (s) | Optimized (s) | Speedup |
|---|---|---|---|
| 2 | 120.01 | 98.32 | 1.22x |
| 4 | 104.12 | 85.37 | 1.22x |
| 16 | 82.31 | 66.41 | 1.24x |
| 32 | 77.45 | 61.01 | 1.27x |
| 64 | 71.69 | 56.35 | 1.27x |
Code completion was a no-op inside `__builtin_offsetof`: a cursor at `
__builtin_offsetof(T, ^)` or `__builtin_offsetof(T, a.^)` fell through
to ordinary-name completion instead of suggesting fields. Route the
code_completion token to a new SemaCodeCompletion entry point that walks
the designator path so far, resolves the subobject's type, and
enumerates its members. Methods are filtered out, inherited fields are
included, indirect fields from anonymous unions and structs are peeled,
and `using Base::field` resolves through its UsingShadowDecl. A
code_completion token past a complete component (right after `]` or at
the end of the chain) is dropped rather than offering fields the user
can't paste without first typing `.`.

The offsetof and designated-initializer type walkers are folded into one
helper parameterized by a field-lookup callback, which incidentally
fixes reference-field and indirect-field traversal in
designated-initializer completion too.

Tests: lit cases in offsetof.cpp covering empty/dot/array/inheritance/
reference/anonymous/using-shadow/macro forms, extended desig-init.cpp
walker cases, and a clangd unit test exercising the IDE path.

Context: llvm#195126 and llvm#194407

This work was AI assisted but human-reviewed.

The followup for the added FIXME is at
llvm/llvm-project@main...schuay:llvm-project:refactor-offsetof-component-to-designation
…198570)

Coverity fixes:
* calling getIntrinsicSignature without checking return value (as is
done elsewhere 4 out of 5 times) in
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
* non-static class member MaxSGPRs, MaxVGPRs and MaxUnifiedVGPRs is not
initialized in this constructor nor in any functions that it calls in
llvm/lib/Target/AMDGPU/GCNRegPressure.h
Failed to build in CI because,
ScriptedPythonInterface::CreatePlugingObject is a template function and
the arguments are of incomplete types gotten from `lldb-forward.h`
(typedef of lldb_private::XXXX = XXXXSP).

Introduced in commit 1b4a578
This PR also move case statement for or `G_UREM `that is being
introduced by llvm#193455 So that
`G_[U|S][DIV|REM] ` being grouped together, just like in
`SelectionDAG.cpp`

Related: llvm#150515

---------

Signed-off-by: ZakyHermawan <zaky.hermawan9615@gmail.com>
…C) (llvm#193476)

Currently `getInstrOrderCost` doesn't check the base pointers of the
accesses, which can lead to undesirable profitability decisions. This
patch adds a test that demonstrates such a case.
…m#198250)

The assumption here is that AMDGPU builtins (typically suffixed with
`__builtin_amdgcn`) use the `__MEMORY_SCOPE_*` enumeration, and not the
`__HIP_MEMORY_SCOPE_*` enumeration (which is how it should be).

Assisted-By: Claude Opus 4.6
…ashTable traversal (llvm#196746)

The intra-block path used a DenseMap cleared at each block boundary, so
pairs from dominating blocks were never visible to descendants. Replace
it and the separate cross-block path with a unified recursive domtree
walk using a ScopedHashTable. Any dominating block's pair is now a
candidate, not just pairs within the same block.

Rename optimizeIntraBlock to optimizeBlock and remove dead code
…ead-cc-defs-in-fcmp.mir (NFC) (llvm#198974)

It's bit-identical to
llvm/test/CodeGen/AArch64/GlobalISel/postselectopt-dead-cc-defs.mir. The
-in-fcmp test is older (0f0fd38), but 84a6a05 later expanded
op coverage and left both files with the exact same contents.
…m#198790)

Currently TargetTransformInfoImpl returns an arbitrary cost of 4 for the
latency of loads in getInstructionCost. This means even if a target
correctly reports the latency for loads in getMemoryOpCost we still get
this arbitrary cost in getInstructionCost. It also means the latency
cost is inconsistent depending on whether you go through
getInstructionCost or getMemoryOpCost.

Solve this by moving the current arbitrary cost into getMemoryOpCost.
This has the side-effect of affecting the cost of masked loads if they
aren't handled by the target, as in BasicTTIImpl the cost for these is
calculated using getMemoryOpCost. This should mean the cost is more
accurate though, and likely won't have any effect as in any
transformation that could introduce masked loads (e.g. vectorization)
the current cost is probably high enough that it's already not worth
using.
The pointer size is not configurable; you get what you get
based on the triple. I don't know what the point of this was,
I don't even see the argument in the final cc1 invocation.
…ck (llvm#193477)

Currently `getInstrOrderCost` doesn't check the base pointers of the
accesses, which can lead to undesirable profitability decisions. This
patch makes the function take the base pointers into account. Fix the
test case added in llvm#193476.
Running replacceSymbolicStrides on VPlan0 means we only need to run it
once, and also enables simplifications earlier on. It is also needed to
be able to compute costs of the scalar VPlan0 early accurately, without
hacks manual folds like in the legacy cost model.

PR: llvm#196840
Summary:
When this is set you can only link against `LLVM`. The previous patch
did not respect this because I did not realize that internally in the
add_llvm_library that this was required.
…espace or dots (llvm#190610)

This patch extends -Wnonportable-include-path to detect and warn about
trailing whitespace and dots in #include directives. Such paths are
non-portable and can lead to build failures on different operating
systems.

The warning is triggered when an include filename ends with a space or a
dot, which is common when copy-pasting paths or due to typos.

Fixes llvm#96064
…lvm#198758)

`toHex()` only prints a single byte of the integer value, which can hide
the actual mismatch in AArch64 PAuth ABI core info diagnostics.
Creates explicit definitions for each latency/throughput/resource
combination and use the definitions in the instruction rule definitions.

Alhough this change touches most lines in the model, there is no
functional change - all test cases are not affected by this change.

This makes the style of the C1-Nano scheduling model be similar to that
used in the C1-Ultra / C1-Premium and is being done in preparation to
including the work to support SME instructions that is currently being
implemented on the C1-Ultra scheduling model
…87898)

This is part of llvm#146131 and llvm#182597

`func3` and `func4` are
[equivalent](https://alive2.llvm.org/ce/z/NNMTDa) but `func3` produces a
`sext` instead of `zext` when `b - a` is known non-negative.

[Proof of correctness](https://alive2.llvm.org/ce/z/ZthC9m)

```c++
#include <stdint.h>

uint64_t func3(int32_t a, int32_t b) {
    return (b < a ? 0 : (int32_t)(b - a));
}

uint64_t func4(int32_t a, int32_t b) {
    return (b < a ? 0 : (uint32_t)(b - a));
}
```
In x86-64, it would make shorter code by zero-extending instead of
sign-extending b - a.


My PR fixes this by handling all patterns below:
`(X < Y) ? 0 : (X - Y)`
`(X > Y) ? 0 : (Y - X)`
`(X < Y) ? (Y - X) : 0`
`(X > Y) ? (X - Y) : 0`
llvm#193478)

LoopInterchange has three types of heuristics for profitability
decisions: `cache`, `instorder`, and `vectorize`. Currently, the
profitability check invokes these heuristics in this order. The
heuristic corresponding to `cache` is based on LoopCacheAnalysis.
However, LoopCacheAnalysis applies several aggressive heuristics, which
can sometimes lead to undesirable decisions. In contrast, the heuristic
corresponding to `instorder` is relatively simpler than `cache`, but its
behavior is clear and it is likely sufficient for practical cases.
In light of the default enablement, I believe it is better to use a
simpler, easier‑to‑reason‑about, and more stable heuristic rather than
an aggressive but complex one. Therefore, this patch disables the
LoopCacheAnalysis‑based profitability check by default.
…lvm#198742)

Following llvm#197442, FortranEvaluate was implicitly included in
OpenMP-utils.h which should be avoided to ensure front-end data
structures in the Optimizers can stop and restart pure MLIR source
without any side-data structures.

To ensure this is done, EntryBlockArgs has been stripped back to only
track vars, objects are now tracked within ObjectEntryBlockArgs in
Lowering as this is a more appropriate place for this information, and
the existing symbol tracking in EntryBlockArgsEntry was only used here.
This ensures FortranEvaluate is not needed within the Optimizers, and
objects can still be maintained when lowering. This enables better
referencing in Reduction Clauses, where previously context was being
lost for expressions such as ArrayElements.

See more: llvm#197442

Assisted-by: Codex
As was done in llvm#198160, address the problem described in
https://docs.github.com/en/actions/concepts/security/script-injections
using the solution recommended by

https://docs.github.com/en/actions/reference/security/secure-use#use-an-intermediate-environment-variable.

Not all these inputs are untrusted, but I've applied it to all of them
just to be consistent.
hussam-alhassan and others added 27 commits May 23, 2026 00:43
… ScopedHashTable traversal (llvm#196746)" (llvm#199288)

This reverts commit 371f57c due to
failing tests
Normally the open parens happen right before a.out, but on arm64e the
load address is placed there instead. So instead of:

$0 = 0x0000d00d (a.out...)

we instead have:

$0 = 0xcafed00d (actual=0x0000d00d a.out ...)
…vm#199169)

This makes check-clang-format automatically builds
clang-format-check-format, which checks that the new clang-format
doesn't break the existing format of the clang-format source.
…dVPValuesInPlan tests (llvm#199275)

llvm#195891 exposed a
use-after-free in the tests: `BinaryOperator *AI` [*] is deleted prior
to VPlan's destructor, which expects all the operands to still be alive.
This patch fixes the test (suggested by a Florian in
llvm#199252 (review)),
by preemptively detaching AI from the VPlan.

[*] No AI was harmed or used during the creation of this patch.
…llvm#198652)

This PR extended xegpu.load_matrix and xegpu.store_matrix to support 1D
mem_desc for contiguous SLM access
  - Added unit tests for 1D load/store (valid ops and invalid cases)
- Added integration test verifying both 1D (<4096xbf16>) and 2D
(<64x128xbf16>), correctly lower through the full WG→SG→WI→XeVM pipeline

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…)) (llvm#199281)

`capture(none)` has very restrictive semantics and an easy footgun to
accidentally fire some UB into your code with. Most significantly it
does not allow any visible side-effects of whether a pointer was null or
not to escape the function. This means that the function cannot perform
different side effects depending on whether a pointer marked `noescape`
is null. Relax this to `captures(address)`, which allows information
about the numerical address to escape the function, but no provenance
(i.e. nothing that could be dereferenced) may escape.

As discussed in
https://discourse.llvm.org/t/rfc-updating-the-semantics-of-the-noescape-attribute/90326.
…ng getVectorElementCount (llvm#199286)

Fixes the assert reported here:

<llvm#198446 (comment)>

I believe this happens when the element type isn't a legal RVV element
type and so has been scalarised by type legalisation.

Adding this guard also matches the AArch64 implementation.

The test change is LLM generated.
…th fixed length vectors (llvm#199227)

Implementing IRTranslator support for fixed length vectors when the V
extension is used. This implementation works similar to SelecionDAGs. We
use insert and extract subvector OPs to get the fixed length vectors out
of the scalable length vectors.
…lysis (llvm#199208)

Add full CostKinds, to improve a lot of reduction matching in
vectorcombine/slp passes

These are based off SMIN/UMIN numbers, and a few SMAX/UMAX numbers don't
always match, but are typically within +/-1
)

It would be convenient to use `destroy_at`, `destroy`, and `destroy_n`
in tests for pre-C++17 uninitialized memory algorithms. So this PR add
backported versions of them for tests.
## Summary

  - Mark `__cxa_bad_cast` as `noreturn` in CIR, mirroring the existing
`__cxa_bad_typeid` handling. The attribute is now set on every `CallOp`
that targets it,
    covering both the CodeGen direct path (`emitCallToBadCast`) and the
    target-lowering path (`buildBadCastCall`).
  - Drop the now-fulfilled `MissingFeatures::opFuncNoReturn` entry and
    the corresponding TODO/assert at the lone caller in
    `LowerItaniumCXXABI.cpp`.
  - Update FileCheck expectations in `dynamic-cast.cpp`,
    `dynamic-cast-exact.cpp`, and `abi-lower-after-unreachable.cpp` to
    require the `{noreturn}` attribute on the lowered
    `cir.call @__cxa_bad_cast()`.
…FC) (llvm#199335)

Drop stale fixme and add test showing missed metadata preservation.
Update the auto-upgrade for llvm.nvvm.abs.i and llvm.nvvm.abs.ll to use
the generic llvm.abs intrinsic with is_int_min_poison=true. The previous
expansion used neg/icmp/select which gives defined INT_MIN -> INT_MIN
behavior, but loses the poison/undefined signed-min semantics needed for
NVPTX to select PTX abs.s32 and abs.s64 instrucitons when the source
operation permits signed-min overflow to be undefined. This is a
followup to llvm#183851 . Using llvm.abs(..., true) preserves intended IR
semantics and lowers through the new ABS_MIN_POISON. We also update the
tests and add NVPTX CodeGen coverage for the legacy nvvm abs intrinsics.
)

This PR makes types in `copy_move_types.h` usable in C++03/11 modes.
Because it is discovered that some types in `copy_move_types.h` are
useful for testing uninitialized memory algorithms in pre-C++20 modes.
…stant concatenation (llvm#199344)

Generalised the code added in llvm#198273 to make it easier to support other
combos in future patches.

Hopefully we can get load combining to work here soon.
… as this seems to give MSVC trouble. (llvm#199311)

Some Windows bots using MSVC 2019 and 2022 get assertion errors in the
clang-doc lit tests (see
[here](llvm#198066). This seems to
be due to MSVC having trouble with a correctly initializing structures
using std::initializer_list when embedded in a struct declared with
constexpr.

This workaround changes constexpr to const in a struct definition to
avoid this issue.
)

Resolves llvm#183047

This patch updates isKnownNeverZero to handle DemandedElts for UDIV and SDIV exact nodes.
`libcxx/test/libcxx/containers/views/mdspan/mdspan/assert.at.pass.cpp`
caused build bot failures for
- sanitizer-aarch64-linux-bootstrap-asan
- sanitizer-aarch64-linux-bootstrap-hwasan
- sanitizer-aarch64-linux-bootstrap-msan

It's not yet clear why current mechanisms don't work for these builds.
`TEST_HAS_NO_EXCEPTIONS` should have been working.

Also remove one unnecessary `static` and use `std::string_view(e.what())
== "mdspan"`.
- Emit !aix.copyright.comment from Clang for the pragma.
- Lower it in LLVM to a TU-local string + llvm.used + !implicit.ref.
- Add module-import and backend relocation tests.
Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>

tonykuttai commented May 23, 2026

Copy link
Copy Markdown
Owner Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment