[Clang][Modules] Fix -Wunused-variable (#196577)#7
Closed
tonykuttai wants to merge 1833 commits into
Closed
Conversation
…element counts (llvm#198989) 32-bit targets will attempt to lower vXi64 reductions prior to argtype legalization Crash fix - we can improve the handling in a future commit
…vm#147297) This patch detects non-consecutive load accesses (i.e. gather) with a constant stride, such as: ``` void stride(int* a, int *b, int n) { for (int i = 0; i < n; i++) a[i * 5] = b[i * 5] + i; } ``` and converts them into strided loads when legal and profitable, using experimental_vp_strided_load. The new VPlan transformation, convertToStridedAccesses, hoists the functionality of RISCVGatherScatterLowering into the vectorizer, enabling a more precise cost estimation during vectorization. Additionally, by leveraging SCEV for stride analysis, the vectorizer can potentially detect more opportunities to optimize gathers into strided loads. This enables more efficient code generation for targets like RISC-V that support strided loads natively.
…duce.add codegen (llvm#198993) The middle-end will detect vector.reduce.add patterns - update the Codegen tests to use the intrinsics directly and add PhaseOrdering tests to ensure vector.reduce.add intrinsics are created
…m#192954) Currently, to save compile-time, LoopInterchange limits the number of memory instructions and bails out early if it exceeds a threshold. However, the dependence analysis phase in LoopInterchange has `O(N^2)` complexity, where `N` is the number of memory instructions. This means that even a small number of memory instructions can have a non‑negligible impact on compile-time. In fact, I found such a case (about +5% compile‑time regression), which the most instructions in the loop are stores. This patch replaces the heuristic which determines whether we should continue the analysis or bail out to save compile time. The idea is that if the ratio of the squared number of memory instructions to the total number of instructions is small, LoopInterchange is allowed to continue its analysis. The existing option `-loop-interchange-max-meminstr-count` is removed. Compile-time improvement: https://llvm-compile-time-tracker.com/compare.php?from=f344adcd2fb876d61f016fb92369a6530cc85a5b&to=6f7e5b0e4b35116728563913f2d98b7f9341409b&stat=instructions:u
We replaced `std::unordered_map` with LLVM's `DenseMap` for the DIE maps in DIEBuilder. Since this map is accessed frequently during DWARF rewriting, the improved data layout translates directly into reduced cache misses. As shown in the benchmark results, this change yields 1.22x–1.27x speedup. **Program from Bytedance** | BatchSize | Baseline (s) | Optimized (s) | Speedup | |---|---|---|---| | 2 | 120.01 | 98.32 | 1.22x | | 4 | 104.12 | 85.37 | 1.22x | | 16 | 82.31 | 66.41 | 1.24x | | 32 | 77.45 | 61.01 | 1.27x | | 64 | 71.69 | 56.35 | 1.27x |
…-eq comparisons as well (llvm#198243)
Code completion was a no-op inside `__builtin_offsetof`: a cursor at ` __builtin_offsetof(T, ^)` or `__builtin_offsetof(T, a.^)` fell through to ordinary-name completion instead of suggesting fields. Route the code_completion token to a new SemaCodeCompletion entry point that walks the designator path so far, resolves the subobject's type, and enumerates its members. Methods are filtered out, inherited fields are included, indirect fields from anonymous unions and structs are peeled, and `using Base::field` resolves through its UsingShadowDecl. A code_completion token past a complete component (right after `]` or at the end of the chain) is dropped rather than offering fields the user can't paste without first typing `.`. The offsetof and designated-initializer type walkers are folded into one helper parameterized by a field-lookup callback, which incidentally fixes reference-field and indirect-field traversal in designated-initializer completion too. Tests: lit cases in offsetof.cpp covering empty/dot/array/inheritance/ reference/anonymous/using-shadow/macro forms, extended desig-init.cpp walker cases, and a clangd unit test exercising the IDE path. Context: llvm#195126 and llvm#194407 This work was AI assisted but human-reviewed. The followup for the added FIXME is at llvm/llvm-project@main...schuay:llvm-project:refactor-offsetof-component-to-designation
…198570) Coverity fixes: * calling getIntrinsicSignature without checking return value (as is done elsewhere 4 out of 5 times) in llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp * non-static class member MaxSGPRs, MaxVGPRs and MaxUnifiedVGPRs is not initialized in this constructor nor in any functions that it calls in llvm/lib/Target/AMDGPU/GCNRegPressure.h
Failed to build in CI because, ScriptedPythonInterface::CreatePlugingObject is a template function and the arguments are of incomplete types gotten from `lldb-forward.h` (typedef of lldb_private::XXXX = XXXXSP). Introduced in commit 1b4a578
This PR also move case statement for or `G_UREM `that is being introduced by llvm#193455 So that `G_[U|S][DIV|REM] ` being grouped together, just like in `SelectionDAG.cpp` Related: llvm#150515 --------- Signed-off-by: ZakyHermawan <zaky.hermawan9615@gmail.com>
…C) (llvm#193476) Currently `getInstrOrderCost` doesn't check the base pointers of the accesses, which can lead to undesirable profitability decisions. This patch adds a test that demonstrates such a case.
…m#198250) The assumption here is that AMDGPU builtins (typically suffixed with `__builtin_amdgcn`) use the `__MEMORY_SCOPE_*` enumeration, and not the `__HIP_MEMORY_SCOPE_*` enumeration (which is how it should be). Assisted-By: Claude Opus 4.6
…lvm#199000) Fix copy-paste issue, extend tests for cmpxchg atomic
…ashTable traversal (llvm#196746) The intra-block path used a DenseMap cleared at each block boundary, so pairs from dominating blocks were never visible to descendants. Replace it and the separate cross-block path with a unified recursive domtree walk using a ScopedHashTable. Any dominating block's pair is now a candidate, not just pairs within the same block. Rename optimizeIntraBlock to optimizeBlock and remove dead code
…ead-cc-defs-in-fcmp.mir (NFC) (llvm#198974) It's bit-identical to llvm/test/CodeGen/AArch64/GlobalISel/postselectopt-dead-cc-defs.mir. The -in-fcmp test is older (0f0fd38), but 84a6a05 later expanded op coverage and left both files with the exact same contents.
…m#198790) Currently TargetTransformInfoImpl returns an arbitrary cost of 4 for the latency of loads in getInstructionCost. This means even if a target correctly reports the latency for loads in getMemoryOpCost we still get this arbitrary cost in getInstructionCost. It also means the latency cost is inconsistent depending on whether you go through getInstructionCost or getMemoryOpCost. Solve this by moving the current arbitrary cost into getMemoryOpCost. This has the side-effect of affecting the cost of masked loads if they aren't handled by the target, as in BasicTTIImpl the cost for these is calculated using getMemoryOpCost. This should mean the cost is more accurate though, and likely won't have any effect as in any transformation that could introduce masked loads (e.g. vectorization) the current cost is probably high enough that it's already not worth using.
The pointer size is not configurable; you get what you get based on the triple. I don't know what the point of this was, I don't even see the argument in the final cc1 invocation.
…ck (llvm#193477) Currently `getInstrOrderCost` doesn't check the base pointers of the accesses, which can lead to undesirable profitability decisions. This patch makes the function take the base pointers into account. Fix the test case added in llvm#193476.
Running replacceSymbolicStrides on VPlan0 means we only need to run it once, and also enables simplifications earlier on. It is also needed to be able to compute costs of the scalar VPlan0 early accurately, without hacks manual folds like in the legacy cost model. PR: llvm#196840
Summary: When this is set you can only link against `LLVM`. The previous patch did not respect this because I did not realize that internally in the add_llvm_library that this was required.
…espace or dots (llvm#190610) This patch extends -Wnonportable-include-path to detect and warn about trailing whitespace and dots in #include directives. Such paths are non-portable and can lead to build failures on different operating systems. The warning is triggered when an include filename ends with a space or a dot, which is common when copy-pasting paths or due to typos. Fixes llvm#96064
…lvm#198758) `toHex()` only prints a single byte of the integer value, which can hide the actual mismatch in AArch64 PAuth ABI core info diagnostics.
Creates explicit definitions for each latency/throughput/resource combination and use the definitions in the instruction rule definitions. Alhough this change touches most lines in the model, there is no functional change - all test cases are not affected by this change. This makes the style of the C1-Nano scheduling model be similar to that used in the C1-Ultra / C1-Premium and is being done in preparation to including the work to support SME instructions that is currently being implemented on the C1-Ultra scheduling model
…87898) This is part of llvm#146131 and llvm#182597 `func3` and `func4` are [equivalent](https://alive2.llvm.org/ce/z/NNMTDa) but `func3` produces a `sext` instead of `zext` when `b - a` is known non-negative. [Proof of correctness](https://alive2.llvm.org/ce/z/ZthC9m) ```c++ #include <stdint.h> uint64_t func3(int32_t a, int32_t b) { return (b < a ? 0 : (int32_t)(b - a)); } uint64_t func4(int32_t a, int32_t b) { return (b < a ? 0 : (uint32_t)(b - a)); } ``` In x86-64, it would make shorter code by zero-extending instead of sign-extending b - a. My PR fixes this by handling all patterns below: `(X < Y) ? 0 : (X - Y)` `(X > Y) ? 0 : (Y - X)` `(X < Y) ? (Y - X) : 0` `(X > Y) ? (X - Y) : 0`
llvm#193478) LoopInterchange has three types of heuristics for profitability decisions: `cache`, `instorder`, and `vectorize`. Currently, the profitability check invokes these heuristics in this order. The heuristic corresponding to `cache` is based on LoopCacheAnalysis. However, LoopCacheAnalysis applies several aggressive heuristics, which can sometimes lead to undesirable decisions. In contrast, the heuristic corresponding to `instorder` is relatively simpler than `cache`, but its behavior is clear and it is likely sufficient for practical cases. In light of the default enablement, I believe it is better to use a simpler, easier‑to‑reason‑about, and more stable heuristic rather than an aggressive but complex one. Therefore, this patch disables the LoopCacheAnalysis‑based profitability check by default.
Add missing argument in the PR Greeter invocation. Follow-up for llvm#197140. Issue reported here: * https://discourse.llvm.org/t/ci-failure-prgreeter-on-my-first-pr Also, as per * https://docs.github.com/en/actions/concepts/security/script-injections, and * https://docs.github.com/en/actions/reference/security/secure-use#use-an-intermediate-environment-variable, make sure that that greeter relies on ENV variables for input arguments.
…lvm#198742) Following llvm#197442, FortranEvaluate was implicitly included in OpenMP-utils.h which should be avoided to ensure front-end data structures in the Optimizers can stop and restart pure MLIR source without any side-data structures. To ensure this is done, EntryBlockArgs has been stripped back to only track vars, objects are now tracked within ObjectEntryBlockArgs in Lowering as this is a more appropriate place for this information, and the existing symbol tracking in EntryBlockArgsEntry was only used here. This ensures FortranEvaluate is not needed within the Optimizers, and objects can still be maintained when lowering. This enables better referencing in Reduction Clauses, where previously context was being lost for expressions such as ArrayElements. See more: llvm#197442 Assisted-by: Codex
As was done in llvm#198160, address the problem described in https://docs.github.com/en/actions/concepts/security/script-injections using the solution recommended by https://docs.github.com/en/actions/reference/security/secure-use#use-an-intermediate-environment-variable. Not all these inputs are untrusted, but I've applied it to all of them just to be consistent.
… ScopedHashTable traversal (llvm#196746)" (llvm#199288) This reverts commit 371f57c due to failing tests
Normally the open parens happen right before a.out, but on arm64e the load address is placed there instead. So instead of: $0 = 0x0000d00d (a.out...) we instead have: $0 = 0xcafed00d (actual=0x0000d00d a.out ...)
…vm#199169) This makes check-clang-format automatically builds clang-format-check-format, which checks that the new clang-format doesn't break the existing format of the clang-format source.
…dVPValuesInPlan tests (llvm#199275) llvm#195891 exposed a use-after-free in the tests: `BinaryOperator *AI` [*] is deleted prior to VPlan's destructor, which expects all the operands to still be alive. This patch fixes the test (suggested by a Florian in llvm#199252 (review)), by preemptively detaching AI from the VPlan. [*] No AI was harmed or used during the creation of this patch.
…llvm#198652) This PR extended xegpu.load_matrix and xegpu.store_matrix to support 1D mem_desc for contiguous SLM access - Added unit tests for 1D load/store (valid ops and invalid cases) - Added integration test verifying both 1D (<4096xbf16>) and 2D (<64x128xbf16>), correctly lower through the full WG→SG→WI→XeVM pipeline --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…)) (llvm#199281) `capture(none)` has very restrictive semantics and an easy footgun to accidentally fire some UB into your code with. Most significantly it does not allow any visible side-effects of whether a pointer was null or not to escape the function. This means that the function cannot perform different side effects depending on whether a pointer marked `noescape` is null. Relax this to `captures(address)`, which allows information about the numerical address to escape the function, but no provenance (i.e. nothing that could be dereferenced) may escape. As discussed in https://discourse.llvm.org/t/rfc-updating-the-semantics-of-the-noescape-attribute/90326.
…ng getVectorElementCount (llvm#199286) Fixes the assert reported here: <llvm#198446 (comment)> I believe this happens when the element type isn't a legal RVV element type and so has been scalarised by type legalisation. Adding this guard also matches the AArch64 implementation. The test change is LLM generated.
…th fixed length vectors (llvm#199227) Implementing IRTranslator support for fixed length vectors when the V extension is used. This implementation works similar to SelecionDAGs. We use insert and extract subvector OPs to get the fixed length vectors out of the scalable length vectors.
…lysis (llvm#199208) Add full CostKinds, to improve a lot of reduction matching in vectorcombine/slp passes These are based off SMIN/UMIN numbers, and a few SMAX/UMAX numbers don't always match, but are typically within +/-1
## Summary
- Mark `__cxa_bad_cast` as `noreturn` in CIR, mirroring the existing
`__cxa_bad_typeid` handling. The attribute is now set on every `CallOp`
that targets it,
covering both the CodeGen direct path (`emitCallToBadCast`) and the
target-lowering path (`buildBadCastCall`).
- Drop the now-fulfilled `MissingFeatures::opFuncNoReturn` entry and
the corresponding TODO/assert at the lone caller in
`LowerItaniumCXXABI.cpp`.
- Update FileCheck expectations in `dynamic-cast.cpp`,
`dynamic-cast-exact.cpp`, and `abi-lower-after-unreachable.cpp` to
require the `{noreturn}` attribute on the lowered
`cir.call @__cxa_bad_cast()`.
…FC) (llvm#199335) Drop stale fixme and add test showing missed metadata preservation.
Update the auto-upgrade for llvm.nvvm.abs.i and llvm.nvvm.abs.ll to use the generic llvm.abs intrinsic with is_int_min_poison=true. The previous expansion used neg/icmp/select which gives defined INT_MIN -> INT_MIN behavior, but loses the poison/undefined signed-min semantics needed for NVPTX to select PTX abs.s32 and abs.s64 instrucitons when the source operation permits signed-min overflow to be undefined. This is a followup to llvm#183851 . Using llvm.abs(..., true) preserves intended IR semantics and lowers through the new ABS_MIN_POISON. We also update the tests and add NVPTX CodeGen coverage for the legacy nvvm abs intrinsics.
…stant concatenation (llvm#199344) Generalised the code added in llvm#198273 to make it easier to support other combos in future patches. Hopefully we can get load combining to work here soon.
… as this seems to give MSVC trouble. (llvm#199311) Some Windows bots using MSVC 2019 and 2022 get assertion errors in the clang-doc lit tests (see [here](llvm#198066). This seems to be due to MSVC having trouble with a correctly initializing structures using std::initializer_list when embedded in a struct declared with constexpr. This workaround changes constexpr to const in a struct definition to avoid this issue.
) Resolves llvm#183047 This patch updates isKnownNeverZero to handle DemandedElts for UDIV and SDIV exact nodes.
`libcxx/test/libcxx/containers/views/mdspan/mdspan/assert.at.pass.cpp` caused build bot failures for - sanitizer-aarch64-linux-bootstrap-asan - sanitizer-aarch64-linux-bootstrap-hwasan - sanitizer-aarch64-linux-bootstrap-msan It's not yet clear why current mechanisms don't work for these builds. `TEST_HAS_NO_EXCEPTIONS` should have been working. Also remove one unnecessary `static` and use `std::string_view(e.what()) == "mdspan"`.
- Emit !aix.copyright.comment from Clang for the pragma. - Lower it in LLVM to a TU-local string + llvm.used + !implicit.ref. - Add module-import and backend relocation tests.
Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
…imeComment() to the end of the class.
Owner
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

[Clang][Modules] Fix -Wunused-variable (llvm#196577)
Mark some variables [[maybe_unused]] and inline others that do not have
side effects to avoid -Wunused-variable in non-assert builds.
[AArch64][GlobalISel] Legalize F64 to BF16 fptruncates (llvm#196077)
This two-step expansion of bf16 fptrunc steps needs to be careful to
avoid double-rounding error. Under AArch64 we can apparently convert to
a fcvtxn that performs round-to-odd, followed by a standard fp truncate
to bf16 to make sure the rounding from there is done correctly. This
reuses the existing lowering added for vector operations.
[SLP][NFC]Add a test with the revectorization of the struct-returning intrinsics
Reviewers:
Pull Request: llvm#196581
[AMDGPU] Add missing CMake link component (llvm#196579)
The issue was triggered by llvm#196547.
[SLP] Vectorize struct-returning intrinsics
Allow SLP to combine across lanes calls that return a literal struct
(llvm.sincos, llvm.*.with.overflow, llvm.frexp, ...) into a single
call returning a struct of vectors, by widening {T, T, ...} to
{, ...} via VectorTypeUtils and emitting extractvalue +
extractelement for external uses.
Reviewers: hiraditya, bababuck, RKSimon
Pull Request: llvm#195521
[PowerPC][NFC]Refactor EmitInstrWithCustomInserter (llvm#196114)
Currently PPCTargetLowering::EmitInstrWithCustomInserter() uses a large
if/else-if structure. Update to use switch and
move ATOMIC_CMP_SWAP and SELECT code to helper functions for better
readability and maintenance.
[AMDGPU] Pre-commit unit test for RP tracking
reset/advanceinconsistencies fix (llvm#196098)This adds a new AMDGPU unit test file for testing the behavior of
GCNRPTrackerand its related classes. The two test showcase confusingreturn value and behavioral semantics for variants of the advance and
reset functions, which will be clarified in a follow up commit.
Revert "[SLP] Vectorize struct-returning intrinsics"
This reverts commit b0c6df7 to fix
buildbots https://lab.llvm.org/buildbot/#/builders/52/builds/17118
Reviewers:
Pull Request: llvm#196591
[OFFLOAD][L0] Fix incorrect values in the Level Zero cached header (llvm#196587)
The current ZE_STRUCTURE_TYPE_DEVICE_IP_VERSION_EXT and
ZE_STRUCTURE_TYPE_RELAXED_ALLOCATION_LIMITS_EXP_DESC values are
incorrect as seen here:
*
https://github.com/oneapi-src/level-zero/blob/0f246f6edf90d56604f00f83b41d783dc6a9394e/include/ze_api.h#L318
*
https://github.com/oneapi-src/level-zero/blob/0f246f6edf90d56604f00f83b41d783dc6a9394e/include/ze_api.h#L324
clang: Consolidate -aux-triple handling (llvm#196551)
All of the offload languages were essentially doing the
same thing, with overcomplicated conditions conditional on
the language.
[flang][OpenACC] support collapse on unstructured acc.loop (llvm#196174)
PR llvm#164992 added unstructured-loop support to OpenACC lowering (no
bounds on acc.loop, IVs privatized, body emitted as explicit cf), but it
didn't covered the
collapse(N)case. Compilingasserted in MLIR's runRegionDCE: "Assertion `mightHaveTerminator()'
failed".
Root cause: visitLoopControl unconditionally marked every inner DO of a
collapsed nest via markDoConstructAsCollapsed. genFIR(DoConstruct) then
read that marking and skipped the inner DO's loop machinery on the
assumption that the parent acc.loop iterates and supplies the IV via a
block argument. That assumption holds for the structured case, but not
for the unstructured case added in llvm#164992. Skipping it left the
PFT-pre-allocated scaffold blocks (pre-header, header, exit) without
terminators.
Fix: add a markInnerCollapsed parameter (default true) to
visitLoopControl, and pass false from privatizeInductionVariables (the
unstructured case of buildACCLoopOp).
Assisted-by: AI
[flang][OpenMP] Fix component-level initializer in declare reduction (llvm#195751)
When a declare reduction initializer uses a component assignment such as
initializer(omp_priv%member = 0), the lowering would store the scalarRHS value (i32) directly to the whole derived-type reference, causing a
FIR verification error:
'fir.store' op store value type must match memory reference type.The root cause is that
MakeEvaluateExprextracts only the RHSexpression
from the
AssignmentStmt, discarding the LHS component information. Thelowering callback then returns this scalar value which gets stored to
the
wrong type.
Fix this by mirroring the approach already used for combiner
expressions:
pass the parser-level
OmpStylizedInstancetoprocessInitializersothe
callback can access the typed assignment and lower the full assignment
(both LHS and RHS), correctly handling component designators, function
calls on the RHS, and user-defined assignment.
Fixes llvm#184927 (with-initializer part; the without-initializer case
remains unsupported).
Assisted-by: Claude Opus 4.6.
Co-authored-by: Matt P. Dziubinski matt-p.dziubinski@hpe.com
[compiler-rt][profile][NFC] Introduce INSTR_PROF_INSTRUMENT_GPU_FUNC macro (llvm#196538)
Add a macro INSTR_PROF_INSTRUMENT_GPU_FUNC for the name of the GPU
profiling function __llvm_profile_instrument_gpu (added in llvm#187136),
following the same pattern as INSTR_PROF_VALUE_PROF_MEMOP_FUNC. Use the
macro in both the declaration in InstrProfiling.h and the definition in
InstrProfilingPlatformGPU.c.
This prepares the upcoming HIP/AMDGPU offload PGO patch (llvm#177665) to use
the same macro when calling this function.
[AMDGPU] Add subtarget features for MAD NC and 64-bit MIN/MAX instructions (llvm#196326)
[InstCombine][NFC] Replace buildAssumeFromKnowledge with CreateAlignmentAssumption (llvm#196254)
[DWARFLinker] Emit .debug_names entries for DW_TAG_template_alias (llvm#196440)
The tag was missing from the accelerator-records saver's switch, so
template alias DIEs were skipped and --verify-dwarf=output rejected the
result.
[lldb] Fix TestPtrauthBRKc47xX16Invalid.py (llvm#196408)
LLDB correctly detects the pointer authentication failure.
[lldb] Remove
__iter/len__fromSBTypeEnumMember(llvm#196610)SBTypeEnumMemberdoesn't have aGetSizeandGetTypeEnumMemberAtIndex, so having__iter__and__len__doesn'tmake sense. These are on
SBTypeEnumMemberList. From the docstrings, itlooks like the extensions were copied from said type.
[CIR] Implement CoawaitExpr for ComplexType (llvm#194027)
Implement CoawaitExpr support for ComplexType
Issue llvm#192331
[cir] fix IR dump comments from llvm#195198 (llvm#196605)
[VPlan] Unify inner and outer loop paths (NFCI). (llvm#192868)
Move combine the logic of tryToBuildVPlanWithVPRecipes and
tryToBuildVPlan, as well as planInVPlanNativePath and plan.
This unifies the code paths to construct plans for both inner and outer
loop vectorization, and removes some duplication. It also ensures we run
almost the same VPlan-transformations in both modes. Currently a few
code paths need to be guarded with a check if we are dealing with an
inner and outer loop.
PR: llvm#192868
[gn] port 2e2d90b (llvm#196618)
[gn build] Port 3fe311f (llvm#196619)
[gn build] Port c507e20 (llvm#196620)
[gn build] Port e6efa1a (llvm#196621)
[gn build] Port ebb9a79 (llvm#196622)
[gn] port 7e74c78 (llvm#196624)
[clang][deps] Use
ModuleDepCollectorfor Make output (llvm#182063)The dependency scanner works significantly differently depending on what
kind of output it's asked to produce. The Make output format has been
using the regular Clang dependency collection mechanism since it was
first implemented. This means the implementation works very differently
to the rest of the scanner and isn't able to turn implicit module
command lines into Makefiles using explicit modules.
This PR unifies the two implementations, using
ModuleDepCollectorevenfor Make output. Emitting explicit module builds into Makefiles will
come in a later PR.
[libc++] Remove _LIBCPP_HIDE_FROM_ABI from <__utility/pair.h> (llvm#196508)
This is a follow-up to llvm#193045. This only drops
_LIBCPP_HIDE_FROM_ABIin a small part of the code base to make sure everything works as
expected. Once this has been in trunk for a while and there aren't any
problems, there will be larger follow-up patches to remove
_LIBCPP_HIDE_FROM_ABIthroughout the code base.[mlir][core] Restore dropped printIR behavior. (llvm#196628)
Restore checking for module scope which is dropped in llvm#195198
[VPlan] Fix cyclic phi type inference in early outer loop plans. (llvm#196634)
For phis check if any of the operands are VPIRValues or we already have
cached types. If so, return them.
This fixes a verification stack overflow in the VPlan outer loop path
after llvm#192868.
[DWARFLinker] Deduplicate .debug_frame CIEs across LinkContexts (llvm#195393)
Each LinkContext held its own EmittedCIEs map, so linking the same
object twice (or two objects with identical CIEs) produced one CIE per
LinkContext instead of one shared CIE. Hoist the registry to linker
scope and split emission into three phases so contexts can emit their
frames concurrently while still sharing one deduplicated CIE pool:
Scan (parallel, during link). scanFrameData() records the unique CIEs
referenced by retained FDEs, in first-reference order, into
FrameScanResult::CIEs. scanAndUnloadInput() chains the scan in front of
the existing input-unload so the DWARFContext can be released before the
post-link emit pass.
Merge (serial, after link completes). registerCIEs() walks each
context's scanned CIEs in ObjectContexts order and try_emplaces them
into the linker-wide CIERegistry. The first LinkContext to reference a
CIE becomes its owner and reserves a local offset in its own
.debug_frame section; later contexts only learn the owner's section and
offset.
Emit (parallel). emitDebugFrame() writes each context's owned CIEs
followed by its FDEs into its own SectionDescriptor. FDE CIE_pointers
are recorded as DebugOffsetPatches against the owner's section; the
existing patch resolver rebinds them to OwnerStartOffset + LocalOffset
when global offsets are assigned. Each task writes only to its own
section, so no locking is needed.
Output is fully deterministic: ownership assignment, per-context CIE
order, FDE order within a section, and section concatenation order all
depend only on the input, not on thread scheduling. A context's CIEs may
now appear after FDEs (from other contexts) that reference them — DWARF
allows this, and cross-context FDE -> CIE pointers resolve correctly via
the patch mechanism.
AMDGPU/GlobalISel: RegBankLegalize rules for cluster_load_b32/b64/b128 (llvm#196186)
AMDGPU/GlobalISel: RegBankLegalize rules for cvt fp8 e5m3 intrinsics (llvm#196369)
[libc] Skip targets with unavailable __ONLY flags (llvm#196637)
When SKIP_FLAG_EXPANSION strips a flag that has the __ONLY modifier,
remove_duplicated_flags drops the flag from the list. This leaves
expand_flags_for_target with an empty flag list, causing it to create a
plain (non-flag) target. The __ONLY semantics, "only build this target
with the flag active", are silently violated.
On x86-64 CI runners without FMA, this results in cosf_float_test and
sinf_float_test being built and linked without FMA. The sincosf
algorithm was tuned assuming fused multiply-add precision, so the
unfused x*y+z fallback exceeds the 3.5 ULP tolerance (57 ULP for cosf,
12 ULP for sinf).
Added an early return in add_target_with_flags: if any flag with the
__ONLY modifier would be skipped, the target is not generated.
Assisted-by: Automated tooling, human reviewed.
[DWARFLinker] Don't duplicate classes with in-class static decls (llvm#196442)
An in-class static declaration was forced to PlainDwarf placement and
cascaded that up to its enclosing class. If the class was already in the
type table via the out-of-line definition's specification, it ended up
with Both placement and cloneDIE emitted two copies. Keep in-class
static declarations in the type table so they stay with their enclosing
type.
[libc] Disable -march=native in CI to fix sccache poisoning (llvm#196560)
-march=native is incompatible with shared build caches because sccache
treats it as a literal string. Object files compiled on one CPU model
get silently served to runners with a different CPU, causing SIGILL
crashes in the opt_host memory tests.
Made LIBC_COMPILE_OPTIONS_NATIVE a CMake cache variable so CI can
override it. Both overlay and fullbuild workflows now pass
-DLIBC_COMPILE_OPTIONS_NATIVE="" to disable -march=native. Local
developer builds are unaffected and still default to -march=native.
Reverted the per-CPU cache key approach from llvm#196477 in favour of this
fix, which addresses the root cause.
Bumped sccache key versions (v2) in both workflows to invalidate the
poisoned caches.
Assisted-by: Automated tooling, human reviewed.
[lldb] Add lldb.summary and lldb.synthetic decorators (llvm#195351)
Adds two new decorators,
@lldb.summaryand@lldb.synthetic,analogous to the existing
@lldb.commanddecorator.These decorators result in
type summary addandtype synthetic addcommands being run.
An additional motivation: these decorators will make it straightforward
to invoke the Python-to-LLDB formatter bytecode compiler
(
formatter_bytecode.Compiler), which currently requires command-lineflags to know how to register formatters. With these decorators, the
registration metadata is associated directly with the implementing
function or class.
See the docstrings and formatters.py test fixture for usage examples.
Assisted-by: claude
[X86] combine-add.ll - regenerate to show missing add asm comments (llvm#196647)
[lld][WebAssembly] Remove the experimental warning for PIC/dynamic linking (llvm#196566)
The current dynamic linking support has been used for several years not
both in emscripten and in wasi-sdk and is documented
https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md.
We did/do have have plans to develop another version of the dynamic
linking ABI that doesn't use a global symbol namespace, and that can
still happen, but the current API is clearly production worthy
regardless of future plans.
This change removes the linker warning and the corresponding
--experimental-picflag.If we do want to still make breaking changes to the dylink format we can
rename the
dylink.1section (which already contains a version number).This change is leads the way for enabling shared libraries by default in
emscripten.
[flang][cuda] Widen stream argument to i64 in stream intrinsic lowering (llvm#196650)
genCUDASetDefaultStreamandgenCUDAStreamDestroybuild their runtimecall with an
i64stream parameter but pass the actual argumentstraight through, so a smaller-kind actual (e.g. the literal
0incudaforSetDefaultStream(0)) produces an ill-typedfir.call:Insert a
fir.converttoi64before the call, matching whatgenCUDASetDefaultStreamArrayalready does.[mlir][AMDGPU] Add, unify verification of memref index counts (llvm#196657)
This PR verifies that, on operations that have
%memref[%idx0, %idx1, ...]arguments, the number of indices matchesthe rank of the memref being passed in.
While we're here, fixes capitalization for certain verification error
messages.
Assisted-by: Codex 5.5 (handled much of the implementation)
[lldb] Handle SIGINT via the MainLoop signal thread (on POSIX) (llvm#195959)
The driver's async SIGINT handler called
SBDebugger::DispatchInputInterrupt directly. That is not
async-signal-safe and can lead to a crash.
Register SIGINT with the existing signal-thread MainLoop instead so
DispatchInputInterrupt runs in normal thread context. The Windows path
is unchanged and keeps the legacy async handler.
While DispatchInputInterrupt runs, the callback temporarily installs
SIG_DFL so a second Ctrl-C still hard-terminates the process, preserving
the escape hatch users rely on when the debugger is unresponsive.
Moving SIGINT off the main thread means a Ctrl-C no longer interrupts
blocking syscalls there (e.g. a Python REPL waiting on input or
sleeping), so Python never observes the queued interrupt and
KeyboardInterrupt is not raised. To restore that behavior, after
dispatching the interrupt the callback re-raises SIGINT on the main
thread via pthread_kill; the resulting EINTR lets Python pick up the
pending interrupt. A skip flag suppresses the re-entry that this
self-send produces. Because the callback only ever runs on the signal
thread, the flag and the captured main-thread id live in the lambda's
captures and need no synchronization.
rdar://158218595
[BOLT][NFCI] Consolidate DataReader::setEntryCounts (llvm#196411)
FuncBranchData/BinaryFunction exec/external entry counts are set
in multiple places in
DataReader:parseandappendFrom,preprocessProfileandmatchProfileData.Consolidate to
setEntryCountscalled fromreadProfile.Drop explicit counters, compute them from
FBD::EntryData.Test Plan: NFCI
[DirectX] Not print invalid root signature definitions. (llvm#196444)
This patch adds a check during root signature printing pass, that makes
sure we have valid root signature before starting printing. This is
required after llvm#194858 changed
reportError to not stop after emitting the first error.
Fix: llvm#196430
[clang][deps] Move
ScanningOutputFormatout of the library (llvm#196631)Basing behavior of the dependency scanner on the final output format is
a leaky abstraction. Instead, we should aim to introduce proper feature
flags.
[RISCV] Use the nhs.lea.h/w/d instead of nhs.lea.h/w/d.ze with Sh1AddPat. (llvm#196660)
The srliw already took care of zeroing the upper bits. Using the non-.ze
form is consistent with the Zba version of this pattern.
Revert "[BOLT] Fix EH data encoding checks in relocateEHFrameSection (llvm#195691)" (llvm#196672)
This reverts commit 7ab26d7.
There is test failure in bolt-tests::exceptions-split-strip.test.
[mlir][tensor] Enhance pattern to fold extract_slice(insert_slice) (llvm#195045)
Extend the DropRedundantRankExpansionOnExtractSliceOfInsertSlice pattern
to support cases where the expanded dimensions are a subset of the
dropped dimensions, rather than requiring them to be exactly equal.
For example:
can be folded into:
[CodeGen] Use unique_ptr for FunctionInfo to prevent memory leaks (llvm#196603)
Raw pointer return from
FunctionInfo::createcaused leaks in callerslike
computeABIInfoUsingLib, breaking BPF tests on ASan bots.Using
std::unique_ptrenforces automatic cleanup.Fixes leak from llvm#194460.
Buildbot: https://lab.llvm.org/buildbot/#/builders/52/builds/17090
Assisted-by: Gemini
[CIR][RISCV] Support zksh builtin codegen (llvm#196463)
[lldb] Fix CommandObjects that don't set a return status (llvm#196588)
Several CommandObject subclasses had DoExecute paths that returned
without ever calling SetStatus on the CommandReturnObject. The status
was silently left at its initial eReturnStatusStarted value, which made
Succeeded() report false for what were really successful commands and
left CommandReturnObject in an undefined state.
[AMDGPU] Support atomic load and store for vector float types (v2f16, v2i16, v4i16, v4f16, v2f32) (llvm#192904)
Add support for atomic load and store on <2 x half>, <4 x half>, and
<2 x float> vector types in the AMDGPU backend.
These types are promoted to equivalently sized integer types before
instruction selection:
<2 x half> -> i32
<4 x half> -> i64
<2 x i16> -> i32
<4 x i16> -> i64
<2 x float> -> i64
Revert "[lldb] Handle SIGINT via the MainLoop signal thread (on POSIX)" (llvm#196684)
Reverts llvm#195959 because it caused
TestIOHandlerCompletion.pyto fail in CI (GreenDragon).[Clang] Do not eat SFINAE diagnostics for explicit template arguments (llvm#139066)
Instead of merely suggesting the template arguments are invalid, we now
provide an explanation of why the explicit template argument is invalid.
[Utils] Fix duplicate DomTree updates in SplitIndirectBrCriticalEdges (llvm#196475)
SplitIndirectBrCriticalEdges generates DomTree Insert/Delete pairs for
each predecessor in OtherPreds. However, OtherPreds can contain
duplicate entries when a conditional branch has both targets pointing to
the same block (e.g.,
br i1 %c, label %X, label %X). This producesduplicate DomTree updates for the same edge, triggering the assertion
std::abs(NumInsertions) <= 1 && "Unbalanced operations!"inLegalizeUpdates.
Fix by tracking which source blocks have already had DomTree updates
emitted, and skipping duplicates.
[CIR][CUDA][NVPTX] Set ptx_kernel calling convention on CUDA kernels (llvm#195382)
Related: llvm#179278,
llvm#175871
More target attributes like: NoInline on kernels, CUDALaunchBoundsAttr,
CUDAGridConstantAttr param attrs, nvvm.annotations for surface/texture
VarDecls to be deferred for later patches.
[DAGTypeLegalizer] Add missing BR_CC handler for soft-promoted half operands (llvm#196214)
SoftPromoteHalfOperandhad no case forISD::BR_CC, causing a crashwhen a half-typed
fcmpresult fed directly into a conditional branch.All other comparison-related nodes (
SETCC, SELECT_CC) were alreadyhandled. Add
SoftPromoteHalfOp_BR_CCfollowing the same pattern asSoftPromoteHalfOp_SELECT_CC.Fixes llvm#195562
Co-authored-by: Tony Varghese tony.varghese@ibm.com
[RISCV][GISel] Add test coverage for the srliw+shXadd patterns. NFC (llvm#196676)
GISel isn't canonicalizing the shift pair to an AND the same way
SelectionDAG does so the patterns weren't firing. Add more directed
tests that use an And explicitly.
[clang][AMDGPU] Reject malformed target IDs with empty components (llvm#196140)
Fixes llvm#196078
An extra colon in
-mcpu(e.g.gfx900::xnack+) produced an emptyfeature component and triggered an assertion in
StringRef::back().Return
std::nulloptfor malformed target IDs instead.[AArch64][GlobalISel] Enable BF16 legalization for fadd and friends. (llvm#196081)
This enabled bf16 promotion for the following operations in GISel,
promoting them to f32 and truncating the result back:
G_FADD, G_FSUB, G_FMUL, G_FDIV, G_FMA, G_FSQRT, G_FMAXNUM, G_FMINNUM,
G_FMAXIMUM, G_FMINIMUM, G_FCEIL, G_FFLOOR, G_FRINT, G_FNEARBYINT,
G_INTRINSIC_TRUNC, G_INTRINSIC_ROUND, G_INTRINSIC_ROUNDEVEN
[AArch64][NFC] Remove unused TRI member from class (llvm#184363)
I’ve removed the TRI member and its initialization, leaving only MRI and
TII as the stored pointers.
Co-authored-by: Benjamin Maxwell benjamin.maxwell@arm.com
[ObjectYAML][NFC] Extract BBAddrMap YAML types into shared namespace (llvm#196019)
Move BBAddrMapEntry and PGOAnalysisMapEntry out of namespace ELFYAML
into a new format-agnostic namespace BBAddrMapYAML so that COFF
YAML support can reuse the same schema and MappingTraits.
[clang] Update
cxx_dr_status.html(llvm#196702)Updates from 2026-05-08 CWG telecon.
[clang-tidy] Avoid
use-nodiscardfalse positives for class templates (llvm#196661)Do not suggest adding
[[nodiscard]]to functions returning a classtemplate specialization whose primary template is already marked
[[nodiscard]].Class template specializations do not carry the
[[nodiscard]]attribute on their own declarations, so
modernize-use-nodiscardpreviously missed this case and emitted redundant diagnostics for return
types such as:
Fixes llvm#163425.
[CI] Ignore TidyFastChecks.inc for formatter CI. NFC. (llvm#196682)
TidyFastChecks.incis generated and its contents should not be checkedby clang-format CI workflow. Add a local
.clang-format-ignoreentry sothe PR formatting check does not report diffs for this file.
Related run:
llvm#194516 (comment)
[clang-tidy] Migrate explicit-constructor check from google to misc and add relative aliases (llvm#194807)
Fixes llvm#126032
[AArch64][GlobalISel] Promote BF16 G_FCMP (llvm#196093)
This adds bf16 legalization for floating point compares.
[RISCV][NFC] Rename
Zvvmminstruction file toZvvm(llvm#196692)Renames
RISCVInstrInfoZvvmm.tdtoRISCVInstrInfoZvvm.tdsoZvvmmand
Zvvfmmshare the same IME instruction file according to the spec.And all future instructions from the
Zvvm familywill be placed heretoo.
This PR is required for reviewing llvm#196486 in order to make GitHub show
the diff correcrly.
[BPF] Support Stack Arguments (llvm#189060)
Currently, bpf program and kfunc only support 5 register parameters. As
bpf community and use cases keep expanding, there are some need to
extend 5 register parameters by allocating additional parameters on
stack. There are two main use cases here:
situation, people may want to have more than 5 parameters. One of
example is for sched_ext.
they do not need to carefully limit the number of parameters for their
programs.
The following is the high-level design:
to avoid mixing stacks due to R10.
supported.
macro __BPF_FEATURE_STACK_ARGUMENT is defined and users can check
whether stack argument is supported or not.
The below is a simple asm code example about stack parameters:
The code patterns in the above try to follow x86_64/arm64 calling
conventions. That is, the first argument is in lower location than
the second argument, etc. The r11 based load should retrieve the value
directly from the caller stack. The r11 based store should push
the value directly on the specificed stack location.
Internally in bpf backend, pseudo insns are generated for
load_stack_arg and store_stack_arg. The BPFMIPeephole pass
changes pseudo insns into proper real bpf insns like the above.
[VectorCombine] foldShuffleChainsToReduce - add support for partial vector reductions (llvm#195119)
Extend foldShuffleChainsToReduce to recognize partial reduction patterns where only a subvector of the full vector is being reduced.
For example, a <16 x i16> vector where the shuffle chain only reduces the lower 8 elements can now be folded into:
shufflevector (extract lower <8 x i16>) + vector.reduce.smax
The detection works by noticing when the bottom-up walk through the
shuffle/op chain ends before consuming the full vector. The number of
levels visited determines the subvector size (2^levels), and an
extract_subvector + scalar reduction replaces the original chain when
profitable.
Fixes llvm#194617
[clang-tidy] Correct
std::has_one_bittostd::has_single_bitinmodernize-use-std-bit(llvm#196721)There isn't
std::has_one_bitin standard library, the function checksif a number is an integral power of 2 is
std::has_single_bit.https://en.cppreference.com/cpp/header/bit
[SelectionDAG] Don't convert sextload to zextload through a multi-use freeze (llvm#196700)
Resolves llvm#196590.
The patch llvm#189317 to teach
DAGCombiner to look through freeze incorrectly introduce a miscompile of
sext -> zext. This resolves resolves the miscompile.
[libc++] LWG4324:
unique_ptr<void>::operator*is not SFINAE-friendly (llvm#190919)Co-authored-by: Hristo Hristov zingam@outlook.com
[clang-format] Add BreakFunctionDeclarationParameters option. (llvm#196567)
Adds an option the break function declaration parameters, always putting
them on the next line after the function opening parentheses.
This is an equivalent of
BreakFunctionDefinitionParameters, but forfunction declarations.
Co-authored-by: Lukas Jirkovsky lukas.jirkovsky@aveco.com
[mlir][SPIR-V] Convert math.fpowi to spirv.CL.pown (llvm#196701)
[VPlan] Lift isUsedByLoadStoreAddr into vputils, operate on VPValue(NFC) (llvm#196415)
Extract the helper previously scoped to VPReplicateRecipe::computeCost
and make it available from VPlanUtils so other transforms can query
whether a VPValue is used as part of another load or store's address.
Also relax the input type from VPUser * to VPValue *: the worklist now
tracks VPValues directly, and traversal is gated on the user being a
VPSingleDefRecipe before walking its own users. This is NFC for the
existing caller.
clang: Fix using -march=amdgcn in some r600 run lines (llvm#196745)
clang/AMDGPU: Use all_equal instead of building a temporary set (llvm#196742)
[mlir][SPIR-V] Support spirv.selection_control attribute on scf.if (llvm#196510)
[SLP][NFC]Add a test with scalable vector type in struct-returning intrinsic, NFC
Reviewers:
Pull Request: llvm#196747
[SLP][NFC]Add a test with struct-returning intrinsics in different basic blocks, NFC
Reviewers:
Pull Request: llvm#196748
[X86] Hoist ReservedIdentifiers to MCAsmInfo and shrink setup cost. NFC (llvm#196699)
PR llvm#186570 added a per-MCAsmInfo
StringSet<>populated with X86register names plus Intel-syntax keywords, which caused a minor
instructions:u increase.
Avoid heap allocation and hoist
ReservedIdentifiersto MCAsmInfo forother targets.
For the register-name source, prefer
X86IntelInstPrinter::getRegisterNameoverMCRegisterInfo::getName.The former is a TableGen-emitted accessor into a
static const char AsmStrs[]pool inX86GenAsmWriter1.inc, populated from the lowercaseasm-name argument of each
def XX : X86Reg<"xx", ...>;inX86RegisterInfo.td.[MCParser] .incbin: Don't retain the buffer, don't require NUL termination (llvm#196696)
processIncbinFile uses SourceMgr::AddIncludeFile, which
RequiresNullTerminator=trueand disablemmapwhen the filesize is a multiple of the page size,
Buffers.Switch to OpenIncludeFile so the buffer is freed when processIncbinFile
returns, and pass RequiresNullTerminator=false. The buffer is consumed
only by emitBytes; the lexer never scans it, so it does not need a
trailing '\0' (different from llvm#154972). Without that requirement,
MemoryBuffer mmaps the file and RSS tracks only the touched pages.
Stress test (1000 .incbin "blob.bin", 0, 16 against a 1 MiB blob):
Fix llvm#62339
Revert "Avoid assert in substqualifier (llvm#182707)" (llvm#196755)
This reverts commit e2def10.
[DAG] canCreateUndefOrPoison - out of range vector insert/extract element indices only generate poison (llvm#196720)
Matches ValueTracking / GISel implementations - although testing options are limited until DAG has actual uses of UndefPoisonKind::UndefOnly
[clang][NFC] Actually add the testcase for llvm#195416 (llvm#196759)
[Docs] Match body/toctree ordering on Reference and UserGuides (llvm#195542)
The
toctreesection is hidden but used for previous/next breadcrumbs.This was suggested in
llvm#184440 (comment)
[clangd] Add InsertReplaceEdit for code completion (llvm#187623)
Handle new insertReplaceSupport capability (defined in LSP 3.16). Add
the new option to the protocol layer and pass it around to the code
completion logic. Update CompletionItem::textEdit to become the union
type as per the LSP specification.
Add a new helper function to the Lexer public API to find the end of an
identifier with full context lexing, to avoid duplicating the logic. Use
the helper both in the Sema flow and in the comment completion flow. Use
a simpler ASCII-only scan in no-Sema mode.
Add LIT tests to verify auto-triggered completions, mid-word
replacement, Unicode, and snippets. Add unit tests to verify
insert/replace ranges with and without Sema, including comments and the
feature-off case.
Update the release notes to document the new capability.
Fixes clangd/clangd#2190
Co-authored-by: timon-ul timon.ulrich@advantest.com
Revert "[clang-format][NFC] Format with the new formatter" (llvm#196771)
Reverts llvm#196523
[ELF] Fix --reproduce non-determinism with parallel input loading (llvm#196773)
After llvm#191690, LoadJob::Archive runs in parallel and getArchiveMembers()
calls ctx.tar->append() from the parallel body. TarWriter::append is
unsynchronized. Member order in the tar is also non-deterministic
because parallelFor scheduling determines append order.
Buffer per-job tar entries during the parallel pass and flush them in
the
existing serial post-pass, mirroring the thinBufs / files pattern.
[Clang] Make matrix type trivially copyable (llvm#193634)
In order to simplify matrix casting and follow the existing pattern HLSL
is doing, the matrix needs to be trivially copyable.
related to: llvm#184471
Co-authored-by: Joao Saffran jderezende@microsoft.com
[ADT] Decouple xxhash.h from ADT. NFC (llvm#196774)
Move xxHash64, xxh3_64bits, and xxh3_128bits ArrayRef/StringRef
overloads from llvm/Support/xxhash.h to inline overloads in
llvm/ADT/ArrayRef.h and llvm/ADT/StringRef.h, so xxhash.h has no ADT
dependencies.
This is prerequisite for using xxh3 as the combine_bytes backend in
llvm/ADT/Hashing.h (llvm#194567), which would otherwise reintroduce a header
dependency cycle.
FoldingSet.h and StableHashing.h adjust to call the new
pointer-and-length entry point.
[Hashing] Replace CityHash mixers with xxh3 (llvm#194567)
Replace the CityHash-style mixer in hash_combine and (transitively)
hash_value(std::basic_string), hash_value(StringRef), and therefore
DenseMap<StringRef, X> lookups, with a flatten-and-call into
xxh3_64bits, a modern hash superior to CityHash.
hash_value(int) / hash_value(ptr) keep the existing Murmur-style
hash_16_bytes mixer; those are the dominant DenseMap key paths and a
fully-inline 16-byte mix beats inlining xxh3's larger 0..16-byte short
path.
To break dependency cycle: xxHash64, xxh3_64bits, and xxh3_128bits
ArrayRef/StringRef overloads move from llvm/Support/xxhash.h to inline
overloads in llvm/ADT/ArrayRef.h and llvm/ADT/StringRef.h, so xxhash.h
has no ADT dependencies.
A variant that inlined xxh3's 0..16-byte fast path at every
combine_bytes call site (vs. always calling out-of-line xxh3_64bits)
showed no measurable compile-time improvement on the tracker, so
combine_bytes is a one-liner over the out-of-line entry point.
llvm-compile-time-tracker.com (CTMark, instructions:u)
DenseMap-of-pointer paths (dominant at -O3) are untouched, so higher-
optimization configs see smaller wins as expected. opt's .text shrinks
~92 KB. Subsumes the StringRef-only carve-out proposed in llvm#191115.
Notes on properties not introduced by this patch:
Endianness: hash_combine over native integers was already not
cross-host
stable. memcpy of a native integer into the buffer is host-encoded;
fetch32 normalized the read but not the underlying bytes, so on LE vs
BE the value fed to the mixer already differed. xxh3 inherits the same
property: same byte stream, different mixer.
Process seed: combine_bytes XORs get_execution_seed into the result,
which cancels under hash_combine(x) ^ hash_combine(y). The pre-patch
short/state paths fed the seed through hash_16_bytes / shift_mix
non-linearly, so this is a regression in seed effectiveness under that
pattern. Default seed is constant, so this only matters under
LLVM_ENABLE_ABI_BREAKING_CHECKS. Follow-up: add a seeded xxh3 entry
point in libSupport.
Aided by Claude opus 4.7
[MC] Remove deprecated lookupTarget overload (llvm#196778)
This has been deprecated for a while and was slated for removal after
the branching of LLVM 22. Remove it since I'm on on the Google integrate
rotation this week and can take care of any failures on our end.
[libc] Add barebones dl_iterate_phdr implementation (llvm#194196)
Add a basic dl_iterate_phdr implementation so that we can get libunwind
building. This implementation is bare and not fully compliant with the
man page for fully static binaries (which are all that we support
currently with the lack of a dynamic linker) due to the lack of TLS
info, but that can be added at a future date if it is needed, as it is
not needed by libunwind.
Add some very basic smoke tests.
[ADT] Remove xxHash64 ArrayRef/StringRef overloads. NFC (llvm#196781)
xxHash64 is a legacy, pre-XXH3 hash whose only non-test caller in the
monorepo is llvm::getKCFITypeID. llvm#196774 accidentally exposed the API.
[Clang] Transform lambda's constraints when instantiating parameter mapping (llvm#195995)
This way we can remove a few workarounds of lambda expressions where
outer template arguments of concepts have to be preserved through
ImplicitConceptSpecializationDecls.
Fixes llvm#193944
[cmake] use target names instead of legacy variables (llvm#185463)
Use the name of the imported
targets
when testing the libraries during cmake configuration. This removes the
need to also set
CMAKE_REQUIRED_INCLUDESandCMAKE_REQUIRED_DEFINITIONSand reflects more modern CMake usage wheretargets are preferred over variables.
This is already the case when checking libcurl in the same file.
[clang-tidy] Remove hicpp module [1/4] (llvm#194516)
This is part one of removing the
hicpp-*checks.RFC:
https://discourse.llvm.org/t/rfc-regarding-the-current-status-of-hicpp-checks/89883
Part of llvm#183462
[DAG][GISel] Rename CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF/CTTZ_ELTS_ZERO_UNDEF -> CTTZ_ZERO_POISON/CTLZ_ZERO_POISON/CTTZ_ELTS_ZERO_POISON (llvm#196732)
DAG/GISel are ambiguous about whether zero-input results in
UNDEF/POISON, unlike the rest of LLVM which makes it clear its POISON.
I've tried to clean this up once and for all by ensuring
SelectionDAG::canCreateUndefOrPoison does a includesPoison(Kind) check,
renaming the opcodes (including the VP variants) and updating as many
comments/tests as possible (I may still have missed some...).
[SPIR-V] Fix inttoptr type deduction with ptr.annotation (llvm#189219)
Opaque pointer inttoptr was recording ptr as a pointee type, so
OpConvertUToPtr was emitted as pointer-to-pointer and then bitcasted
back. Please see an example below.
LLVM IR:
SPIR-V (before the change):
Skip assigning pointee type for inttoptr when the destination is
untyped, fallback later recovers the correct single pointer type.
[clang-tidy] Fix FP in readability-container-size-empty with compairing to unrelated type (llvm#190535)
Fixes llvm#162287.
[lldb][Windows] Invalidate cached register values on thread stop (llvm#192430)
Invalidate cached values in register context data structures on every
thread stop.
NativeRegisterContextRegisterInfo::InvalidateAllRegisters performs no
operation by default. Subclasses may override it to clear cached values
within their register context data structures whenever a thread stops.
This change intends to set up the necessary infrastructure to support
caching of the thread context in NativeRegisterContextWindows_arm64,
which will improve read performance. Currently, the thread context is
retrieved for every read or write operation.
Revert "[VectorCombine] foldShuffleChainsToReduce - add support for partial vector reductions" (llvm#196796)
Reverts llvm#195119 while reported assertions are investigated.
[LifetimeSafety] Warn on incorrectly placed
[[clang::lifetimebound]]attributes (llvm#196144)Adds new warning that is emitted when parameter is marked as
[[clang::lifetimebound]]but is not returned in one way or another(tracked via
OriginEscapeFact).Closes llvm#182935
[libc] Fix -Wshadow warnings in freetrie.h (llvm#196529)
clang-format: ensure ternary operands are aligned (llvm#196697)
Set ParentState::AlignedTo for ternary operands.
[gn] Make ClangDependencyScanningTests depend on Testing/Support (llvm#196809)
Needed after ebb9a79.
[flang][OpenMP] Consistent names for non-executable directives, NFC (llvm#196803)
Change
OpenMPGroupprivate -> OmpGroupprivateDirective
OpenMPThreadprivate -> OmpThreadprivateDirective
OpenMPRequiresConstruct -> OmpRequiresDirective
OpenMPUtilityConstruct -> OmpUtilityDirective
[AArch64] Improve post-inc stores of SIMD/FP values (llvm#151372)
Add patterns to match post-increment truncating stores from lane 0 of
wide integer vectors (v4i32/v2i64) to narrower types (i8/i16/i32). This
avoids transferring the value through a GPR when storing.
Also remove the pre-legaliztion early-exit in
combineStoreValueFPToIntas it prevented the optimization from applying in some cases.
[clang-tidy] Rename hicpp-multiway-paths-covered to bugprone-unhandled-code-paths (llvm#191625)
Part of the work in llvm#183462.
Closes llvm#183464.
Splitting the check into two more focused checks was considered during
discussion, but since clang-tidy does not support one-to-many aliases, a
single name covering both behaviors was chosen instead that is more
clear than
multiway-paths-covered.Co-authored-by: Zeyi Xu mitchell.xu2@gmail.com
[IRBuilder] Split CreateAssumption to one with bundle and one with condition [NFC] (llvm#196795)
as it is not possible to combine bundles and conditions from
llvm#160460 reflect that in
CreateAssumption
[clang-tidy] Reland "An option for conditional skipping overloaded functions in modernize-use-string-view" (llvm#196387)
[CIR][AArch64] Lower NEON vuzp intrinsics (llvm#195591)
Summary
part of : llvm#185382
lower
vuzpintrinsics in:https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#unzip-elements
this is a follow up : llvm#195527
Lower
NEON::BI__builtin_neon_vuzp_vandNEON::BI__builtin_neon_vuzpq_vin CIRGenBuiltinAArch64.cpp by portingby porting the existing incubator
logic(clangir/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp) : two
bitcasts on the input vectors,two rounds of cir.vec.shuffle generating
the deinterleave (even/odd) shuffle patterns with indices 2*i+vi, each
stored via ptr_stride on the sret base pointer.
[llvm][RISCV] Optimize fneg for fixed vectors (llvm#194555)
vfneg is not available on zvfhmin or zvfbfmin, it's expected to expand
to integer operations instead of unrolling to scalar operations.
General expandFNEG already handles that in most of cases except for
fixed vector types that are not promotable, we need to find a better
heuristic to gate this.
[llvm][RISCV] Optimize fabs for fixed vectors (llvm#194554)
vfabs is not available on zvfhmin or zvfbfmin, it's expected to expand
to integer operations instead of unrolling to scalar operations.
General expandFABS already handles that in most of cases except for
fixed vector types that are not promotable, we need to find a better
heuristic to gate this.
[llvm][RISCV] Optimize fcopysign for fixed vectors (llvm#193802)
vfsgnj is not available on zvfhmin or zvfbfmin, it's expected to expand
to integer operations instead of unrolling to scalar operations.
General expandFCOPYSIGN already handles that in most of cases except for
fixed vector types that are not promotable, we need to find a better
heuristic to gate this.
[InstCombine] Fold constant byte stores to integer stores (llvm#196740)
Byte constants are equivalent to integer constants when stored to
memory. Replacing them in store instructions reduces IR differences and
enables existing optimizations over integer constants.
[libcxx] Switch to check-runtimes for generic-llvm-libc (llvm#196780)
Move KCFI type ID hash helpers out of LLVMSupport (llvm#196784)
PR llvm#167254 inappropriately introduced llvm/Support/Hash.{h,cpp} for the
KCFI helpers. The name is misleading — it has nothing to do with the
generic hashing facility in llvm/ADT/Hashing.h — and KCFI is a
CodeGen/IR feature that does not belong in the foundational Support
layer.
Move the files to llvm/lib/Transforms/Utils/KCFIHash.cpp, alongside
setKCFIType, which is the only existing KCFI helper in TransformUtils.
Also relocate the deprecated pre-xxh3 xxHash64 implementation into
KCFIHash.cpp, the sole user. clang/test/CodeGen/kcfi-generalize.c and
kcfi-normalize.c are end-to-end regression tests for the xxHash64 output
[Coverage] Fix assertion failure when a -isystem header invokes a user macro (llvm#195427)
Commit 702a2b6 ("[Coverage] Rework !SystemHeadersCoverage")
replaced the system-header skip in gatherFileIDs with this assertion,
which trips as
SM.isInSystemHeader(SM.getSpellingLoc(Loc))is false.This patch adds back the pre-llvm#91446 condition but folds it with
the macro-token remap
ifstatement.Fixes llvm#179316/llvm#195422.
Clang Opus 4.7 identified clang/lib/Parse/ParseExpr.cpp, created a
minimal reproduce with cvise, and wrote the initial version of this
CodeGen patch. (An earlier session papered over the bug by patching
llvm-cov instead, which I abandoned).
[clang-tidy][NFC] Move
ClassifiedTokento cpp file (llvm#196820)ClassifiedTokenis used in only the implementation ofUseTrailingReturnTypeCheck. Move it into the unnamed namespace of thecpp file instead of it being in the header.
[Bazel] Fixes 2f4c387 (llvm#196822)
This fixes 2f4c387.
Co-authored-by: Google Bazel Bot google-bazel-bot@google.com
[libc] Move a few -Wshadow warnings in __support/File (llvm#196810)
No behavior change.
[libc][math] Fix -Wshadow warnings in cos.h (llvm#196342)
cos() does
using namespace range_reduction_double_internal;andrange_reduction_double_internal after 51e9430 contains
So the local using statements for DoubleDouble and Float128 shadowed
these. Just remove the local using statements.
No behavior change.
[AArch64] New pass for code layout optimizations. (llvm#184434)
This pass is intended to optimize code layout prior to AsmPrinter. The
initial version handles two known cases:
I. FCMP-FCSEL
II. CMP/CMN-CSEL, 32-bit only
Using existing directives, the pass induces function-alignment (of
64-bytes by default) when a pair is detected, and possibly induces
block-alignment of up to 4-bytes on top of that if the pair would
straddle cache-lines.
Beyond performance improvement, this pass reduces noise due to code
layout thus stabilizes measured performance over-time. For example,
knock-out effects on a "sensitive function" won't be triggered by
codegen changes outside it.
Enabled by default on processors with the new
FeatureAlignCmpCSelPairssubtarget feature (gated per sub-case by
FeatureFuseCmpCSel/FeatureFuseFCmpFCSel); each case can also be forced through the-aarch64-code-layout-optenumerated bit-maskCo-authored-by: Jon Roelofs jroelofs@gmail.com
rdar://171283264
[mlir][spirv] Remove stale NV CooperativeMatrix attributes (llvm#196639)
Since the support for NV CooperativeMatrix has been removed a while
back, those attributes can be safely removed.
[mlir][spirv] Enforce execution scope for group operations in ODS (llvm#196644)
This adds a new class
SPIRV_ExecutionScopeAttrIsshared between groupand non-uniform group operations.
Assisted-by: Codex
[LV] Add tests for load/store scalarization and ptrcasts (NFC) (llvm#196839)
Add missing test coverage for range of pointer casts and load/store
scalarization.
[LV] Add missing cost tests for various unary and binary ops (NFC) (llvm#196841)
Add missing direct includes for bit.h/SwapByteOrder.h. NFC (llvm#196843)
These translation units use llvm::endianness, llvm::byteswap,
llvm::has_single_bit, or sys::IsLittleEndianHost without explicitly
including the header that declares them. They currently compile only
because llvm/ADT/Hashing.h transitively pulls in
llvm/Support/SwapByteOrder.h (which includes llvm/ADT/bit.h).
[libc] Fix a copyright comment typo (llvm#196846)
No behavior change.
[clang-tidy] comment braced and parenthesized init arguments (llvm#180408)
Handle arguments like
{},Type{}andType()inbugprone-argument-commentandadd coverage for
initializer_listand designated initializers.Fixes: llvm#171842
[ADT] Avoid map storage for small SmallMapVector (llvm#196473)
SmallMapVector previously used SmallDenseMap for its index, which still
initializes and maintains map storage even when the number of entries is
tiny.
Teach MapVector to support a vector-only small mode. While the entry
count stays
within the configured small size, operations use the underlying vector
directly.
When the size grows past the threshold, the map index is built and
subsequent
operations use the regular MapVector path.
This mirrors the small-size strategy used by SmallSetVector.
[clangd][Parser][Sema] Fix TemplateIdAnnotation UAF with template-id declarator and lambda default argument (llvm#196788)
I think this is another case of template annotations lifetime bug,
similar to the one fixed by
llvm#89494.
Closes llvm#196725.
[clang] Add arm64_neon.h wrapper on windows (llvm#196014)
Add an MSVC-compatible <arm64_neon.h> resource header that forwards to
Clang's generated <arm_neon.h>. This lets ARM64 Windows code using the
MSVC header name lower NEON intrinsics through Clang builtins instead of
eaving external neon_* calls such as neon_ld1m4_q32
Fixes llvm#195683
[clang][test] Add AArch64 requirement to arm64_neon.h test (llvm#196867)
Only run test when the AArch64 target is built
[LV][NFC] Reshape pointer_iv_non_uniform_0 test to use distinct loads (llvm#196494)
The followup patch
is folding some of the idempotent binary ops This test has
sub x - xoperation which is affected by the followup patch. This patch is making
the test immune to the fold.
[InstCombine][NFC] Change the order of checks in SliceUpIllegalIntegerPHI for faster compile time. (llvm#183726)
SliceUpIllegalIntegerPHI searches for PHIs that have illegal type and
are only used by trunc or trunc(lshr) operations. It bails out if
encounters invoke or EH pad instructions.
It first checks whether it encounters invoke or EH pad, which is time
consuming as it checks every instruction. Then it checks whether it is
used by trunc or trunc(lshr). The former check is generally loose, while
the latter one is stricter. Switch the order of the checks will speed up
compilation.
Signed-off-by: XinlongZHANG-Bob zhangxinlong.bob@bytedance.com
[NFC] Fix C++23 build failures caused by incomplete types (llvm#196814)
[AArch64][CostModel] Model sve costs for ctpop (llvm#192428)
Targets supporting sve prefer sve for ctpop with fixed length vectors.
Update cost model to reflect the same.
[MLIR][NVVM][NFC] Restructure NVVM dialect (llvm#195811)
Moves the declarations of the NVVM dialect and some widely used enums
(
FPRoundingModeAttrandSaturationModeAttr) to separate files to makethem easier to maintain and also use in the NVGPU dialect.
[clang][bytecode] Allow const mutation in all variable initializers (llvm#195794)
So the attached test case works even though it's just an
InitListExpr.[libc][stdlib] Add setenv (llvm#163018)
Add the POSIX setenv() function, with EnvironmentManager::set()
handling environment array management and ownership tracking.
Registered for x86_64, aarch64, and riscv architectures. Integration
tests cover overwrite/no-overwrite semantics, empty/invalid names,
empty values, and repeated replacement.
Assisted-by: Automated tooling, human reviewed.
Co-authored-by: Michael Jones michaelrj@google.com
[GlobalISel] Delay match table builder initialization (llvm#196506)
MachineIRBuilder::setInstrAndDebugLoc is expensive, delay until needed.
CTMark -0.10% geomean improvement on aarch64-O0-g.
https://llvm-compile-time-tracker.com/compare.php?from=71fef6d5a306d1adf8bf7d30d2fe9e286380fecf&to=8a87845dfde9de9d141b42d2fce92fcf3be02276&stat=instructions%3Au
Assisted-by: codex
[GlobalISel] Avoid repeated target info queries in combiners (llvm#196530)
tryCombineAllImpl queries target info for every instruction. Cache
TargetInstrInfo/TargetRegisterInfo/RegisterBankInfo in CombinerHelper
and pass to executeMatchTable instead.
This avoids repeated virtual calls on the combiner executeMatchTable
path.
CTMark -0.08% geomean improvement on aarch64-O0-g.
https://llvm-compile-time-tracker.com/compare.php?from=71fef6d5a306d1adf8bf7d30d2fe9e286380fecf&to=13bc49510657450402c066098e3a4b7d1af9d0e6&stat=instructions%3Au
Assisted-by: codex
[DebugInfo] Pack DILocation hash inputs (llvm#196556)
Pack DILocation fields before hashing. Now that column is 16-bits
Line/Column/ImplicitCode fit in one 64-bit value (32 + 16 + 1 = 49 bits)
and AtomGroup and AtomRank also fit cleanly in one 64-bit value (61 + 3
= 64 bits).
Fewer hash_combine inputs on the hot DILocation path is a small
compile-time improvement.
CTMark geomean:
https://llvm-compile-time-tracker.com/compare.php?from=71fef6d5a306d1adf8bf7d30d2fe9e286380fecf&to=1d80b5f5aa98561d2ba09adc3f20c3eacd24cb88&stat=instructions%3Au
Assisted-by: codex
[LoopFusion] Remove SCEV-based dependence analysis path (llvm#195864)
Loop Fusion has used Dependence Analysis (DA) as the default dependence
check since the option default was flipped in llvm#187309. The SCEV-based
strategy and the combined "all" mode were retained only for fallback and
experimentation, with a comment noting that the SCEV code would be
removed in a follow-up.
This patch removes the SCEV-based dependence path and the now-unused
selector machinery.
Fixes llvm#194821.
Assisted by Cursor.
[clang-tidy][NFC] Fix tests on 32bit ARM (llvm#196873)
Should fix
llvm#191386 (comment).
[libc] Fix partial multi-byte write detection in File (llvm#196402)
File::write_unlocked(const wchar_t*, size_t) checked 'write_res.value <
1' after writing a converted UTF-8 sequence. For multi-byte characters,
a short platform write (e.g. 2 of 3 bytes for a 3-byte character) passed
this check and was counted as a successful write. The output stream
would then contain an incomplete UTF-8 sequence with no error reported
to the caller.
Changed the check to 'write_res.value < char_size' and set the error
indicator on the stream when it triggers.
Added a regression test using a mock File subclass that limits
platform_write to 2 bytes per call, simulating short writes on pipes and
sockets.
Assisted-by: Automated tooling, human reviewed.
Co-authored-by: Michael Jones michaelrj@google.com
[AA] No synchronization effects for never-escaping identified local (llvm#193939)
Fences and other synchronizing operations (such as atomic accesses
stronger than monotonic) are modelled as reading and writing all memory,
in order to enforce their implied ordering constraints.
Currently, this happens even for identified function locals that do not
escape. This patch excludes those objects.
Notably, we can not reason based on captures-before here, because the
synchronizing operation still has an effect even if the object only
escapes later.
The hope here is that with this restriction in place, it may be viable
to respect potential synchronization inside non-nosync function calls.
[Bazel] Fixes ce6605a (llvm#196880)
This fixes ce6605a.
Co-authored-by: Google Bazel Bot google-bazel-bot@google.com
[clang][NFC] Remove alignment checks from test/CodeGen/c-strings.c (llvm#196501)
and re-enable it on more targets.
I don't think this test was intended to check for alignment. Those
expectations were added as part of FileCheck-izing the test in
e29dadb and we've been working around
them or xfailing the test since.
[CIR][AMDGPU] Add lowering for amdgcn ds swizzle builtin. (llvm#196011)
Upstreaming clangIR PR: llvm/clangir#2052
This PR adds support for lowering of _builtin_amdgcn_ds_swizzle* amdgpu
builtin to clangIR.
[lldb] Fix TestDelayedBreakpoint on ARM Thumb (llvm#196888)
The original address used for the "fake breakpoint" is not valid in
Thumb mode. To be safe, change it to have 0's in the LSBs.
[clang][bytecode] Visit
tryEvaluateObjectSizeexpr as lvalue (llvm#196010)Just like we do with the first parameter of a regular
__builtin_object_sizecall.This still doesn't fix the bigger bos test cases since e.g.
is still broken because we don't have special handling for the
&t[1].t[1]handling here and we can't usually access a one-past-endpointer.
Use auto for DenseMap/SmallDenseMap iterator variables. NFC (llvm#196883)
To match the prevailing style.
[AArch64] Use dup (lane mov) over ext for high-half extract (llvm#195010)
This changes the instruction we use to extract the high half of a vector
register from a
ext v0, v1, v1, 8to adup d0, v1.d[1]. This isapparently slightly quicker on certain cpus and is generally a simpler
instruction. This matches the instruction that gisel produced.
Some of the old patterns for extract_subvector with index of 1 seem
incorrect but were never used as we do not reach selection with such
instructions. They have been repurposed to emit the new DUPi64
instructions.
Revert "[AA] No synchronization effects for never-escaping identified local" (llvm#196890)
Reverts llvm#193939
Caused buildbot failure.
Update GitHub PR Greeter (llvm#194307)
Following these two discussions:
add a reference to the LLVM AI policy in the GH greeter.
In addition:
well, since these are often shared during PR review.
the policies.
Hello @{self.author} :wave:to make the greeting more personal.[flang] dummy arguments used as function calls (llvm#196426)
Adding an error when a dummy argument is used as a statement function.
This PR now points out:
Handles issue
196424
Co-authored-by: Sunil Kuravinakop kuravina@pe31.hpc.amslabs.hpecorp.net
[SelectionDAG] Split vector types for atomic load (llvm#165818)
Vector types that aren't widened are split so that a single ATOMIC_LOAD
is issued for the entire vector at once. This change utilizes the load
vectorization infrastructure in SelectionDAG in order to group the
vectors. This enables SelectionDAG to translate vectors with type
bfloat,half.
Add support for Ubuntu 26.10 - Stonking Stingray (llvm#196896)
Co-authored-by: Oliver Reiche oliver.reiche@canonical.com
[clang-tidy] Remove hicpp modules [2/4] (llvm#196870)
This is part two of removing the hicpp-* checks.
RFC:
https://discourse.llvm.org/t/rfc-regarding-the-current-status-of-hicpp-checks/89883
Part of llvm#183462
[LV] Handle FSub Partial Reductions (llvm#191186)
Introduces a new RecurKind value 'FSub' in order to handle partial
reductions of floating point values.
This is done by following the existing method for integer partial
reductions, doing a positive accumulation followed by a final
subtraction in the middle block.
[LV][NFC] Remove instcombine pass from RUN lines of simple tests (llvm#196257)
Most of the work done by the instcombine pass on these files involves
canonicalising GEPs and shuffling code around. I don't believe there is
any value running instcombine in these cases.
[GISel][X86] port X86PreLegalizerCombiner to npm (llvm#182638)
Porting X86PreLegalizerCombiner to npm as part of
llvm#178192
[X86] Cast atomic vectors in IR to support floats (llvm#148899)
This commit casts floats to ints in an atomic load during AtomicExpand
to support
floating point types. It also is required to support 128 bit vectors in
SSE/AVX.
[AMDGPU] Add VMovB64 subtarget feature (llvm#196340)
[mlir][SPIR-V] Add CL.{exp2,exp10,log2,log10} ops (llvm#196869)
[Clang] Fix incorrect type for
__mfp8inextractelementcodegen (llvm#192977)The codegen for extracting an element from an FP8 vector was emitting a
simple
extractelementwithi8type for the extracted element. The__mfp8type is represented as<1 x i8>in LLVM IR. This codegencreated inconsistency in Clang - some
__mfp8expressions wouldcorrespond to LLVM IR values with
<1 x i8>type and some toi8type.It also caused an assertion failure when the extracted element was
passed as a function argument.
This patch fixes the issue by inserting the extracted element
into a
<1 x i8>.[mlir][tosa] Add a pass to downgrade TOSA
1.1.draftto1.0(llvm#194971)This commit adds a pass that will allow 1.1.draft operations to be
rewritten to their 1.0 counterparts where possible. The pass currently
covers the following operations:
Note that the downgrade is 'best-effort' and the pass does not perform
any validation itself. The validation pass should be run after
downgrading to check that the resulting IR was downgraded successfully.
Motivation: This decouples the target specification version in
legalizations and backends. Legalizations from higher level frameworks
may be updated to support producing TOSA 1.1.draft variants of
operations, while backends can still consume TOSA 1.0 IR after running
the downgrade pass.
[llubi] Upstream existing floating-point intrinsics (llvm#196034)
This PR upstreams existing floating-point intrinsics in the out-of-tree
version of llubi. Including FP vector reduction, FP min/max operations,
etc. Some minor bugs from llvm#188453 are also fi