hanchenye-llvm-project

Commit Graph

Author	SHA1	Message	Date
Hal Finkel	2e0ff2b244	[LoopVectorize] Don't vectorize loops when everything will be scalarized This change prevents the loop vectorizer from vectorizing when all of the vector types it generates will be scalarized. I've run into this problem on the PPC's QPX vector ISA, which only holds floating-point vector types. The loop vectorizer will, however, happily vectorize loops with purely integer computation. Here's an example: LV: The Smallest and Widest types: 32 / 32 bits. LV: The Widest register is: 256 bits. LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 1 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Scalar loop costs: 3. LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 2 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 2 for VF 2 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 2 costs: 2. LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 4 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 4 costs: 1. ... LV: Selecting VF: 8. LV: The target has 32 registers LV(REG): Calculating max register usage: LV(REG): At #0 Interval # 0 LV(REG): At #1 Interval # 1 LV(REG): At #2 Interval # 2 LV(REG): At #4 Interval # 1 LV(REG): At #5 Interval # 1 LV(REG): VF = 8 The problem is that the cost model here is not wrong, exactly. Since all of these operations are scalarized, their cost (aside from the uniform ones) are indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF picked, the lower the relative overhead from the loop itself (and the induction-variable update and check), and so in a sense, picking the largest VF here is the right thing to do. The problem is that vectorizing like this, where all of the vectors will be scalarized in the backend, isn't really vectorizing, but rather interleaving. By itself, this would be okay, but then the vectorizer itself also interleaves, and that's where the problem manifests itself. There's aren't actually enough scalar registers to support the normal interleave factor multiplied by a factor of VF (8 in this example). In other words, the problem with this is that our register-pressure heuristic does not account for scalarization. While we might want to improve our register-pressure heuristic, I don't think this is the right motivating case for that work. Here we have a more-basic problem: The job of the vectorizer is to vectorize things (interleaving aside), and if the IR it generates won't generate any actual vector code, then something is wrong. Thus, if every type looks like it will be scalarized (i.e. will be split into VF or more parts), then don't consider that VF. This is not a problem specific to PPC/QPX, however. The problem comes up under SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's reduced test case from PR26837 to this commit. Differential Revision: http://reviews.llvm.org/D18537 llvm-svn: 264904	2016-03-30 19:37:08 +00:00
Adhemerval Zanella	69b29b2b7c	[lld] [ELF/AArch64] Add aarch64 TLS IE to LE relax for local symbol test This patch add a TLS relax optimization test when transforming Initial-Exec to Local-Exec for local symbols (which can not be preempted). llvm-svn: 264903	2016-03-30 19:12:18 +00:00
Rong Xu	b534166fd4	[PGO] PGOFuncName in LTO optimizations PGOFuncNames are used as the key to retrieve the Function definition from the MD5 stored in the profile. For internal linkage function, we prefix the source file name to the PGOFuncNames. LTO's internalization privatizes many global linkage symbols. This happens after value profile annotation, but those internal linkage functions should not have a source prefix. To differentiate compiler generated internal symbols from original ones, PGOFuncName meta data are created and attached to the original internal symbols in the value profile annotation step. If a symbol does not have the meta data, its original linkage must be non-internal. Also add a new map that maps PGOFuncName's MD5 value to the function definition. Differential Revision: http://reviews.llvm.org/D17895 llvm-svn: 264902	2016-03-30 18:37:52 +00:00
Reid Kleckner	747dc2eb61	[cmake] Get the MSVC version by running cl rather than relying on MSVC_VERSION MSVC_VERSION comes from the _MSC_VER macro, which won't correspond to the STL version if the host compiler is clang-cl. llvm-svn: 264901	2016-03-30 18:31:14 +00:00
Reid Kleckner	88ad225e94	[cmake] Instead of testing char16_t for MSVC compat, directly ask cl.exe its version Credit to Aaron Ballman for thinking of this. llvm-svn: 264886	2016-03-30 18:19:39 +00:00
Tobias Grosser	6deba4ea03	Revert 264782 and 264789 These caused LNT failures due to new assertions when running with -polly-position=before-vectorizer -polly-process-unprofitable for: FAIL: clamscan.compile_time FAIL: cjpeg.compile_time FAIL: consumer-jpeg.compile_time FAIL: shapes.compile_time FAIL: clamscan.execution_time FAIL: cjpeg.execution_time FAIL: consumer-jpeg.execution_time FAIL: shapes.execution_time The failures have been introduced by r264782, but r264789 had to be reverted as it depended on the earlier patch. llvm-svn: 264885	2016-03-30 18:18:31 +00:00
Teresa Johnson	83c517c44e	Restore "[ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly" This restores commit 264869, with a fix for windows bots to properly escape '\' in the path when serializing out. Added test. llvm-svn: 264884	2016-03-30 18:15:08 +00:00
Jim Ingham	a7c5e1922d	Fix header name. llvm-svn: 264883	2016-03-30 18:14:36 +00:00
Chad Rosier	f7ac5f28ab	[AArch64] Fix warnings pointed out by Hal. llvm-svn: 264882	2016-03-30 18:08:51 +00:00
Reid Kleckner	2b3db2c1bb	[cmake] Add -fms-compatibility-version=19 when clang-cl gives errors about char16_t What we are really trying to do here is to figure out if we are using the 2015 STL. Unfortunately, so far as I know the MSVC STL does not define a version macro that we can check directly. Instead I wrote a check to see if char16_t works. llvm-svn: 264881	2016-03-30 17:30:26 +00:00
Reid Kleckner	8c18019d50	[cmake] Allow EH usage with clang-cl llvm-svn: 264880	2016-03-30 17:28:21 +00:00
Rong Xu	311ada11f8	[PGO] Use ArrayRef in annotateValueSite() Using ArrayRef in annotateValueSite's parameter instead of using an array and it's size. Differential Revision: http://reviews.llvm.org/D18568 llvm-svn: 264879	2016-03-30 16:56:31 +00:00
Rui Ueyama	38dc83417b	Include line number in error message for linker scripts. This patch is based on http://reviews.llvm.org/D18545 written by George Rimar. llvm-svn: 264878	2016-03-30 16:51:57 +00:00
Tom Stellard	1d5e6d4bdc	AMDGPU/SI: Improve MachineSchedModel definition This patch contains a few improvements to the model, including: - Using a single resource with a defined buffers size for each memory unit. - Setting the IssueWidth correctly. - Fixing latency values for memory instructions. shader-db stats: 16429 shaders in 3231 tests Totals: SGPRS: 318232 -> 312328 (-1.86 %) VGPRS: 208996 -> 209346 (0.17 %) Code Size: 7147044 -> 7166440 (0.27 %) bytes LDS: 83 -> 83 (0.00 %) blocks Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave Max Waves: 49182 -> 49243 (0.12 %) Wait states: 0 -> 0 (0.00 %)A Differential Revision: http://reviews.llvm.org/D18453 llvm-svn: 264877	2016-03-30 16:35:13 +00:00
Tom Stellard	0bc954e3bc	AMDGPU/SI: Enable lanemask tracking in misched Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876	2016-03-30 16:35:09 +00:00
Jonas Paulsson	f76123386a	[SystemZ] Add nop and nopr InstAliases. For compatability with GAS, nop and nopr are recognized as alises for bc and bcr, respectively. A mask of 0 turns these instructions effectively into no-operations. Reviewed by Ulrich Weigand. llvm-svn: 264875	2016-03-30 16:11:58 +00:00
Vedant Kumar	b64d86ff8e	[c-index-test] Delete dead function, NFC llvm-svn: 264874	2016-03-30 16:03:02 +00:00
Jonas Paulsson	3ace74a414	[SystemZ] Specify required features for builtins. BuiltinsSystemZ.def is extended to include the required processor features per intrinsic. New test test/CodeGen/builtins-systemz-error2.c that checks for expected errors when instrinsics are used with a subtarget that does not support the required feature (e.g. vector support). Reviewed by Ulrich Weigand. llvm-svn: 264873	2016-03-30 15:51:24 +00:00
Nirav Dave	8dd66e5753	Remove HasFnAttribute guards to getFnAttribute calls These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 llvm-svn: 264872	2016-03-30 15:41:12 +00:00
Teresa Johnson	20beeea24a	Revert "[ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly" This reverts commit r264869. I am seeing Windows bot failures due to the "\" in the path being mishandled at some point (seems to be interpreted wrongly at some point and llvm-as \| llvm-dis is yielding some junk characters). Need to investigate. llvm-svn: 264871	2016-03-30 15:16:04 +00:00
Simon Pilgrim	b87ffe8519	[X86][XOP] BITREVERSE lowering using VPPERM XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction. llvm-svn: 264870	2016-03-30 14:14:00 +00:00
Teresa Johnson	832a6790f6	[ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly Summary: This change serializes out and in the SourceFileName to LLVM assembly so that it is preserved through "llvm-dis \| llvm-as". This is necessary to ensure that the global identifiers created for local values in the module summary index are the same even if the bitcode is streamed out and read back from LLVM assembly. Serializing the summary itself to LLVM assembly is in progress. Reviewers: joker.eph Subscribers: llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D18588 llvm-svn: 264869	2016-03-30 14:00:02 +00:00
Teresa Johnson	0c7bb96533	Prepare tests for change to emit Module SourceFileName to LLVM assembly Modify these tests to ignore the source file name when looking for the expected string. It was already catching the source file name once via the ModuleID, and will catch it another time with an impending change to LLVM to serialize out the module's SourceFileName. llvm-svn: 264868	2016-03-30 13:59:49 +00:00
Simon Pilgrim	9490b56a89	[X86][SSE] Test the legalization of vector comparison results We are currently doing a REALLY bad job of packing results of vector comparisons into the legalized <X x i1> result equivalents - a mixture of PACKSS/PMOVMSKB would be much better here. llvm-svn: 264867	2016-03-30 13:55:00 +00:00
Rafael Espindola	287e100db2	No relocation needs bot SA and ZA. Pass only one of them to relocateOne. llvm-svn: 264866	2016-03-30 13:27:50 +00:00
Rafael Espindola	8cc68c313b	Implement getImplicitAddend for mips. llvm-svn: 264865	2016-03-30 13:18:08 +00:00
Rafael Espindola	abc9a12929	Simplify mips addend processing. It is now added to the addend in the same way as a regular Elf_Rel addend. llvm-svn: 264864	2016-03-30 12:45:58 +00:00
Rafael Espindola	da99df366d	Fix handling of addends on i386. Because of merge sections it is not sufficient to just add them while applying a relocation. llvm-svn: 264863	2016-03-30 12:40:38 +00:00
Alexander Kornienko	b014596047	[clang-tidy] Fix MSVC build. llvm-svn: 264862	2016-03-30 12:35:05 +00:00
Benjamin Kramer	9415e06da7	[NVPTX] Avoid temporary std::string and make single-use function local to the cpp file. No functionality change intended. llvm-svn: 264861	2016-03-30 12:31:51 +00:00
Marianne Mailhot-Sarrasin	a5a750eaf1	gold-plugin: Fixed typo in an error message. llvm-svn: 264860	2016-03-30 12:20:53 +00:00
Gabor Horvath	349c828bea	[clang-tidy] Adjust dangling references check to ASTMatcher changes. llvm-svn: 264859	2016-03-30 12:16:09 +00:00
Alexander Kornienko	dbe0a1fd92	[docs] Added 3.8 clang-tidy release notes, fixed formatting. llvm-svn: 264858	2016-03-30 12:05:33 +00:00
Simon Pilgrim	ab305a9d4c	[X86][SSE] Added tests for clearing upper bits of vector elements Patterns based on PR6455 llvm-svn: 264857	2016-03-30 11:43:26 +00:00
Alexander Kornienko	e3ae0c6f19	[clang-tidy] readability check for const params in declarations Summary: Adds a clang-tidy warning for top-level consts in function declarations. Reviewers: hokein, sbenza, alexfh Subscribers: cfe-commits Patch by Matt Kulukundis! Differential Revision: http://reviews.llvm.org/D18408 llvm-svn: 264856	2016-03-30 11:31:33 +00:00
Gabor Horvath	1b654f2293	[ASTMatchers] Existing matcher hasAnyArgument fixed Summary: A checker (will be uploaded after this patch) needs to check implicit casts. The checker needs matcher hasAnyArgument but it ignores implicit casts and parenthesized expressions which disables checking of implicit casts for arguments in the checker. However the documentation of the matcher contains a FIXME that this should be removed once separate matchers for ignoring implicit casts and parenthesized expressions are ready. Since these matchers were already there the fix could be executed. Only one Clang checker was affected which was also fixed (ignoreParenImpCasts added) and is separately uploaded. Third party checkers (not in the Clang repository) may be affected by this fix so the fix must be emphasized in the release notes. Reviewers: klimek, sbenza, alexfh Subscribers: alexfh, klimek, xazax.hun, cfe-commits Differential Revision: http://reviews.llvm.org/D18243 llvm-svn: 264855	2016-03-30 11:22:14 +00:00
Kuba Brecka	058c302e0a	Fix the ThreadSanitizer support to avoid creating empty SBThreads and to not crash when thread_id is unavailable. Plus a whitespace fix. llvm-svn: 264854	2016-03-30 10:50:24 +00:00
Alexey Bataev	587e1de4ea	[OPENMP 4.0] Initial support for '#pragma omp declare simd' directive. Initial parsing/sema/serialization/deserialization support for '#pragma omp declare simd' directive. The 'declare simd' construct can be applied to a function to enable the creation of one or more versions that can process multiple arguments using SIMD instructions from a single invocation from a SIMD loop. If the function has any declarations, then the declare simd construct for any declaration that has one must be equivalent to the one specified for the definition. Otherwise, the result is unspecified. This pragma can be applied many times to the same declaration. Internally this pragma is represented as an attribute. But we need special processing for this pragma because it must be used before function declaration, this directive is applied to. Differential Revision: http://reviews.llvm.org/D10599 llvm-svn: 264853	2016-03-30 10:43:55 +00:00
James Molloy	8e46cd05a1	[VectorUtils] Don't try and truncate PHIs to a smaller bitwidth We already try not to truncate PHIs in computeMinimalBitwidths. LoopVectorize can't handle it and we really don't need to, because both induction and reduction PHIs are truncated by other means. However, we weren't bailing out in all the places we should have, and we ended up by returning a PHI to be truncated, which has caused PR27018. This fixes PR17018. llvm-svn: 264852	2016-03-30 10:11:43 +00:00
Gabor Horvath	b780c44eec	[analyzer] Fix an assertion fail in hash generation. In case the (uniqueing) location of the diagnostic is in a line that only contains whitespaces there was an assertion fail during issue hash generation. Unfortunately I am unable to reproduce this error with the built in checkers, so no there is no failing test case with this patch. It would be possible to write a debug checker for that purpuse but it does not worth the effort. Differential Revision: http://reviews.llvm.org/D18210 llvm-svn: 264851	2016-03-30 10:08:59 +00:00
Pavel Labath	021ccdb7cd	Fix SocketAddressTest (again) On some versions of Windows, the address is returned as "::1", while on others it's "0:0:...:0:1". Accept both versions, as they represent the same address. llvm-svn: 264850	2016-03-30 09:43:04 +00:00
Pavel Labath	1b46a72eb2	Fix warning in ThreadSanitizerRuntime llvm-svn: 264849	2016-03-30 09:42:59 +00:00
Pavel Labath	6ce88f8d66	Fix warning in ClangExpressionParser llvm-svn: 264847	2016-03-30 08:45:37 +00:00
Pavel Labath	ec62c0559f	Fix flakyness in TestWatchpointMultipleThreads Summary: the inferior in the test deliberately does not lock a mutex when accessing the watched variable. The reason for that is unclear as, based on the logs, the original intention of the test was to check whether watchpoints get propagated to newly created threads, which should work fine even with a mutex. Furthermore, in the unlikely event (which I have still observed happening from time to time) that two threads do manage the execute the "critical section" simultaneously, the test will fail, as it is expecting the watchpoint "hit count" to be 1, but in this case it will be 2. Given this, I have simply chose to lock the mutex always, so that we have more predictible behavior. Watchpoints being hit simultaneously is still (and correctly!) tested by TestConcurrentEvents. Reviewers: clayborg, jingham Subscribers: lldb-commits Differential Revision: http://reviews.llvm.org/D18558 llvm-svn: 264846	2016-03-30 08:43:54 +00:00
Chandler Carruth	8e06a10d1f	[x86] Fix a horrible bug in our lowering of x86 floating point atomic operations. Specifically, we had code that tried to badly approximate reconstructing all of the possible variations on addressing modes in two x86 instructions based on those in one pseudo instruction. This is not the first bug uncovered with doing this, so stop doing it altogether. Instead generically and pedantically copy every operand from the address over to both new instructions, and strip kill flags from any register operands. This fixes a subtle bug seen in the wild where we would mysteriously drop parts of the addressing mode, causing for example the index argument in the added test case to just be completely ignored. Hypothetically, this was an extremely bad miscompile because it actually caused a predictable and leveragable write of a 64bit quantity to an unintended offset (the first element of the array intead of whatever other element was intended). As a consequence, in theory this could even have introduced security vulnerabilities. However, this was only something that could happen with an atomic floating point add. No other operation could trigger this bug, so it seems extremely unlikely to have occured widely in the wild. But it did in fact occur, and frequently in scientific applications which were using relaxed atomic updates of a floating point value after adding a delta. Those would end up being quite badly miscompiled by LLVM, which is how we found this. Of course, this often looks like a race condition in the code, but it was actually a miscompile. I suspect that this whole RELEASE_FADD thing was a complete mistake. There is no such operation, and I worry that anything other than add will get remarkably worse codegeneration. But that's not for this change.... llvm-svn: 264845	2016-03-30 08:41:59 +00:00
Ismail Donmez	22921c9cf5	Fix shared build after r264790 llvm-svn: 264844	2016-03-30 08:31:46 +00:00
George Rimar	f1c0bf5b40	[ELF] - Do not keep undefined locals in .symtab gold and bfd do not include the undefined locals in symtab. We have no reasons to support that either. That fixes PR27016 Differential revision: http://reviews.llvm.org/D18554 llvm-svn: 264843	2016-03-30 08:16:11 +00:00
Stephan Bergmann	17d7d14571	For MS ABI, emit dllexport friend functions defined inline in class ...as that is apparently what MSVC does. This is an updated version of r263738, which had to be reverted in r263740 due to test failures. The original version had erroneously emitted functions that are defined in class templates, too (see the updated "Handle friend functions" code in EmitDeferredDecls, lib/CodeGen/ModuleBuilder.cpp). (The updated tests needed to be split out into their own dllexport-ms-friend.cpp because of the CHECK-NOTs which would have interfered with subsequent CHECK-DAGs in dllexport.cpp.) Differential Revision: http://reviews.llvm.org/D18430 llvm-svn: 264841	2016-03-30 06:27:31 +00:00
Craig Topper	e9ff01b2a7	[CodeGen] Mark EVT:getExtendedSizeInBits() as LLVM_READONLY. I think I had tried this a long time back and some bots failed. Hoping that was with an older gcc and maybe now it will work. llvm-svn: 264840	2016-03-30 05:26:43 +00:00
Jingyue Wu	f190ed4355	[docs] Add gpucc publication and tutorial. llvm-svn: 264839	2016-03-30 05:05:40 +00:00

1 2 3 4 5 ...

226382 Commits All Branches Search

226382 Commits

All Branches