hanchenye-llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	560665250f	NVPTX: Extract mem intrinsic expansions into utilities llvm-svn: 294490	2017-02-08 17:49:52 +00:00
Justin Lebar	e90c468444	[NVPTX] Enable combineRepeatedFPDivisors for NVPTX. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D29477 llvm-svn: 294011	2017-02-03 15:13:50 +00:00
Matt Arsenault	8e190b2f23	NVPTX: Fix not preserving volatile when expanding memset llvm-svn: 293851	2017-02-02 01:20:34 +00:00
Justin Lebar	06fcea4cd9	[NVPTX] Compute approx sqrt as 1/rsqrt(x) rather than xrsqrt(x). xrsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as desired. Verified that the particular nvptx approximate instructions here do in fact return 0 for x = 0. llvm-svn: 293713	2017-01-31 23:08:57 +00:00
Justin Lebar	1c9692a46f	[NVPTX] Implement NVPTXTargetLowering::getSqrtEstimate. Summary: This lets us lower to sqrt.approx and rsqrt.approx under more circumstances. * Now we emit sqrt.approx and rsqrt.approx for calls to @llvm.sqrt.f32, when fast-math is enabled. Previously, we only would emit it for calls to @llvm.nvvm.sqrt.f. (With this patch we no longer emit sqrt.approx for calls to @llvm.nvvm.sqrt.f; we rely on intcombine to simplify llvm.nvvm.sqrt.f into llvm.sqrt.f32.) * Now we emit the ftz version of rsqrt.approx when ftz is enabled. Previously, we only emitted rsqrt.approx when ftz was disabled. Reviewers: hfinkel Subscribers: llvm-commits, tra, jholewinski Differential Revision: https://reviews.llvm.org/D28508 llvm-svn: 293605	2017-01-31 05:58:22 +00:00
Matt Arsenault	850657a439	NVPTX: Move InferAddressSpaces to generic code llvm-svn: 293579	2017-01-31 01:10:58 +00:00
Matt Arsenault	9f432ec24c	NVPTX: Trivial cleanups of NVPTXInferAddressSpaces - Move DEBUG_TYPE below includes - Change unknown address space constant to be consistent with other passes - Grammar fixes in debug output llvm-svn: 293567	2017-01-30 23:27:11 +00:00
Matt Arsenault	42b6478344	NVPTX: Refactor NVPTXInferAddressSpaces to check TTI Add a new TTI hook for getting the generic address space value. llvm-svn: 293563	2017-01-30 23:02:12 +00:00
Rafael Espindola	e0eba3c493	Only print architecture dependent flags for that architecture. Different architectures can have different meaning for flags in the SHF_MASKPROC mask, so we should always check what the architecture use before checking the flag. NFC for now, but will allow fixing the value of an xmos flag. llvm-svn: 293484	2017-01-30 15:38:43 +00:00
Arpith Chacko Jacob	2b156edf56	[NVPTX] Add intrinsics to support named barriers. Support for barrier synchronization between a subset of threads in a CTA through one of sixteen explicitly specified barriers. These intrinsics are not directly exposed in CUDA but are critical for forthcoming support of OpenMP on NVPTX GPUs. The intrinsics allow the synchronization of an arbitrary (multiple of 32) number of threads in a CTA at one of 16 distinct barriers. The two intrinsics added are as follows: call void @llvm.nvvm.barrier.n(i32 10) waits for all threads in a CTA to arrive at named barrier #10. call void @llvm.nvvm.barrier(i32 15, i32 992) waits for 992 threads in a CTA to arrive at barrier #15. Detailed description of these intrinsics are available in the PTX manual. http://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions Reviewers: hfinkel, jlebar Differential Revision: https://reviews.llvm.org/D17657 llvm-svn: 293384	2017-01-28 16:38:15 +00:00
Matt Arsenault	32b9600a7e	NVPTX: Make NVPTXInferAddressSpaces preserve CFG llvm-svn: 293308	2017-01-27 17:30:39 +00:00
NAKAMURA Takumi	0d299191d0	NVPTXCodeGen: Add IPO to libdeps, since r293189. llvm-svn: 293256	2017-01-27 02:11:10 +00:00
Stanislav Mekhanoshin	81598117b6	Replace addEarlyAsPossiblePasses callback with adjustPassManager This change introduces adjustPassManager target callback giving a target an opportunity to tweak PassManagerBuilder before pass managers are populated. This generalizes and replaces addEarlyAsPossiblePasses target callback. In particular that can be used to add custom passes to extension points other than EP_EarlyAsPossible. Differential Revision: https://reviews.llvm.org/D28336 llvm-svn: 293189	2017-01-26 16:49:08 +00:00
Justin Lebar	46624a822d	[NVPTX] Auto-upgrade some NVPTX intrinsics to LLVM target-generic code. Summary: Specifically, we upgrade llvm.nvvm.: * brev{32,64} * clz.{i,ll} * popc.{i,ll} * abs.{i,ll} * {min,max}.{i,ll,u,ull} * h2f These either map directly to an existing LLVM target-generic intrinsic or map to a simple LLVM target-generic idiom. In all cases, we check that the code we generate is lowered to PTX as we expect. These builtins don't need to be backfilled in clang: They're not accessible to user code from nvcc. Reviewers: tra Subscribers: majnemer, cfe-commits, llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28793 llvm-svn: 292694	2017-01-21 01:00:32 +00:00
Justin Lebar	077f8fb168	[NVPTX] Move getDivF32Level, usePrecSqrtF32, and useF32FTZ into out of DAGToDAG and into TargetLowering. Summary: DADToDAG has access to TargetLowering, but not vice versa, so this is the more general location for these functions. NFC Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28795 llvm-svn: 292693	2017-01-21 01:00:14 +00:00
Artem Belevich	3d3f6190ab	[NVPTX] Fix lowering of fp16 ISD::FNEG. There's no neg.f16 instruction, so negation has to be done via subtraction from zero. Differential Revision: https://reviews.llvm.org/D28876 llvm-svn: 292452	2017-01-19 00:14:45 +00:00
Justin Lebar	1cf6bf4989	[NVPTX] Support global variables of integer type larger than i64. Reviewers: tra, majnemer Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28825 llvm-svn: 292316	2017-01-18 00:29:53 +00:00
Justin Lebar	9c46450dbb	[NVPTX] Standardize asm printer on "foo \tbar". Some instructions were printed as "foo\tbar", but most are printed as "foo \bar". Standardize on the latter form. llvm-svn: 292306	2017-01-18 00:09:36 +00:00
Justin Lebar	2a2d6f0ddd	[NVPTX] Clean up nested !strconcat calls. !strconcat is a variadic function; it will concatenate an arbitrary number of strings. There's no need to nest it. llvm-svn: 292305	2017-01-18 00:09:19 +00:00
Justin Lebar	cc938fc197	[NVPTX] Implement min/max in tablegen, rather than with custom DAGComine logic. Summary: This change also lets us use max.{s,u}16. There's a vague warning in a test about this maybe being less efficient, but I could not come up with a case where the resulting SASS (sm_35 or sm_60) was different with or without max.{s,u}16. It's true that nvcc seems to emit only max.{s,u}32, but even ptxas 7.0 seems to have no problem generating efficient SASS from max.{s,u}16 (the casts up to i32 and back down to i16 seem to be implicit and nops, happening via register aliasing). In the absence of evidence, better to have fewer special cases, emit more straightforward code, etc. In particular, if a new GPU has 16-bit min/max instructions, we want to be able to use them. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28732 llvm-svn: 292304	2017-01-18 00:09:01 +00:00
Justin Lebar	7dc3d6c341	[NVPTX] Lower integer absolute value idiom to abs instruction. Summary: Previously we lowered it literally, to shifts and xors. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28722 llvm-svn: 292303	2017-01-18 00:08:44 +00:00
Justin Lebar	1091a9f566	[NVPTX] Improve lowering of llvm.ctpop. Summary: Avoid an unnecessary conversion operation when using the result of ctpop.i32 or ctpop.i16 as an i32, as in both cases the ptx instruction we run returns an i32. (Previously if we used the value as an i32, we'd do an unnecessary zext+trunc.) Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28721 llvm-svn: 292302	2017-01-18 00:08:27 +00:00
Justin Lebar	c7d20128bd	[NVPTX] Add lowering for llvm.bitreverse. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28720 llvm-svn: 292301	2017-01-18 00:08:10 +00:00
Justin Lebar	d17de5380b	[NVPTX] Improve lowering of llvm.ctlz. Summary: * Disable "ctlz speculation", which inserts a branch on every ctlz(x) which has defined behavior on x == 0 to check whether x is, in fact zero. * Add DAG patterns that avoid re-truncating or re-expanding the result of the 16- and 64-bit ctz instructions. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28719 llvm-svn: 292299	2017-01-18 00:07:35 +00:00
Justin Lebar	38746d9718	[NVPTX] Let there be One True Way to set NVVMReflect params. Summary: Previously there were three ways to inform the NVVMReflect pass whether you wanted to flush denormals to zero: * An LLVM command-line option * Parameters to the NVVMReflect constructor * Metadata on the module itself. This change removes the first two, leaving only the third. The motivation for this change, aside from simplifying things, is that we want LLVM to be aware of whether it's operating in FTZ mode, so other passes can use this information. Ideally we'd have a target-generic piece of metadata on the module. This change moves us in that direction. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28700 llvm-svn: 292068	2017-01-15 16:54:35 +00:00
Artem Belevich	64dc9be7b4	[NVPTX] Added support for half-precision floating point. Only scalar half-precision operations are supported at the moment. - Adds general support for 'half' type in NVPTX. - fp16 math operations are supported on sm_53+ GPUs only (can be disabled with --nvptx-no-f16-math). - Type conversions to/from fp16 are supported on all GPU variants. - On GPU variants that do not have full fp16 support (or if it's disabled), fp16 operations are promoted to fp32 and results are converted back to fp16 for storage. Differential Revision: https://reviews.llvm.org/D28540 llvm-svn: 291956	2017-01-13 20:56:17 +00:00
Artem Belevich	d109f46573	[NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed. Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32 instruction even when unsafe FP math was not allowed. Clang-generated IR is not affected by this as it uses precise sin/cos from CUDA's libdevice when unsafe math is disabled. Differential Revision: https://reviews.llvm.org/D28619 llvm-svn: 291936	2017-01-13 18:48:13 +00:00
Diana Picus	116bbab4e4	[CodeGen] Rename MachineInstrBuilder::addOperand. NFC Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891	2017-01-13 09:58:52 +00:00
Mohammed Agabaria	2c96c43388	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch. updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657	2017-01-11 08:23:37 +00:00
Eugene Zelenko	c9f1f6b8ec	[NVPTX] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 291490	2017-01-09 22:16:51 +00:00
David Majnemer	5fa7d48bb8	[NVVMIntrRange] Only set range metadata if none is already present The range metadata inserted by NVVMIntrRange is pessimistic, range metadata already present could be more precise. llvm-svn: 290294	2016-12-22 00:51:59 +00:00
Justin Lebar	7853d3b9dd	[NVPTX] Remove dead #defines from NVPTXUtilities.h. llvm-svn: 289747	2016-12-15 00:45:06 +00:00
Justin Lebar	a54f4d7052	[NVPTX] Remove dead code. I've chosen to remove NVPTXInstrInfo::CanTailMerge but not NVPTXInstrInfo::isLoadInstr and isStoreInstr (which are also dead) because while the latter two are reasonably useful utilities, the former cannot be used safely: It relies on successful address space inference to identify writes to shared memory, but addrspace inference is a best-effort thing. llvm-svn: 289740	2016-12-14 23:20:40 +00:00
Justin Lebar	19bf9d2b6d	[NVPTX] Support .maxnreg annotation. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D27638 llvm-svn: 289729	2016-12-14 22:32:50 +00:00
Justin Lebar	e6867085fa	[NVPTX] Remove string constants from NVPTXBaseInfo.h. Summary: Previously they were defined as a 2D char array in a header file. This is kind of overkill -- we can let the linker lay out these strings however it pleases. While we're at it, we might as well just inline these constants where they're used, as each of them is used only once. Also move NVPTXUtilities.{h,cpp} into namespace llvm. Reviewers: tra Subscribers: jholewinski, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D27636 llvm-svn: 289728	2016-12-14 22:32:44 +00:00
Stephan Bergmann	17c7f70362	Replace APFloatBase static fltSemantics data members with getter functions At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647	2016-12-14 11:57:17 +00:00
Malcolm Parsons	06ac79c210	Fix Clang-tidy readability-redundant-string-cstr warnings Reviewers: beanz, lattner, jlebar Subscribers: jholewinski, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D26235 llvm-svn: 285832	2016-11-02 16:43:50 +00:00
Justin Lebar	ed1e312f05	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass. Summary: This has been replaced by the NVPTXInferAddressSpaces pass. We've had the new one as the default with the old one accessible via a flag for some months now, and we've had no problems. Reviewers: tra Subscribers: llvm-commits, jholewinski, jingyue, mgorny Differential Revision: https://reviews.llvm.org/D26165 llvm-svn: 285642	2016-10-31 21:51:42 +00:00
Justin Lebar	f0a80ba385	[NVPTX] Compute 'rem' using the result of 'div', if possible. Summary: In isel, transform Num % Den into Num - (Num / Den) * Den if the result of Num / Den is already available. Reviewers: tra Subscribers: hfinkel, llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26090 llvm-svn: 285461	2016-10-28 21:44:00 +00:00
Peter Collingbourne	6733564e5a	Target: Change various section classifiers in TargetLoweringObjectFile to take a GlobalObject. These functions are about classifying a global which will actually be emitted, so it does not make sense for them to take a GlobalValue which may for example be an alias. Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to look through aliases before using TargetLoweringObjectFile interfaces. These are functional changes but all appear to be bug fixes. Differential Revision: https://reviews.llvm.org/D25917 llvm-svn: 285006	2016-10-24 19:23:39 +00:00
Pavel Labath	51c454c1a9	Remove unused #includes of TimeValue.h. NFC. llvm-svn: 284975	2016-10-24 14:00:26 +00:00
Benjamin Kramer	2a8bef8769	Do a sweep over move ctors and remove those that are identical to the default. All of these existed because MSVC 2013 was unable to synthesize default move ctors. We recently dropped support for it so all that error-prone boilerplate can go. No functionality change intended. llvm-svn: 284721	2016-10-20 12:20:28 +00:00
Mehdi Amini	f42454b94b	Move the global variables representing each Target behind accessor function This avoids "static initialization order fiasco" Differential Revision: https://reviews.llvm.org/D25412 llvm-svn: 283702	2016-10-09 23:00:34 +00:00
Peter Collingbourne	2261d78cd2	Target: Remove unused patterns and transforms. NFC. llvm-svn: 283515	2016-10-07 00:30:49 +00:00
Mehdi Amini	117296c0a0	Use StringRef in Pass/PassManager APIs (NFC) llvm-svn: 283004	2016-10-01 02:56:57 +00:00
Eric Christopher	b4b75a531e	Update comment about initializing TLOF with a pointer at the previous line or the other commented out place. llvm-svn: 282673	2016-09-29 02:03:47 +00:00
Artem Belevich	3e1211581c	[NVPTX] Added intrinsics for atom.gen.{sys\|cta}.* instructions. These are only available on sm_60+ GPUs. Differential Revision: https://reviews.llvm.org/D24943 llvm-svn: 282607	2016-09-28 17:25:38 +00:00
Jacques Pienaar	98345fc0a1	[NVPTX] Check if callsite is defined when computing argument allignment Summary: In getArgumentAlignment check if the ImmutableCallSite pointer CS is non-null before dereferencing. If CS is 0x0 fall back to the ABI type alignment else compute the alignment as before. Reviewers: eliben, jpienaar Subscribers: jlebar, vchuravy, cfe-commits, jholewinski Differential Revision: https://reviews.llvm.org/D9168 llvm-svn: 282045	2016-09-21 01:57:57 +00:00
Eric Christopher	b0ee4e04b3	Actually remove the Mangler from the AsmPrinter and clean up the places it was "used" but not used. llvm-svn: 281749	2016-09-16 17:07:23 +00:00
Eric Christopher	4367c7fb9a	Move the Mangler from the AsmPrinter down to TLOF and clean up the TLOF API accordingly. llvm-svn: 281708	2016-09-16 07:33:15 +00:00

1 2 3 4 5 ...

693 Commits