Commit Graph

447 Commits

Author SHA1 Message Date
Jan Vesely 7ab2d0bdcd shared: Implement aligned vector stores (vstorea_half)
Float version passes newly posted piglit tests on turks, float and double pass on carrizo.
v2: scalar vstorea_half
v3: fix typo

Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 316291
2017-10-22 14:21:59 +00:00
Jan Vesely 12061c7125 shared: Implement aligned vector loads (vloada_half)
Passes newly posted piglits on turks and carrizo
v2: add scalar vloada_half
v3: fix typo

Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 316290
2017-10-22 14:21:56 +00:00
Jan Vesely c420b61b26 amdgcn: Add missing datalayout info to .ll files
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 316239
2017-10-20 21:10:18 +00:00
Jan Vesely 66b32ad9ad r600: Add missing datalayout to .ll files
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 316238
2017-10-20 21:00:31 +00:00
Jan Vesely 577c52b9c7 travis: enable checks of nvptx libraries
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315343
2017-10-10 18:10:25 +00:00
Jan Vesely 2601429bac travis: Enable external function call checks on llvm-{4,5}
Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315342
2017-10-10 18:10:24 +00:00
Jan Vesely 3d349ea98e Make image builtins r600/llvm-3.9 only
The implementation uses r600 sepcific intrinsics
LLVM-4 switched to _ro_t and _rw_t image types
Portions of the code can be moved back as more targets/llvm versions add image support

Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315341
2017-10-10 18:10:21 +00:00
Jeroen Ketema 1364d268a4 Implement mem_fence on ptx
PTX does not differentiate between read and write fences. Hence, these a
lowered to a mem_fence call. The mem_fence function compiles to the
“member.cta” instruction, which commits all outstanding reads and writes
of a thread such that these become visible to all other threads in the same
CTA (i.e., work-group). The instruction does not differentiate between
global and local memory. Hence, the flags parameter is ignored, except
for deciding whether a “member.cta” instruction should be issued at all.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315235
2017-10-09 19:43:04 +00:00
Jeroen Ketema 4f5a3d5d6f Make ptx barrier work irrespective of the cl_mem_fence_flags
This generates a "bar.sync 0” instruction, which not only causes the
threads to wait, but does acts as a memory fence, as required by
OpenCL. The fence does not differentiate between local and global
memory. Unfortunately, there is no similar instruction which does
not include a memory fence. Hence, we cannot optimize the case
where neither CLK_LOCAL_MEM_FENCE nor CLK_GLOBAL_MEM_FENCE is
passed.

llvm-svn: 315228
2017-10-09 18:36:48 +00:00
Jan Vesely 3c51ae5bd9 travis: Make sure we report failure even if only earlier checked files fail
for loop would only report status of the last command
v2: return '1'
    call test instead of '['

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315193
2017-10-08 20:07:58 +00:00
Jan Vesely 136381dc38 check_external_calls.sh: Print number of calls in tested file.
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315192
2017-10-08 20:07:56 +00:00
Jan Vesely 80bb52ae75 ptx: Use __clc_nextafter to implement nextafter
using clang builtin results in external library call

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315191
2017-10-08 19:34:00 +00:00
Jan Vesely 1de1444d62 Do not include clc_nextafter header globally
Drop unused clc/math/clc_nextafter.h header

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315190
2017-10-08 19:33:58 +00:00
Jan Vesely 6a5c8ddb3a math/nextafter: Use custom declaration inc file
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315189
2017-10-08 19:33:55 +00:00
Jan Vesely 72be1cc0be math/binary_decl.inc: Do not declare mixed float/double functions
fmin/fmax only need vector/scalar mix

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315188
2017-10-08 19:33:53 +00:00
Jan Vesely beb6591753 ldexp: Fix double precision function return type
Fixes ~1200 external calls from nvtpx library.

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315170
2017-10-08 06:56:14 +00:00
Jan Vesely 391305638c configure: Fix handling of directories with compats only source lists
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315018
2017-10-05 20:16:28 +00:00
Jeroen Ketema 957151bd86 Add vload_half helpers for ptx
The removes the vload_half unresolved calls from the nvptx libraries.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314998
2017-10-05 18:17:40 +00:00
Jeroen Ketema feefb0870f Add vstore_half helpers for ptx
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314925
2017-10-04 19:07:48 +00:00
Jan Vesely a02d0e2c50 integer/sub_sat: Use clang builtin instead of llvm asm
reviewer: Tom Stellard

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314703
2017-10-02 18:39:03 +00:00
Jan Vesely 1964df8fad integer/add_sat: Use clang builtin instead of llvm asm
reviewer: Tom Stellard

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314702
2017-10-02 18:39:00 +00:00
Jan Vesely 943057a288 integer/clz: Use clang builtin instead of llvm asm
The generated llvm IR mostly identical. char/uchar case is a bit worse.

reviewer: Tom Stellard

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314701
2017-10-02 18:38:57 +00:00
Jeroen Ketema fe9fa89854 Let get_work_dim take exactly 0 arguments
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314634
2017-10-01 20:11:46 +00:00
Jeroen Ketema 17fdf263c5 Do no circularly define NULL
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314633
2017-10-01 20:10:14 +00:00
Jan Vesely 2b7fa1c6f6 Fix amdgcn-amdhsa on llvm-3.9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314548
2017-09-29 19:06:52 +00:00
Jan Vesely aee030f284 travis: Check built libraries on llvm-3.9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314547
2017-09-29 19:06:50 +00:00
Jan Vesely 8c8c287adf Add script to check for unresolved function calls
v2: add shell shebang
    improve error checks and reporting
v3: fix typo

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314546
2017-09-29 19:06:48 +00:00
Jan Vesely 41b1500db0 geometric: geometric functions are only supported for vector lengths <=4
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314545
2017-09-29 19:06:47 +00:00
Jan Vesely 8d08f01eff travis: add build using llvm-3.9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314544
2017-09-29 19:06:45 +00:00
Jan Vesely ce29e8cde1 Restore support for llvm-3.9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314543
2017-09-29 19:06:41 +00:00
Jan Vesely 3bb50f6f7b Add missing HAVE_LLVM define to fix build with latest llvm
Broken since r314111

V2: pointed out by Jan Vesely
   - Use format() instead of % formating

Patch-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314261
2017-09-26 23:15:54 +00:00
Jan Vesely 1fa727d615 Rework atomic ops to use clang builtins rather than llvm asm
reviewer: Aaron Watry

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314112
2017-09-25 16:07:34 +00:00
Jan Vesely 760052047b prepare_builtins: Fix compile breakage with older LLVM
Fixes r314050

reviewer: Tom Stellard

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314111
2017-09-25 16:04:37 +00:00
Reid Kleckner 3fc649cb76 [Support] Rename tool_output_file to ToolOutputFile, NFC
This class isn't similar to anything from the STL, so it shouldn't use
the STL naming conventions.

llvm-svn: 314050
2017-09-23 01:03:17 +00:00
Jan Vesely c9bbbe2403 Implement cl_khr_int64_extended_atomics builtins
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 313811
2017-09-20 20:42:19 +00:00
Jan Vesely 1c81f4b0e3 Implement cl_khr_int64_base_atomics builtins
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 313810
2017-09-20 20:42:14 +00:00
Jan Vesely d0320d5289 Add travis CI configuration file
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 313773
2017-09-20 17:28:58 +00:00
Aaron Watry e62f5fa64d Add native_recip(x) as ((1)/(x))
Signed-off-by: Aaron Watry <awatry@gmail.com>
Acked-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 313107
2017-09-13 01:40:25 +00:00
Aaron Watry 415a60f303 integer: Add popcount implementation using ctpop intrinsic
Also copy/modify the unary_intrin.inc from math/ to make the
intrinsic declaration somewhat reusable.

Passes CL CTS integer_ops/test_integer_ops popcount tests for CL 1.2

Tested-by on GCN 1.0 (Pitcairn)

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312854
2017-09-09 02:23:54 +00:00
Jan Vesely 285d2fb85c Implement vload_half{,n} and vload(half)
v2: add vload(half) as well
    make helpers amdgpu specific (NVPTX uses different private AS numbering)
    use clang builtin on clang >= 6

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312839
2017-09-08 23:59:00 +00:00
Jan Vesely 661ac03a1b vstore: Cleanup and add vstore(half)
Add missing undefs
Make helpers amdgpu specific (NVPTX uses different numbering for private AS)
Use clang builtins on clang >= 6

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312838
2017-09-08 23:58:57 +00:00
Jan Vesely b9dbaae3fb configure.py: Simplify compatibility sources
Just add the SOURCE_X.Y list to the list of sources if X.Y is the current llvm version.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312837
2017-09-08 23:58:53 +00:00
Jan Vesely 3d1db3de74 amdgcn,waitcnt: Add datalayout info
This file is only compiled for GCN which all share the same layout

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 312493
2017-09-04 15:52:07 +00:00
Jan Vesely e337b30c7d r600: Cleanup barrier implementation.
We don't have memory fences for r600 so just call group barrier directly
Make sure that barrier is called even with 0 flags

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 312492
2017-09-04 15:52:05 +00:00
Jan Vesely 1796d590c1 Fixup clc.h comment
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312491
2017-09-04 15:52:03 +00:00
Aaron Watry 0bf96b1712 relational: Implement shuffle2 builtin
This was added in CL 1.1

Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in_dual_input

v2: Add half support to shuffle2
    Move shuffle2 to misc/

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312404
2017-09-02 02:23:28 +00:00
Aaron Watry 880f15dae6 relational: Implement shuffle builtin
This was added in CL 1.1

Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in

v2: Add half-precision support to shuffle when available.
    Move to misc/ and add section 6.12.12 to clc.h

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312403
2017-09-02 02:23:26 +00:00
Aaron Watry da8dfefd1c Add halfN types and enable fp16 when generating builtin declarations
Uses the same mechanism to enable fp16 as we use for fp64 when
processing clc.h

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312402
2017-09-02 02:23:16 +00:00
Jan Vesely 999b1d9426 amdgcn: rewrite barrier() using fence and clang __builtin_amdgcn_s_barrier
Specs require using fences when barrier() is invoked:
"The barrier function will either flush any variables stored in local memory
or queue a memory fence to ensure correct ordering of memory operations to local memory."
and
"The barrier function will queue a memory fence to ensure correct ordering
of memory operations to global memory."

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 311022
2017-08-16 17:09:00 +00:00
Jan Vesely 1977092dc3 amdgcn: Implement {read_,write_,}mem_fence builtin
v2: add more detailed comment about waitcnt instruction

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 311021
2017-08-16 17:08:56 +00:00
Jan Vesely 7fc4c79fa5 configure.py: Drop explicit import of int builtin
I can't reproduce the error that made me add this.

Reported-by: Kim Gräsman <kim.grasman@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Kim Gräsman <kim.grasman@gmail.com>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 310968
2017-08-15 22:24:05 +00:00
Jan Vesely a4a20cd2f3 configure.py: Make python3 friendly
mostly prints and exceptions.
Few behavioral changes are documented in the text
Generated Makefile is identical between python2 and python3

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 309820
2017-08-02 15:00:59 +00:00
Jan Vesely 09f0a560e1 add __kernel_exec macros
also consolidate macros into one file, and rename to clcmacros.h

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 309358
2017-07-28 03:39:03 +00:00
Jan Vesely 2f2a3bc0dc generic: add missing get_work_dim include
Fixes few piglits since clang r304193

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 304556
2017-06-02 15:58:35 +00:00
Jan Vesely 9f7172965c math: Implement sinh function
mostly copied form amd_builtins

llvm-svn: 296233
2017-02-25 02:46:53 +00:00
Jan Vesely c3868c8f8d .gitignore: Ignore amdgcn-mesa object directory
llvm-svn: 296164
2017-02-24 20:32:18 +00:00
Aaron Watry dfec3c8e95 math: Add native_tan as wrapper to tan
Trivially define native_tan as a redirect to tan.

If there are any targets with a native implementation, we can deal with it later.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <arsenm2@gmail.com>
llvm-svn: 295920
2017-02-23 01:46:57 +00:00
Jeroen Ketema 80d2e8ffc1 Move BufferPtr into the block where it it being used
The previous location outside the block would crash prepare-builtins
when no the builtins file accidentially not passed on the command line.

llvm-svn: 294916
2017-02-12 21:33:49 +00:00
Jeroen Ketema ed98e8d099 Add the correct prefixes to the cl_khr_fp64 pragma
llvm-svn: 294915
2017-02-12 21:31:41 +00:00
Matt Arsenault 9df2b9781c math: Add native_rsqrt builtin function
Trivial define to rsqrt.

Patch by Vedran Miletić <vedran@miletic.net>

llvm-svn: 294608
2017-02-09 18:39:26 +00:00
Aaron Watry c606efabb7 math: Add logb builtin
Ported from the amd-builtins branch.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 292335
2017-01-18 03:14:10 +00:00
Aaron Watry 900bd7eb7f math: Add expm1 builtin function
Ported from the amd-builtins branch.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 292334
2017-01-18 03:13:37 +00:00
Tom Stellard d83eb34ee7 Fix build since r286752.
llvm-svn: 286839
2016-11-14 16:06:33 +00:00
Tom Stellard 088faab429 Fix build since llvm r286566 and require at least llvm 4.0
llvm-svn: 286634
2016-11-11 21:34:47 +00:00
Jan Vesely 0a5aac3fc4 Provide vstore_half helper to workaround clc restrictions
clang won't accept half precision loads and stores without cl_khr_fp16 since r281904

llvm-svn: 282106
2016-09-21 20:15:55 +00:00
Tom Stellard 6b195ece57 configure: Add amdgcn-mesa-mesa3d target
llvm-svn: 281793
2016-09-16 22:43:33 +00:00
Tom Stellard f19cf403c4 amdgcn-amdhsa: Add get_num_groups implementation
llvm-svn: 281792
2016-09-16 22:43:31 +00:00
Tom Stellard e7ad23bad3 amdgcn-amdhsa: Add get_global_size() implementation
llvm-svn: 281791
2016-09-16 22:43:29 +00:00
Aaron Watry af569547fa math: Implement tgamma
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281566
2016-09-15 00:17:34 +00:00
Aaron Watry e9009cdd21 math: Implement lgamma
Just use lgamma_r and ignore the value returned in the second argument

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281565
2016-09-15 00:17:31 +00:00
Aaron Watry 0ab07e1bde math: Implement lgamma_r
Ported from the amd-builtins branch, which is itself based on the
Sun Microsystems implementation.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281564
2016-09-15 00:17:28 +00:00
Aaron Watry f969413a82 Add ADDR_SPACE parameter to _CLC_V_V_VP_VECTORIZE
This macro is currently unused, but I plan to use it shortly.

The previous form did casts of pointers without an address space, which
doesn't work so well for CL 1.x.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281563
2016-09-15 00:17:22 +00:00
Matt Arsenault fbfd828d2a Replace nextafter implementation
This one passes conformance.

llvm-svn: 280961
2016-09-08 16:37:56 +00:00
Jan Vesely eade17271a Avoid ambiguity in calling atom_add functions.
clang (since r280553) allows pointer casts in function overloads,
so we need to disambiguate the second argument.

clang might be smarter about overloads in the future
see https://reviews.llvm.org/D24113, but let's be safe in libclc anyway.

llvm-svn: 280871
2016-09-07 22:11:02 +00:00
Niels Ole Salscheider 63f71057c0 configure.py: Add polaris10 and polaris11
llvm-svn: 280121
2016-08-30 18:00:41 +00:00
Matt Arsenault 958fce3192 amdgcn: Fix return type of get_num_groups
llvm-svn: 279723
2016-08-25 07:31:40 +00:00
Matt Arsenault 7ef7e6aacd Strip opencl.ocl.version metadata
This should be uniqued when linking, but right now it creates
a lot of metadata spam listing the same version. This should also
probably be reporting the compiled version of the user program,
which may differ from the library. Currently the library IR files report
1.0 while 1.1/1.2 are the default for user programs.

llvm-svn: 279692
2016-08-25 00:25:10 +00:00
Matt Arsenault d0a275228e amdgcn: Also correct get_local_size type for HSA
llvm-svn: 279656
2016-08-24 19:11:52 +00:00
Matt Arsenault 26d9c41ff6 amdgcn: Fix return type for get_global_size
llvm-svn: 279644
2016-08-24 17:52:04 +00:00
Matt Arsenault 314364cbd2 amdgpu: Fix default case value for get_local_size
llvm-svn: 279359
2016-08-20 04:17:17 +00:00
Matt Arsenault 220268d177 amdgcn: Fix get_local_size IR return type
llvm-svn: 279350
2016-08-20 00:01:21 +00:00
Matt Arsenault 2ce3d94a01 amdgcn: Correct return types to be size_t
llvm-svn: 279343
2016-08-19 22:49:39 +00:00
Jan Vesely ad8672727c Implement vstore_half{,n}
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 278962
2016-08-17 20:02:11 +00:00
Jan Vesely 4c59714a52 Make min follow the OCL 1.0 specs
OpenCL 1.0: "Returns y if y < x, otherwise it returns x. If x *and* y
are infinite or NaN, the return values are undefined."

OpenCL 1.1+: "Returns y if y < x, otherwise it returns x. If x *or* y
are infinite or NaN, the return values are undefined."

The 1.0 version is stricter so use that one.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276704
2016-07-25 22:36:22 +00:00
Tom Stellard d835b3f1af Implement cbrt builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 276497
2016-07-22 23:45:15 +00:00
Tom Stellard 9cb070f96a Implement cosh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 276496
2016-07-22 23:45:13 +00:00
Tom Stellard ff13926a60 geometric/floatn.inc: Add vec8 and vec16 types
llvm-svn: 276495
2016-07-22 23:45:11 +00:00
Jan Vesely a82e080b57 AMDGPU: Implement get_global_offset builtin
Also fix get_global_id to consider offset
No idea how to add this for ptx, so they are stuck with the old get_global_id
implementation.

v2: split to a separate patch

v3: Switch R600 to use implictarg.ptr

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276443
2016-07-22 17:24:24 +00:00
Jan Vesely 74f02db922 AMDGPU: Use clang intrinsics for workitem builtins
v2: split into 2 patches
    use clang builtins for other intrinsics as well

v3: Fix warnings
    Switch r600 to use implictarg.ptr

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276442
2016-07-22 17:24:20 +00:00
Jan Vesely 7846c9b8f0 ptx: Fix builtin names after clang r274770
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 276423
2016-07-22 15:00:08 +00:00
Matt Arsenault 633d749da7 amdgpu: Use right builtn for rsq
The r600 path has never actually worked sinced double is not implemented
there.

llvm-svn: 276009
2016-07-19 19:02:01 +00:00
Matt Arsenault 1ab0d9c1ee R600: Use new barrier intrinsic
llvm-svn: 275874
2016-07-18 18:42:17 +00:00
Matt Arsenault b456c6dd56 Replace llvm.AMDGPU.ldexp with llvm.amdgcn.ldexp
It didn't really work on r600 to begin with, which should
get its own intrinsic.

llvm-svn: 275813
2016-07-18 16:42:50 +00:00
Jan Vesely e97deffb6a configure: Remove device specific defines
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 273044
2016-06-17 20:30:50 +00:00
Jan Vesely 5fd84d028d nvptx: Drop feature defines.
This is now handled by clang

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 273043
2016-06-17 20:30:49 +00:00
Jan Vesely 3317f253de 64 bit integers are legal in full profile without an extension
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 273042
2016-06-17 20:30:41 +00:00
Jan Vesely 973c1fa5f5 math: Use single precision fmax in sp path
Fixes fdim piglit on Turks

v2: use CL fmax instead of __builtin

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom.stellard@amd.com>
llvm-svn: 269807
2016-05-17 19:44:01 +00:00
Jan Vesely c374cb76f4 math: Add erf ported from amd-builtins
The scalar float/double function bodies are a direct copy/paste,
aside from the removed (optional) code in float function body that
requires subnormals.

reviewers: jvesely

Patch by: Vedran Miletić <rivanvx@gmail.com>

llvm-svn: 268766
2016-05-06 18:02:30 +00:00
Aaron Watry 55a8e0fd6d math: Add fdim implementation
Based on the amd-builtin, but explicitly vectorized for all sizes (not just
float4), and includes a vectorized double implementation.

Passes piglit (float) tests on pitcairn.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 268708
2016-05-06 03:34:45 +00:00
Tom Stellard 6cb18a09b1 prepare-builtins: Remove call to getGlobalContext()
This function has been removed from LLVM.

Patch By: Laurent Carlier

llvm-svn: 266430
2016-04-15 14:18:58 +00:00