Commit Graph

177 Commits

Author SHA1 Message Date
Tom Stellard d9ca1f1596 configure: Add --enable-runtime-subnormal option
This makes it possible for runtime implementations to disable
subnormal handling at runtime.

When this flag is enabled, decisions about how to handle subnormals
in the library will be controlled by an external variable called
__CLC_SUBNORMAL_DISABLE.

Function implementations should use these new helpers for querying subnormal
support:
__clc_fp16_subnormals_supported();
__clc_fp32_subnormals_supported();
__clc_fp64_subnormals_supported();

In order for the library to link correctly with this feature,
users will be required to either:

1. Insert this variable into the module (if using the LLVM/Clang C++/C APIs).

2. Pass either subnormal_disable.bc or subnormal_use_default.bc to the
linker.  These files are distributed with liblclc and installed to
$(installdir).  e.g.:

llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_disable.bc

or

llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_use_default.bc

If you do not supply the --enable-runtime-subnormal then the library
behaves the same as it did before this commit.

In addition to these changes, the patch adds helper functions that
should be used when implementing library functions that need
special handling for denormals:

__clc_fp16_subnormals_supported();
__clc_fp32_subnormals_supported();
__clc_fp64_subnormals_supported();

llvm-svn: 235329
2015-04-20 18:49:50 +00:00
Tom Stellard da2969fca7 Implement atanh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 234324
2015-04-07 16:20:22 +00:00
Tom Stellard ca4d382e11 Implement acosh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 234323
2015-04-07 16:20:20 +00:00
Tom Stellard 03dc366e79 Implement atanpi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233928
2015-04-02 17:01:58 +00:00
Tom Stellard eea0997566 Implement asinpi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233927
2015-04-02 17:01:56 +00:00
Tom Stellard 2b4ef39b2f Implement asinh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233926
2015-04-02 17:01:54 +00:00
Tom Stellard 084124a8fa Implement acospi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233925
2015-04-02 17:01:52 +00:00
Tom Stellard 1ded220cc0 Implement fmax using __builtin_fmax
This ensures correct handling of NaNi.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233713
2015-03-31 16:59:23 +00:00
Tom Stellard 310da7bfd2 Implement fmin using __builtin_fmin
This ensures correct handling of NaN.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233712
2015-03-31 16:59:21 +00:00
Tom Stellard bd4da7a0ef Implement fast_distance builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 232978
2015-03-23 18:10:04 +00:00
Tom Stellard cb80e14f2c Implement fast_length builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 232977
2015-03-23 18:10:02 +00:00
Tom Stellard d2a1559846 Implement half_sqrt builtin v2
This is a generic implementation which just calls sqrt.  Targets should
override this if they want a faster implementation.

v2:
  - Alphabetize SOURCES

llvm-svn: 232965
2015-03-23 17:01:37 +00:00
Tom Stellard 551a669e80 Implement distance builtin v2
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Remove unnecessary copyright.

llvm-svn: 232964
2015-03-23 17:01:35 +00:00
Tom Stellard cb1c0d7939 Fix implementation of length builtin v2
v2:
  - Move common code into a macro
  - Use the same constant for all vector types.

llvm-svn: 232963
2015-03-23 17:01:33 +00:00
Tom Stellard 8d3a4e3af2 Add __clc_ prefix to functions in sincos_helpers.cl
This will help avoid naming conflicts with functions defined in
kernels linking with libclc.

llvm-svn: 232960
2015-03-23 16:20:24 +00:00
Aaron Watry 2cf4d5f312 math: Implement erfc
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 232674
2015-03-18 21:52:07 +00:00
Tom Stellard adfd96f742 Fix bitselect for float/double types v2
We need to reinterpret float/double types as uint/ulong in order to
perform the bitwise operations.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Use vector operations rather than splitting vectors into scalar
    components.

Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 231373
2015-03-05 15:31:05 +00:00
Aaron Watry 1314630ec3 Move mix from math to common
It has been part of the common functions since 1.0

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 231137
2015-03-03 21:25:08 +00:00
Tom Stellard 9d0d374c5b Implement step builtin
This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 230970
2015-03-02 15:29:41 +00:00
Tom Stellard 1f28b14bba Implement smoothstep builtin v2
This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Fix typo in smoothstep.h

llvm-svn: 230969
2015-03-02 15:29:39 +00:00
Tom Stellard f5e5b0171d Implement radians builtin v2
This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Move to the common/ directory

llvm-svn: 230968
2015-03-02 15:29:37 +00:00
Tom Stellard 8336b3a604 Implement degrees builtin v2
This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Move to the common/ directory

llvm-svn: 230967
2015-03-02 15:29:35 +00:00
Aaron Watry f89bcca0b7 libclc/math: Add cospi
Ported from the libclc/amd-builtins branch

v2: Rename sincos_f_piby4 to __libclc__sincosf_piby4
    Add cospi(double) implementation instead of using llvm.cos

Notes:
The sincosD_piby4.h file is mostly the same as the builtin implementation
released by AMD. The inline attribute declaration is changed, and M_PI is
used instead of a constant double. Otherwise, the only difference is that
the header explicitly enables the fp64 pragma.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
CC: Tom Stellard <tom@stellard.net>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 230641
2015-02-26 15:42:00 +00:00
Jan Vesely 51702e6e75 Implement log10
v2: Use constant and multiplication instead of division
v3: Use hex constants

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 227585
2015-01-30 18:00:34 +00:00
Tom Stellard bf9f76fbe0 Implement log1p builtin
llvm-svn: 219230
2014-10-07 20:22:42 +00:00
Jan Vesely 8f64c3d842 Implement fmod
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 219087
2014-10-05 20:24:52 +00:00
Tom Stellard 081e778d22 Implement async_work_group_copy builtin v3
This is a simple implementation which just copies data synchronously.

v2:
  - Use size_t.

v3:
  - Fix possible race condition by splitting the copy among multiple
    work items.

llvm-svn: 219008
2014-10-03 19:49:39 +00:00
Tom Stellard ed5bbfdb1b Implement async_work_group_strided_copy builtin v2
This is a simple implementation which just copies data synchronously.

v2:
  - Use size_t.

llvm-svn: 219007
2014-10-03 19:49:37 +00:00
Tom Stellard b5064f79ef Implement wait_group_events builtin v2
This is a simple default implemetation which just calls barrier().

v2:
  - Only call barrier() once.

llvm-svn: 219006
2014-10-03 19:49:34 +00:00
Aaron Watry 0d976ba497 atomic: Add generic atom[ic]_cmpxchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217918
2014-09-16 22:34:49 +00:00
Aaron Watry 025d79ad6c atomic: Implement generic atom[ic]_xchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217917
2014-09-16 22:34:45 +00:00
Aaron Watry 7cfa12c2a5 atomic: Add generic atomic_min implementation
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217916
2014-09-16 22:34:41 +00:00
Aaron Watry 3f0a1a4c27 atomic: Add generic atom[ic]_xor
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217915
2014-09-16 22:34:36 +00:00
Aaron Watry 31e67d1cff atomic: Add atom[ic]_or
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217914
2014-09-16 22:34:32 +00:00
Aaron Watry cc68405761 atomics: Add generic atom[ic]_and
Not used yet.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217913
2014-09-16 22:34:28 +00:00
Aaron Watry 49614fbfd9 atomic: Add generic implementation of atom[ic]_max
Not used yet...

v2: Correct int/uint behavior

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217912
2014-09-16 22:34:24 +00:00
Aaron Watry c9b88d32be atomic: define extension functions for existing atomic implementations
We were missing the local versions of the atom_* before

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217911
2014-09-16 22:34:21 +00:00
Aaron Watry 947bdd059a math: Add tan implementation
Uses the algorithm:
tan(x) = sin(x) / sqrt(1-sin^2(x))

An alternative is:
tan(x) = sin(x) / cos(x)

Which produces more verbose bitcode and longer assembly.

Either way, the generated bitcode seems pretty nasty and a more optimized
but still precise-enough solution is welcome.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217511
2014-09-10 15:43:35 +00:00
Aaron Watry 951ab64d19 math: Add asin implementation
asin(x) = atan2(x, sqrt( 1-x^2 ))

alternatively:
asin(x) = PI/2 - acos(x)

Use the atan2 implementation since it produces slightly shorter bitcode and
R600 machine code.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217510
2014-09-10 15:43:32 +00:00
Aaron Watry 268beab921 math: Add acos implementation
Passes the tests that were submitted to the piglit list

Tested on R600 (Pitcairn)

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217509
2014-09-10 15:43:29 +00:00
Jan Vesely 05a60b7ac3 add isordered builtin
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217247
2014-09-05 13:59:15 +00:00
Jan Vesely 63486c1f0e add isunordered builtin
v2: remove trailing newline

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217246
2014-09-05 13:59:13 +00:00
Jan Vesely 41a0c491de add islessgreater builtin
v2: remove trailing newline

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217245
2014-09-05 13:59:11 +00:00
Jan Vesely 369e20353c add isnormal builtin
v2: simplify and remove isnan leftovers
    remove trailing newline

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217244
2014-09-05 13:59:09 +00:00
Jan Vesely a5a3b023b4 add isfinite builtin
v2: simplify and remove isinf leftovers
    remove trailing newline

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217243
2014-09-05 13:59:06 +00:00
Tom Stellard 7a9e2c6879 Implement isinf builtin
llvm-svn: 217046
2014-09-03 15:55:40 +00:00
Tom Stellard d8a73abfc3 Fix implementation of copysign
This was previously implemented with a macro and we were using
__builtin_copysign(), which takes double inputs for the float
version of copysign().

Reviewed-and-Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217045
2014-09-03 15:55:38 +00:00
Jan Vesely ef513d392b Implement generic mad_sat
v2: Fix trailing whitespace
    Fix signed long overflow
    improve comment

v3: fix typo

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 216923
2014-09-02 17:55:02 +00:00
Aaron Watry 9447097636 Revert "Implement generic mad_sat"
This reverts commit cf62eded8b623a1c10d3692d25e5882b7939f564.

I didn't mean to commit this...  Jan has a v3 incoming

llvm-svn: 216322
2014-08-23 14:06:01 +00:00
Aaron Watry 6bfac7ae69 Implement generic mad_sat
v2: Fix trailing whitespace
    Fix signed long overflow
    improve comment

Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
llvm-svn: 216320
2014-08-23 14:04:33 +00:00
Tom Stellard 2ad4243bf7 Implement prefetch builtin
The default implementation is a no-op.  Targets should override this
with their own implementations.

llvm-svn: 216127
2014-08-20 21:23:03 +00:00
Aaron Watry f991505d02 vload/vstore: Use casts instead of scalarizing everything in CLC version
This generates bitcode which is indistinguishable from what was
hand-written for int32 types in v[load|store]_impl.ll.

v4: Use vec2+scalar for vec3 load/stores to prevent corruption (per Tom)
v3: Also remove unused generic/lib/shared/v[load|store]_impl.ll
v2: (Per Matt Arsenault) Fix alignment issues with vector load stores

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 216069
2014-08-20 13:58:57 +00:00
Jan Vesely 12c660827e relational: Add islessequal(floatN) builtin
v2: remove the initial undef

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 214568
2014-08-01 21:50:59 +00:00
Jan Vesely acba2c98eb relational: Add isless(floatN) builtin
v2: remove the initial undef

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 214567
2014-08-01 21:50:55 +00:00
Tom Stellard 903a78b7c6 Implement sin builtin for float types
This double version still uses @llvm.sin.

llvm-svn: 213762
2014-07-23 15:16:21 +00:00
Tom Stellard c0ab2f81e3 Implement cos builtin for float types
The double version still uses @llvm.cos.

llvm-svn: 213761
2014-07-23 15:16:18 +00:00
Tom Stellard f9caca8b9d Implement atan2 builtin
llvm-svn: 213760
2014-07-23 15:16:16 +00:00
Tom Stellard 47882923c7 Implement atan builtin
llvm-svn: 213759
2014-07-23 15:16:13 +00:00
Aaron Watry d7f022a582 relational: Implement isnotequal
v2: Use relational macros instead of hand-rolled ones

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213320
2014-07-17 22:07:32 +00:00
Aaron Watry 30102536c0 relational: Implement isgreaterequal
v2: Use relational macros instead of hand-rolled macros

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213319
2014-07-17 22:07:27 +00:00
Aaron Watry 803a992f04 relational: Implement isgreater
v2: Use relational macros instead of hand-rolled macros

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213318
2014-07-17 22:07:19 +00:00
Aaron Watry 9335fe8eff relational/signbit: Refactor to use relational macros
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213317
2014-07-17 22:05:25 +00:00
Aaron Watry d5aace4874 Fix isnan definition for vector results
Vector true is -1, not 1, which means we need to use the relational unary
macro instead of the normal unary builtin one.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213316
2014-07-17 22:05:22 +00:00
Aaron Watry 13116cf01a relational: create re-usable macros for relational declarations
relational.h includes relational macros for defining functions which need to
return 1 for scalar true and -1 for vector true.

I believe that this is the only place that this behavior is required, so the
macro is placed at its lowest useful level (same directory as it is used in).

This also creates re-usable unary/binary declaration and floatn includes which
should simplify relational builtin declarations.

Mostly patterned off of include/math/[binary_decl|unary_decl|floatn].inc
but with required changes for relational functions.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213315
2014-07-17 22:05:16 +00:00
Aaron Watry d7f158e006 relational: Fix signbit
The vector components were mistakenly using () instead of {}, which caused
all but the last vector component to be dropped on the floor.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
llvm-svn: 211733
2014-06-25 21:08:38 +00:00
Aaron Watry d9ee196eab relational: Implement signbit
v2 Changes:
   - use __builtin_signbit instead of shifting by hand
   - significantly improve vector shuffling
   - Works correctly now for signbit(float16) on radeonsi

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 211696
2014-06-25 13:29:23 +00:00
Jeroen Ketema 42df5d2a8f Add exp10
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 211680
2014-06-25 10:06:35 +00:00
Jeroen Ketema 526fe2d501 Move clcmacro.h to avoid cluttering user namespace v2
v2: - use quotes instead of <>
    - add include to r600/lib/math/nextafter.c changed

Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 211576
2014-06-24 09:36:32 +00:00
Jeroen Ketema bfdb1c0c2f Protect functions taking double by #ifdef cl_khr_fp64
Also change the order of the functions to be consistent with
the order in the header files.

llvm-svn: 211496
2014-06-23 14:15:39 +00:00
Jeroen Ketema 09516fa27d Add pown
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 211211
2014-06-18 19:42:23 +00:00
Aaron Watry d9afe9def0 Fix definition of INFINITY and add NAN/HUGE_VAL[F]
v3: change __builtin_nanf() to __builtin_nanf("")
    This doesn't work yet, but it was agreed to commit as-is with the logic
    that "broken" is better than "completely missing" and this should be
    fixed in clang.

v2: use __builtin_inff() and also add nan/huge_val definitions

Signed-off-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 211065
2014-06-16 22:32:58 +00:00
Aaron Watry 6af2969a61 math: Implement mix builtin
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 211047
2014-06-16 19:53:59 +00:00
Aaron Watry f7f79d2a94 relational: Add isequal(floatN) builtin
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 211046
2014-06-16 19:53:57 +00:00
Aaron Watry e167db9238 Add all(igentype) builtin
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 211045
2014-06-16 19:53:54 +00:00
Jeroen Ketema e2a0f050d8 Add files forgotten in the previous commit
llvm-svn: 210896
2014-06-13 12:33:40 +00:00
Jeroen Ketema 82aaa41286 Implementations for exp(float) and exp(double) v2
Use separate implementations instead of a macro
to ensure the constant multiplied with is of
higher precision.

v2: Use the correct formula, spotted by Dan Liew <daniel.liew@imperial.ac.uk>

Reviewed-by: Aaron Warty <awatry@gmail.com>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 210891
2014-06-13 09:40:09 +00:00
Tom Stellard 3a12fc6a07 Add sincos
Patch by: Jeroen Ketema

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 204478
2014-03-21 16:22:01 +00:00
Tom Stellard 074e7a8ed0 Add cross for double3 and double4
Patch by: Jeroen Ketema

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 204477
2014-03-21 16:21:58 +00:00
Tom Stellard 457e35912e Implement builtins for cl_khr_global_int32_base_atomics extension
llvm-svn: 195021
2013-11-18 18:21:23 +00:00
Tom Stellard 3a9632d544 s/_CLC_DECL/_CLC_DEF/
Some function definitions were using _CLC_DECL, which meant that they
weren't being marked as always_inline.

Reviewed-by and Tested-by: Aaron Watry <awatry@gmail.com>

llvm-svn: 193754
2013-10-31 15:50:53 +00:00
Tom Stellard f21e3ea972 Port pocl's gen_convert.py script to libclc
This script generates implementations for the entire set of convert_*
functions,

llvm-svn: 192385
2013-10-10 19:09:01 +00:00
Tom Stellard 436bf70519 Implement sign() builtin
llvm-svn: 192384
2013-10-10 19:08:56 +00:00
Tom Stellard 6c7b86c106 Implement nextafter() builtin
There are two implementations of nextafter():
1. Using clang's __builtin_nextafter.  Clang replaces this builtin with
a call to nextafter which is part of libm.  Therefore, this
implementation will only work for targets with an implementation of
libm (e.g. most CPU targets).

2. The other implementation is written in OpenCL C.  This function is
known internally as __clc_nextafter and can be used by targets that
don't have access to libm.

llvm-svn: 192383
2013-10-10 19:08:51 +00:00
Tom Stellard e36e9dec65 Implement isnan() builtin
llvm-svn: 192382
2013-10-10 19:08:41 +00:00
Aaron Watry 283e3fa011 Add atomic_sub and atomic_dec builtin functions
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190201
2013-09-06 20:20:21 +00:00
Aaron Watry 50a7bcbac9 Add atomic_inc and atomic_add builtins
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 190058
2013-09-05 16:04:01 +00:00
Aaron Watry fbe439f8c0 Add mul_hi implementation [v2]
Everything except long/ulong is handled by just casting to the next larger type,
doing the math and then shifting/casting the result.

For 64-bit types, we break the high/low parts of each operand apart, and do
a FOIL-based multiplication.

v2:
  Discard the stack-overflow implementation due to copyright concerns.
  - The implementation is still FOIL-based, but discards the previous code.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188684
2013-08-19 18:31:49 +00:00
Aaron Watry 8548725f29 Add rhadd builtin
rhadd = (x+y+1)>>1

Implemented as:
(x>>1) + (y>>1) + ((x&1)|(y&1))

This prevents us having to do assembly addition and overflow detection

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188477
2013-08-15 19:21:10 +00:00
Aaron Watry 7659157f1b Add hadd builtin
(x + y) >> 1 gets changed to:
(x>>1) + (y>>1) + (x&y&1)

Saves us having to do any llvm assembly and overflow checking in the addition.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188476
2013-08-15 19:21:07 +00:00
Aaron Watry 0c21c7c747 Add intN vloadN() implementations for address spaces 3 and 4
Not hooked up to R600 yet due to current lack of support, at least on EG.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188181
2013-08-12 14:42:51 +00:00
Aaron Watry 7d52565321 Add vload* for addrspace(2) and use as constant load for R600
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188179
2013-08-12 14:42:49 +00:00
Tom Stellard 41ef85df0a Add some missing convert_* functions
llvm-svn: 188131
2013-08-10 03:40:37 +00:00
Aaron Watry 1769b1fca9 Implement generic upsample()
Reduces all vector upsamples down to its scalar components, so probably
not the most efficient thing in the world, but it does what the
spec says it needs to do.

Another possible implementation would be to convert/cast everything as
unsigned if necessary, upsample the input vectors, create the upsampled
value, and then cast back to signed if required.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 186691
2013-07-19 16:44:37 +00:00
Aaron Watry 99a2f3b274 Fix and re-enable R600 vload/vstore assembly
The assembly optimizations were making unsafe assumptions about which address
spaces had which identifiers.

Also, fix vload/vstore with 64-bit pointers. This was broken previously on
Radeon SI.

This version still only has assembly versions of int/uint 2/4/8/16 for global
loads and stores on R600, but it does it in a way that would be very easily
extended to private/local/constant and could also be handled easily on other
architectures.

v2: 1) Leave v[load|store]_impl.ll in generic/lib
    2) Remove vload_if.ll and vstore_if.ll interfaces
    3) Fix address+offset calculations
    3) Remove offset from assembly arg list
llvm-svn: 186416
2013-07-16 14:29:01 +00:00
Aaron Watry 4cb7cf276d libclc: vload/vstore disable assembly and fix offset calculation
This commit gets us back to pure CLC and fixes offset calculations.

The next commit will re-enable the assembly implementation for R600,
fix bugs related to 64-bit address spaces, and also fix the
incorrect assumption that address space identifiers are the same in
all architectures.

llvm-svn: 186415
2013-07-16 14:28:58 +00:00
Tom Stellard 6f33168bb7 Implement mad24() and mul24() builtins
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 185839
2013-07-08 17:27:13 +00:00
Tom Stellard d768ac0395 Add __CLC_ prefix to all macro definitions in headers
libclc was defining and undefing GENTYPE and several other macros with
common names in its header files.  This was preventing applications from
defining macros with identical names as command line arguments to the
compiler, because the definitions in the header files were masking the
macros defined as compiler arguements.

Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 185838
2013-07-08 17:27:02 +00:00
Tom Stellard 64b3bbae1e libclc: Add assembly versions of vstore for global [u]int4/8/16
The assembly should be generic, but at least currently R600 only supports
32-bit stores of [u]int1/4, and I believe that only global is well-supported.

R600 lowers the 8/16 component stores to multiple 4-component stores.

The unoptimized C versions of the other stuff is left in place.

Patch by: Aaron Watry

llvm-svn: 185009
2013-06-26 18:22:20 +00:00
Tom Stellard 922ac056e3 libclc: Add assembly versions of vload for global int4/8/16
The assembly should be generic, but at least currently R600 only supports
32-bit loads of int1/4, and I believe that only global is well-supported.

R600 lowers the 8/16 component vectors to multiple 4-bit loads.

The unoptimized C versions of the other stuff is left in place.

Patch by: Aaron Watry

llvm-svn: 185008
2013-06-26 18:22:15 +00:00
Tom Stellard 51441f80c5 libclc: Initial vstore implementation
Assumes that the target supports byte-addressable stores.

Completely unoptimized.

Patch by: Aaron Watry

llvm-svn: 185007
2013-06-26 18:22:11 +00:00