Commit Graph

335 Commits

Author SHA1 Message Date
Aaron Watry 8872800eff math: Add frexp ported from amd-builtins
The float implementation is almost a direct port from the amd-builtins,
but instead of just having a scalar and float4 implementation, it has
a scalar and arbitrary width vector implementation.

The double scalar is also a direct port from AMD's builtin release.

The double vector implementation copies the logic in the float vector
implementation using the values from the double scalar version.

Both have been tested in piglit using tests sent to that project's
mailing list.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 260114
2016-02-08 17:07:21 +00:00
Tom Stellard 37d19875fa Implement modf math builtin
V2: use the reference implementation as suggested by Matt Arsenault

Patch By: Pavel Ondračka

llvm-svn: 258933
2016-01-27 14:52:10 +00:00
Tom Stellard a249f50970 Add _CLC_V_V_VP_VECTORIZE macro
Patch by: Pavel Ondračka

llvm-svn: 258932
2016-01-27 14:52:07 +00:00
Tom Stellard 0abd28140b AMDGPU: Add aliases for all VI targets
llvm-svn: 255663
2015-12-15 18:37:04 +00:00
Tom Stellard 655680a22b AMDGPU: Add alias for tonga
Patch by: Vedran Mileti

llvm-svn: 255662
2015-12-15 18:37:02 +00:00
Aaron Watry 23faa5a1f9 integer: remove explicit casts from _MIN definitions
The spec says (section 6.12.3, CL version 1.2):
  The macro names given in the following list must use the values
  specified. The values shall all be constant expressions suitable
  for use in #if preprocessing directives.

This commit addresses the second part of that statement.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
CC: Moritz Pflanzer <moritz.pflanzer14@imperial.ac.uk>
CC: Serge Martin <edb+libclc@sigluy.net>
llvm-svn: 249445
2015-10-06 19:12:12 +00:00
Niels Ole Salscheider f51df5ba8c Implement tanh builtin
This is a port from the AMD builtin library.

llvm-svn: 248780
2015-09-29 06:39:09 +00:00
Tom Stellard e2bab44ca9 Add sampler defines.
Patch by: Zoltan Gilian

llvm-svn: 248163
2015-09-21 14:59:58 +00:00
Tom Stellard 50dfd44577 Add image attribute defines.
Patch by: Zoltan Gilian

llvm-svn: 248162
2015-09-21 14:59:57 +00:00
Tom Stellard a59fd49ba4 r600: Add image writing builtins.
Patch by: Zoltan Gilian

llvm-svn: 248161
2015-09-21 14:59:56 +00:00
Tom Stellard 9a7d4a940f r600: Add image reading builtins.
Patch by: Zoltan Gilian

llvm-svn: 248160
2015-09-21 14:59:54 +00:00
Tom Stellard ccc0ec1ddb Add image attribute getter builtins
Added get_image_* OpenCL builtins to the headers.
Added implementation to the r600 target.

Patch by: Zoltan Gilian

llvm-svn: 248159
2015-09-21 14:47:53 +00:00
Aaron Watry 43ee367d1e integer: Update integer limits to comply with spec
The values for the char/short/integer/long minimums were declared with
their actual values, not the definitions from the CL spec (v1.1).  As
a result, (-2147483648) was actually being treated as a long by the
compiler, not an int, which caused issues when trying to add/subtract
that value from a vector.

Update the definitions to use the values declared by the spec, and also
add explicit casts for the char/short/int minimums so that the compiler
actually treats them as shorts/chars. Without those casts, they
actually end up stored as integers, and the compiler may end up storing
the INT_MIN as a long.

The compiler can sign extend the values if it needs to convert the
char->short, short->int, or int->long

v2: Add explicit cast for INT_MIN and fix some type-o's and wrapping
    in the commit message.

Reported-by: Moritz Pflanzer <moritz.pflanzer14@imperial.ac.uk>
CC: Moritz Pflanzer <moritz.pflanzer14@imperial.ac.uk>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 247661
2015-09-15 03:56:21 +00:00
Peter Collingbourne 008ff14acf Update mailing list reference.
llvm-svn: 245894
2015-08-24 22:43:24 +00:00
Jeroen Ketema d7be603ab1 Remove files accidentally not removed in r244310
llvm-svn: 244987
2015-08-13 23:43:12 +00:00
Jeroen Ketema d915739b38 Require LLVM >=3.7 and bump version to 0.2.0
v2: Also remove LLVM 3.6 traces from prepare-builtins.cpp

Patch by: EdB

llvm-svn: 244310
2015-08-07 08:31:37 +00:00
Tom Stellard 7a09e88b6e Fix double implementation of log
We need to use M_LOG2E instead of M_LOG2E_F.

llvm-svn: 243132
2015-07-24 18:07:14 +00:00
Tom Stellard 44b6117dfd Implement accurate log2 function
Use the implementation was ported from the AMD builtin library rather
than LLVM Intrinsics.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 243131
2015-07-24 18:07:12 +00:00
Tom Stellard f01ffa9ddc Use llvm intrinsics for native_log and native_log2
llvm-svn: 243130
2015-07-24 18:07:06 +00:00
Tom Stellard 4f8d26230c R600: Implement accurate double precision sqrt v2
v2:
  - Use same implementation for R600 and gcn.

llvm-svn: 241907
2015-07-10 13:37:08 +00:00
Tom Stellard 2ef5ec6b2b Fix implementation of sqrt v2
Passing values less than 0 to the llvm.sqrt() intrinsic results in
undefined behavior, so we need to check the input and return NaN if
is is less than 0.

v2:
  - Fix build failures.

llvm-svn: 241906
2015-07-10 13:37:07 +00:00
Tom Stellard 0e4152e391 prepare-builtins: Fix build with LLVM 3.6
Patch by: Tomasz Borowik

llvm-svn: 241905
2015-07-10 13:37:04 +00:00
Jeroen Ketema 2697d2a43b Properly initialize Module pointer
llvm-svn: 240881
2015-06-27 12:35:54 +00:00
Tom Stellard 4957e4057d prepare-builtins: Fix build with LLVM 3.7
llvm-svn: 240552
2015-06-24 17:03:50 +00:00
Tom Stellard a64bad8338 Use a more accurate implementation for exp
Using exp2(x * M_LOG2E_F) does not give us accurate enough results for
OpenCL.  If you look at the new exp implementation you'll see that
it does multiply the input by M_LOG2E_F, but it still uses the original
input in part of the calculation.

This exp implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237229
2015-05-13 03:55:09 +00:00
Tom Stellard d538fdc217 Implement exp2 using OpenCL C rather than using an intrinsic
Not all targets support the intrinsic, so it's better to have a
generic implementation which does not use it.

This exp2 implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237228
2015-05-13 03:55:07 +00:00
Tom Stellard 4294541290 Implement sin for double types
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237155
2015-05-12 17:18:47 +00:00
Tom Stellard 2e6ff0c66e Implement cos for double types
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237154
2015-05-12 17:18:46 +00:00
Tom Stellard 37406a209c Implement atan2pi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237138
2015-05-12 14:48:26 +00:00
Tom Stellard 79cc3eda1e Implement atan2 for doubles
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237131
2015-05-12 13:48:51 +00:00
Jan Vesely b0fb990b54 math: limit half_sqrt to single precision
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236941
2015-05-09 22:31:03 +00:00
Jan Vesely 7c829fe149 geometric: Limit fast_{distance,length} functions to single precision
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236940
2015-05-09 22:31:01 +00:00
Jan Vesely 071833d454 Fix ldexp fp64 build error
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 236939
2015-05-09 22:30:59 +00:00
Tom Stellard 17ec3a51c3 Implement fast_normalize builtin v4
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Remove f suffix from constant in double implementations.
  - Consolidate implementations using the .cl/.inc approach.

v3:
 - Use __CLC_FPSIZE instead of __CLC_FP{32,64}

v4 (Jan Vesely):
 - Limit to single precision.

llvm-svn: 236920
2015-05-09 00:04:12 +00:00
Tom Stellard 2ddfa0c5b2 Implement half_rsqrt builtin v3
This is a generic implementation which just calls rsqrt.
Targets should override this if they want a faster implementation.

v2:
  - Alphabettize SOURCES

v3 (Jan Vesely):
  Limit to single precision types.

llvm-svn: 236915
2015-05-08 23:28:44 +00:00
Jan Vesely 30978bb99d r600: Use __clc_ldexp on asics that don't implement the intruction
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236649
2015-05-06 21:59:30 +00:00
Jan Vesely 90e7ad589e Move ldexp soft implementation to a separate file
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236648
2015-05-06 21:59:29 +00:00
Jan Vesely bc81ebefb7 Implement sinpi builtin
Ported from AMD builtin library, passes piglit on Turks.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236647
2015-05-06 21:59:26 +00:00
Tom Stellard 2ca909d824 math: Add ldexp implementation
Signed-off-by: Aaron Watry <awatry@gmail.com>

Tom Stellard:
  - Add denormal handling.
  - Share vectorization code with r600 implementation.

Patch By: Aaron Watry

llvm-svn: 236639
2015-05-06 20:53:32 +00:00
Tom Stellard f30d5fc01d Implement ldexp for R600/SI
llvm-svn: 236638
2015-05-06 20:53:29 +00:00
Tom Stellard aed5f3cf7e Fix implementation of normalize builtin
The new implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 236608
2015-05-06 16:06:31 +00:00
Tom Stellard ba742f58af Allow compilation depending to the LLVM version
It allows to keep temporary compatibilty with older version.
For exemple, this can be use when change are not to large.

Patch by: EdB

llvm-svn: 236113
2015-04-29 15:37:06 +00:00
Jan Vesely 44e768e777 Fix compilation warnings without cl_khr_fp64
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 235762
2015-04-24 19:54:17 +00:00
Tom Stellard 9447de37a9 Implement fract builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 235620
2015-04-23 18:50:14 +00:00
Tom Stellard d9ca1f1596 configure: Add --enable-runtime-subnormal option
This makes it possible for runtime implementations to disable
subnormal handling at runtime.

When this flag is enabled, decisions about how to handle subnormals
in the library will be controlled by an external variable called
__CLC_SUBNORMAL_DISABLE.

Function implementations should use these new helpers for querying subnormal
support:
__clc_fp16_subnormals_supported();
__clc_fp32_subnormals_supported();
__clc_fp64_subnormals_supported();

In order for the library to link correctly with this feature,
users will be required to either:

1. Insert this variable into the module (if using the LLVM/Clang C++/C APIs).

2. Pass either subnormal_disable.bc or subnormal_use_default.bc to the
linker.  These files are distributed with liblclc and installed to
$(installdir).  e.g.:

llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_disable.bc

or

llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_use_default.bc

If you do not supply the --enable-runtime-subnormal then the library
behaves the same as it did before this commit.

In addition to these changes, the patch adds helper functions that
should be used when implementing library functions that need
special handling for denormals:

__clc_fp16_subnormals_supported();
__clc_fp32_subnormals_supported();
__clc_fp64_subnormals_supported();

llvm-svn: 235329
2015-04-20 18:49:50 +00:00
Tom Stellard da2969fca7 Implement atanh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 234324
2015-04-07 16:20:22 +00:00
Tom Stellard ca4d382e11 Implement acosh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 234323
2015-04-07 16:20:20 +00:00
Tom Stellard 03dc366e79 Implement atanpi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233928
2015-04-02 17:01:58 +00:00
Tom Stellard eea0997566 Implement asinpi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233927
2015-04-02 17:01:56 +00:00
Tom Stellard 2b4ef39b2f Implement asinh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233926
2015-04-02 17:01:54 +00:00
Tom Stellard 084124a8fa Implement acospi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233925
2015-04-02 17:01:52 +00:00
Tom Stellard 1ded220cc0 Implement fmax using __builtin_fmax
This ensures correct handling of NaNi.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233713
2015-03-31 16:59:23 +00:00
Tom Stellard 310da7bfd2 Implement fmin using __builtin_fmin
This ensures correct handling of NaN.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233712
2015-03-31 16:59:21 +00:00
Tom Stellard bd4da7a0ef Implement fast_distance builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 232978
2015-03-23 18:10:04 +00:00
Tom Stellard cb80e14f2c Implement fast_length builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 232977
2015-03-23 18:10:02 +00:00
Tom Stellard d2a1559846 Implement half_sqrt builtin v2
This is a generic implementation which just calls sqrt.  Targets should
override this if they want a faster implementation.

v2:
  - Alphabetize SOURCES

llvm-svn: 232965
2015-03-23 17:01:37 +00:00
Tom Stellard 551a669e80 Implement distance builtin v2
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Remove unnecessary copyright.

llvm-svn: 232964
2015-03-23 17:01:35 +00:00
Tom Stellard cb1c0d7939 Fix implementation of length builtin v2
v2:
  - Move common code into a macro
  - Use the same constant for all vector types.

llvm-svn: 232963
2015-03-23 17:01:33 +00:00
Tom Stellard 8d3a4e3af2 Add __clc_ prefix to functions in sincos_helpers.cl
This will help avoid naming conflicts with functions defined in
kernels linking with libclc.

llvm-svn: 232960
2015-03-23 16:20:24 +00:00
Aaron Watry 2cf4d5f312 math: Implement erfc
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 232674
2015-03-18 21:52:07 +00:00
Tom Stellard adfd96f742 Fix bitselect for float/double types v2
We need to reinterpret float/double types as uint/ulong in order to
perform the bitwise operations.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Use vector operations rather than splitting vectors into scalar
    components.

Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 231373
2015-03-05 15:31:05 +00:00
Aaron Watry 1314630ec3 Move mix from math to common
It has been part of the common functions since 1.0

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 231137
2015-03-03 21:25:08 +00:00
Tom Stellard 9d0d374c5b Implement step builtin
This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 230970
2015-03-02 15:29:41 +00:00
Tom Stellard 1f28b14bba Implement smoothstep builtin v2
This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Fix typo in smoothstep.h

llvm-svn: 230969
2015-03-02 15:29:39 +00:00
Tom Stellard f5e5b0171d Implement radians builtin v2
This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Move to the common/ directory

llvm-svn: 230968
2015-03-02 15:29:37 +00:00
Tom Stellard 8336b3a604 Implement degrees builtin v2
This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Move to the common/ directory

llvm-svn: 230967
2015-03-02 15:29:35 +00:00
Aaron Watry f89bcca0b7 libclc/math: Add cospi
Ported from the libclc/amd-builtins branch

v2: Rename sincos_f_piby4 to __libclc__sincosf_piby4
    Add cospi(double) implementation instead of using llvm.cos

Notes:
The sincosD_piby4.h file is mostly the same as the builtin implementation
released by AMD. The inline attribute declaration is changed, and M_PI is
used instead of a constant double. Otherwise, the only difference is that
the header explicitly enables the fp64 pragma.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
CC: Tom Stellard <tom@stellard.net>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 230641
2015-02-26 15:42:00 +00:00
Jan Vesely 51702e6e75 Implement log10
v2: Use constant and multiplication instead of division
v3: Use hex constants

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 227585
2015-01-30 18:00:34 +00:00
Tom Stellard 0f39721261 Use amdgcn triple for SI+ GPUs
llvm-svn: 225296
2015-01-06 20:42:12 +00:00
Tom Stellard 24ea64e050 r600: get_work_dim: Update metadata syntax for LLVM 3.6
llvm-svn: 225042
2014-12-31 15:27:59 +00:00
Tom Stellard 1d77071e0d Require LLVM 3.6 and bump version to 0.1.0
Some functions are implemented using hand-written LLVM IR, and
LLVM assembly format is allowed to change between versions, so we
should require a specific version of LLVM.

llvm-svn: 225041
2014-12-31 15:27:53 +00:00
Jeroen Ketema c9526139bc Remove wrong semi-colons
Patch by Alastair Donaldson

llvm-svn: 224568
2014-12-19 09:18:23 +00:00
Jeroen Ketema 7a22aebbda Don't include <stddef.h>
Including a standard or system header isn't allowed in OpenCL.

The type "size_t" needs to be explicitely defined now.

v2: Use __SIZE_TYPE__ instead of unsigned int.
v3: Define ptrdiff_t and NULL.

Patch-by: Jean-Sébastien Pédron
Reviewed-by: Jeroen Ketema
Reviewed-by: Jan Vesely
llvm-svn: 222235
2014-11-18 14:19:27 +00:00
NAKAMURA Takumi 729be14435 Prune CRLF.
llvm-svn: 220678
2014-10-27 12:37:26 +00:00
Jan Vesely ae50c89589 r600: Fix get_work_dim range metadata
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220388
2014-10-22 14:32:53 +00:00
Jan Vesely 260827caa2 r600: Use llvm intrinsic to read work dimension information
v2: Fix function declaration
    Add range metadata to r600 implementation
v3: change prefix to AMDGPU

Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219793
2014-10-15 15:08:06 +00:00
Tom Stellard bf9f76fbe0 Implement log1p builtin
llvm-svn: 219230
2014-10-07 20:22:42 +00:00
Jan Vesely 8f64c3d842 Implement fmod
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 219087
2014-10-05 20:24:52 +00:00
Tom Stellard 081e778d22 Implement async_work_group_copy builtin v3
This is a simple implementation which just copies data synchronously.

v2:
  - Use size_t.

v3:
  - Fix possible race condition by splitting the copy among multiple
    work items.

llvm-svn: 219008
2014-10-03 19:49:39 +00:00
Tom Stellard ed5bbfdb1b Implement async_work_group_strided_copy builtin v2
This is a simple implementation which just copies data synchronously.

v2:
  - Use size_t.

llvm-svn: 219007
2014-10-03 19:49:37 +00:00
Tom Stellard b5064f79ef Implement wait_group_events builtin v2
This is a simple default implemetation which just calls barrier().

v2:
  - Only call barrier() once.

llvm-svn: 219006
2014-10-03 19:49:34 +00:00
Jeroen Ketema 87d2ca57d7 Remove more redundant semi-colons
llvm-svn: 218039
2014-09-18 09:23:40 +00:00
Aaron Watry db770b5bb5 atomic: undef macros that are included from atomic_decl.inc
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
llvm-svn: 217958
2014-09-17 15:27:39 +00:00
Jeroen Ketema 839b8a62d9 Remove redundant semi-colons
llvm-svn: 217954
2014-09-17 14:40:48 +00:00
Aaron Watry f4133b8a10 R600: Map Address spaces for atomic_cmpxchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217925
2014-09-16 22:34:59 +00:00
Aaron Watry e210cae126 R600: Map address spaces for atomic_xchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217924
2014-09-16 22:34:58 +00:00
Aaron Watry 0545fa3fb0 R600: Map address spaces for atomic_min
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217923
2014-09-16 22:34:56 +00:00
Aaron Watry dd754f4b33 R600: Map address spaces for atomic_xor
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217922
2014-09-16 22:34:55 +00:00
Aaron Watry ea32a57060 R600: Map addr spaces and use atomic_max
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217921
2014-09-16 22:34:53 +00:00
Aaron Watry 5ab82be926 R600: Map address spaces for atomic_or
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217920
2014-09-16 22:34:52 +00:00
Aaron Watry 348db3c666 R600: Map atomic_and address spaces
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217919
2014-09-16 22:34:51 +00:00
Aaron Watry 0d976ba497 atomic: Add generic atom[ic]_cmpxchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217918
2014-09-16 22:34:49 +00:00
Aaron Watry 025d79ad6c atomic: Implement generic atom[ic]_xchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217917
2014-09-16 22:34:45 +00:00
Aaron Watry 7cfa12c2a5 atomic: Add generic atomic_min implementation
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217916
2014-09-16 22:34:41 +00:00
Aaron Watry 3f0a1a4c27 atomic: Add generic atom[ic]_xor
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217915
2014-09-16 22:34:36 +00:00
Aaron Watry 31e67d1cff atomic: Add atom[ic]_or
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217914
2014-09-16 22:34:32 +00:00
Aaron Watry cc68405761 atomics: Add generic atom[ic]_and
Not used yet.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217913
2014-09-16 22:34:28 +00:00
Aaron Watry 49614fbfd9 atomic: Add generic implementation of atom[ic]_max
Not used yet...

v2: Correct int/uint behavior

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217912
2014-09-16 22:34:24 +00:00
Aaron Watry c9b88d32be atomic: define extension functions for existing atomic implementations
We were missing the local versions of the atom_* before

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217911
2014-09-16 22:34:21 +00:00
Aaron Watry 947bdd059a math: Add tan implementation
Uses the algorithm:
tan(x) = sin(x) / sqrt(1-sin^2(x))

An alternative is:
tan(x) = sin(x) / cos(x)

Which produces more verbose bitcode and longer assembly.

Either way, the generated bitcode seems pretty nasty and a more optimized
but still precise-enough solution is welcome.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217511
2014-09-10 15:43:35 +00:00