Commit Graph

177 Commits

Author SHA1 Message Date
Jan Vesely 9f7172965c math: Implement sinh function
mostly copied form amd_builtins

llvm-svn: 296233
2017-02-25 02:46:53 +00:00
Aaron Watry c606efabb7 math: Add logb builtin
Ported from the amd-builtins branch.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 292335
2017-01-18 03:14:10 +00:00
Aaron Watry 900bd7eb7f math: Add expm1 builtin function
Ported from the amd-builtins branch.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 292334
2017-01-18 03:13:37 +00:00
Jan Vesely 0a5aac3fc4 Provide vstore_half helper to workaround clc restrictions
clang won't accept half precision loads and stores without cl_khr_fp16 since r281904

llvm-svn: 282106
2016-09-21 20:15:55 +00:00
Aaron Watry af569547fa math: Implement tgamma
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281566
2016-09-15 00:17:34 +00:00
Aaron Watry e9009cdd21 math: Implement lgamma
Just use lgamma_r and ignore the value returned in the second argument

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281565
2016-09-15 00:17:31 +00:00
Aaron Watry 0ab07e1bde math: Implement lgamma_r
Ported from the amd-builtins branch, which is itself based on the
Sun Microsystems implementation.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281564
2016-09-15 00:17:28 +00:00
Aaron Watry f969413a82 Add ADDR_SPACE parameter to _CLC_V_V_VP_VECTORIZE
This macro is currently unused, but I plan to use it shortly.

The previous form did casts of pointers without an address space, which
doesn't work so well for CL 1.x.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281563
2016-09-15 00:17:22 +00:00
Matt Arsenault fbfd828d2a Replace nextafter implementation
This one passes conformance.

llvm-svn: 280961
2016-09-08 16:37:56 +00:00
Jan Vesely eade17271a Avoid ambiguity in calling atom_add functions.
clang (since r280553) allows pointer casts in function overloads,
so we need to disambiguate the second argument.

clang might be smarter about overloads in the future
see https://reviews.llvm.org/D24113, but let's be safe in libclc anyway.

llvm-svn: 280871
2016-09-07 22:11:02 +00:00
Jan Vesely ad8672727c Implement vstore_half{,n}
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 278962
2016-08-17 20:02:11 +00:00
Jan Vesely 4c59714a52 Make min follow the OCL 1.0 specs
OpenCL 1.0: "Returns y if y < x, otherwise it returns x. If x *and* y
are infinite or NaN, the return values are undefined."

OpenCL 1.1+: "Returns y if y < x, otherwise it returns x. If x *or* y
are infinite or NaN, the return values are undefined."

The 1.0 version is stricter so use that one.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276704
2016-07-25 22:36:22 +00:00
Tom Stellard d835b3f1af Implement cbrt builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 276497
2016-07-22 23:45:15 +00:00
Tom Stellard 9cb070f96a Implement cosh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 276496
2016-07-22 23:45:13 +00:00
Jan Vesely a82e080b57 AMDGPU: Implement get_global_offset builtin
Also fix get_global_id to consider offset
No idea how to add this for ptx, so they are stuck with the old get_global_id
implementation.

v2: split to a separate patch

v3: Switch R600 to use implictarg.ptr

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276443
2016-07-22 17:24:24 +00:00
Jan Vesely 3317f253de 64 bit integers are legal in full profile without an extension
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 273042
2016-06-17 20:30:41 +00:00
Jan Vesely 973c1fa5f5 math: Use single precision fmax in sp path
Fixes fdim piglit on Turks

v2: use CL fmax instead of __builtin

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom.stellard@amd.com>
llvm-svn: 269807
2016-05-17 19:44:01 +00:00
Jan Vesely c374cb76f4 math: Add erf ported from amd-builtins
The scalar float/double function bodies are a direct copy/paste,
aside from the removed (optional) code in float function body that
requires subnormals.

reviewers: jvesely

Patch by: Vedran Miletić <rivanvx@gmail.com>

llvm-svn: 268766
2016-05-06 18:02:30 +00:00
Aaron Watry 55a8e0fd6d math: Add fdim implementation
Based on the amd-builtin, but explicitly vectorized for all sizes (not just
float4), and includes a vectorized double implementation.

Passes piglit (float) tests on pitcairn.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 268708
2016-05-06 03:34:45 +00:00
Aaron Watry 09f3c99a86 math: Fix ilogb(double) return type
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 261714
2016-02-24 00:52:15 +00:00
Aaron Watry d6d0454231 math: Add ilogb ported from amd-builtins
The scalar float/double function bodies are a direct copy/paste
with usage of the CLC wrappers to vectorize them.

This commit also adds in the FP_ILOGB0 and FP_ILOGBNAN macros which are
equal to the results of ilogb(0.0f) and ilogb(float nan) respectively.

v2: Add FP_ILOGB0 and FP_ILOGBNAN definitions

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
v1 Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

llvm-svn: 261639
2016-02-23 14:43:09 +00:00
Jan Vesely 7fbb96b907 math: Fix log2 vectorization on non-fp64 hw
reviewer: tstellard
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 260301
2016-02-09 22:17:42 +00:00
Aaron Watry 8872800eff math: Add frexp ported from amd-builtins
The float implementation is almost a direct port from the amd-builtins,
but instead of just having a scalar and float4 implementation, it has
a scalar and arbitrary width vector implementation.

The double scalar is also a direct port from AMD's builtin release.

The double vector implementation copies the logic in the float vector
implementation using the values from the double scalar version.

Both have been tested in piglit using tests sent to that project's
mailing list.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 260114
2016-02-08 17:07:21 +00:00
Tom Stellard 37d19875fa Implement modf math builtin
V2: use the reference implementation as suggested by Matt Arsenault

Patch By: Pavel Ondračka

llvm-svn: 258933
2016-01-27 14:52:10 +00:00
Tom Stellard a249f50970 Add _CLC_V_V_VP_VECTORIZE macro
Patch by: Pavel Ondračka

llvm-svn: 258932
2016-01-27 14:52:07 +00:00
Niels Ole Salscheider f51df5ba8c Implement tanh builtin
This is a port from the AMD builtin library.

llvm-svn: 248780
2015-09-29 06:39:09 +00:00
Tom Stellard ccc0ec1ddb Add image attribute getter builtins
Added get_image_* OpenCL builtins to the headers.
Added implementation to the r600 target.

Patch by: Zoltan Gilian

llvm-svn: 248159
2015-09-21 14:47:53 +00:00
Jeroen Ketema d7be603ab1 Remove files accidentally not removed in r244310
llvm-svn: 244987
2015-08-13 23:43:12 +00:00
Tom Stellard 7a09e88b6e Fix double implementation of log
We need to use M_LOG2E instead of M_LOG2E_F.

llvm-svn: 243132
2015-07-24 18:07:14 +00:00
Tom Stellard 44b6117dfd Implement accurate log2 function
Use the implementation was ported from the AMD builtin library rather
than LLVM Intrinsics.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 243131
2015-07-24 18:07:12 +00:00
Tom Stellard f01ffa9ddc Use llvm intrinsics for native_log and native_log2
llvm-svn: 243130
2015-07-24 18:07:06 +00:00
Tom Stellard 2ef5ec6b2b Fix implementation of sqrt v2
Passing values less than 0 to the llvm.sqrt() intrinsic results in
undefined behavior, so we need to check the input and return NaN if
is is less than 0.

v2:
  - Fix build failures.

llvm-svn: 241906
2015-07-10 13:37:07 +00:00
Tom Stellard a64bad8338 Use a more accurate implementation for exp
Using exp2(x * M_LOG2E_F) does not give us accurate enough results for
OpenCL.  If you look at the new exp implementation you'll see that
it does multiply the input by M_LOG2E_F, but it still uses the original
input in part of the calculation.

This exp implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237229
2015-05-13 03:55:09 +00:00
Tom Stellard d538fdc217 Implement exp2 using OpenCL C rather than using an intrinsic
Not all targets support the intrinsic, so it's better to have a
generic implementation which does not use it.

This exp2 implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237228
2015-05-13 03:55:07 +00:00
Tom Stellard 4294541290 Implement sin for double types
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237155
2015-05-12 17:18:47 +00:00
Tom Stellard 2e6ff0c66e Implement cos for double types
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237154
2015-05-12 17:18:46 +00:00
Tom Stellard 37406a209c Implement atan2pi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237138
2015-05-12 14:48:26 +00:00
Tom Stellard 79cc3eda1e Implement atan2 for doubles
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237131
2015-05-12 13:48:51 +00:00
Jan Vesely b0fb990b54 math: limit half_sqrt to single precision
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236941
2015-05-09 22:31:03 +00:00
Jan Vesely 7c829fe149 geometric: Limit fast_{distance,length} functions to single precision
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236940
2015-05-09 22:31:01 +00:00
Jan Vesely 071833d454 Fix ldexp fp64 build error
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 236939
2015-05-09 22:30:59 +00:00
Tom Stellard 17ec3a51c3 Implement fast_normalize builtin v4
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Remove f suffix from constant in double implementations.
  - Consolidate implementations using the .cl/.inc approach.

v3:
 - Use __CLC_FPSIZE instead of __CLC_FP{32,64}

v4 (Jan Vesely):
 - Limit to single precision.

llvm-svn: 236920
2015-05-09 00:04:12 +00:00
Tom Stellard 2ddfa0c5b2 Implement half_rsqrt builtin v3
This is a generic implementation which just calls rsqrt.
Targets should override this if they want a faster implementation.

v2:
  - Alphabettize SOURCES

v3 (Jan Vesely):
  Limit to single precision types.

llvm-svn: 236915
2015-05-08 23:28:44 +00:00
Jan Vesely 90e7ad589e Move ldexp soft implementation to a separate file
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236648
2015-05-06 21:59:29 +00:00
Jan Vesely bc81ebefb7 Implement sinpi builtin
Ported from AMD builtin library, passes piglit on Turks.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236647
2015-05-06 21:59:26 +00:00
Tom Stellard 2ca909d824 math: Add ldexp implementation
Signed-off-by: Aaron Watry <awatry@gmail.com>

Tom Stellard:
  - Add denormal handling.
  - Share vectorization code with r600 implementation.

Patch By: Aaron Watry

llvm-svn: 236639
2015-05-06 20:53:32 +00:00
Tom Stellard aed5f3cf7e Fix implementation of normalize builtin
The new implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 236608
2015-05-06 16:06:31 +00:00
Tom Stellard ba742f58af Allow compilation depending to the LLVM version
It allows to keep temporary compatibilty with older version.
For exemple, this can be use when change are not to large.

Patch by: EdB

llvm-svn: 236113
2015-04-29 15:37:06 +00:00
Jan Vesely 44e768e777 Fix compilation warnings without cl_khr_fp64
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 235762
2015-04-24 19:54:17 +00:00
Tom Stellard 9447de37a9 Implement fract builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 235620
2015-04-23 18:50:14 +00:00