hanchenye-llvm-project

Commit Graph

Author	SHA1	Message	Date
Jan Vesely	9f7172965c	math: Implement sinh function mostly copied form amd_builtins llvm-svn: 296233	2017-02-25 02:46:53 +00:00
Aaron Watry	c606efabb7	math: Add logb builtin Ported from the amd-builtins branch. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 292335	2017-01-18 03:14:10 +00:00
Aaron Watry	900bd7eb7f	math: Add expm1 builtin function Ported from the amd-builtins branch. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 292334	2017-01-18 03:13:37 +00:00
Jan Vesely	0a5aac3fc4	Provide vstore_half helper to workaround clc restrictions clang won't accept half precision loads and stores without cl_khr_fp16 since r281904 llvm-svn: 282106	2016-09-21 20:15:55 +00:00
Aaron Watry	af569547fa	math: Implement tgamma Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281566	2016-09-15 00:17:34 +00:00
Aaron Watry	e9009cdd21	math: Implement lgamma Just use lgamma_r and ignore the value returned in the second argument Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281565	2016-09-15 00:17:31 +00:00
Aaron Watry	0ab07e1bde	math: Implement lgamma_r Ported from the amd-builtins branch, which is itself based on the Sun Microsystems implementation. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281564	2016-09-15 00:17:28 +00:00
Aaron Watry	f969413a82	Add ADDR_SPACE parameter to _CLC_V_V_VP_VECTORIZE This macro is currently unused, but I plan to use it shortly. The previous form did casts of pointers without an address space, which doesn't work so well for CL 1.x. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281563	2016-09-15 00:17:22 +00:00
Matt Arsenault	fbfd828d2a	Replace nextafter implementation This one passes conformance. llvm-svn: 280961	2016-09-08 16:37:56 +00:00
Jan Vesely	eade17271a	Avoid ambiguity in calling atom_add functions. clang (since r280553) allows pointer casts in function overloads, so we need to disambiguate the second argument. clang might be smarter about overloads in the future see https://reviews.llvm.org/D24113, but let's be safe in libclc anyway. llvm-svn: 280871	2016-09-07 22:11:02 +00:00
Jan Vesely	ad8672727c	Implement vstore_half{,n} Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 278962	2016-08-17 20:02:11 +00:00
Jan Vesely	4c59714a52	Make min follow the OCL 1.0 specs OpenCL 1.0: "Returns y if y < x, otherwise it returns x. If x and y are infinite or NaN, the return values are undefined." OpenCL 1.1+: "Returns y if y < x, otherwise it returns x. If x or y are infinite or NaN, the return values are undefined." The 1.0 version is stricter so use that one. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 276704	2016-07-25 22:36:22 +00:00
Tom Stellard	d835b3f1af	Implement cbrt builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 276497	2016-07-22 23:45:15 +00:00
Tom Stellard	9cb070f96a	Implement cosh builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 276496	2016-07-22 23:45:13 +00:00
Jan Vesely	a82e080b57	AMDGPU: Implement get_global_offset builtin Also fix get_global_id to consider offset No idea how to add this for ptx, so they are stuck with the old get_global_id implementation. v2: split to a separate patch v3: Switch R600 to use implictarg.ptr Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 276443	2016-07-22 17:24:24 +00:00
Jan Vesely	3317f253de	64 bit integers are legal in full profile without an extension Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 273042	2016-06-17 20:30:41 +00:00
Jan Vesely	973c1fa5f5	math: Use single precision fmax in sp path Fixes fdim piglit on Turks v2: use CL fmax instead of __builtin Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom.stellard@amd.com> llvm-svn: 269807	2016-05-17 19:44:01 +00:00
Jan Vesely	c374cb76f4	math: Add erf ported from amd-builtins The scalar float/double function bodies are a direct copy/paste, aside from the removed (optional) code in float function body that requires subnormals. reviewers: jvesely Patch by: Vedran Miletić <rivanvx@gmail.com> llvm-svn: 268766	2016-05-06 18:02:30 +00:00
Aaron Watry	55a8e0fd6d	math: Add fdim implementation Based on the amd-builtin, but explicitly vectorized for all sizes (not just float4), and includes a vectorized double implementation. Passes piglit (float) tests on pitcairn. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 268708	2016-05-06 03:34:45 +00:00
Aaron Watry	09f3c99a86	math: Fix ilogb(double) return type Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 261714	2016-02-24 00:52:15 +00:00
Aaron Watry	d6d0454231	math: Add ilogb ported from amd-builtins The scalar float/double function bodies are a direct copy/paste with usage of the CLC wrappers to vectorize them. This commit also adds in the FP_ILOGB0 and FP_ILOGBNAN macros which are equal to the results of ilogb(0.0f) and ilogb(float nan) respectively. v2: Add FP_ILOGB0 and FP_ILOGBNAN definitions Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> v1 Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 261639	2016-02-23 14:43:09 +00:00
Jan Vesely	7fbb96b907	math: Fix log2 vectorization on non-fp64 hw reviewer: tstellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 260301	2016-02-09 22:17:42 +00:00
Aaron Watry	8872800eff	math: Add frexp ported from amd-builtins The float implementation is almost a direct port from the amd-builtins, but instead of just having a scalar and float4 implementation, it has a scalar and arbitrary width vector implementation. The double scalar is also a direct port from AMD's builtin release. The double vector implementation copies the logic in the float vector implementation using the values from the double scalar version. Both have been tested in piglit using tests sent to that project's mailing list. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 260114	2016-02-08 17:07:21 +00:00
Tom Stellard	37d19875fa	Implement modf math builtin V2: use the reference implementation as suggested by Matt Arsenault Patch By: Pavel Ondračka llvm-svn: 258933	2016-01-27 14:52:10 +00:00
Tom Stellard	a249f50970	Add _CLC_V_V_VP_VECTORIZE macro Patch by: Pavel Ondračka llvm-svn: 258932	2016-01-27 14:52:07 +00:00
Niels Ole Salscheider	f51df5ba8c	Implement tanh builtin This is a port from the AMD builtin library. llvm-svn: 248780	2015-09-29 06:39:09 +00:00
Tom Stellard	ccc0ec1ddb	Add image attribute getter builtins Added get_image_* OpenCL builtins to the headers. Added implementation to the r600 target. Patch by: Zoltan Gilian llvm-svn: 248159	2015-09-21 14:47:53 +00:00
Jeroen Ketema	d7be603ab1	Remove files accidentally not removed in r244310 llvm-svn: 244987	2015-08-13 23:43:12 +00:00
Tom Stellard	7a09e88b6e	Fix double implementation of log We need to use M_LOG2E instead of M_LOG2E_F. llvm-svn: 243132	2015-07-24 18:07:14 +00:00
Tom Stellard	44b6117dfd	Implement accurate log2 function Use the implementation was ported from the AMD builtin library rather than LLVM Intrinsics. This has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 243131	2015-07-24 18:07:12 +00:00
Tom Stellard	f01ffa9ddc	Use llvm intrinsics for native_log and native_log2 llvm-svn: 243130	2015-07-24 18:07:06 +00:00
Tom Stellard	2ef5ec6b2b	Fix implementation of sqrt v2 Passing values less than 0 to the llvm.sqrt() intrinsic results in undefined behavior, so we need to check the input and return NaN if is is less than 0. v2: - Fix build failures. llvm-svn: 241906	2015-07-10 13:37:07 +00:00
Tom Stellard	a64bad8338	Use a more accurate implementation for exp Using exp2(x * M_LOG2E_F) does not give us accurate enough results for OpenCL. If you look at the new exp implementation you'll see that it does multiply the input by M_LOG2E_F, but it still uses the original input in part of the calculation. This exp implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237229	2015-05-13 03:55:09 +00:00
Tom Stellard	d538fdc217	Implement exp2 using OpenCL C rather than using an intrinsic Not all targets support the intrinsic, so it's better to have a generic implementation which does not use it. This exp2 implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237228	2015-05-13 03:55:07 +00:00
Tom Stellard	4294541290	Implement sin for double types This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237155	2015-05-12 17:18:47 +00:00
Tom Stellard	2e6ff0c66e	Implement cos for double types This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237154	2015-05-12 17:18:46 +00:00
Tom Stellard	37406a209c	Implement atan2pi builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237138	2015-05-12 14:48:26 +00:00
Tom Stellard	79cc3eda1e	Implement atan2 for doubles This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237131	2015-05-12 13:48:51 +00:00
Jan Vesely	b0fb990b54	math: limit half_sqrt to single precision Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 236941	2015-05-09 22:31:03 +00:00
Jan Vesely	7c829fe149	geometric: Limit fast_{distance,length} functions to single precision Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 236940	2015-05-09 22:31:01 +00:00
Jan Vesely	071833d454	Fix ldexp fp64 build error Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 236939	2015-05-09 22:30:59 +00:00
Tom Stellard	17ec3a51c3	Implement fast_normalize builtin v4 This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Remove f suffix from constant in double implementations. - Consolidate implementations using the .cl/.inc approach. v3: - Use __CLC_FPSIZE instead of __CLC_FP{32,64} v4 (Jan Vesely): - Limit to single precision. llvm-svn: 236920	2015-05-09 00:04:12 +00:00
Tom Stellard	2ddfa0c5b2	Implement half_rsqrt builtin v3 This is a generic implementation which just calls rsqrt. Targets should override this if they want a faster implementation. v2: - Alphabettize SOURCES v3 (Jan Vesely): Limit to single precision types. llvm-svn: 236915	2015-05-08 23:28:44 +00:00
Jan Vesely	90e7ad589e	Move ldexp soft implementation to a separate file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 236648	2015-05-06 21:59:29 +00:00
Jan Vesely	bc81ebefb7	Implement sinpi builtin Ported from AMD builtin library, passes piglit on Turks. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 236647	2015-05-06 21:59:26 +00:00
Tom Stellard	2ca909d824	math: Add ldexp implementation Signed-off-by: Aaron Watry <awatry@gmail.com> Tom Stellard: - Add denormal handling. - Share vectorization code with r600 implementation. Patch By: Aaron Watry llvm-svn: 236639	2015-05-06 20:53:32 +00:00
Tom Stellard	aed5f3cf7e	Fix implementation of normalize builtin The new implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 236608	2015-05-06 16:06:31 +00:00
Tom Stellard	ba742f58af	Allow compilation depending to the LLVM version It allows to keep temporary compatibilty with older version. For exemple, this can be use when change are not to large. Patch by: EdB llvm-svn: 236113	2015-04-29 15:37:06 +00:00
Jan Vesely	44e768e777	Fix compilation warnings without cl_khr_fp64 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 235762	2015-04-24 19:54:17 +00:00
Tom Stellard	9447de37a9	Implement fract builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 235620	2015-04-23 18:50:14 +00:00

1 2 3 4

177 Commits