Commit Graph

98 Commits

Author SHA1 Message Date
Tom Stellard 3a9632d544 s/_CLC_DECL/_CLC_DEF/
Some function definitions were using _CLC_DECL, which meant that they
weren't being marked as always_inline.

Reviewed-by and Tested-by: Aaron Watry <awatry@gmail.com>

llvm-svn: 193754
2013-10-31 15:50:53 +00:00
Tom Stellard d2e83929a9 R600: Set the noduplicate attribute on barrier() intrinsics
This will prevent LLVM optimization passes from creating illegal uses
of the barrier() intrinsic (e.g. calling barrier() from a conditional
that is not executed by all threads).

llvm-svn: 193753
2013-10-31 15:50:48 +00:00
Tom Stellard 9fabcb3edb Clean-up dependency files
Patch by: Jeroen Ketema

llvm-svn: 193221
2013-10-23 02:49:33 +00:00
Tom Stellard 9f48bb3b9a Make C++ compiler configurable
The C++ compiler used to build prepare-builtins
may differ from the llvm/clang for which we are
building libclc.

Use 'clang++' as the default compiler.

Patch by: Jeroen Ketema

llvm-svn: 193220
2013-10-23 02:49:27 +00:00
Tom Stellard f21e3ea972 Port pocl's gen_convert.py script to libclc
This script generates implementations for the entire set of convert_*
functions,

llvm-svn: 192385
2013-10-10 19:09:01 +00:00
Tom Stellard 436bf70519 Implement sign() builtin
llvm-svn: 192384
2013-10-10 19:08:56 +00:00
Tom Stellard 6c7b86c106 Implement nextafter() builtin
There are two implementations of nextafter():
1. Using clang's __builtin_nextafter.  Clang replaces this builtin with
a call to nextafter which is part of libm.  Therefore, this
implementation will only work for targets with an implementation of
libm (e.g. most CPU targets).

2. The other implementation is written in OpenCL C.  This function is
known internally as __clc_nextafter and can be used by targets that
don't have access to libm.

llvm-svn: 192383
2013-10-10 19:08:51 +00:00
Tom Stellard e36e9dec65 Implement isnan() builtin
llvm-svn: 192382
2013-10-10 19:08:41 +00:00
Tom Stellard ef13294c93 Add missing as_{float,double} functions
llvm-svn: 192381
2013-10-10 19:08:29 +00:00
Aaron Watry dfd8afa02b Parenthesize arguments for mad_hi
Thanks to Jordon Rose <jordan_rose@apple.com> for pointing this out.

llvm-svn: 190310
2013-09-09 14:36:21 +00:00
Aaron Watry 3466342f57 Implement mad_hi built-in
We already have a working mul_hi, and the spec gives us the implementation as:
Returns mul_hi(a,b)+c.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190211
2013-09-06 22:09:51 +00:00
Aaron Watry 283e3fa011 Add atomic_sub and atomic_dec builtin functions
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190201
2013-09-06 20:20:21 +00:00
Tom Stellard 93d674f7b3 Place pkg-config file in $prefix/share/pkgconfig.
libclc is ABI-agnostic, and $prefix/lib/pkgconfig causes issues
on multilib setups. Using $prefix/share/pkgconfig allows us to reuse
a single libclc build across all system ABIs.

Patch by: Michał Górny

llvm-svn: 190107
2013-09-05 23:27:58 +00:00
Aaron Watry 7171a2f965 Remove unneeded semi-colons
Reviewed-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 190059
2013-09-05 16:04:07 +00:00
Aaron Watry 50a7bcbac9 Add atomic_inc and atomic_add builtins
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 190058
2013-09-05 16:04:01 +00:00
Aaron Watry fbe439f8c0 Add mul_hi implementation [v2]
Everything except long/ulong is handled by just casting to the next larger type,
doing the math and then shifting/casting the result.

For 64-bit types, we break the high/low parts of each operand apart, and do
a FOIL-based multiplication.

v2:
  Discard the stack-overflow implementation due to copyright concerns.
  - The implementation is still FOIL-based, but discards the previous code.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188684
2013-08-19 18:31:49 +00:00
Aaron Watry 8548725f29 Add rhadd builtin
rhadd = (x+y+1)>>1

Implemented as:
(x>>1) + (y>>1) + ((x&1)|(y&1))

This prevents us having to do assembly addition and overflow detection

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188477
2013-08-15 19:21:10 +00:00
Aaron Watry 7659157f1b Add hadd builtin
(x + y) >> 1 gets changed to:
(x>>1) + (y>>1) + (x&y&1)

Saves us having to do any llvm assembly and overflow checking in the addition.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188476
2013-08-15 19:21:07 +00:00
Aaron Watry 0c21c7c747 Add intN vloadN() implementations for address spaces 3 and 4
Not hooked up to R600 yet due to current lack of support, at least on EG.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188181
2013-08-12 14:42:51 +00:00
Aaron Watry c0aa6e0291 Enable assembly vload3 int/uint constant/global for R600
It's supported by the R600 LLVM back-end now, at least for evergreen.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188180
2013-08-12 14:42:50 +00:00
Aaron Watry 7d52565321 Add vload* for addrspace(2) and use as constant load for R600
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188179
2013-08-12 14:42:49 +00:00
Tom Stellard 41ef85df0a Add some missing convert_* functions
llvm-svn: 188131
2013-08-10 03:40:37 +00:00
Tom Stellard abbfd2bde0 Implement generic rint()
llvm-svn: 188130
2013-08-10 03:40:33 +00:00
Tom Stellard da920eab42 configure: Fix build when clang is installed to a non-standard prefix
llvm-svn: 188129
2013-08-10 03:40:26 +00:00
Aaron Watry 88ac12591c Add missing integer min/max definitions
Found in CL 1.1 spec section 6.11.3

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 187200
2013-07-26 13:02:02 +00:00
Aaron Watry bde11213e7 Added get_num_groups
The get_num_groups function was missing for r600g. I did the same
thing as the other workitem functions.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 187059
2013-07-24 18:03:38 +00:00
Aaron Watry 1769b1fca9 Implement generic upsample()
Reduces all vector upsamples down to its scalar components, so probably
not the most efficient thing in the world, but it does what the
spec says it needs to do.

Another possible implementation would be to convert/cast everything as
unsigned if necessary, upsample the input vectors, create the upsampled
value, and then cast back to signed if required.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 186691
2013-07-19 16:44:37 +00:00
Aaron Watry 0da3d3b5ba Fix build with LLVM 3.4
F_Binary and friends were moved to include/Support/FileSystem.h

v2: Maintain compatibility with LLVM 3.3

Signed-off-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 186610
2013-07-18 21:24:35 +00:00
Aaron Watry 99a2f3b274 Fix and re-enable R600 vload/vstore assembly
The assembly optimizations were making unsafe assumptions about which address
spaces had which identifiers.

Also, fix vload/vstore with 64-bit pointers. This was broken previously on
Radeon SI.

This version still only has assembly versions of int/uint 2/4/8/16 for global
loads and stores on R600, but it does it in a way that would be very easily
extended to private/local/constant and could also be handled easily on other
architectures.

v2: 1) Leave v[load|store]_impl.ll in generic/lib
    2) Remove vload_if.ll and vstore_if.ll interfaces
    3) Fix address+offset calculations
    3) Remove offset from assembly arg list
llvm-svn: 186416
2013-07-16 14:29:01 +00:00
Aaron Watry 4cb7cf276d libclc: vload/vstore disable assembly and fix offset calculation
This commit gets us back to pure CLC and fixes offset calculations.

The next commit will re-enable the assembly implementation for R600,
fix bugs related to 64-bit address spaces, and also fix the
incorrect assumption that address space identifiers are the same in
all architectures.

llvm-svn: 186415
2013-07-16 14:28:58 +00:00
Tom Stellard eaa534450c Add integer-gentype.inc: Missing file from r185839
llvm-svn: 186326
2013-07-15 15:20:05 +00:00
Tom Stellard 6f33168bb7 Implement mad24() and mul24() builtins
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 185839
2013-07-08 17:27:13 +00:00
Tom Stellard d768ac0395 Add __CLC_ prefix to all macro definitions in headers
libclc was defining and undefing GENTYPE and several other macros with
common names in its header files.  This was preventing applications from
defining macros with identical names as command line arguments to the
compiler, because the definitions in the header files were masking the
macros defined as compiler arguements.

Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 185838
2013-07-08 17:27:02 +00:00
Tom Stellard 3a81b5d083 Implement barrier() builtin
Reviewed and Tested-by: Aaron Watry <awatry@gmail.com>

llvm-svn: 185837
2013-07-08 17:26:39 +00:00
Tom Stellard a4cadba551 Add bitselect() builtin
Reviewed-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 185836
2013-07-08 17:26:33 +00:00
Tom Stellard 64b3bbae1e libclc: Add assembly versions of vstore for global [u]int4/8/16
The assembly should be generic, but at least currently R600 only supports
32-bit stores of [u]int1/4, and I believe that only global is well-supported.

R600 lowers the 8/16 component stores to multiple 4-component stores.

The unoptimized C versions of the other stuff is left in place.

Patch by: Aaron Watry

llvm-svn: 185009
2013-06-26 18:22:20 +00:00
Tom Stellard 922ac056e3 libclc: Add assembly versions of vload for global int4/8/16
The assembly should be generic, but at least currently R600 only supports
32-bit loads of int1/4, and I believe that only global is well-supported.

R600 lowers the 8/16 component vectors to multiple 4-bit loads.

The unoptimized C versions of the other stuff is left in place.

Patch by: Aaron Watry

llvm-svn: 185008
2013-06-26 18:22:15 +00:00
Tom Stellard 51441f80c5 libclc: Initial vstore implementation
Assumes that the target supports byte-addressable stores.

Completely unoptimized.

Patch by: Aaron Watry

llvm-svn: 185007
2013-06-26 18:22:11 +00:00
Tom Stellard 66ecbc7c18 libclc: Initial vload implementation
Should work for all targets and data types.  Completely unoptimized.

Patch by: Aaron Watry

llvm-svn: 185006
2013-06-26 18:22:05 +00:00
Tom Stellard c0af47de00 r600: Fix implementations of get_group_id.ll and get_local_size.ll
llvm-svn: 185005
2013-06-26 18:22:00 +00:00
Tom Stellard e78344dfae libclc: Implement clz() builtin
Squashed commit of the following:

commit a0df0a0e86c55c1bdc0b9c0f5a739e5adef4b056
Author: Aaron Watry <awatry@gmail.com>
Date:   Mon Apr 15 18:42:04 2013 -0500

    libclc: Rename clz.ll to clz_if.ll to ensure it gets built.

    configure.py treats files that have the same name with the .cl and .ll
    extensions as overriding eachother.

    E.g. If you have clz.cl and clz.ll both specified to be built in the same
    SOURCES file, only the first file listed will actually be built.

    Since the contents of clz.ll were an interface that is implemented in
    clz_impl.ll, rename clz.ll to clz_if.ll to make sure that the interface is
    built.

commit 931b62bed05c58f737de625bd415af09571a6a5a
Author: Aaron Watry <awatry@gmail.com>
Date:   Sat Apr 13 12:32:54 2013 -0500

    libclc: llvm assembly implementation of clz

    Untested... currently crashes in the same manner as add_sat.

commit 6ef0b7b0b6d2e5584086b4b9a9243743b2e0538f
Author: Aaron Watry <awatry@gmail.com>
Date:   Sat Mar 23 12:35:27 2013 -0500

    libclc: Add stub clz builtin

    For scalar int/uint, attempt to use the clz llvm builtin.. for all others
    return 0 until an actual implementation is finished.

Patch by: Aaron Watry

llvm-svn: 185004
2013-06-26 18:21:55 +00:00
Tom Stellard 34f513df7c libclc: Add clamp(vec, scalar, scalar) and max(vec, scalar)
For any GENTYPE that isn't scalar, we need to implement a mixed
vector/scalar version of clamp/max.

This depends on the min() patches I sent to the list a few minutes ago.

Patch by: Aaron Watry

llvm-svn: 185003
2013-06-26 18:21:49 +00:00
Tom Stellard 075b31a2fa libclc: Implement the min(vec, scalar) version of the min builtin.
Checks if the current GENTYPE is scalar, and if not, then defines a separate
implementation of the function which casts the second arg to vector before
proceeding.

Patch by: Aaron Watry

llvm-svn: 185002
2013-06-26 18:21:44 +00:00
Tom Stellard 0be3acfc70 libclc: implement initial version of min()
This doesn't handle the integer cases for min(vector, scalar).

Patch by: Aaron Watry

llvm-svn: 185001
2013-06-26 18:21:38 +00:00
Tom Stellard 29b5b9816b libclc: Rename [add|sub]_sat.ll to [add|sub]_sat_if.ll
configure.py allows overloading *.cl with *.ll, but will only ever build
the first file listed in SOURCES of ${file}.cl and ${file}.ll

add_sat, sub_sat, (and the soon to be submitted clz) all define interfaces in
${function_name}.ll which are implemented in ${function_name}_impl.ll.

Renaming the interface files is enough to get them to build again, fixing
CL usage of these functions.

Tested on clover/r600g.

Patch by: Aaron Watry

llvm-svn: 185000
2013-06-26 18:21:31 +00:00
Tom Stellard a30713710c Add a another TODO note.
Patch by: Aaron Watry

llvm-svn: 184999
2013-06-26 18:21:25 +00:00
Tom Stellard 4974f6c6d0 Add a TODO note.
Patch by: Aaron Watry

llvm-svn: 184998
2013-06-26 18:21:22 +00:00
Tom Stellard 8c1e72f46a Simplify rotate implementation a bit..
Much more understandable/readable as a result, and probably more efficient.

Patch by: Aaron Watry

llvm-svn: 184997
2013-06-26 18:21:18 +00:00
Tom Stellard 0bb381eaec libclc: implement rotate builtin
This implementation does a lot of bit shifting and masking. Suffice to say,
this is somewhat suboptimal... but it does look to produce correct results
(after the piglit tests were corrected for sign extension issues).

Someone who knows LLVM better than I could re-write this more efficiently.

Patch by: Aaron Watry

llvm-svn: 184996
2013-06-26 18:21:13 +00:00
Tom Stellard cb133c9322 libclc: Move max builtin to shared/
Max(x,y) is available for all integer/floating types.

Patch by: Aaron Watry

llvm-svn: 184995
2013-06-26 18:21:06 +00:00