Commit Graph

11 Commits

Author SHA1 Message Date
Jan Vesely 260827caa2 r600: Use llvm intrinsic to read work dimension information
v2: Fix function declaration
    Add range metadata to r600 implementation
v3: change prefix to AMDGPU

Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219793
2014-10-15 15:08:06 +00:00
Aaron Watry f991505d02 vload/vstore: Use casts instead of scalarizing everything in CLC version
This generates bitcode which is indistinguishable from what was
hand-written for int32 types in v[load|store]_impl.ll.

v4: Use vec2+scalar for vec3 load/stores to prevent corruption (per Tom)
v3: Also remove unused generic/lib/shared/v[load|store]_impl.ll
v2: (Per Matt Arsenault) Fix alignment issues with vector load stores

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 216069
2014-08-20 13:58:57 +00:00
Tom Stellard d2e83929a9 R600: Set the noduplicate attribute on barrier() intrinsics
This will prevent LLVM optimization passes from creating illegal uses
of the barrier() intrinsic (e.g. calling barrier() from a conditional
that is not executed by all threads).

llvm-svn: 193753
2013-10-31 15:50:48 +00:00
Tom Stellard 6c7b86c106 Implement nextafter() builtin
There are two implementations of nextafter():
1. Using clang's __builtin_nextafter.  Clang replaces this builtin with
a call to nextafter which is part of libm.  Therefore, this
implementation will only work for targets with an implementation of
libm (e.g. most CPU targets).

2. The other implementation is written in OpenCL C.  This function is
known internally as __clc_nextafter and can be used by targets that
don't have access to libm.

llvm-svn: 192383
2013-10-10 19:08:51 +00:00
Aaron Watry 50a7bcbac9 Add atomic_inc and atomic_add builtins
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 190058
2013-09-05 16:04:01 +00:00
Aaron Watry bde11213e7 Added get_num_groups
The get_num_groups function was missing for r600g. I did the same
thing as the other workitem functions.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 187059
2013-07-24 18:03:38 +00:00
Aaron Watry 99a2f3b274 Fix and re-enable R600 vload/vstore assembly
The assembly optimizations were making unsafe assumptions about which address
spaces had which identifiers.

Also, fix vload/vstore with 64-bit pointers. This was broken previously on
Radeon SI.

This version still only has assembly versions of int/uint 2/4/8/16 for global
loads and stores on R600, but it does it in a way that would be very easily
extended to private/local/constant and could also be handled easily on other
architectures.

v2: 1) Leave v[load|store]_impl.ll in generic/lib
    2) Remove vload_if.ll and vstore_if.ll interfaces
    3) Fix address+offset calculations
    3) Remove offset from assembly arg list
llvm-svn: 186416
2013-07-16 14:29:01 +00:00
Tom Stellard 3a81b5d083 Implement barrier() builtin
Reviewed and Tested-by: Aaron Watry <awatry@gmail.com>

llvm-svn: 185837
2013-07-08 17:26:39 +00:00
Tom Stellard f2f5a86620 R600: Replace cl implementations with LLVM IR implementation
This allows libclc to be built for R600 with upstream clang and LLVM.

llvm-svn: 184980
2013-06-26 18:20:00 +00:00
Tom Stellard 38f0ac9d5e r600: Add get_global_size() implementation
llvm-svn: 184977
2013-06-26 18:19:44 +00:00
Tom Stellard 879327fcdc r600: Initial support
This includes a get_global_id() implementation and function stubs for
the other workitem and synchronization functions.

llvm-svn: 184975
2013-06-26 18:18:59 +00:00