4bff5c4cf0
(all calls are redirected to UtilXlib/device_helper.f90) |
||
---|---|---|
.. | ||
tests | ||
CMakeLists.txt | ||
Makefile | ||
Makefile.test | ||
README.md | ||
c_mkdir.c | ||
clib_wrappers.f90 | ||
clocks_handler.f90 | ||
copy.c | ||
cptimer.c | ||
data_buffer.f90 | ||
device_helper.f90 | ||
divide.f90 | ||
error_handler.f90 | ||
eval_infix.c | ||
export_gstart_2_solvers.f90 | ||
find_free_unit.f90 | ||
fletcher32.c | ||
fletcher32_mod.f90 | ||
hash.f90 | ||
md5.c | ||
md5.h | ||
md5_from_file.c | ||
mem_counter.f90 | ||
memstat.c | ||
memusage.c | ||
mp.f90 | ||
mp_bands_util.f90 | ||
mp_base.f90 | ||
mp_base_gpu.f90 | ||
nvtx_wrapper.f90 | ||
parallel_include.f90 | ||
print_mem.f90 | ||
ptrace.c | ||
set_mpi_comm_4_solvers.f90 | ||
thread_util.f90 | ||
util_param.f90 |
README.md
UtilXlib
This library implements various basic tasks such as timing, tracing, optimized memory accesses and an abstraction layer for the MPI subroutines.
The following pre-processor directives can be used to enable/disable some features:
__MPI
: activates MPI support.__TRACE
: activates verbose output for debugging purposes__CUDA
: activates CUDA Fortran based interfaces.__GPU_MPI
: use CUDA aware MPI calls instead of standard sync-send-update method (experimental).
Usage of wrapper interfaces for MPI
This library offers a number of interfaces to abstract the MPI APIs and to optionally relax the dependency on a MPI library.
mp_*
interfaces present in the library can only be called after the
initialization performed by the subroutine mp_start
and before the
finalization done by mp_end
.
All rules have exceptions and indeed subroutines mp_count_nodes
,
mp_type_create_column_section
and mp_type_free
can also be called
outside the aforementioned window.
If CUDA Fortran support is enabled, almost all interfaces accept input
data declared with the device
attribute. Note however that CUDA Fortran
support should be considered experimental.
CUDA specific notes
All calls to message passing interfaces are synchronous with respect to
both MPI and CUDA streams. The code will synchronize the device before
starting the communication, also in those cases where communication
may be avoided (for example in serial version).
A different behaviour may be observed when the default stream
synchronization behaviour is overridden by the user (see cudaStreamCreateWithFlags
).
Be careful when using CUDA-aware MPI. Some implementations are not
complete. The library will not check for the CUDA-aware MPI APIs during
the initialization, but may report failure codes during the execution.
If you encounter problems when adding the flag __GPU_MPI
it might
be that the MPI library does not support some CUDA-aware APIs.
Testing
Partial unit testing is available in the tests
sub-directory. See the
README.md file in that directory for further information.