History

Pietro Bonfa c4d0af26dd Added optional argument to deallocate UtilXlib buffers and checks to avoid double allocations on mp_start.		2018-12-17 13:30:04 +01:00
..
tests	Added optional argument to deallocate UtilXlib buffers and checks to avoid double allocations on mp_start.	2018-12-17 13:30:04 +01:00
Makefile	A few cuf kernel based helper subroutines for the CUDA port	2018-11-08 11:29:39 +01:00
Makefile.test	First draft of CUDA Fortran enabled UtilXlib	2018-02-15 18:08:38 +01:00
README.md	Added information regarding MPI interfaces	2018-09-28 16:01:30 +02:00
clocks_handler.f90	Update clocks_handler.f90	2018-07-24 13:49:02 +00:00
cuda_util.f90	Preprocessor directives aligned correctly. Arrays that gets updated now have inout attribute.	2018-11-12 18:38:16 +01:00
data_buffer.f90	USE cudafor for attributes when compiling with CUDA support	2018-02-26 18:26:44 +01:00
divide.f90	Remove all unnecessary mem ops in cegterg.	2018-05-27 21:54:46 -05:00
error_handler.f90	Misc problems with error messages	2017-08-07 11:10:41 +00:00
find_free_unit.f90	partially reversed previuos commit, in case of failure find_free_unit returns a negative value and prints an info message without stopping the program.	2017-07-30 17:28:48 +00:00
fletcher32_mod.f90	Fletcher-32 checksum implemented in clib/fletcher32.c	2017-09-03 15:01:51 +00:00
hash.f90	simple example of use of the fletcher32 check sum functionality	2017-09-03 15:12:46 +00:00
make.depend	A few cuf kernel based helper subroutines for the CUDA port	2018-11-08 11:29:39 +01:00
mem_counter.f90	More realistic memory estimate for EXX calculations.	2018-06-08 08:27:01 +02:00
mp.f90	Added optional argument to deallocate UtilXlib buffers and checks to avoid double allocations on mp_start.	2018-12-17 13:30:04 +01:00
mp_bands_util.f90	Replicated routine "set_bgrp_index" replaced by "divide"	2017-12-23 22:00:32 +01:00
mp_base.f90	Added optional argument to deallocate UtilXlib buffers and checks to avoid double allocations on mp_start.	2018-12-17 13:30:04 +01:00
mp_base_gpu.f90	Added optional argument to deallocate UtilXlib buffers and checks to avoid double allocations on mp_start.	2018-12-17 13:30:04 +01:00
parallel_include.f90	UtilXlib directory created to contain a library (libutil.a) for	2017-07-26 11:15:20 +00:00
thread_util.f90	revert thread_util to version w/o data chuncking	2018-08-07 20:45:56 +02:00
util_param.f90	UtilXlib directory created to contain a library (libutil.a) for	2017-07-26 11:15:20 +00:00

README.md

UtilXlib

This library implements various basic tasks such as timing, tracing, optimized memory accesses and an abstraction layer for the MPI subroutines.

The following pre-processor directives can be used to enable/disable some features:

__MPI : activates MPI support.
__TRACE : activates verbose output for debugging purposes
__CUDA : activates CUDA Fortran based interfaces.
__GPU_MPI : use CUDA aware MPI calls instead of standard sync-send-update method (experimental).

Usage of wrapper interfaces for MPI

This library offers a number of interfaces to abstract the MPI APIs and to optionally relax the dependency on a MPI library.

mp_* interfaces present in the library can only be called after the initialization performed by the subroutine mp_start and before the finalization done by mp_end. All rules have exceptions and indeed subroutines mp_count_nodes, mp_type_create_column_section and mp_type_free can also be called outside the aforementioned window.

If CUDA Fortran support is enabled, almost all interfaces accept input data declared with the device attribute. Note however that CUDA Fortran support should be considered experimental.

CUDA specific notes

All calls to message passing interfaces are synchronous with respect to both MPI and CUDA streams. The code will synchronize the device before starting the communication, also in those cases where communication may be avoided (for example in serial version). A different behaviour may be observed when the default stream synchronization behaviour is overridden by the user (see cudaStreamCreateWithFlags).

Be careful when using CUDA-aware MPI. Some implementations are not complete. The library will not check for the CUDA-aware MPI APIs during the initialization, but may report failure codes during the execution. If you encounter problems when adding the flag __GPU_MPI it might be that the MPI library does not support some CUDA-aware APIs.

Known Issues

Owing to the use of the source option in data allocations, PGI versions older than 17.10 may fail with arrays having initial index different from 1.

Testing

Partial unit testing is available in the tests sub-directory. See the README in that directory for further information.