quantum-espresso/CPV/examples/EXX-wf-example/README-Parallelization

46 lines
3.2 KiB
Plaintext

----------------------------------------------------
Parallelization scheme:
----------------------------------------------------
The references for this algorithm are:
(i) Theory: X. Wu , A. Selloni, and R. Car, Phys. Rev. B 79, 085102 (2009).
(ii) Implementation: H.-Y. Ko, B. Santra, R. A. DiStasio, L. Kong, and R. Car, arxiv.
The parallelization scheme in this algorithm is based upon the number of electronic states.
In the current implementation, there are certain restrictions on the choice of the number of
MPI tasks. Also slightly different algorithms are employed depending on whether the number of
MPI tasks used in the calculation are greater or less than the number of electronic states.
We highly recommend users to follow the notes below.
This algorithm can be used most efficiently if the numbers of electronic states are
uniformly distributed over the number of MPI tasks. For a system having N electronic states,
the optimum numbers of MPI tasks (nproc) are the following:
a) In case of nproc <= N, the optimum choices are N/m, where m is any positive integer.
Robustness : Can be used for odd and even number of electronic states.
OpenMP threads : Can be used.
Taskgroup : Only the default value of the task group (-ntg 1) is allowed.
b) In case of nproc > N, the optimum choices are N*m, where m is any positive integer.
Robustness : Can be used for even number of electronic states.
Largest value of m : As long as nj_max (see output) is greater than 1, however beyond m=8
the scaling may become poor. The scaling should be tested by users.
OpenMP threads : Can be used and highly recommended. We have tested number of threads
starting from 2 up to 64. More threads are also allowed.
For very large calculations (nproc > 1000 ) efficiency can largely depend
on the computer architecture and the balance between the MPI tasks and
the OpenMP threads. User should test for an optimal balance.
Reasonably good scaling can be achieved by using m=6-8 and
OpenMP threads=2-16.
Taskgroup : Can be greater than 1 and users should choose the largest possible value
for ntg. To estimate ntg, find the value of nr3x in the output and
compute nproc/nr3x and take the integer value. We have tested the value of
ntg as 2^m, where m is any positive integer. Other values of ntg should be
used with caution.
Ndiag : Use -ndiag X option in the execution of cp.x. Without this option jobs
may crash on certain architectures. Set X to any perfect square number
which is equal to or less than N.
DEBUG : The EXX calculations always work when number of MPI tasks = number of electronic states.
In case of any uncertainty, the EXX energy computed using different numbers of MPI tasks can be
checked by performing test calculations using number of MPI tasks = number of electronic states.