mirror of https://github.com/abinit/abinit.git
81 lines
3.4 KiB
Plaintext
81 lines
3.4 KiB
Plaintext
|
|
To run the dfpt test case.
|
|
|
|
-----------------------------------------
|
|
|
|
First test : FCC aluminum, computation of the phonon frequencies at q=0.25 -0.125 0.125 .
|
|
|
|
This test, with only one atom, is very fast, but does not scale very well,
|
|
as the sequential part will quickly dominate. It is actually about 1.5% of the
|
|
code on one processor. So the maximum speed-up is about 60.
|
|
One relies on a k-point grid of 4x4x4 x 4 shifts (=256 k points), and 5 bands.
|
|
There are three perturbations. For the two first perturbations, no symmetry can be used,
|
|
while for the the third, two symmetries can be used to reduce the number of k points to 128
|
|
Hence, for the perfectly scalable sections of the code, the maximum speed up is 128*5=640 (on 640 cores).
|
|
However, the sequential parts of the code dominate at a much lower value...
|
|
The scaling limitation is mostly due to the reading of ground state wavefunctions in the inwffil.F90
|
|
routine. This reading is not parallelized over k points/bands. In the present
|
|
case, 10k points are present in the irreducible Brillouin zone (256 in the
|
|
full BZ) and must be read.
|
|
|
|
There is one preparatory steps, before running the DFPT calculation.
|
|
|
|
Preparatory step 1
|
|
(mpirun ...) abinit < tdfpt_01.files > tdfpt_01.log
|
|
cp tdfpt_01.o_WFK tdfpt_02.i_WFK
|
|
cp tdfpt_01.o_WFK tdfpt_02.i_WFQ
|
|
|
|
Test case, step 2 (DFPT calculation)
|
|
(mpirun ...) abinit < tdfpt_02.files > tdfpt_02.log
|
|
|
|
-------------------------------------------
|
|
|
|
Second test : BaTiO3 slab (29 atoms),
|
|
computation of the phonon frequencies at qpt 0.0 0.375 0.0
|
|
|
|
This test, with 29 atom, is quite slow, but scales very well.
|
|
|
|
There is one preparatory step, before running the DFPT calculation.
|
|
The preparatory step can be run on 16 processors at most with the current
|
|
input file. It might use more processors as well, with the kgb parallelism
|
|
(but the input file has to be modified).
|
|
On 8 processors, the preparatory step is about three hours.
|
|
It generates well-converged wavefunctions. For a quick trial,
|
|
simply set nstep 1 instead of nstep 50 ,
|
|
this will run in about 6 minutes.
|
|
|
|
The test case itself is an underconverged calculation of the response with
|
|
respect to one perturbation (atomic displacement). It is underconverged
|
|
because nstep has been set to 10, while more than 30 are needed.
|
|
Moreover, obtaining the interatomic force constants would need computing
|
|
many more perturbations than the present one.
|
|
In any case, the present test case run in about 45 minutes on a 8 core
|
|
machine.
|
|
Since the number of k points to be kept for the present perturbation is is 8x8x1 with 4 symmetries,
|
|
that is 16, and the number of bands is 120, the perfectly scalable part of the
|
|
test case should have a maximum speed up of 1920.
|
|
|
|
From tests for the 8 core case, on a total of 20200 secs, there
|
|
were 305 secs for vtorho3:synchro (sequential) and
|
|
260.460 for inwffil (sequential).
|
|
The latter will not increase with a bigger value of nstep, and for more
|
|
perturbations, while the former will increase proportionally.
|
|
|
|
Hence, in the present status, for 8 cores, the sequential part is about 3%,
|
|
leading to a maximum speed-up with respect to sequential, of about 240.
|
|
For a larger test case (bigger nstep, more perturbations), the maximum speed up might
|
|
be twice bigger.
|
|
|
|
|
|
|
|
Preparatory step 1
|
|
(mpirun ...) abinit < tdfpt_03.files > tdfpt_03.log
|
|
cp tdfpt_03.o_WFK tdfpt_04.i_WFK
|
|
cp tdfpt_03.o_WFK tdfpt_04.i_WFQ
|
|
|
|
Test case, step 2 (DFPT calculation)
|
|
(mpirun ...) abinit < tdfpt_04.files > tdfpt_04.log
|
|
|
|
-------------------------------------------
|
|
|