mirror of https://gitlab.com/QEF/q-e.git
[skip-CI] CP user guide updated
This commit is contained in:
parent
c8919b9124
commit
6b9d6677b5
|
@ -3,42 +3,36 @@
|
||||||
Introduction
|
Introduction
|
||||||
============
|
============
|
||||||
|
|
||||||
This guide covers the usage of the `CP` package, version 6.6, a core
|
This guide covers the usage of the `CP` package, version 6.8, a core
|
||||||
component of the Quantum ESPRESSO distribution. Further documentation,
|
component of the Quantum ESPRESSO distribution. Further documentation,
|
||||||
beyond what is provided in this guide, can be found in the directory
|
beyond what is provided in this guide, can be found in the directory
|
||||||
`CPV/Doc/`, containing a copy of this guide.
|
`CPV/Doc/`, containing a copy of this guide.
|
||||||
|
|
||||||
*Important notice: due to the lack of time and of manpower, this manual
|
This guide assumes that you know the physics that `CP` describes and the
|
||||||
is only partially updated and may contain outdated information.*
|
|
||||||
|
|
||||||
This guide assumes that you know the physics that `CP` describes and the
|
|
||||||
methods it implements. It also assumes that you have already installed,
|
methods it implements. It also assumes that you have already installed,
|
||||||
or know how to install, Quantum ESPRESSO. If not, please read the
|
or know how to install, Quantum ESPRESSO. If not, please read the
|
||||||
general User's Guide for Quantum ESPRESSO, found in directory `Doc/` two
|
general User's Guide for Quantum ESPRESSO, found in directory `Doc/` two
|
||||||
levels above the one containing this guide; or consult the web site:\
|
levels above the one containing this guide; or consult the web site:
|
||||||
`http://www.quantum-espresso.org`.
|
`http://www.quantum-espresso.org`.
|
||||||
|
|
||||||
People who want to modify or contribute to `CP` should read the
|
People who want to modify or contribute to `CP` should read the
|
||||||
Developer Manual:\
|
Developer Manual: `https://gitlab.com/QEF/q-e/-/wikis/home`.
|
||||||
`Doc/developer_man.pdf`.
|
|
||||||
|
|
||||||
`CP` can perform Car-Parrinello molecular dynamics, including
|
`CP` can perform Car-Parrinello molecular dynamics, including
|
||||||
variable-cell dynamics, and free-energy surface calculation at fixed
|
variable-cell dynamics. The `CP` package is based on the original code
|
||||||
cell through meta-dynamics, if patched with PLUMED.
|
written by Roberto Car
|
||||||
|
and Michele Parrinello. `CP` was developed by Alfredo Pasquarello (EPF
|
||||||
The `CP` package is based on the original code written by Roberto Car
|
|
||||||
and Michele Parrinello. `CP` was developed by Alfredo Pasquarello (EPF
|
|
||||||
Lausanne), Kari Laasonen (Oulu), Andrea Trave, Roberto Car (Princeton),
|
Lausanne), Kari Laasonen (Oulu), Andrea Trave, Roberto Car (Princeton),
|
||||||
Nicola Marzari (EPF Lausanne), Paolo Giannozzi, and others. FPMD, later
|
Nicola Marzari (EPF Lausanne), Paolo Giannozzi, and others. FPMD, later
|
||||||
merged with `CP`, was developed by Carlo Cavazzoni, Gerardo Ballabio
|
merged with `CP`, was developed by Carlo Cavazzoni (Leonardo), Gerardo
|
||||||
(CINECA), Sandro Scandolo (ICTP), Guido Chiarotti, Paolo Focher, and
|
Ballabio (CINECA), Sandro Scandolo (ICTP), Guido Chiarotti, Paolo Focher,
|
||||||
others. We quote in particular:
|
and others. We quote in particular:
|
||||||
|
|
||||||
- Federico Grasselli and Riccardo Bertossa (SISSA) for bug fixes,
|
- Federico Grasselli and Riccardo Bertossa (SISSA) for bug fixes,
|
||||||
extensions to Autopilot;
|
extensions to Autopilot;
|
||||||
|
|
||||||
- Biswajit Santra, Hsin-Yu Ko, Marcus Calegari Andrade (Princeton) for
|
- Biswajit Santra, Hsin-Yu Ko, Marcus Calegari Andrade (Princeton) for
|
||||||
SCAN functional;
|
various contribution, notably the SCAN functional;
|
||||||
|
|
||||||
- Robert DiStasio (Cornell)), Biswajit Santra, and Hsin-Yu Ko for
|
- Robert DiStasio (Cornell)), Biswajit Santra, and Hsin-Yu Ko for
|
||||||
hybrid functionals with MLWF; (maximally localized Wannier
|
hybrid functionals with MLWF; (maximally localized Wannier
|
||||||
|
@ -50,21 +44,21 @@ others. We quote in particular:
|
||||||
- Paolo Umari (Univ. Padua) for finite electric fields and conjugate
|
- Paolo Umari (Univ. Padua) for finite electric fields and conjugate
|
||||||
gradients;
|
gradients;
|
||||||
|
|
||||||
- Paolo Umari and Ismaila Dabo for ensemble-DFT;
|
- Paolo Umari and Ismaila Dabo (Penn State) for ensemble-DFT;
|
||||||
|
|
||||||
- Xiaofei Wang (Princeton) for META-GGA;
|
- Xiaofei Wang (Princeton) for META-GGA;
|
||||||
|
|
||||||
- The Autopilot feature was implemented by Targacept, Inc.
|
- The Autopilot feature was implemented by Targacept, Inc.
|
||||||
|
|
||||||
This guide has been mostly writen by Gerardo Ballabio and Carlo
|
The original version of this guide was mostly written by Gerardo Ballabio
|
||||||
Cavazzoni.
|
and Carlo Cavazzoni.
|
||||||
|
|
||||||
`CP` is free software, released under the GNU General Public License.\
|
`CP` is free software, released under the GNU General Public License.\
|
||||||
See `http://www.gnu.org/licenses/old-licenses/gpl-2.0.txt`, or the file
|
See `http://www.gnu.org/licenses/old-licenses/gpl-2.0.txt`, or the file
|
||||||
License in the distribution).
|
`License` in the distribution.
|
||||||
|
|
||||||
We shall greatly appreciate if scientific work done using the Quantum
|
We shall greatly appreciate if scientific work done using the Quantum
|
||||||
ESPRESSO distribution will contain an acknowledgment to the following
|
ESPRESSO distribution will contain an acknowledgment to the following
|
||||||
references:
|
references:
|
||||||
|
|
||||||
> P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car, C. Cavazzoni,
|
> P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car, C. Cavazzoni,
|
||||||
|
@ -95,22 +89,22 @@ Users of the GPU-enabled version should also cite the following paper:
|
||||||
> Ferretti, N. Marzari, I. Timrov, A. Urru, S. Baroni, J. Chem. Phys.
|
> Ferretti, N. Marzari, I. Timrov, A. Urru, S. Baroni, J. Chem. Phys.
|
||||||
> 152, 154105 (2020)
|
> 152, 154105 (2020)
|
||||||
|
|
||||||
Note the form Quantum ESPRESSO for textual citations of the code. Please
|
Note the form `Quantum ESPRESSO` (in small caps) for textual citations
|
||||||
also see package-specific documentation for further recommended
|
of the code. Please also see other package-specific documentation for
|
||||||
citations. Pseudopotentials should be cited as (for instance)
|
further recommended citations. Pseudopotentials should be cited as
|
||||||
|
(for instance)
|
||||||
|
|
||||||
> \[ \] We used the pseudopotentials C.pbe-rrjkus.UPF and O.pbe-vbc.UPF
|
> \[ \] We used the pseudopotentials C.pbe-rrjkus.UPF and O.pbe-vbc.UPF
|
||||||
> from\
|
> from `http://www.quantum-espresso.org`.
|
||||||
> `http://www.quantum-espresso.org`.
|
|
||||||
|
|
||||||
Compilation
|
Compilation
|
||||||
===========
|
===========
|
||||||
|
|
||||||
`CP` is included in the core Quantum ESPRESSO distribution. Instruction
|
`CP` is included in the core Quantum ESPRESSO distribution. Instruction
|
||||||
on how to install it can be found in the general documentation (User's
|
on how to install it can be found in the general documentation (User's
|
||||||
Guide) for Quantum ESPRESSO.
|
Guide) for Quantum ESPRESSO.
|
||||||
|
|
||||||
Typing `make cp` from the main Quantum ESPRESSO directory or `make` from
|
Typing `make cp` from the main Quantum ESPRESSO directory or `make` from
|
||||||
the `CPV/` subdirectory produces the following codes in `CPV/src`:
|
the `CPV/` subdirectory produces the following codes in `CPV/src`:
|
||||||
|
|
||||||
- `cp.x`: Car-Parrinello Molecular Dynamics code
|
- `cp.x`: Car-Parrinello Molecular Dynamics code
|
||||||
|
@ -125,52 +119,44 @@ Symlinks to executable programs will be placed in the `bin/`
|
||||||
subdirectory.
|
subdirectory.
|
||||||
|
|
||||||
As a final check that compilation was successful, you may want to run
|
As a final check that compilation was successful, you may want to run
|
||||||
some or all of the tests and examples. Automated tests for `cp.x` are in
|
some or all of the tests and examples. Automated tests for `cp.x` are in
|
||||||
directory `test-suite/` and can be run via the `Makefile` found there.
|
directory `test-suite/` and can be run via the `Makefile` found there.
|
||||||
Please see the general User's Guide for their setup.
|
Please see the general User's Guide for their setup.
|
||||||
|
|
||||||
You may take the tests and examples distributed with `CP` as templates
|
You may take the tests and examples distributed with `CP` as templates
|
||||||
for writing your own input files. Input files for tests are contained in
|
for writing your own input files. Input files for tests are contained in
|
||||||
subdirectories `test-suite/cp_` with file type `*.in1`, `*.in2`, \... .
|
subdirectories `test-suite/cp_*` with file type `*.in1`, `*.in2`, \... .
|
||||||
Input files for examples are produced, if you run the examples, in the
|
Input files for examples are produced, if you run the examples, in the
|
||||||
`results/` subdirectories, with names ending with `.in`.
|
`results/` subdirectories, with names ending with `.in`.
|
||||||
|
|
||||||
For general information on parallelism and how to run in parallel
|
For general information on parallelism and how to run in parallel
|
||||||
execution, please see the general User's Guide. `CP` currently can take
|
execution, please see the general User's Guide. `CP` currently can take
|
||||||
advantage of both MPI and OpenMP parallelization. The "plane-wave",
|
advantage of both MPI and OpenMP parallelization and on GPU acceleration.
|
||||||
"linear-algebra" and "task-group" parallelization levels are
|
The "plane-wave", "linear-algebra" and "task-group" parallelization levels
|
||||||
implemented.
|
are implemented.
|
||||||
|
|
||||||
Input data
|
Input data
|
||||||
==========
|
==========
|
||||||
|
|
||||||
Input data for `cp.x` is organized into several namelists, followed by
|
Input data for `cp.x` is organized into several namelists, followed by
|
||||||
other fields ("cards") introduced by keywords. The namelists are
|
other fields ("cards") introduced by keywords. The namelists are:
|
||||||
|
|
||||||
------------------- ----------------------------------------------------------
|
> &CONTROL: general variables controlling the run\
|
||||||
&CONTROL: general variables controlling the run
|
> &SYSTEM: structural information on the system under investigation\
|
||||||
&SYSTEM: structural information on the system under investigation
|
> &ELECTRONS: electronic variables, electron dynamics\
|
||||||
&ELECTRONS: electronic variables, electron dynamics
|
> &IONS : ionic variables, ionic dynamics\
|
||||||
&IONS : ionic variables, ionic dynamics
|
> &CELL (optional): variable-cell dynamics\
|
||||||
&CELL (optional): variable-cell dynamics
|
|
||||||
------------------- ----------------------------------------------------------
|
|
||||||
|
|
||||||
\
|
|
||||||
The `&CELL` namelist may be omitted for fixed-cell calculations. This
|
The `&CELL` namelist may be omitted for fixed-cell calculations. This
|
||||||
depends on the value of variable `calculation` in namelist &CONTROL.
|
depends on the value of variable `calculation` in namelist &CONTROL.
|
||||||
Most variables in namelists have default values. Only the following
|
Most variables in namelists have default values. Only he following
|
||||||
variables in &SYSTEM must always be specified:
|
variables in &SYSTEM must always be specified:
|
||||||
|
|
||||||
----------- --------------------- -----------------------------------------------
|
> `ibrav` (integer) Bravais-lattice index\
|
||||||
`ibrav` (integer) Bravais-lattice index
|
> `celldm` (real, dimension 6) crystallographic constants\
|
||||||
`celldm` (real, dimension 6) crystallographic constants
|
> `nat` (integer) number of atoms in the unit cell\
|
||||||
`nat` (integer) number of atoms in the unit cell
|
> `ntyp` (integer) number of types of atoms in the unit cell\
|
||||||
`ntyp` (integer) number of types of atoms in the unit cell
|
> `ecutwfc` (real) kinetic energy cutoff (Ry) for wavefunctions
|
||||||
`ecutwfc` (real) kinetic energy cutoff (Ry) for wavefunctions.
|
|
||||||
----------- --------------------- -----------------------------------------------
|
|
||||||
|
|
||||||
\
|
|
||||||
).
|
|
||||||
|
|
||||||
Explanations for the meaning of variables `ibrav` and `celldm`, as well
|
Explanations for the meaning of variables `ibrav` and `celldm`, as well
|
||||||
as on alternative ways to input structural data, are contained in files
|
as on alternative ways to input structural data, are contained in files
|
||||||
|
@ -178,34 +164,31 @@ as on alternative ways to input structural data, are contained in files
|
||||||
describe a large number of other variables as well. Almost all variables
|
describe a large number of other variables as well. Almost all variables
|
||||||
have default values, which may or may not fit your needs.
|
have default values, which may or may not fit your needs.
|
||||||
|
|
||||||
Comment lines in namelists can be introduced by a \"!\", exactly as in
|
|
||||||
fortran code.
|
|
||||||
|
|
||||||
After the namelists, you have several fields ("cards") introduced by
|
After the namelists, you have several fields ("cards") introduced by
|
||||||
keywords with self-explanatory names:
|
keywords with self-explanatory names:
|
||||||
|
|
||||||
> ATOMIC\_SPECIES\
|
> ATOMIC\_SPECIES\
|
||||||
> ATOMIC\_POSITIONS\
|
> ATOMIC\_POSITIONS\
|
||||||
> CELL\_PARAMETERS (optional)\
|
> CELL\_PARAMETERS (optional)\
|
||||||
> OCCUPATIONS (optional)\
|
> OCCUPATIONS (optional)
|
||||||
|
|
||||||
The keywords may be followed on the same line by an option. Unknown
|
The keywords may be followed on the same line by an option. Unknown
|
||||||
fields are ignored. See the files mentioned above for details on the
|
fields are ignored. See the files mentioned above for details on the
|
||||||
available "cards".
|
available "cards".
|
||||||
|
|
||||||
Comments lines in "cards" can be introduced by either a "!" or a "\#"
|
Comment lines in namelists can be introduced by a \"!\", exactly as in
|
||||||
character in the first position of a line.
|
fortran code. Comments lines in "cards" can be introduced by either a "!"
|
||||||
|
or a "\#" character in the first position of a line.
|
||||||
|
|
||||||
Data files
|
Data files
|
||||||
----------
|
----------
|
||||||
|
|
||||||
The output data files are written in the directory specified by variable
|
The output data files are written in the directory specified by variable
|
||||||
`outdir`, with names specified by variable `prefix` (a string that is
|
`outdir`, with names specified by variable `prefix` (a string that is
|
||||||
prepended to all file names, whose default value is: `prefix=’pwscf’`).
|
prepended to all file names, whose default value is `prefix=’cp_$ndw’`,
|
||||||
The `iotk` toolkit is used to write the file in a XML format, whose
|
where `ndw` is an integer specified in input).
|
||||||
definition can be found in the Developer Manual. In order to use the
|
In order to use the data on a different machine, you may need to
|
||||||
data directory on a different machine, you need to convert the binary
|
compile `CP` with HDF5 enabled.
|
||||||
files to formatted and back, using the `bin/iotk` script.
|
|
||||||
|
|
||||||
The execution stops if you create a file `prefix.EXIT` either in the
|
The execution stops if you create a file `prefix.EXIT` either in the
|
||||||
working directory (i.e. where the program is executed), or in the
|
working directory (i.e. where the program is executed), or in the
|
||||||
|
@ -215,58 +198,13 @@ this procedure is that all files are properly closed, whereas just
|
||||||
killing the process may leave data and output files in an unusable
|
killing the process may leave data and output files in an unusable
|
||||||
state.
|
state.
|
||||||
|
|
||||||
Format of arrays containing charge density, potential, etc.
|
The format of arrays containing charge density, potential, etc.
|
||||||
-----------------------------------------------------------
|
is described in the developer manual.
|
||||||
|
|
||||||
The index of arrays used to store functions defined on 3D meshes is
|
|
||||||
actually a shorthand for three indices, following the FORTRAN convention
|
|
||||||
(\"leftmost index runs faster\"). An example will explain this better.
|
|
||||||
Suppose you have a 3D array `psi(nr1x,nr2x,nr3x)`. FORTRAN compilers
|
|
||||||
store this array sequentially in the computer RAM in the following way:
|
|
||||||
|
|
||||||
psi( 1, 1, 1)
|
|
||||||
psi( 2, 1, 1)
|
|
||||||
...
|
|
||||||
psi(nr1x, 1, 1)
|
|
||||||
psi( 1, 2, 1)
|
|
||||||
psi( 2, 2, 1)
|
|
||||||
...
|
|
||||||
psi(nr1x, 2, 1)
|
|
||||||
...
|
|
||||||
...
|
|
||||||
psi(nr1x,nr2x, 1)
|
|
||||||
...
|
|
||||||
psi(nr1x,nr2x,nr3x)
|
|
||||||
etc
|
|
||||||
|
|
||||||
Let `ind` be the position of the `(i,j,k)` element in the above list:
|
|
||||||
the following relation
|
|
||||||
|
|
||||||
ind = i + (j - 1) * nr1x + (k - 1) * nr2x * nr1x
|
|
||||||
|
|
||||||
holds. This should clarify the relation between 1D and 3D indexing. In
|
|
||||||
real space, the `(i,j,k)` point of the FFT grid with dimensions `nr1`
|
|
||||||
( $`\le`$ `nr1x`), `nr2` ( $`\le`$ `nr2x`), , `nr3` ( $`\le`$ `nr3x`), is
|
|
||||||
|
|
||||||
```math
|
|
||||||
r_{ijk}=\frac{i-1}{nr1} \tau_1 + \frac{j-1}{nr2} \tau_2 + \frac{k-1}{nr3} \tau_3
|
|
||||||
```
|
|
||||||
|
|
||||||
where the $`\tau_i`$ are the basis vectors of the
|
|
||||||
Bravais lattice. The latter are stored row-wise in the `at` array:
|
|
||||||
$`\tau_1 =`$ `at(:, 1)`, $`\tau_2 =`$ `at(:, 2)`, $`\tau_3 =`$ `at(:, 3)`.
|
|
||||||
|
|
||||||
The distinction between the dimensions of the FFT grid, `(nr1,nr2,nr3)`
|
|
||||||
and the physical dimensions of the array, `(nr1x,nr2x,nr3x)` is done
|
|
||||||
only because it is computationally convenient in some cases that the two
|
|
||||||
sets are not the same. In particular, it is often convenient to have
|
|
||||||
`nrx1`=`nr1`+1 to reduce memory conflicts.
|
|
||||||
|
|
||||||
|
|
||||||
Output files
|
Output files
|
||||||
==========
|
==========
|
||||||
|
|
||||||
The `cp.x` code produces many output file, that together build up the trajectory.
|
The `cp.x` code produces many output files, that together build up the trajectory.
|
||||||
|
|
||||||
You have a file for the positions, called `prefix.pos`, where `prefix` is defined in
|
You have a file for the positions, called `prefix.pos`, where `prefix` is defined in
|
||||||
the input file, that is formatted like:
|
the input file, that is formatted like:
|
||||||
|
@ -280,35 +218,40 @@ the input file, that is formatted like:
|
||||||
0.42395189282719E+01 0.55766875434652E+01 0.31291744042209E+01
|
0.42395189282719E+01 0.55766875434652E+01 0.31291744042209E+01
|
||||||
0.45445534106843E+01 0.36049553522533E+01 0.55864387532281E+01
|
0.45445534106843E+01 0.36049553522533E+01 0.55864387532281E+01
|
||||||
|
|
||||||
where in the first line there is an header with, in order, the number of the step and
|
where the first line contains the step number and elapsed time, in ps, at this
|
||||||
the time in ps of this step. Later you found the positions of all the atoms, in the
|
step; the following lines contain the positions, in Bohr radii, of all the
|
||||||
same order of the input file (note that this behaviour emerged in v6.6 -- previously
|
atoms (3 in this examples), in the same order as in the input file (since v6.6
|
||||||
atoms were sorted by type). In this example we have 3 atoms.
|
-- previously, atoms were sorted by type; the type must be deduced from the
|
||||||
The type must be deduced from the input file. After the first 4 lines
|
input file). The same structure is repeated for the second step and so on.
|
||||||
you find the same structure for the second step. The units of the position are Bohr's
|
The printout is made every `iprint` steps (10 in this case, so at step 10, 20,
|
||||||
radius. Note that the atoms coordinates are unwrapped, so it is possible that they go
|
etc.). Note that the atomic coordinates are not wrapped into the simulation
|
||||||
outside the simulation cell.
|
cell, so it is possible that they lie outside it.
|
||||||
|
|
||||||
The velocities are written in a similar file named `prefix.vel`, where `prefix` is defined in
|
The velocities are written in a similar file named `prefix.vel`, where `prefix`
|
||||||
the input file, that is formatted like the `.pos` file. The units are the usual Hartree
|
is defined in the input file, that is formatted like the `.pos` file. The units
|
||||||
atomic units (note again that the velocity in the pw code differs by a factor of 2).
|
are the usual Hartree atomic units (note that the velocities in the `pw.x` code
|
||||||
|
are in _Rydberg_ a.u. and differ by a factor 2).
|
||||||
|
|
||||||
The `prefix.for` file is formatted like the previous two. Contains the computed forces
|
The `prefix.for` file, formatted like the previous two, contains the computed
|
||||||
and has Hartree atomic units too.
|
forces, in Hartree atomic units as well. It is written only if a molecular
|
||||||
It is written only if `tprnfor = .true.` is set in the input file.
|
dynamics calculation is performed, or if `tprnfor = .true.` is set in input.
|
||||||
|
|
||||||
The file `prefix.evp` has one line per printed step and contains some thermodynamic data.
|
The file `prefix.evp` has one line per printed step and contains some
|
||||||
|
thermodynamical data.
|
||||||
The first line of the file names the columns:
|
The first line of the file names the columns:
|
||||||
```
|
```
|
||||||
# nfi time(ps) ekinc T\_cell(K) Tion(K) etot enthal econs econt Volume Pressure(GPa
|
# nfi time(ps) ekinc Tcell(K) Tion(K) etot enthal econs econt Volume Pressure(GPa)
|
||||||
```
|
```
|
||||||
where:
|
where:
|
||||||
- `ekinc` $`K_{ELECTRONS}`$, the electron's fake kinetic energy
|
- `ekinc` is the electrons fictitious kinetic energy, $`K_{ELECTRONS}`$
|
||||||
- `enthal` $`E_{DFT}+PV`$
|
- `enthal` is the enthalpy, $`E_{DFT}+PV`$
|
||||||
- `etot` $`E_{DFT}`$ potential energy of the system, the DFT energy
|
- `etot` is the DFT (potential) energy of the system, $`E_{DFT}`$
|
||||||
- `econs` $`E_{DFT} + K_{NUCLEI}`$ this is something that is a constant of motion in the limit where the electronic fictitious mass is zero. It has a physical meaning.
|
- `econs` is a physically meaningful constant of motion, $`E_{DFT} + K_{NUCLEI}`$,
|
||||||
- `econt` $`E_{DFT} + K_{IONS} + K_{ELECTRONS}`$ this is a constant of motion of the lagrangian. If the dt is small enough this will be up to a very good precision a constant. It is not a physical quantity, since $`K_{ELECTRONS}`$ has _nothing_ to do with the quantum kinetic energy of the electrons.
|
in the limit of zero electronic fictitious mass
|
||||||
|
- `econt` is the constant of motion of the lagrangian$`E_{DFT} + K_{IONS} + K_{ELECTRONS}`$ t.
|
||||||
|
If the time step `dt` is small enough this will be up to a very good precision a constant.
|
||||||
|
It is not a physical quantity, since $`K_{ELECTRONS}`$ has _nothing_ to do with the quantum
|
||||||
|
kinetic energy of the electrons.
|
||||||
|
|
||||||
|
|
||||||
Using `CP`
|
Using `CP`
|
||||||
|
@ -317,7 +260,7 @@ Using `CP`
|
||||||
It is important to understand that a CP simulation is a sequence of
|
It is important to understand that a CP simulation is a sequence of
|
||||||
different runs, some of them used to \"prepare\" the initial state of
|
different runs, some of them used to \"prepare\" the initial state of
|
||||||
the system, and other performed to collect statistics, or to modify the
|
the system, and other performed to collect statistics, or to modify the
|
||||||
state of the system itself, i.e. modify the temperature or the pressure.
|
state of the system itself, i.e. to modify the temperature or the pressure.
|
||||||
|
|
||||||
To prepare and run a CP simulation you should first of all define the
|
To prepare and run a CP simulation you should first of all define the
|
||||||
system:
|
system:
|
||||||
|
@ -393,8 +336,7 @@ An example of input file (Benzene Molecule):
|
||||||
H -2.2 2.2 0.0
|
H -2.2 2.2 0.0
|
||||||
H 2.2 2.2 0.0
|
H 2.2 2.2 0.0
|
||||||
|
|
||||||
You can find the description of the input variables in file
|
You can find the description of the input variables in file `Doc/INPUT_CP.*`.
|
||||||
`Doc/INPUT_CP.*`.
|
|
||||||
|
|
||||||
Reaching the electronic ground state
|
Reaching the electronic ground state
|
||||||
------------------------------------
|
------------------------------------
|
||||||
|
@ -403,7 +345,7 @@ The first run, when starting from scratch, is always an electronic
|
||||||
minimization, with fixed ions and cell, to bring the electronic system
|
minimization, with fixed ions and cell, to bring the electronic system
|
||||||
on the ground state (GS) relative to the starting atomic configuration.
|
on the ground state (GS) relative to the starting atomic configuration.
|
||||||
This step is conceptually very similar to self-consistency in a
|
This step is conceptually very similar to self-consistency in a
|
||||||
`pw.x` run.
|
`pw.x` run.
|
||||||
|
|
||||||
Sometimes a single run is not enough to reach the GS. In this case, you
|
Sometimes a single run is not enough to reach the GS. In this case, you
|
||||||
need to re-run the electronic minimization stage. Use the input of the
|
need to re-run the electronic minimization stage. Use the input of the
|
||||||
|
@ -428,14 +370,12 @@ $`< 10^{-5}`$. You could check the value of the fictitious kinetic energy
|
||||||
on the standard output (column EKINC).
|
on the standard output (column EKINC).
|
||||||
|
|
||||||
Different strategies are available to minimize electrons, but the most
|
Different strategies are available to minimize electrons, but the most
|
||||||
used ones are:
|
frequently used is _damped dynamics_: `electron_dynamics = ’damp’` and
|
||||||
|
`electron_damping` = a number typically ranging from 0.1 and 0.5.
|
||||||
- steepest descent: `electron_dynamics = ’sd’`
|
|
||||||
|
|
||||||
- damped dynamics: `electron_dynamics = ’damp’`, `electron_damping` =
|
|
||||||
a number typically ranging from 0.1 and 0.5
|
|
||||||
|
|
||||||
See the input description to compute the optimal damping factor.
|
See the input description to compute the optimal damping factor.
|
||||||
|
Steepest descent: `electron_dynamics = ’sd’`, is also available but it
|
||||||
|
is typicallyslower than damped dynamics and should be used only to
|
||||||
|
start the minimization.
|
||||||
|
|
||||||
Relax the system
|
Relax the system
|
||||||
----------------
|
----------------
|
||||||
|
@ -860,14 +800,6 @@ ranges between 4 and 7.
|
||||||
All the other parameters have the same meaning in the usual `CP` input,
|
All the other parameters have the same meaning in the usual `CP` input,
|
||||||
and they are discussed above.
|
and they are discussed above.
|
||||||
|
|
||||||
### Free-energy surface calculations
|
|
||||||
|
|
||||||
Once `CP` is patched with `PLUMED` plug-in, it becomes possible to
|
|
||||||
turn-on most of the PLUMED functionalities running `CP` as:
|
|
||||||
`./cp.x -plumed` plus the other usual `CP` arguments. The PLUMED input
|
|
||||||
file has to be located in the specified `outdir` with the fixed name
|
|
||||||
`plumed.dat`.
|
|
||||||
|
|
||||||
### Treatment of USPPs
|
### Treatment of USPPs
|
||||||
|
|
||||||
The cutoff `ecutrho` defines the resolution on the real space FFT mesh
|
The cutoff `ecutrho` defines the resolution on the real space FFT mesh
|
||||||
|
@ -1030,99 +962,62 @@ An example input is listed as following:
|
||||||
O 16.0D0 O_HSCV_PBE-1.0.UPF
|
O 16.0D0 O_HSCV_PBE-1.0.UPF
|
||||||
H 2.0D0 H_HSCV_PBE-1.0.UPF
|
H 2.0D0 H_HSCV_PBE-1.0.UPF
|
||||||
|
|
||||||
Performances
|
Parallel Performances
|
||||||
============
|
=====================
|
||||||
|
|
||||||
`cp.x` can run in principle on any number of processors. The
|
`cp.x` can run in principle on any number of processors. The
|
||||||
effectiveness of parallelization is ultimately judged by the "scaling",
|
effectiveness of parallelization is ultimately judged by the "scaling",
|
||||||
i.e. how the time needed to perform a job scales with the number of
|
i.e. how the time needed to perform a job scales with the number of
|
||||||
processors, and depends upon:
|
processors. Ideally one would like to have linear scaling, i.e.
|
||||||
|
$`T \sim T_0/N_p`$ for $`N_p`$ processors, where $`T_0`$ is the estimated
|
||||||
|
time for serial execution. In addition, one would like to have linear
|
||||||
|
scaling of the RAM per processor: $`O_N \sim O_0/N_p`$, so that large-memory
|
||||||
|
systems fit into the RAM of each processor.
|
||||||
|
|
||||||
- the size and type of the system under study;
|
We refer to the "Parallelization" section of the general User's Guide for
|
||||||
|
a description of MPI and OpenMP parallelization paradigms, of the various
|
||||||
|
MPI parallelization levels, and on how to activate them.
|
||||||
|
|
||||||
- the judicious choice of the various levels of parallelization
|
A judicious choice of the various levels of parallelization, together
|
||||||
(detailed in
|
with the availability of suitable hardware (e.g. fast communications)
|
||||||
Sec.[\[SubSec:para\]](#SubSec:para){reference-type="ref"
|
is fundamental to reach good performances._VERY IMPORTANT_: For each
|
||||||
reference="SubSec:para"});
|
system there is an optimal range of number of processors on which to
|
||||||
|
run the job. A too large number of processors or a bad parallelization
|
||||||
|
style will yield performance degradation.
|
||||||
|
|
||||||
- the availability of fast interprocess communications (or lack of
|
For `CP` with hybrid functionals, see the related section above this one.
|
||||||
it).
|
For all other cases, the relevant MPI parallelization levels are:
|
||||||
|
|
||||||
Ideally one would like to have linear scaling, i.e. $`T \sim T_0/N_p`$ for
|
- "plane waves" (PW);
|
||||||
$`N_p`$ processors, where $`T_0`$ is the estimated time for serial
|
- "tasks" (activated by command-line option `-nt N`);
|
||||||
execution. In addition, one would like to have linear scaling of the RAM
|
- "linear algebra" (`-nd N`);
|
||||||
per processor: $`O_N \sim O_0/N_p`$, so that large-memory systems fit into
|
- "bands" parallelization (`-nb N`), to be used only in
|
||||||
the RAM of each processor.
|
special cases;
|
||||||
|
- "images" parallelization (`-ni N`), used only in code `manycp.x`
|
||||||
|
(see the header of `CPV/src/manycp.f90` for documentation).
|
||||||
|
|
||||||
As a general rule, image parallelization:
|
As a rule of thumb:
|
||||||
|
- start with PW parallelization only (e.g. `mpirun -np N cp.x ...` with
|
||||||
|
no other parallelization options); the code will scale well unless `N`
|
||||||
|
exceeds the third FFT dimensions `nr3` and/or `nr3s`.
|
||||||
|
- To further increase the number of processors, use "task groups",
|
||||||
|
typically 4 to 8 (e.g. `mpirun -np N cp.x -nt 8 ...`).
|
||||||
|
- Alternatively, or in addition, you may compile with OpenMP:
|
||||||
|
`./configure --enable-openmp ...`, then `export OMP_NUM_THREADS=n`
|
||||||
|
and run on `n` threads (4 to 8 typically).
|
||||||
|
_Beware conflicts between MPI and OpenMP threads_!
|
||||||
|
don't do this unless you know what you are doing.
|
||||||
|
- Finally, the optimal number of processors for \"linear-algebra\"
|
||||||
|
parallelization can be found by observing the performances of `ortho`
|
||||||
|
in the final time report for different numbers of processors in the
|
||||||
|
linear-algebra group (must be a square integer, not larger than the
|
||||||
|
number of processoris for plane-wave parallelization). Linear-algebra
|
||||||
|
parallelization distributes `M\times M`$ matrices, with `M` number of
|
||||||
|
bands, so it may be useful if memory-constrained.
|
||||||
|
|
||||||
- may give good scaling, but the slowest image will determine the
|
Note: optimal serial performances are achieved when the data are as much
|
||||||
overall performances ("load balancing" may be a problem);
|
as possible kept into the cache. As a side effect, PW parallelization may
|
||||||
|
yield superlinear (better than linear) scaling, thanks to the increase in
|
||||||
|
serial speed coming from the reduction of data size (making it easier for
|
||||||
|
the machine to keep data in the cache).
|
||||||
|
|
||||||
- requires very little communications (suitable for ethernet
|
|
||||||
communications);
|
|
||||||
|
|
||||||
- does not reduce the required memory per processor (unsuitable for
|
|
||||||
large-memory jobs).
|
|
||||||
|
|
||||||
Parallelization on k-points:
|
|
||||||
|
|
||||||
- guarantees (almost) linear scaling if the number of k-points is a
|
|
||||||
multiple of the number of pools;
|
|
||||||
|
|
||||||
- requires little communications (suitable for ethernet
|
|
||||||
communications);
|
|
||||||
|
|
||||||
- does not reduce the required memory per processor (unsuitable for
|
|
||||||
large-memory jobs).
|
|
||||||
|
|
||||||
Parallelization on PWs:
|
|
||||||
|
|
||||||
- yields good to very good scaling, especially if the number of
|
|
||||||
processors in a pool is a divisor of $`N_3`$ and $`N_{r3}`$ (the
|
|
||||||
dimensions along the z-axis of the FFT grids, `nr3` and `nr3s`,
|
|
||||||
which coincide for NCPPs);
|
|
||||||
|
|
||||||
- requires heavy communications (suitable for Gigabit ethernet up to
|
|
||||||
4, 8 CPUs at most, specialized communication hardware needed for 8
|
|
||||||
or more processors );
|
|
||||||
|
|
||||||
- yields almost linear reduction of memory per processor with the
|
|
||||||
number of processors in the pool.
|
|
||||||
|
|
||||||
A note on scaling: optimal serial performances are achieved when the
|
|
||||||
data are as much as possible kept into the cache. As a side effect, PW
|
|
||||||
parallelization may yield superlinear (better than linear) scaling,
|
|
||||||
thanks to the increase in serial speed coming from the reduction of data
|
|
||||||
size (making it easier for the machine to keep data in the cache).
|
|
||||||
|
|
||||||
VERY IMPORTANT: For each system there is an optimal range of number of
|
|
||||||
processors on which to run the job. A too large number of processors
|
|
||||||
will yield performance degradation. If the size of pools is especially
|
|
||||||
delicate: $`N_p`$ should not exceed $`N_3`$ and $`N_{r3}`$, and should ideally
|
|
||||||
be no larger than $`1/2\div1/4 N_3`$ and/or $`N_{r3}`$. In order to increase
|
|
||||||
scalability, it is often convenient to further subdivide a pool of
|
|
||||||
processors into "task groups". When the number of processors exceeds the
|
|
||||||
number of FFT planes, data can be redistributed to \"task groups\" so
|
|
||||||
that each group can process several wavefunctions at the same time.
|
|
||||||
|
|
||||||
The optimal number of processors for \"linear-algebra\" parallelization,
|
|
||||||
taking care of multiplication and diagonalization of $`M\times M`$
|
|
||||||
matrices, should be determined by observing the performances of
|
|
||||||
`cdiagh/rdiagh` (`pw.x`) or `ortho` (`cp.x`) for different numbers of
|
|
||||||
processors in the linear-algebra group (must be a square integer).
|
|
||||||
|
|
||||||
Actual parallel performances will also depend on the available software
|
|
||||||
(MPI libraries) and on the available communication hardware. For PC
|
|
||||||
clusters, OpenMPI (`http://www.openmpi.org/`) seems to yield better
|
|
||||||
performances than other implementations (info by Kostantin Kudin). Note
|
|
||||||
however that you need a decent communication hardware (at least Gigabit
|
|
||||||
ethernet) in order to have acceptable performances with PW
|
|
||||||
parallelization. Do not expect good scaling with cheap hardware: PW
|
|
||||||
calculations are by no means an \"embarrassing parallel\" problem.
|
|
||||||
|
|
||||||
Also note that multiprocessor motherboards for Intel Pentium CPUs
|
|
||||||
typically have just one memory bus for all processors. This dramatically
|
|
||||||
slows down any code doing massive access to memory (as most codes in the
|
|
||||||
Quantum ESPRESSO distribution do) that runs on processors of the same
|
|
||||||
motherboard.
|
|
||||||
|
|
|
@ -1,5 +1,6 @@
|
||||||
New in development version:
|
New in development version:
|
||||||
* RMM-DIIS for CPU (S. Nisihara) and GPU (E. de Paoli, P. Delugas)
|
* RMM-DIIS for CPU (S. Nisihara) and GPU (E. de Paoli, P. Delugas)
|
||||||
|
* DFT-D3: MPI parallelization and GPU acceleration with OPenACC
|
||||||
|
|
||||||
Fixed in development version:
|
Fixed in development version:
|
||||||
* Some build problems occurring under special circumstances
|
* Some build problems occurring under special circumstances
|
||||||
|
|
|
@ -11,10 +11,10 @@ Installation
|
||||||
This version requires the nvfortran (previously PGI) compiler from the
|
This version requires the nvfortran (previously PGI) compiler from the
|
||||||
freely available NVidia HPC SDK. You are adviced to use a recent version
|
freely available NVidia HPC SDK. You are adviced to use a recent version
|
||||||
of NVidia software. Any version later than 17.4 should work, but many glitches
|
of NVidia software. Any version later than 17.4 should work, but many glitches
|
||||||
are know to exist in older versions.
|
are known to exist in older versions.
|
||||||
The configure script checks for the presence of the nvfortran compiler and of
|
The `configure` script checks for the presence of the nvfortran compiler and
|
||||||
a few cuda libraries.For this reason the path pointing to cudatoolkit must be
|
of a few cuda libraries. For this reason the path pointing to the cuda toolkit
|
||||||
present in `LD_LIBRARY_PATH`.
|
must be present in `LD_LIBRARY_PATH`.
|
||||||
|
|
||||||
A template for the configure command is:
|
A template for the configure command is:
|
||||||
|
|
||||||
|
@ -26,7 +26,8 @@ where `XX` is the location of the CUDA Toolkit (in HPC environments is
|
||||||
generally `$CUDA_HOME`), `YY` is the version of the cuda toolkit and `ZZ`
|
generally `$CUDA_HOME`), `YY` is the version of the cuda toolkit and `ZZ`
|
||||||
is the compute capability of the card.
|
is the compute capability of the card.
|
||||||
If you have no idea what these numbers are you may give a try to the
|
If you have no idea what these numbers are you may give a try to the
|
||||||
automatic tool `get_device_props.py`. An example using Slurm is:
|
automatic tool `get_device_props.py`. Go to directory `dev-tools/` and
|
||||||
|
run `python get_device_props.py`. An example using Slurm:
|
||||||
|
|
||||||
```
|
```
|
||||||
$ module load cuda
|
$ module load cuda
|
||||||
|
@ -46,12 +47,12 @@ Compute capabilities for dev 3: 6.0
|
||||||
```
|
```
|
||||||
|
|
||||||
It is generally a good idea to disable Scalapack when running small test
|
It is generally a good idea to disable Scalapack when running small test
|
||||||
cases since the serial GPU eigensolver can outperform the parallel CPU
|
cases since the serial GPU eigensolver outperforms the parallel CPU
|
||||||
eigensolver in many circumstances.
|
eigensolver in many circumstances.
|
||||||
|
|
||||||
From time to time PGI links to the wrong CUDA libraries and fails reporting
|
From time to time PGI links to the wrong CUDA libraries and fails reporting a
|
||||||
a problem in `cusolver` missing `GOmp` (GNU Openmp). The solution to this
|
problem in `cusolver` missing `GOmp` (GNU Openmp). This problem can be solved
|
||||||
problem is removing cudatoolkit from the `LD_LIBRARY_PATH` before compiling.
|
by removing the cuda toolkit from the `LD_LIBRARY_PATH` before compiling.
|
||||||
|
|
||||||
Serial compilation is also supported.
|
Serial compilation is also supported.
|
||||||
|
|
||||||
|
|
|
@ -48,16 +48,8 @@ If you encounter problems when adding the flag `__GPU_MPI` it might
|
||||||
be that the MPI library does not support some CUDA-aware APIs.
|
be that the MPI library does not support some CUDA-aware APIs.
|
||||||
|
|
||||||
|
|
||||||
Known Issues
|
|
||||||
============
|
|
||||||
|
|
||||||
Owing to the use of the `source` option in data allocations,
|
|
||||||
PGI versions older than 17.10 may fail with arrays having initial index
|
|
||||||
different from 1.
|
|
||||||
|
|
||||||
|
|
||||||
Testing
|
Testing
|
||||||
=======
|
=======
|
||||||
|
|
||||||
Partial unit testing is available in the `tests` sub-directory. See the
|
Partial unit testing is available in the `tests` sub-directory. See the
|
||||||
README in that directory for further information.
|
README.md file in that directory for further information.
|
||||||
|
|
Loading…
Reference in New Issue