mirror of https://gitlab.com/QEF/q-e.git
1048 lines
43 KiB
TeX
1048 lines
43 KiB
TeX
\documentclass[12pt,a4paper]{article}
|
|
\def\version{6.3}
|
|
\def\qe{{\sc Quantum ESPRESSO}}
|
|
|
|
\usepackage{html}
|
|
|
|
% BEWARE: don't revert from graphicx for epsfig, because latex2html
|
|
% doesn't handle epsfig commands !!!
|
|
\usepackage{graphicx}
|
|
|
|
\textwidth = 17cm
|
|
\textheight = 24cm
|
|
\topmargin =-1 cm
|
|
\oddsidemargin = 0 cm
|
|
|
|
\def\pwx{\texttt{pw.x}}
|
|
\def\cpx{\texttt{cp.x}}
|
|
\def\phx{\texttt{ph.x}}
|
|
\def\nebx{\texttt{neb.x}}
|
|
\def\configure{\texttt{configure}}
|
|
\def\PWscf{\texttt{PWscf}}
|
|
\def\PHonon{\texttt{PHonon}}
|
|
\def\CP{\texttt{CP}}
|
|
\def\PostProc{\texttt{PostProc}}
|
|
\def\make{\texttt{make}}
|
|
|
|
\begin{document}
|
|
\author{}
|
|
\date{}
|
|
|
|
\def\qeImage{../../Doc/quantum_espresso.pdf}
|
|
|
|
\title{
|
|
\includegraphics[width=5cm]{\qeImage} \\
|
|
% title
|
|
\Huge \CP\ User's Guide (v. \version)
|
|
\\ \Large (only partially updated)
|
|
}
|
|
|
|
\maketitle
|
|
|
|
\tableofcontents
|
|
|
|
\section{Introduction}
|
|
|
|
This guide covers the usage of the
|
|
\CP\ package, version \version, a core component
|
|
of the \qe\ distribution.
|
|
Further documentation, beyond what is provided
|
|
in this guide, can be found in the directory
|
|
\texttt{CPV/Doc/}, containing a copy of this guide.
|
|
|
|
{\em Important notice: due to the lack of time and of manpower, this
|
|
manual is only partially updated and may contain outdated information.}
|
|
|
|
This guide assumes that you know the physics
|
|
that \CP\ describes and the methods it implements.
|
|
It also assumes that you have already installed,
|
|
or know how to install, \qe. If not, please read
|
|
the general User's Guide for \qe, found in
|
|
directory \texttt{Doc/} two levels above the
|
|
one containing this guide; or consult the web site:\\
|
|
\texttt{http://www.quantum-espresso.org}.
|
|
|
|
People who want to modify or contribute to
|
|
\CP\ should read the Developer Manual: \\
|
|
\texttt{Doc/developer\_man.pdf}.
|
|
|
|
\CP\ can perform Car-Parrinello molecular dynamics, including
|
|
variable-cell dynamics, and free-energy surface calculation at
|
|
fixed cell through meta-dynamics, if patched with PLUMED.
|
|
|
|
The \CP\ package is based on the original code written by Roberto Car
|
|
and Michele Parrinello. \CP\ was developed by Alfredo Pasquarello
|
|
(EPF Lausanne), Kari Laasonen (Oulu), Andrea Trave, Roberto
|
|
Car (Princeton), Nicola Marzari (EPF Lausanne), Paolo Giannozzi, and others.
|
|
FPMD, later merged with \CP, was developed by Carlo Cavazzoni,
|
|
Gerardo Ballabio (CINECA), Sandro Scandolo (ICTP),
|
|
Guido Chiarotti, Paolo Focher, and others.
|
|
We quote in particular:
|
|
\begin{itemize}
|
|
\item Federico Grasselli and Riccardo Bertossa (SISSA) for bug fixes,
|
|
extensions to Autopilot;
|
|
\item Biswajit Santra, Hsin-Yu Ko, Marcus Calegari Andrade (Princeton)
|
|
for SCAN functional;
|
|
\item Robert DiStasio (Cornell)), Biswajit Santra, and Hsin-Yu Ko
|
|
for hybrid functionals with MLWF;
|
|
(maximally localized Wannier functions);
|
|
\item Manu Sharma (Princeton) and Yudong Wu (Princeton) for dynamics
|
|
with MLWF;
|
|
\item Paolo Umari (Univ. Padua) for finite electric fields and conjugate
|
|
gradients;
|
|
\item Paolo Umari and Ismaila Dabo for ensemble-DFT;
|
|
\item Xiaofei Wang (Princeton) for META-GGA;
|
|
\item The Autopilot feature was implemented by Targacept, Inc.
|
|
\end{itemize}
|
|
This guide has been mostly writen by Gerardo Ballabio and Carlo Cavazzoni.
|
|
|
|
\CP\ is free software, released under the
|
|
GNU General Public License. \\ See
|
|
\texttt{http://www.gnu.org/licenses/old-licenses/gpl-2.0.txt},
|
|
or the file License in the distribution).
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\input ../../Doc/quote.tex
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\section{Compilation}
|
|
|
|
\CP\ is included in the core \qe\ distribution.
|
|
Instruction on how to install it can be found in the
|
|
general documentation (User's Guide) for \qe.
|
|
|
|
Typing \texttt{make cp} from the main \qe\ directory
|
|
or \make\ from the \texttt{CPV/} subdirectory produces the following codes in \texttt{CPV/src}:
|
|
\begin{itemize}
|
|
\item \cpx: Car-Parrinello Molecular Dynamics
|
|
code
|
|
\item \texttt{cppp.x}: (until v.6.3 only) postprocessing code for \cpx
|
|
\item \texttt{wfdd.x}: utility code for finding maximally
|
|
localized Wannier functions using damped dynamics.
|
|
\end{itemize}
|
|
Symlinks to executable programs will be placed in the \texttt{bin/} subdirectory.
|
|
|
|
As a final check that compilation was successful,
|
|
you may want to run some or all of the tests
|
|
and examples. Please see the general User's Guide for their setup. Automated tests for \cpx\ are in directory
|
|
\texttt{tests/} and can be run via the
|
|
script \texttt{check\_cp.x.j}
|
|
|
|
You may take the tests and examples distributed
|
|
with \CP\ as templates for writing your own input
|
|
files. Input files for tests are contained
|
|
in \texttt{tests/} subdirectory with file type
|
|
\texttt{*.in1}, \texttt{*.in2}, ... . Input file for examples
|
|
are produced if you run the examples in the
|
|
\texttt{results/} subdirectories, with names ending
|
|
with \texttt{.in}.
|
|
|
|
For general information on parallelism and how
|
|
to run in parallel execution, please see the general User's Guide. \CP\ currently can take advantage
|
|
of both MPI and OpenMP parallelization. The
|
|
``plane-wave'', ``linear-algebra'' and ``task-group''
|
|
parallelization levels are implemented.
|
|
|
|
\section{Input data}
|
|
|
|
Input data for \cpx\ is organized into several namelists, followed by other
|
|
fields (``cards'') introduced by keywords. The namelists are
|
|
|
|
\begin{tabular}{ll}
|
|
\&CONTROL:& general variables controlling the run\\
|
|
\&SYSTEM: &structural information on the system under investigation\\
|
|
\&ELECTRONS: &electronic variables, electron dynamics\\
|
|
\&IONS : &ionic variables, ionic dynamics\\
|
|
\&CELL (optional): &variable-cell dynamics\\
|
|
\end{tabular}
|
|
\\
|
|
The \texttt{\&CELL} namelist may be omitted for
|
|
fixed-cell calculations. This depends on the value of variable \texttt{calculation}
|
|
in namelist \&CONTROL. Most variables in namelists have default values. Only
|
|
the following variables in \&SYSTEM must always be specified:
|
|
|
|
\begin{tabular}{lll}
|
|
\texttt{ibrav} & (integer)& Bravais-lattice index\\
|
|
\texttt{celldm} &(real, dimension 6)& crystallographic constants\\
|
|
\texttt{nat} &(integer)& number of atoms in the unit cell\\
|
|
\texttt{ntyp} &(integer)& number of types of atoms in the unit cell\\
|
|
\texttt{ecutwfc} &(real)& kinetic energy cutoff (Ry) for wavefunctions.
|
|
\end{tabular} \\).
|
|
|
|
Explanations for the meaning of variables \texttt{ibrav} and \texttt{celldm},
|
|
as well as on alternative ways to input structural data,
|
|
are contained in files \texttt{Doc/INPUT\_CP.*}. These files are the reference for input data and describe
|
|
a large number of other variables as well. Almost all variables have default
|
|
values, which may or may not fit your needs.
|
|
|
|
Comment lines in namelists can be introduced by a "!", exactly as in
|
|
fortran code.
|
|
|
|
After the namelists, you have several fields (``cards'')
|
|
introduced by keywords with self-explanatory names:
|
|
\begin{quote}
|
|
ATOMIC\_SPECIES\\
|
|
ATOMIC\_POSITIONS\\
|
|
CELL\_PARAMETERS (optional)\\
|
|
OCCUPATIONS (optional)\\
|
|
\end{quote}
|
|
The keywords may be followed on the same line by an option. Unknown
|
|
fields are ignored.
|
|
See the files mentioned above for details on the available ``cards''.
|
|
|
|
Comments lines in ``cards'' can be introduced by either a ``!'' or a ``\#''
|
|
character in the first position of a line.
|
|
|
|
\subsection{Data files}
|
|
|
|
The output data files are written in the directory specified by variable
|
|
\texttt{outdir}, with names specified by variable \texttt{prefix} (a string that is prepended
|
|
to all file names, whose default value is: \texttt{prefix='pwscf'}). The \texttt{iotk}
|
|
toolkit is used to write the file in a XML format, whose definition can
|
|
be found in the Developer Manual. In order to use the data directory
|
|
on a different machine, you need to convert the binary files to formatted
|
|
and back, using the \texttt{bin/iotk} script.
|
|
|
|
The execution stops if you create a file \texttt{prefix.EXIT} either
|
|
in the working directory (i.e. where the program is executed), or in
|
|
the \texttt{outdir} directory. Note that with some versions of MPI,
|
|
the working directory is the directory where the executable is!
|
|
The advantage of this procedure is that all files are properly closed,
|
|
whereas just killing the process may leave data and output files in
|
|
an unusable state.
|
|
|
|
\subsection{Format of arrays containing charge density, potential, etc.}
|
|
|
|
The index of arrays used to store functions defined on 3D meshes is
|
|
actually a shorthand for three indices, following the FORTRAN convention
|
|
("leftmost index runs faster"). An example will explain this better.
|
|
Suppose you have a 3D array \texttt{psi(nr1x,nr2x,nr3x)}. FORTRAN
|
|
compilers store this array sequentially in the computer RAM in the following way:
|
|
\begin{verbatim}
|
|
psi( 1, 1, 1)
|
|
psi( 2, 1, 1)
|
|
...
|
|
psi(nr1x, 1, 1)
|
|
psi( 1, 2, 1)
|
|
psi( 2, 2, 1)
|
|
...
|
|
psi(nr1x, 2, 1)
|
|
...
|
|
...
|
|
psi(nr1x,nr2x, 1)
|
|
...
|
|
psi(nr1x,nr2x,nr3x)
|
|
etc
|
|
\end{verbatim}
|
|
Let \texttt{ind} be the position of the \texttt{(i,j,k)} element in the above list:
|
|
the following relation
|
|
\begin{verbatim}
|
|
ind = i + (j - 1) * nr1x + (k - 1) * nr2x * nr1x
|
|
\end{verbatim}
|
|
holds. This should clarify the relation between 1D and 3D indexing. In real
|
|
space, the \texttt{(i,j,k)} point of the FFT grid with dimensions
|
|
\texttt{nr1} ($\le$\texttt{nr1x}),
|
|
\texttt{nr2} ($\le$\texttt{nr2x}), , \texttt{nr3} ($\le$\texttt{nr3x}), is
|
|
$$
|
|
r_{ijk}=\frac{i-1}{nr1} \tau_1 + \frac{j-1}{nr2} \tau_2 +
|
|
\frac{k-1}{nr3} \tau_3
|
|
$$
|
|
where the $\tau_i$ are the basis vectors of the Bravais lattice.
|
|
The latter are stored row-wise in the \texttt{at} array:
|
|
$\tau_1 = $ \texttt{at(:, 1)},
|
|
$\tau_2 = $ \texttt{at(:, 2)},
|
|
$\tau_3 = $ \texttt{at(:, 3)}.
|
|
|
|
The distinction between the dimensions of the FFT grid,
|
|
\texttt{(nr1,nr2,nr3)} and the physical dimensions of the array,
|
|
\texttt{(nr1x,nr2x,nr3x)} is done only because it is computationally
|
|
convenient in some cases that the two sets are not the same.
|
|
In particular, it is often convenient to have \texttt{nrx1}=\texttt{nr1}+1
|
|
to reduce memory conflicts.
|
|
|
|
\section{Using \CP}
|
|
|
|
It is important to understand that a CP simulation is a sequence of different
|
|
runs, some of them used to "prepare" the initial state of the system, and
|
|
other performed to collect statistics, or to modify the state of the system
|
|
itself, i.e. modify the temperature or the pressure.
|
|
|
|
To prepare and run a CP simulation you should first of all
|
|
define the system:
|
|
\begin{quote}
|
|
atomic positions\\
|
|
system cell\\
|
|
pseudopotentials\\
|
|
cut-offs\\
|
|
number of electrons and bands (optional)\\
|
|
FFT grids (optional)
|
|
\end{quote}
|
|
An example of input file (Benzene Molecule):
|
|
\begin{verbatim}
|
|
&control
|
|
title = 'Benzene Molecule',
|
|
calculation = 'cp',
|
|
restart_mode = 'from_scratch',
|
|
ndr = 51,
|
|
ndw = 51,
|
|
nstep = 100,
|
|
iprint = 10,
|
|
isave = 100,
|
|
tstress = .TRUE.,
|
|
tprnfor = .TRUE.,
|
|
dt = 5.0d0,
|
|
etot_conv_thr = 1.d-9,
|
|
ekin_conv_thr = 1.d-4,
|
|
prefix = 'c6h6',
|
|
pseudo_dir='/scratch/benzene/',
|
|
outdir='/scratch/benzene/Out/'
|
|
/
|
|
&system
|
|
ibrav = 14,
|
|
celldm(1) = 16.0,
|
|
celldm(2) = 1.0,
|
|
celldm(3) = 0.5,
|
|
celldm(4) = 0.0,
|
|
celldm(5) = 0.0,
|
|
celldm(6) = 0.0,
|
|
nat = 12,
|
|
ntyp = 2,
|
|
nbnd = 15,
|
|
ecutwfc = 40.0,
|
|
nr1b= 10, nr2b = 10, nr3b = 10,
|
|
input_dft = 'BLYP'
|
|
/
|
|
&electrons
|
|
emass = 400.d0,
|
|
emass_cutoff = 2.5d0,
|
|
electron_dynamics = 'sd'
|
|
/
|
|
&ions
|
|
ion_dynamics = 'none'
|
|
/
|
|
&cell
|
|
cell_dynamics = 'none',
|
|
press = 0.0d0,
|
|
/
|
|
ATOMIC_SPECIES
|
|
C 12.0d0 c_blyp_gia.pp
|
|
H 1.00d0 h.ps
|
|
ATOMIC_POSITIONS (bohr)
|
|
C 2.6 0.0 0.0
|
|
C 1.3 -1.3 0.0
|
|
C -1.3 -1.3 0.0
|
|
C -2.6 0.0 0.0
|
|
C -1.3 1.3 0.0
|
|
C 1.3 1.3 0.0
|
|
H 4.4 0.0 0.0
|
|
H 2.2 -2.2 0.0
|
|
H -2.2 -2.2 0.0
|
|
H -4.4 0.0 0.0
|
|
H -2.2 2.2 0.0
|
|
H 2.2 2.2 0.0
|
|
\end{verbatim}
|
|
You can find the description of the input variables in file
|
|
\texttt{Doc/INPUT\_CP.*}.
|
|
|
|
\subsection{Reaching the electronic ground state}
|
|
|
|
The first run, when starting from scratch, is always an electronic
|
|
minimization, with fixed ions and cell, to bring the electronic system on the ground state (GS) relative to the starting atomic configuration. This step is conceptually very similar to
|
|
self-consistency in a \pwx\ run.
|
|
|
|
Sometimes a single run is not enough to reach the GS. In this case,
|
|
you need to re-run the electronic minimization stage. Use the input
|
|
of the first run, changing \texttt{restart\_mode = 'from\_scratch'}
|
|
to \texttt{restart\_mode = 'restart'}.
|
|
|
|
NOTA BENE: Unless you are already experienced with the system
|
|
you are studying or with the internals of the code, you will usually need
|
|
to tune some input parameters, like \texttt{emass}, \texttt{dt}, and cut-offs. For this
|
|
purpose, a few trial runs could be useful: you can perform short
|
|
minimizations (say, 10 steps) changing and adjusting these parameters
|
|
to fit your needs. You can specify the degree of convergence with these
|
|
two thresholds:
|
|
\begin{quote}
|
|
\texttt{etot\_conv\_thr}: total energy difference between two consecutive steps\\
|
|
\texttt{ekin\_conv\_thr}: value of the fictitious kinetic energy of the electrons.
|
|
\end{quote}
|
|
|
|
Usually we consider the system on the GS when
|
|
\texttt{ekin\_conv\_thr} $ < 10^{-5}$.
|
|
You could check the value of the fictitious kinetic energy on the standard
|
|
output (column EKINC).
|
|
|
|
Different strategies are available to minimize electrons, but the most used
|
|
ones are:
|
|
\begin{itemize}
|
|
\item steepest descent: \texttt{electron\_dynamics = 'sd'}
|
|
\item damped dynamics: \texttt{electron\_dynamics = 'damp'},
|
|
\texttt{electron\_damping} = a number typically ranging from 0.1 and 0.5
|
|
\end{itemize}
|
|
See the input description to compute the optimal damping factor.
|
|
|
|
\subsection{Relax the system}
|
|
|
|
Once your system is in the GS, depending on how you have prepared the starting
|
|
atomic configuration:
|
|
\begin{enumerate}
|
|
\item
|
|
if you have set the atomic positions "by hand" and/or from a classical code,
|
|
check the forces on atoms, and if they are large ($\sim 0.1 \div 1.0$
|
|
atomic units), you should perform an ionic minimization, otherwise the
|
|
system could break up during the dynamics.
|
|
\item
|
|
if you have taken the positions from a previous run or a previous ab-initio
|
|
simulation, check the forces, and if they are too small ($\sim 10^{-4}$
|
|
atomic units), this means that atoms are already in equilibrium positions
|
|
and, even if left free, they will not move. Then you need to randomize
|
|
positions a little bit (see below).
|
|
\end{enumerate}
|
|
|
|
Let us consider case 1). There are
|
|
different strategies to relax the system, but the most used
|
|
are again steepest-descent or damped-dynamics for ions and electrons.
|
|
You could also mix electronic and ionic minimization scheme freely,
|
|
i.e. ions in steepest-descent and electron in with damped-dynamics or vice versa.
|
|
\begin{itemize}
|
|
\item[(a)] suppose we want to perform steepest-descent for ions. Then we should specify
|
|
the following section for ions:
|
|
\begin{verbatim}
|
|
&ions
|
|
ion_dynamics = 'sd'
|
|
/
|
|
\end{verbatim}
|
|
Change also the ionic masses to accelerate the minimization:
|
|
\begin{verbatim}
|
|
ATOMIC_SPECIES
|
|
C 2.0d0 c_blyp_gia.pp
|
|
H 2.00d0 h.ps
|
|
\end{verbatim}
|
|
while leaving other input parameters unchanged.
|
|
{\em Note} that if the forces are really high ($> 1.0$ atomic units), you
|
|
should always use steepest descent for the first ($\sim 100$
|
|
relaxation steps.
|
|
\item[(b)] As the system approaches the equilibrium positions, the steepest
|
|
descent scheme slows down, so is better to switch to damped dynamics:
|
|
\begin{verbatim}
|
|
&ions
|
|
ion_dynamics = 'damp',
|
|
ion_damping = 0.2,
|
|
ion_velocities = 'zero'
|
|
/
|
|
\end{verbatim}
|
|
A value of \texttt{ion\_damping} around 0.05 is good for many systems.
|
|
It is also better to specify to restart with zero ionic and electronic
|
|
velocities, since we have changed the masses.
|
|
|
|
Change further the ionic masses to accelerate the minimization:
|
|
\begin{verbatim}
|
|
ATOMIC_SPECIES
|
|
C 0.1d0 c_blyp_gia.pp
|
|
H 0.1d0 h.ps
|
|
\end{verbatim}
|
|
\item[(c)] when the system is really close to the equilibrium, the damped dynamics
|
|
slow down too, especially because, since we are moving electron and ions
|
|
together, the ionic forces are not properly correct, then it is often better
|
|
to perform a ionic step every N electronic steps, or to move ions only when
|
|
electron are in their GS (within the chosen threshold).
|
|
|
|
This can be specified by adding, in the ionic section, the
|
|
\texttt{ion\_nstepe}
|
|
parameter, then the \&IONS namelist become as follows:
|
|
\begin{verbatim}
|
|
&ions
|
|
ion_dynamics = 'damp',
|
|
ion_damping = 0.2,
|
|
ion_velocities = 'zero',
|
|
ion_nstepe = 10
|
|
/
|
|
\end{verbatim}
|
|
Then we specify in the \&CONTROL namelist:
|
|
\begin{verbatim}
|
|
etot_conv_thr = 1.d-6,
|
|
ekin_conv_thr = 1.d-5,
|
|
forc_conv_thr = 1.d-3
|
|
\end{verbatim}
|
|
As a result, the code checks every 10 electronic steps whether
|
|
the electronic system satisfies the two thresholds
|
|
\texttt{etot\_conv\_thr}, \texttt{ekin\_conv\_thr}: if it does,
|
|
the ions are advanced by one step.
|
|
The process thus continues until the forces become smaller than
|
|
\texttt{forc\_conv\_thr}.
|
|
|
|
{\em Note} that to fully relax the system you need many runs, and different
|
|
strategies, that you should mix and change in order to speed-up the convergence.
|
|
The process is not automatic, but is strongly based on experience, and trial
|
|
and error.
|
|
|
|
Remember also that the convergence to the equilibrium positions depends on
|
|
the energy threshold for the electronic GS, in fact correct forces (required
|
|
to move ions toward the minimum) are obtained only when electrons are in their
|
|
GS. Then a small threshold on forces could not be satisfied, if you do not
|
|
require an even smaller threshold on total energy.
|
|
\end{itemize}
|
|
|
|
Let us now move to case 2: randomization of positions.
|
|
|
|
If you have relaxed the system or if the starting system is already in
|
|
the equilibrium positions, then you need to displace ions from the equilibrium
|
|
positions, otherwise they will not move in a dynamics simulation.
|
|
After the randomization you should bring electrons on the GS again,
|
|
in order to start a dynamic with the correct forces and with electrons
|
|
in the GS. Then you should switch off the ionic dynamics and activate
|
|
the randomization for each species, specifying the amplitude of the
|
|
randomization itself. This could be done with the following
|
|
\&IONS namelist:
|
|
\begin{verbatim}
|
|
&ions
|
|
ion_dynamics = 'none',
|
|
tranp(1) = .TRUE.,
|
|
tranp(2) = .TRUE.,
|
|
amprp(1) = 0.01
|
|
amprp(2) = 0.01
|
|
/
|
|
\end{verbatim}
|
|
In this way a random displacement (of max 0.01 a.u.) is added to atoms of
|
|
species 1 and 2. All other input parameters could remain the same.
|
|
Note that the difference in the total energy (etot) between relaxed and
|
|
randomized positions can be used to estimate the temperature that will
|
|
be reached by the system. In fact, starting with zero ionic velocities,
|
|
all the difference is potential energy, but in a dynamics simulation, the
|
|
energy will be equipartitioned between kinetic and potential, then to
|
|
estimate the temperature take the difference in energy (de), convert it
|
|
in Kelvin, divide for the number of atoms and multiply by 2/3.
|
|
Randomization could be useful also while we are relaxing the system,
|
|
especially when we suspect that the ions are in a local minimum or in
|
|
an energy plateau.
|
|
|
|
\subsection{CP dynamics}
|
|
|
|
At this point after having minimized the electrons, and with ions displaced from their equilibrium positions, we are ready to start a CP
|
|
dynamics. We need to specify \texttt{'verlet'} both in ionic and electronic
|
|
dynamics. The threshold in control input section will be ignored, like
|
|
any parameter related to minimization strategy. The first time we perform
|
|
a CP run after a minimization, it is always better to put velocities equal
|
|
to zero, unless we have velocities, from a previous simulation, to
|
|
specify in the input file. Restore the proper masses for the ions. In this
|
|
way we will sample the microcanonical ensemble. The input section
|
|
changes as follow:
|
|
\begin{verbatim}
|
|
&electrons
|
|
emass = 400.d0,
|
|
emass_cutoff = 2.5d0,
|
|
electron_dynamics = 'verlet',
|
|
electron_velocities = 'zero'
|
|
/
|
|
&ions
|
|
ion_dynamics = 'verlet',
|
|
ion_velocities = 'zero'
|
|
/
|
|
ATOMIC_SPECIES
|
|
C 12.0d0 c_blyp_gia.pp
|
|
H 1.00d0 h.ps
|
|
\end{verbatim}
|
|
|
|
If you want to specify the initial velocities for ions, you have to set
|
|
\texttt{ion\_velocities ='from\_input'}, and add the IONIC\_VELOCITIES
|
|
card, after the ATOMIC\_POSITION card, with the list of velocities in
|
|
atomic units.
|
|
|
|
NOTA BENE: in restarting the dynamics after the first CP run,
|
|
remember to remove or comment the velocities parameters:
|
|
\begin{verbatim}
|
|
&electrons
|
|
emass = 400.d0,
|
|
emass_cutoff = 2.5d0,
|
|
electron_dynamics = 'verlet'
|
|
! electron_velocities = 'zero'
|
|
/
|
|
&ions
|
|
ion_dynamics = 'verlet'
|
|
! ion_velocities = 'zero'
|
|
/
|
|
\end{verbatim}
|
|
otherwise you will quench the system interrupting the sampling of the
|
|
microcanonical ensemble.
|
|
|
|
\paragraph{ Varying the temperature }
|
|
|
|
It is possible to change the temperature of the system or to sample the
|
|
canonical ensemble fixing the average temperature, this is done using
|
|
the Nos\'e thermostat. To activate this thermostat for ions you have
|
|
to specify in namelist \&IONS:
|
|
\begin{verbatim}
|
|
&ions
|
|
ion_dynamics = 'verlet',
|
|
ion_temperature = 'nose',
|
|
fnosep = 60.0,
|
|
tempw = 300.0
|
|
/
|
|
\end{verbatim}
|
|
where \texttt{fnosep} is the frequency of the thermostat in THz, that should be
|
|
chosen to be comparable with the center of the vibrational spectrum of
|
|
the system, in order to excite as many vibrational modes as possible.
|
|
\texttt{tempw} is the desired average temperature in Kelvin.
|
|
|
|
{\em Note:} to avoid a strong coupling between the Nos\'e thermostat
|
|
and the system, proceed step by step. Don't switch on the thermostat
|
|
from a completely relaxed configuration: adding a random displacement
|
|
is strongly recommended. Check which is the average temperature via a
|
|
few steps of a microcanonical simulation. Don't increase the temperature
|
|
too much. Finally switch on the thermostat. In the case of molecular system,
|
|
different modes have to be thermalized: it is better to use a chain of
|
|
thermostat or equivalently running different simulations with different
|
|
frequencies.
|
|
|
|
\paragraph{ No\'se thermostat for electrons }
|
|
|
|
It is possible to specify also the thermostat for the electrons. This is
|
|
usually activated in metals or in systems where we have a transfer of
|
|
energy between ionic and electronic degrees of freedom. Beware: the
|
|
usage of electronic thermostats is quite delicate. The following information
|
|
comes from K. Kudin:
|
|
|
|
''The main issue is that there is usually some "natural" fictitious kinetic
|
|
energy that electrons gain from the ionic motion ("drag"). One could easily
|
|
quantify how much of the fictitious energy comes from this drag by doing a CP
|
|
run, then a couple of CG (same as BO) steps, and then going back to CP.
|
|
The fictitious electronic energy at the last CP restart will be purely
|
|
due to the drag effect.''
|
|
|
|
''The thermostat on electrons will either try to overexcite the otherwise
|
|
"cold" electrons, or it will try to take them down to an unnaturally cold
|
|
state where their fictitious kinetic energy is even below what would be
|
|
just due pure drag. Neither of this is good.''
|
|
|
|
''I think the only workable regime with an electronic thermostat is a
|
|
mild overexcitation of the electrons, however, to do this one will need
|
|
to know rather precisely what is the fictitious kinetic energy due to the
|
|
drag.''
|
|
|
|
|
|
\subsection{Advanced usage}
|
|
|
|
\subsubsection{ Self-interaction Correction }
|
|
|
|
The self-interaction correction (SIC) included in the \CP\
|
|
package is based
|
|
on the Constrained Local-Spin-Density approach proposed my F. Mauri and
|
|
coworkers (M. D'Avezac et al. PRB 71, 205210 (2005)). It was used for
|
|
the first time in \qe\ by F. Baletto, C. Cavazzoni
|
|
and S.Scandolo (PRL 95, 176801 (2005)).
|
|
|
|
This approach is a simple and nice way to treat ONE, and only one,
|
|
excess charge. It is moreover necessary to check a priori that
|
|
the spin-up and spin-down eigenvalues are not too different, for the
|
|
corresponding neutral system, working in the Local-Spin-Density
|
|
Approximation (setting \texttt{nspin = 2}). If these two conditions are satisfied
|
|
and you are interest in charged systems, you can apply the SIC.
|
|
This approach is a on-the-fly method to correct the self-interaction
|
|
with the excess charge with itself.
|
|
|
|
Briefly, both the Hartree and the XC part have been
|
|
corrected to avoid the interaction of the excess charge with itself.
|
|
|
|
For example, for the Boron atoms, where we have an even number of
|
|
electrons (valence electrons = 3), the parameters for working with
|
|
the SIC are:
|
|
\begin{verbatim}
|
|
&system
|
|
nbnd= 2,
|
|
tot_magnetization=1,
|
|
sic_alpha = 1.d0,
|
|
sic_epsilon = 1.0d0,
|
|
sic = 'sic_mac',
|
|
force_pairing = .true.,
|
|
\end{verbatim}
|
|
The two main parameters are:
|
|
\begin{quote}
|
|
\texttt{force\_pairing = .true.}, which forces the paired electrons to be the same;\\
|
|
\texttt{sic='sic\_mac'}, which instructs the code to use Mauri's correction.
|
|
\end{quote}
|
|
|
|
{\bf Warning}:
|
|
This approach has known problems for dissociation mechanism
|
|
driven by excess electrons.
|
|
|
|
Comment 1:
|
|
Two parameters, \texttt{sic\_alpha} and \texttt{sic\_epsilon'}, have been introduced
|
|
following the suggestion of M. Sprik (ICR(05)) to treat the radical
|
|
(OH)-H$_2$O. In any case, a complete ab-initio approach is followed
|
|
using \texttt{sic\_alpha=1}, \texttt{sic\_epsilon=1}.
|
|
|
|
Comment 2:
|
|
When you apply this SIC scheme to a molecule or to an atom, which are neutral,
|
|
remember to add the correction to the energy level as proposed by Landau:
|
|
in a neutral system, subtracting the self-interaction, the unpaired electron
|
|
feels a charged system, even if using a compensating positive background.
|
|
For a cubic box, the correction term due to the Madelung energy is approx.
|
|
given by $1.4186/L_{box} - 1.047/(L_{box})^3$, where $L_{box}$ is the
|
|
linear dimension of your box (=celldm(1)). The Madelung coefficient is
|
|
taken from I. Dabo et al. PRB 77, 115139 (2007).
|
|
(info by F. Baletto, francesca.baletto@kcl.ac.uk)
|
|
|
|
% \subsubsection{ Variable-cell MD }
|
|
|
|
%The variable-cell MD is when the Car-Parrinello technique is also applied
|
|
%to the cell. This technique is useful to study system at very high pressure.
|
|
|
|
\subsubsection{ ensemble-DFT }
|
|
|
|
The ensemble-DFT (eDFT) is a robust method to simulate the metals in the
|
|
framework of ''ab-initio'' molecular dynamics. It was introduced in 1997
|
|
by Marzari et al.
|
|
|
|
The specific subroutines for the eDFT are in
|
|
\texttt{CPV/src/ensemble\_dft.f90} where you
|
|
define all the quantities of interest. The subroutine
|
|
\texttt{CPV/src/inner\_loop\_cold.f90}
|
|
called by \texttt{cg\_sub.f90}, control the inner loop, and so the minimization of
|
|
the free energy $A$ with respect to the occupation matrix.
|
|
|
|
To select a eDFT calculations, the user has to set:
|
|
\begin{verbatim}
|
|
calculation = 'cp'
|
|
occupations= 'ensemble'
|
|
tcg = .true.
|
|
passop= 0.3
|
|
maxiter = 250
|
|
\end{verbatim}
|
|
to use the CG procedure. In the eDFT it is also the outer loop, where the
|
|
energy is minimized with respect to the wavefunction keeping fixed the
|
|
occupation matrix. While the specific parameters for the inner loop.
|
|
Since eDFT was born to treat metals, keep in mind that we want to describe
|
|
the broadening of the occupations around the Fermi energy.
|
|
Below the new parameters in the electrons list, are listed.
|
|
\begin{itemize}
|
|
\item \texttt{smearing}: used to select the occupation distribution;
|
|
there are two options: Fermi-Dirac smearing='fd', cold-smearing
|
|
smearing='cs' (recommended)
|
|
\item \texttt{degauss}: is the electronic temperature; it controls the broadening
|
|
of the occupation numbers around the Fermi energy.
|
|
\item \texttt{ninner}: is the number of iterative cycles in the inner loop,
|
|
done to minimize the free energy $A$ with respect the occupation numbers.
|
|
The typical range is 2-8.
|
|
\item \texttt{conv\_thr}: is the threshold value to stop the search of the 'minimum'
|
|
free energy.
|
|
\item \texttt{niter\_cold\_restart}: controls the frequency at which a full iterative
|
|
inner cycle is done. It is in the range $1\div$\texttt{ninner}. It is a trick to speed up
|
|
the calculation.
|
|
\item \texttt{lambda\_cold}: is the length step along the search line for the best
|
|
value for $A$, when the iterative cycle is not performed. The value is close
|
|
to 0.03, smaller for large and complicated metallic systems.
|
|
\end{itemize}
|
|
{\em NOTE:} \texttt{degauss} is in Hartree, while in \PWscf is in Ry (!!!).
|
|
The typical range is 0.01-0.02 Ha.
|
|
|
|
The input for an Al surface is:
|
|
\begin{verbatim}
|
|
&CONTROL
|
|
calculation = 'cp',
|
|
restart_mode = 'from_scratch',
|
|
nstep = 10,
|
|
iprint = 5,
|
|
isave = 5,
|
|
dt = 125.0d0,
|
|
prefix = 'Aluminum_surface',
|
|
pseudo_dir = '~/UPF/',
|
|
outdir = '/scratch/'
|
|
ndr=50
|
|
ndw=51
|
|
/
|
|
&SYSTEM
|
|
ibrav= 14,
|
|
celldm(1)= 21.694d0, celldm(2)= 1.00D0, celldm(3)= 2.121D0,
|
|
celldm(4)= 0.0d0, celldm(5)= 0.0d0, celldm(6)= 0.0d0,
|
|
nat= 96,
|
|
ntyp= 1,
|
|
nspin=1,
|
|
ecutwfc= 15,
|
|
nbnd=160,
|
|
input_dft = 'pbe'
|
|
occupations= 'ensemble',
|
|
smearing='cs',
|
|
degauss=0.018,
|
|
/
|
|
&ELECTRONS
|
|
orthogonalization = 'Gram-Schmidt',
|
|
startingwfc = 'random',
|
|
ampre = 0.02,
|
|
tcg = .true.,
|
|
passop= 0.3,
|
|
maxiter = 250,
|
|
emass_cutoff = 3.00,
|
|
conv_thr=1.d-6
|
|
n_inner = 2,
|
|
lambda_cold = 0.03,
|
|
niter_cold_restart = 2,
|
|
/
|
|
&IONS
|
|
ion_dynamics = 'verlet',
|
|
ion_temperature = 'nose'
|
|
fnosep = 4.0d0,
|
|
tempw = 500.d0
|
|
/
|
|
ATOMIC_SPECIES
|
|
Al 26.89 Al.pbe.UPF
|
|
\end{verbatim}
|
|
{\em NOTA1} remember that the time step is to integrate the ionic dynamics,
|
|
so you can choose something in the range of 1-5 fs. \\
|
|
{\em NOTA2} with eDFT you are simulating metals or systems for which the
|
|
occupation number is also fractional, so the number of band, \texttt{nbnd}, has to
|
|
be chosen such as to have some empty states. As a rule of thumb, start
|
|
with an initial occupation number of about 1.6-1.8 (the more bands you
|
|
consider, the more the calculation is accurate, but it also takes longer.
|
|
The CPU time scales almost linearly with the number of bands.) \\
|
|
{\em NOTA3} the parameter \texttt{emass\_cutoff} is used in the preconditioning
|
|
and it has a completely different meaning with respect to plain CP.
|
|
It ranges between 4 and 7.
|
|
|
|
All the other parameters have the same meaning in the usual \CP\ input,
|
|
and they are discussed above.
|
|
|
|
\subsubsection{Free-energy surface calculations}
|
|
Once \texttt{CP} is patched with \texttt{PLUMED} plug-in, it becomes possible to turn-on most of the PLUMED functionalities
|
|
running \texttt{CP} as: \texttt{./cp.x -plumed} plus the other usual \texttt{CP} arguments. The PLUMED input file has to be located in the specified \texttt{outdir} with
|
|
the fixed name \texttt{plumed.dat}.
|
|
|
|
|
|
\subsubsection{Treatment of USPPs}
|
|
|
|
The cutoff \texttt{ecutrho} defines the resolution on the real space FFT mesh (as expressed
|
|
by \texttt{nr1}, \texttt{nr2} and \texttt{nr3}, that the code left on its own sets automatically).
|
|
In the USPP case we refer to this mesh as the "hard" mesh, since it
|
|
is denser than the smooth mesh that is needed to represent the square
|
|
of the non-norm-conserving wavefunctions.
|
|
|
|
On this "hard", fine-spaced mesh, you need to determine the size of the
|
|
cube that will encompass the largest of the augmentation charges - this
|
|
is what \texttt{nr1b}, \texttt{nr2b}, \texttt{nr3b} are. hey are independent
|
|
of the system size, but dependent on the size
|
|
of the augmentation charge (an atomic property that doesn't vary
|
|
that much for different systems) and on the
|
|
real-space resolution needed by augmentation charges (rule of thumb:
|
|
\texttt{ecutrho} is between 6 and 12 times \texttt{ecutwfc}).
|
|
|
|
The small boxes should be set as small as possible, but large enough
|
|
to contain the core of the largest element in your system.
|
|
The formula for estimating the box size is quite simple:
|
|
\begin{quote}
|
|
\texttt{nr1b} = $2 R_c / L_x \times$ \texttt{nr1}
|
|
\end{quote}
|
|
and the like, where $R_{cut}$ is largest cut-off radius among the various atom
|
|
types present in the system, $L_x$ is the
|
|
physical length of your box along the $x$ axis. You have to round your
|
|
result to the nearest larger integer.
|
|
In practice, \texttt{nr1b} etc. are often in the region of 20-24-28; testing seems
|
|
again a necessity.
|
|
|
|
The core charge is in principle finite only at the core region (as defined
|
|
by some $R_{rcut}$ ) and vanishes out side the core. Numerically the charge is
|
|
represented in a Fourier series which may give rise to small charge
|
|
oscillations outside the core and even to negative charge density, but
|
|
only if the cut-off is too low. Having these small boxes removes the
|
|
charge oscillations problem (at least outside the box) and also offers
|
|
some numerical advantages in going to higher cut-offs." (info by Nicola Marzari)
|
|
|
|
\subsubsection{Hybrid functional calculations using maximally localized Wannier functions}
|
|
In this section, we illustrate some guidelines to perform exact exchange (EXX) calculations using Wannier functions efficiently.
|
|
|
|
The references for this algorithm are:
|
|
\begin{itemize}
|
|
\item[(i)] Theory: X. Wu , A. Selloni, and R. Car, Phys. Rev. B 79, 085102 (2009).
|
|
\item[(ii)] Implementation: H.-Y. Ko, B. Santra, R. A. DiStasio, L. Kong, Z. Li, X. Wu, and R. Car, arxiv.
|
|
\end{itemize}
|
|
|
|
The parallelization scheme in this algorithm is based upon the number of electronic states.
|
|
In the current implementation, there are certain restrictions on the choice of the number of MPI tasks.
|
|
Also slightly different algorithms are employed depending on whether the number of MPI tasks used in the calculation are greater or less than the number of electronic states.
|
|
We highly recommend users to follow the notes below.
|
|
This algorithm can be used most efficiently if the numbers of electronic states are uniformly distributed over the number of MPI tasks.
|
|
For a system having N electronic states the optimum numbers of MPI tasks (nproc) are the following:
|
|
|
|
\begin{itemize}
|
|
\item[(a)] In case of nproc $\leq$ N, the optimum choices are N/m, where m is any positive integer.
|
|
\begin{itemize}
|
|
\item Robustness: Can be used for odd and even number of electronic states.
|
|
\item OpenMP threads: Can be used.
|
|
\item Taskgroup: Only the default value of the task group (-ntg 1) is allowed.
|
|
\end{itemize}
|
|
\item[(b)] In case of nproc $>$ N, the optimum choices are N*m, where m is any positive integer.
|
|
\begin{itemize}
|
|
\item Robustness: Can be used for even number of electronic states.
|
|
\item Largest value of m: As long as nj\_max (see output) is greater than 1,
|
|
however beyond m=8 the scaling may become poor. The scaling should be tested by users.
|
|
\item OpenMP threads: Can be used and highly recommended. We have tested number of threads
|
|
starting from 2 up to 64. More threads are also allowed.
|
|
For very large calculations (nproc $>$ 1000 ) efficiency can largely depend on the computer
|
|
architecture and the balance between the MPI tasks and the OpenMP threads.
|
|
User should test for an optimal balance. Reasonably good scaling can be achieved by using
|
|
m=6-8 and OpenMP threads=2-16.
|
|
\item Taskgroup: Can be greater than 1 and users should choose the largest possible value
|
|
for ntg. To estimate ntg, find the value of nr3x in the output and compute nproc/nr3x and
|
|
take the integer value. We have tested the value of ntg as $2^m$, where m is any positive integer.
|
|
Other values of ntg should be used with caution.
|
|
\item Ndiag: Use -ndiag X option in the execution of cp.x. Without this option jobs
|
|
may crash on certain architectures. Set X to any perfect square number which is equal to or less than N.
|
|
\end{itemize}
|
|
\item DEBUG: The EXX calculations always work when number of MPI tasks = number of electronic states.
|
|
In case of any uncertainty, the EXX energy computed using different numbers of MPI tasks can be
|
|
checked by performing test calculations using number of MPI tasks = number of electronic states.
|
|
\end{itemize}
|
|
|
|
An example input is listed as following:
|
|
\begin{verbatim}
|
|
&CONTROL
|
|
calculation = 'cp-wf',
|
|
title = "(H2O)32 Molecule: electron minimization PBE0",
|
|
restart_mode = "from_scratch",
|
|
pseudo_dir = './',
|
|
outdir = './',
|
|
prefix = "water",
|
|
nstep = 220,
|
|
iprint = 100,
|
|
isave = 100,
|
|
dt = 4.D0,
|
|
ekin_conv_thr = 1.D-5,
|
|
etot_conv_thr = 1.D-5,
|
|
/
|
|
&SYSTEM
|
|
ibrav = 1,
|
|
celldm(1) = 18.6655,
|
|
nat = 96,
|
|
ntyp = 2,
|
|
ecutwfc = 85.D0,
|
|
input_dft = 'pbe0',
|
|
/
|
|
&ELECTRONS
|
|
emass = 400.D0,
|
|
emass_cutoff = 3.D0,
|
|
ortho_eps = 1.D-8,
|
|
ortho_max = 300,
|
|
electron_dynamics = "damp",
|
|
electron_damping = 0.1D0,
|
|
/
|
|
&IONS
|
|
ion_dynamics = "none",
|
|
/
|
|
&WANNIER
|
|
nit = 60,
|
|
calwf = 3,
|
|
tolw = 1.D-6,
|
|
nsteps = 20,
|
|
adapt = .FALSE.
|
|
wfdt = 4.D0,
|
|
wf_q = 500,
|
|
wf_friction = 0.3D0,
|
|
exx_neigh = 60, ! exx related optional
|
|
exx_dis_cutoff = 8.0D0, ! exx related optional
|
|
exx_ps_rcut_self = 6.0D0, ! exx related optional
|
|
exx_ps_rcut_pair = 5.0D0, ! exx related optional
|
|
exx_me_rcut_self = 9.3D0, ! exx related optional
|
|
exx_me_rcut_pair = 7.0D0, ! exx related optional
|
|
exx_poisson_eps = 1.D-6, ! exx related optional
|
|
/
|
|
ATOMIC_SPECIES
|
|
O 16.0D0 O_HSCV_PBE-1.0.UPF
|
|
H 2.0D0 H_HSCV_PBE-1.0.UPF
|
|
\end{verbatim}
|
|
|
|
|
|
\section{Performances}
|
|
|
|
% \subsection{Execution time}
|
|
|
|
% \subsection{Memory requirements}
|
|
|
|
% \subsection{File space requirements}
|
|
|
|
% \subsection{Parallelization issues}
|
|
% \label{SubSec:badpara}
|
|
|
|
\cpx\ can run in principle on any number of processors.
|
|
The effectiveness of parallelization is ultimately judged by the
|
|
''scaling'', i.e. how the time needed to perform a job scales
|
|
with the number of processors, and depends upon:
|
|
\begin{itemize}
|
|
\item the size and type of the system under study;
|
|
\item the judicious choice of the various levels of parallelization
|
|
(detailed in Sec.\ref{SubSec:para});
|
|
\item the availability of fast interprocess communications (or lack of it).
|
|
\end{itemize}
|
|
Ideally one would like to have linear scaling, i.e. $T \sim T_0/N_p$ for
|
|
$N_p$ processors, where $T_0$ is the estimated time for serial execution.
|
|
In addition, one would like to have linear scaling of
|
|
the RAM per processor: $O_N \sim O_0/N_p$, so that large-memory systems
|
|
fit into the RAM of each processor.
|
|
|
|
As a general rule, image parallelization:
|
|
\begin{itemize}
|
|
\item may give good scaling, but the slowest image will determine
|
|
the overall performances (''load balancing'' may be a problem);
|
|
\item requires very little communications (suitable for ethernet
|
|
communications);
|
|
\item does not reduce the required memory per processor (unsuitable for
|
|
large-memory jobs).
|
|
\end{itemize}
|
|
Parallelization on k-points:
|
|
\begin{itemize}
|
|
\item guarantees (almost) linear scaling if the number of k-points
|
|
is a multiple of the number of pools;
|
|
\item requires little communications (suitable for ethernet communications);
|
|
\item does not reduce the required memory per processor (unsuitable for
|
|
large-memory jobs).
|
|
\end{itemize}
|
|
Parallelization on PWs:
|
|
\begin{itemize}
|
|
\item yields good to very good scaling, especially if the number of processors
|
|
in a pool is a divisor of $N_3$ and $N_{r3}$ (the dimensions along the z-axis
|
|
of the FFT grids, \texttt{nr3} and \texttt{nr3s}, which coincide for NCPPs);
|
|
\item requires heavy communications (suitable for Gigabit ethernet up to
|
|
4, 8 CPUs at most, specialized communication hardware needed for 8 or more
|
|
processors );
|
|
\item yields almost linear reduction of memory per processor with the number
|
|
of processors in the pool.
|
|
\end{itemize}
|
|
|
|
A note on scaling: optimal serial performances are achieved when the data are
|
|
as much as possible kept into the cache. As a side effect, PW
|
|
parallelization may yield superlinear (better than linear) scaling,
|
|
thanks to the increase in serial speed coming from the reduction of data size
|
|
(making it easier for the machine to keep data in the cache).
|
|
|
|
VERY IMPORTANT: For each system there is an optimal range of number of processors on which to
|
|
run the job. A too large number of processors will yield performance
|
|
degradation. If the size of pools is especially delicate: $N_p$ should not
|
|
exceed $N_3$ and $N_{r3}$, and should ideally be no larger than
|
|
$1/2\div1/4 N_3$ and/or $N_{r3}$. In order to increase scalability,
|
|
it is often convenient to
|
|
further subdivide a pool of processors into ''task groups''.
|
|
When the number of processors exceeds the number of FFT planes,
|
|
data can be redistributed to "task groups" so that each group
|
|
can process several wavefunctions at the same time.
|
|
|
|
The optimal number of processors for "linear-algebra"
|
|
parallelization, taking care of multiplication and diagonalization
|
|
of $M\times M$ matrices, should be determined by observing the
|
|
performances of \texttt{cdiagh/rdiagh} (\pwx) or \texttt{ortho} (\cpx)
|
|
for different numbers of processors in the linear-algebra group
|
|
(must be a square integer).
|
|
|
|
Actual parallel performances will also depend on the available software
|
|
(MPI libraries) and on the available communication hardware. For
|
|
PC clusters, OpenMPI (\texttt{http://www.openmpi.org/}) seems to yield better
|
|
performances than other implementations (info by Kostantin Kudin).
|
|
Note however that you need a decent communication hardware (at least
|
|
Gigabit ethernet) in order to have acceptable performances with
|
|
PW parallelization. Do not expect good scaling with cheap hardware:
|
|
PW calculations are by no means an "embarrassing parallel" problem.
|
|
|
|
Also note that multiprocessor motherboards for Intel Pentium CPUs typically
|
|
have just one memory bus for all processors. This dramatically
|
|
slows down any code doing massive access to memory (as most codes
|
|
in the \qe\ distribution do) that runs on processors of the same
|
|
motherboard.
|
|
|
|
\end{document}
|