quantum-espresso/Doc/manual.tex

2859 lines
112 KiB
TeX

\documentclass[12pt,a4paper]{article}
\def\version{CVS}
\usepackage{epsfig}
\usepackage{html}
%\def\htmladdnormallink#1#2{#1}
\begin{document}
\author{}
\date{}
\title{
% PWscf and Democritos logos, raise the latter to align
\epsfig{figure=pwscf,width=4cm}\hfill%
\raisebox{0.5cm}{\epsfig{figure=democritos,width=8cm}}
\vspace{1.5cm}
\\
% title
\huge User's Guide for Quantum-ESPRESSO v.\version
}
\maketitle
\tableofcontents
\clearpage
\section{Introduction}
This guide covers the installation and usage of Quantum ($\nu$)
ESPRESSO (opEn-Source Package for Research in Electronic Structure,
Simulation, and Optimization), version \version. The $\nu-$ESPRESSO
package contains the
following codes for the calculation of electronic-structure properties
within Density-Functional Theory, using a Plane-Wave basis set and
pseudopotentials:
\begin{itemize}
\item PWscf (Plane-Wave Self-Consistent Field).
\item FPMD (First Principles Molecular Dynamics).
\item CP (Car-Parrinello).
\end{itemize}
Moreover it contains auxiliary codes:
\begin{itemize}
\item PWgui (Graphical User Interface for PWscf): a graphical
interface for producing input data files for PWscf.
\item atomic: a program for atomic calculations and generation of
pseudopotentials.
\end{itemize}
Documentation, in addition to what provided in this guide, can be
found in:
\begin{itemize}
\item the \texttt{Doc/} directory of the $\nu-$ESPRESSO distribution
In particular the \texttt{INPUT\_*} files contain the detailed
listing of available input variables and cards.
\item the various \texttt{README} files found in the distribution
\item the PWscf web site
(\htmladdnormallink{\texttt{http://www.pwscf.org/}}%
{http://www.pwscf.org/})
\item the Pw\_forum mailing list
(\htmladdnormallink{\texttt{pw\_forum@pwscf.org}}%
{mailto:pw_forum@pwscf.org})
See the PWscf web site for instructions on how to subscribe
and how to browse and search the archives of the mailing list.
Please search the archives before posting to the list: your
question might already have been answered.
\item the ``Scientific Software'' page of the Democritos web site
\hfill\break
(\htmladdnormallink%
{\texttt{http://www.democritos.it/scientific.php}}%
{http://www.democritos.it/scientific.php})
\end{itemize}
The $\nu-$ESPRESSO codes work on many different types of Unix machines,
including parallel machines using Message Passing Interface (MPI).
Running $\nu-$ESPRESSO on MS-Windows is possible, but not supported:
see section \ref{installation}, ``Installation''.
\subsection{Codes}
PWscf can currently perform the following kinds of calculations:
\begin{itemize}
\item ground-state energy and one-electron (Kohn-Sham) orbitals
\item atomic forces, stresses, and structural optimization
\item molecular dynamics on the ground-state Born-Oppenheimer
surface, also with variable-cell
\item Nudged Elastic Band (NEB) and Fourier String Method Dynamics (SMD)
for energy barriers and reaction paths
\item phonon frequencies and eigenvectors at a generic wave vector,
using Density-Functional Perturbation Theory
\item effective charges and dielectric tensors
\item electron-phonon interaction coefficients for metals
\item interatomic force constants in real space
\item third-order anharmonic phonon lifetimes
\item Infrared and Raman (nonresonant) cross section
\item macroscopic polarization via Berry Phase
\end{itemize}
All of the above work for both insulators and metals, in any crystal
structure, for many exchange-correlation functionals (including spin
polarization), for both norm-conserving (Hamann-Schl\"uter-Chiang)
pseudopotentials in separable form, and --- with very few exceptions
--- for Ultrasoft (Vanderbilt) pseudopotentials. Non-colinear
magnetism and spin-orbit interactions are also implemented, although
at an experimental stage. Various postprocessing programs are
available.
FPMD and CP can currently perform the following kinds of calculations:
\begin{itemize}
\item Car-Parrinello molecular dynamics simulation
\item geometry optimization by damped dynamics
\item constant-temperature simulation with Nos\`e thermostats
\item variable-cell (Parrinello-Rahman) dynamics
\item Nudged Elastic Band (NEB) for energy barriers and reaction
paths
\item String Method Dynamics (in real space) (CP only)
\item dynamics with Wannier functions (CP only)
\end{itemize}
Spin-polarized calculations and (for FPMD only) calculations with
multiple k-points can be performed.
CP works with both norm-conserving and Ultrasoft pseudopotentials,
while FPMD is currently limited to norm-conserving.
The restart files of the two programs are compatible: you can run FPMD
with a restart file from CP, and vice versa.
\subsection{People}
\hyphenation{gian-noz-zi}
The maintenance and further development of the $\nu-$ESPRESSO code is
promoted by the DEMOCRITOS National Simulation Center of the Italian
INFM
(\htmladdnormallink{\texttt{http://www.democritos.it/}}%
{http://www.democritos.it/})
under the coordination of Paolo Giannozzi
(\htmladdnormallink{\texttt{giannozz@nest.sns.it}}%
{mailto:giannozz@nest.sns.it})
(Scuola Normale Superiore, Pisa), with the strong support of the
CINECA National Supercomputing Center in Bologna
(\htmladdnormallink{\texttt{http://www.cineca.it/}}%
{http://www.cineca.it/}),
under the responsibility of Carlo Cavazzoni\break
(\htmladdnormallink{\texttt{c.cavazzoni@cineca.it}}%
{mailto:c.cavazzoni@cineca.it}).
Currently active developers include
Gerardo Ballabio (CINECA),
Stefano Fabris, Adriano Mosca Conte, Carlo Sbraccia
(SISSA, Trieste),
Anton Kokalj (Jo\v{z}ef Stefan Institute, Ljubljana).
The PWscf package was originally developed by Stefano Baroni, Stefano
de Gironcoli, Andrea Dal Corso (SISSA), Paolo Giannozzi, and others.
The web site for PWscf and related codes is:
\htmladdnormallink{\texttt{http://www.pwscf.org/}}%
{http://www.pwscf.org/}
The FPMD and CP codes are both based on the original code written by
Roberto Car and Michele Parrinello.
FPMD was developed by
Carlo Cavazzoni, Gerardo Ballabio (CINECA),
Sandro Scandolo (ICTP, Trieste),
Guido Chiarotti (SISSA),
Paolo Focher,
and others.
CP was developed by
Alfredo Pasquarello (IRRMA, Lausanne),
Kari Laasonen (Oulu),
Andrea Trave (LLNL),
Roberto Car (Princeton),
Nicola Marzari (MIT),
Paolo Giannozzi,
and others.
PWgui was written by Anton Kokalj and is based on his GUIB concept
(\htmladdnormallink{\texttt{http://www-k3.ijs.si/kokalj/guib/}}%
{http://www-k3.ijs.si/kokalj/guib/}).
The pseudopotential generation package ``atomic'' was written by
Andrea Dal Corso and it is the result of many additions to the
original code by Paolo Giannozzi.
A list of further contributors includes:
Dario Alf\`e,
Francesco Antoniella,
Mauro Boero,
Claudia Bungaro,
Paolo Cazzato,
Gabriele Cipriani,
Matteo Cococcioni,
Alberto Debernardi,
Gernot Deinzer.
Oswaldo Dieguez,
Guido Fratesi,
Ralph Gebauer,
Martin Hilgeman,
Yosuke Kanai,
Axel Kohlmeyer,
Konstantin Kudin,
Michele Lazzeri,
Kurt Maeder,
Francesco Mauri,
Nicolas Mounet,
Pasquale Pavone,
Mickael Profeta,
Guido Roma,
Manu Sharma,
Alexander Smogunov,
Kurt Stokbro,
Pascal Thibaudeau,
Antonio Tilocca,
Renata Wentzcovitch,
Yudong Wu,
Xiaofei Wang,
and let us apologize to everybody we have forgotten.
This guide was written (mostly) by Paolo Giannozzi, Gerardo Ballabio,
Carlo Cavazzoni.
\subsection{Terms of use}
$\nu-$ESPRESSO is free software, released under the GNU General Public
License
(\htmladdnormallink{\texttt{http://www.pwscf.org/License.txt}}%
{http://www.pwscf.org/License.txt},
or the file \texttt{License} in the distribution).
All trademarks mentioned in this guide belong to their respective
owners.
We shall greatly appreciate if scientific work done using this code
will contain an explicit acknowledgment and a reference to the
$\nu-$ESPRESSO web page.
Our preferred form for the acknowledgment is the following:
\begin{quote}
\emph{Acknowledgments:}
\par\noindent
Calculations in this work have been done using the $\nu-$ESPRESSO package
[\emph{ref}].
\par\noindent
\emph{Bibliography:}
\par\noindent
[\emph{ref}]
S.~Baroni, A.~Dal Corso, S.~de Gironcoli, P.~Giannozzi, % PWscf
C.~Cavazzoni, G.~Ballabio, S.~Scandolo, G.~Chiarotti, P.~Focher, % FPMD
A.~Pasquarello, K.~Laasonen, A.~Trave, R.~Car, N.~Marzari, % CP
A.~Kokalj, % PWgui
\texttt{http://www.pwscf.org/}.
\end{quote}
\clearpage
\section{Installation}
\label{installation}
Presently, the $\nu-$ESPRESSO package is only distributed in source form;
some precompiled executables (binary files) are provided only for
PWgui.
Providing binaries for $\nu-$ESPRESSO would require too much effort and
would work only for a small number of machines anyway.
To install $\nu-$ESPRESSO, you need working C and fortran-95 compilers
(fortran-90 is not sufficient, but most "fortran-90" compilers
are actually fortran-95-compliant). You will also need basic unix
facilities: a shell, the \texttt{make} and \texttt{awk} utilities.
The latest stable release of the $\nu-$ESPRESSO source package (currently
version \version) can be downloaded from this URL:
\medskip
\htmladdnormallink{\texttt{http://www.pwscf.org/download.htm}}%
{http://www.pwscf.org/download.htm}
\medskip
\noindent
To uncompress and unpack it, move it to an empty directory of your
choice, \texttt{cd} to that directory, and run the command:
\medskip
\texttt{tar zxvf pw.\version.tgz}
\medskip
\noindent
If your version of \texttt{tar} doesn't recognize the \texttt{z} flag,
use this instead:
\medskip
\texttt{gunzip -c pw.\version.tgz | tar xvf -}
\medskip
\noindent
The bravest may access the (unstable) development version via anonymous
CVS (Concurrent Version Sysstem): see the file \texttt{README.cvs}
contained in the distribution.
To install $\nu-$ESPRESSO, you must:
\begin{enumerate}
\item configure the source package for your system, compilers and
libraries;
\item compile some or all the executables in the package.
\end{enumerate}
For the impatient:
\begin{verbatim}
./configure
make all
\end{verbatim}
Executable programs (actually, symlinks to them) will be placed in the
\texttt{bin/} directory.
If you have problems or would like to tweak the default settings, read
the detailed instructions below.
\subsection{Configure}
To configure the $\nu-$ESPRESSO source package, run the \texttt{configure}
script. It will (try to) detect compilers and libraries available on
your machine, and set up things accordingly.
Presently it is expected to work on most Linux 32- and 64-bit (Itanium
and Opteron) PCs and clusters, IBM SP machines, SGI Origin, some
HP-Compaq Alpha machines, Cray X1, Mac OS X. It may work with some
assistance also on other architectures (see below).
Cross-compilation is theoretically supported, but has never been
tested; you have to specify the target machine with the
\texttt{--host} option (see below).
Specifically, \texttt{configure} generates the following files:
\begin{quote}
\texttt{make.sys}: compilation settings and flags\\
\texttt{make.rules}: compilation rules\\
\texttt{*/make.depend}: dependencies, per source directory
\end{quote}
\texttt{make.depend} files are actually generated by the
\texttt{makedeps.sh} shell script, that \texttt{configure} invokes.
If you modify the program sources, you might have to rerun it.
You should always be able to compile the $\nu-$ESPRESSO suite of programs
without having to edit any of the generated files. However you may
have to tune \texttt{configure} by specifying appropriate environment
variables and/or command-line options.
Usually the most tricky part is to get external libraries recognized
and used: see section \ref{libraries}, ``Libraries'', for details and
hints.
Environment variables may be set in any of these ways:
\begin{verbatim}
export VARIABLE=value # sh, bash, ksh
./configure
setenv VARIABLE value # csh, tcsh
./configure
./configure VARIABLE=value # any shell
\end{verbatim}
Some environment variables that are relevant to \texttt{configure} are:
\begin{quote}
\texttt{ARCH}:
label identifying the machine type (see below)\\
\texttt{F90}, \texttt{F77}, \texttt{CC}:
names of Fortran 95, Fortran 77, and C compilers\\
\texttt{CPP}:
source file preprocessor (defaults to \texttt{\$CC -E})\\
\texttt{LD}: linker (defaults to \texttt{\$F90})\\
\texttt{CFLAGS}, \texttt{FFLAGS}, \texttt{F90FLAGS},
\texttt{CPPFLAGS}, \texttt{LDFLAGS}:
compilation flags\\
\texttt{LIBDIRS}:
extra directories to search for libraries (see below)
\end{quote}
For example, the following command line:
\begin{verbatim}
./configure F90=ifort FFLAGS="-Vaxlib -O2 -assume byterecl" \
CC=gcc CFLAGS=-O3 LDFLAGS="-Vaxlib -static"
\end{verbatim}
instructs \texttt{configure} to use \texttt{ifort} as Fortran 95
compiler with flags \texttt{"-Vaxlib -O2 -assume byterecl"},
\texttt{gcc} as C compiler with flags \texttt{"-O3"}, and to use flags
\texttt{"-Vaxlib -static"} when linking. Note that the values
of \texttt{FFLAGS} and \texttt{LDFLAGS} must be quoted, because they
contain spaces.
If your machine type is unknown to \texttt{configure}, you may use the
\texttt{ARCH} variable to suggest an architecture among supported
ones. Try the one that looks more similar to your machine type
(you'll probably have to do some additional tweaking).
Currently supported architectures are:
\begin{quote}
\texttt{linux64}: Linux 64-bit machines (Itanium, Opteron)\\
\texttt{linux32}: Linux PCs\\
\texttt{aix}: IBM AIX machines\\
\texttt{mips}: SGI MIPS machines\\
\texttt{alpha}: HP-Compaq alpha machines\\
\texttt{sparc}: Sun SPARC machines\\
\texttt{crayx1}: Cray X1 machines\\
\texttt{mac}: Apple PowerPC running Mac OS X
\end{quote}
Finally, \texttt{configure} recognizes the following command-line
options:
\begin{quote}
\texttt{--disable-parallel}:
compile serial code, even if parallel environment is available\\
\texttt{--disable-shared}:
don't use shared libraries: generate static executables\\
\texttt{--enable-shared}:
use shared libraries\\
\texttt{--host=}\emph{target}:
specify target machine for cross-compilation.\break
\emph{Target} must be a string identifying the architecture that
you want to compile for; you can obtain it by running
\texttt{config.guess} on the target machine.
\end{quote}
If you want to modify the \texttt{configure} script (advanced users
only!), you'll need GNU Autoconf
(\htmladdnormallink{\texttt{http://www.gnu.org/software/autoconf/}}%
{http://www.gnu.org/software/autoconf/}).
Edit the source file \texttt{configure.ac}, then run Autoconf to
regenerate \texttt{configure}. If you edit \texttt{configure}
directly, all changes will be lost when you regenerate it.
You may also want to edit \texttt{make.sys.in} and
\texttt{make.rules.in}.
For more information, see \texttt{README.configure}.
\subsubsection{Libraries}
\label{libraries}
$\nu-$ESPRESSO makes use of the following external libraries:
\begin{itemize}
\item BLAS
(\htmladdnormallink{\texttt{http://www.netlib.org/blas/}}%
{http://www.netlib.org/blas/})
and LAPACK\hfill\break
(\htmladdnormallink{\texttt{http://www.netlib.org/lapack/}}%
{http://www.netlib.org/lapack/})
for linear algebra
\item FFTW
(\htmladdnormallink{\texttt{http://www.fftw.org/}}%
{http://www.fftw.org/})
for Fast Fourier Transforms
\end{itemize}
A copy of the needed routines is provided with the distribution.
However, when available, optimized vendor-specific libraries can be
used instead: this often yields huge performance gains.
$\nu-$ESPRESSO can use the following architecture-specific replacements for
BLAS and LAPACK:
\begin{quote}
\texttt{essl} for IBM machines\\
\texttt{complib.sgimath} for SGI Origin\\
\texttt{scilib} for Cray/T3e\\
\texttt{sunperf} for Sun\\
\texttt{MKL} for Intel Linux PCs\\
\texttt{ACML} for AMD Linux PCs\\
\texttt{cxml} for HP-Compaq Alphas.
\end{quote}
If none of these is available, we suggest that you use the optimized
ATLAS library
(\htmladdnormallink{\texttt{http://math-atlas.sourceforge.net/}}%
{http://math-atlas.sourceforge.net/}).
Note that ATLAS is not a complete replacement for LAPACK: it contains
all of the BLAS, plus the LU code, plus the full storage Cholesky
code. Follow the instructions in the ATLAS distributions to produce a
full LAPACK replacement.
Axel Kohlmeyer maintains a set of ATLAS libraries,
containing all of LAPACK and no external reference to fortran
libraries:\hfill\break
\htmladdnormallink%
{{\small\texttt{http://www.theochem.rub.de/\~{}axel.kohlmeyer/%
cpmd-linux.html\#atlas}}}%
{http://www.theochem.rub.de/~axel.kohlmeyer/cpmd-linux.html\#atlas}
Sergei Lisenkov reported success and good performances with
optimized BLAS by Kazushige Goto.
They can be downloaded freely (but not redistributed!) from:
\htmladdnormallink%
{\texttt{http://www.cs.utexas.edu/users/flame/goto/}}%
{http://www.cs.utexas.edu/users/flame/goto/}
The FFTW library can also be replaced by vendor-specific FFT
libraries, when available, or you can link to a precompiled FFTW
library. Please note that you must use FFTW version 2. Support for
version 3 is in progress: contact the developers if you want to try.
The \texttt{configure} script attempts to find optimized libraries,
but may fail if they have been installed in non-standard places.
You should examine the final value of \texttt{LIBS} (either in the
output of \texttt{configure}, or in the generated \texttt{make.sys})
to check whether it found all the libraries that you intend to use.
If any libraries weren't found, you can specify a list of directories
to search in the environment variable \texttt{LIBDIRS}, and rerun
\texttt{configure}; directories in the list must be separated by
spaces. For example:
\begin{verbatim}
./configure LIBDIRS="/opt/intel/mkl70/lib/32 /usr/lib/math"
\end{verbatim}
If this still fails, you may set the environment variable
\texttt{LIBS} manually and retry. For example:
\begin{verbatim}
./configure LIBS="-L/usr/lib/math -lfftw -lf77blas -latlas"
\end{verbatim}
Beware that in this case, you must specify \emph{all} the libraries
that you want to link to. \texttt{configure} will blindly accept the
specified value, and won't search for any extra libraries. (This is
so that if \texttt{configure} finds any library that you don't want to
use, you can override it.)
If you want to use a precompiled FFTW library, the corresponding
\texttt{fftw.h} include file is also required.
If \texttt{configure} wasn't able to find it, you may specify its
location in the \texttt{INCLUDEFFTW} environment variable.
For example:
\begin{verbatim}
./configure INCLUDEFFTW="/usr/lib/fftw-2.1.3/fftw"
\end{verbatim}
If everything else fails, you'll have to write the \texttt{make.sys}
file manually: see section \ref{manualconf}, ``Manual configuration''.
\paragraph{Please Note:}
If you change any settings after a previous (successful or failed)
compilation, you must run \texttt{make clean} before recompiling,
unless you know exactly which routines are affected by the changed
settings and how to force their recompilation.
\subsubsection{Manual configuration}
\label{manualconf}
To configure $\nu-$ESPRESSO manually, you have to write working
\texttt{make.sys} and \texttt{make.rules}, and generate
\texttt{*/make.depend} files yourself.
For \texttt{make.sys}, several templates (each for a different machine
type) to start with are provided in the \texttt{install/} directory:
they have names of the form \texttt{Make.}\emph{system}, where
\emph{system} is a string identifying the architecture and compiler.
Currently available systems are:
\begin{quote}
\texttt{alpha}: HP-Compaq alpha workstations\\
\texttt{alphaMPI}: HP-Compaq alpha parallel machines\\
\texttt{altix}: SGI Altix 350/3000 with Linux, Intel compiler\\
\texttt{cygwin}: Windows PC, Intel compiler (see below)\\
\texttt{fujitsu}: Fujitsu vector machines\\
\texttt{hitachi}: Hitachi SR8000\\
\texttt{hp}: HP PA-RISC workstations\\
\texttt{hpMPI}: HP PA-RISC parallel machines\\
\texttt{ia64}: HP Itanium workstations\\
\texttt{irix}: SGI workstations\\
\texttt{pc\_abs}: Linux PCs, Absoft compiler\\
\texttt{pc\_lahey}: Linux PCs, Lahey compiler\\
\texttt{pc\_pgi}: Linux PCs, Portland compiler\\
\texttt{sun}: Sun workstations\\
\texttt{sunmpi}: Sun parallel machines\\
\texttt{sxcross}: NEC SX-6\\
\texttt{t3e}: Cray T3E
\end{quote}
The \texttt{install/} directory also contains files \texttt{Rules.cpp}
and \texttt{Rules.nocpp}, which are templates for \texttt{make.rules}.
The former is to be used with Fortran compilers that support
the preprocessing of source files; otherwise you must use the latter.
They'll usually work without further editing.
To select the appropriate templates, you can run:
\medskip
\texttt{./configure.old} \emph{system}
\medskip
\noindent
where \emph{system} is the best match to your configuration;
\texttt{configure.old} with no arguments prints the up-to-date list of
available systems.
That will copy \texttt{Make.}\emph{system} to \texttt{make.sys}, and
either \texttt{Rules.*} file to \texttt{make.rules}; it will usually
pick the right one.
In addition, it'll run the \texttt{makedeps.sh} script to generate
\texttt{*/make.depend} files.
(If you don't run the \texttt{configure.old} script, you'll have to do
that yourself.)
Most probably (and even more so if there isn't an exact match to your
machine type), you'll have to tweak \texttt{make.sys} by hand until
you obtain successful compilation.
In particular, you must specify the full list of libraries that
you intend to link to.
You'll also have to set the \texttt{MYLIB} variable to:
\begin{quote}
\texttt{blas\_and\_lapack} to compile BLAS and LAPACK from source;\\
\texttt{lapack\_mkl} to use the Intel MKL library;\\
\texttt{lapack\_t3e} to use the LAPACK for Cray T3E;\\
otherwise, leave it empty.
\end{quote}
\paragraph{Note for HP PA-RISC users:}
The Makefile for HP PA-RISC workstations and parallel machines is
based on a Makefile contributed by Sergei Lysenkov.
It assumes that you have HP compiler with MLIB libraries installed on
a machine running HP-UX.
\paragraph{Note for MS-Windows users:}
The Makefile for Windows PCs is based on a Makefile written for an
earlier version of PWscf (1.2.0), contributed by Lu Fu-Fa, CCIT,
Taiwan.
Since there have been many changes to the installation procedure, the
provided Makefile --- which has never been tested --- may not work.
You will need the CygWin package (a UNIX environment for PC which runs
in Windows).
The provided Makefiles assumes that you have the Intel compiler with
MKL libraries installed.
Another possibility is to install Linux, either in dual-boot mode, or
running from a CD-ROM. You will need to create a partition for Linux
and to install a boot loader (LILO, GRUB). The latter step is not
necessary if you boot from CD-ROM. The former step could also be
avoided in principle (distributions like Knoppix run directly from the
CD-ROM) but for serious use you will need to have disk access.
\subsection{Compile}
There are a few adjustable parameters in
\texttt{Modules/parameters.f90}.
The present values will work for most cases. All other variables are
dynamically allocated: you do not need to recompile your code for a
different system.
At your option, you may compile the complete $\nu-$ESPRESSO suite of
programs (with \texttt{make all}), or only some specific programs.
\texttt{make} with no arguments yields a list of valid compilation
targets.
Here is a list:
\begin{itemize}
\item
\texttt{make pw} produces \texttt{PW/pw.x} and
\texttt{PW/memory.x}.
\texttt{pw.x} calculates electronic structure, structural
optimization, molecular dynamics, barriers with NEB.
\texttt{memory.x} is an auxiliary program that checks the input of
\texttt{pw.x} for correctness and yields a rough (under-) estimate
of the required memory.
\item
\texttt{make ph} produces \texttt{PH/ph.x}.
\texttt{ph.x} calculates phonon frequencies and displacement
patterns, dielectric tensors, effective charges (uses data
produced by \texttt{pw.x}).
\item
\texttt{make d3} produces \texttt{D3/d3.x}
\texttt{d3.x} calculates anharmonic phonon lifetimes (third-order
derivatives of the energy), using data produced by \texttt{pw.x}
and \texttt{ph.x}.
\item
\texttt{make gamma} produces \texttt{Gamma/phcg.x}.
\texttt{phcg.x} is a version of \texttt{ph.x} that calculates
phonons at $\mathbf{q}=0$ using conjugate-gradient minimization of
the density functional expanded to second-order.
Only the $\Gamma$ ($\mathbf{q}=0$) point is used for Brillouin
zone integration.
It is faster and takes less memory than \texttt{ph.x}, but does
not support Ultrasoft pseudopotentials.
\item
\texttt{make raman} produces \texttt{Raman/ram.x}.
\texttt{ram.x} calculates nonresonant Raman tensor coefficients
(derivatives of the polarizability wrt atomic displacements)
using the $(2n+1)$ theoremi.
\item
\texttt{make pp} produces several codes for data postprocessing, in
\texttt{PP/} (see list below).
\item
\texttt{make tools} produces several utility programs, mostly for
phonon calculations, in \texttt{pwtools/} (see list below).
\item
\texttt{make pwcond} produces \texttt{PWCOND/pwcond.x}, for
ballistic conductance calculations (experimental).
\item
\texttt{make pwall} produces all of the above.
\item
\texttt{make ld1} produces code \texttt{atomic/ld1.x} for
pseudopotential generationd (see the specific
documentation in \texttt{atomic\_doc/}).
\item
\texttt{make upf} produces utilities for pseudopotential
conversion in directory \texttt{upftools/} (see section
\ref{pseudopotentials}, ``Pseudopotentials'').
\item
\texttt{make cp} produces the Car-Parrinello code CP in
\texttt{CPV/cp.x}.
\item
\texttt{make fpmd} produces the Car-Parrinello code FPMD
\texttt{CPV/fpmd.x} and the postprocessing code
\texttt{CPV/fpmdpp.x}.
\item
\texttt{make all} produces all of the above.
\end{itemize}
For the setup of the GUI, refer to the
\texttt{PWgui-}\emph{X.Y.Z}\texttt{/INSTALL} file, where \emph{X.Y.Z}
stands for the version number of the GUI (presently 0.6.2).
If you are using the CVS-sources, then see the \texttt{GUI/README}
file instead.
The codes for data postprocessing in \texttt{PP/} are:
\begin{itemize}
\item \texttt{pp.x} extracts the specified data from files
produced by \texttt{pw.x} for further processing
\item \texttt{bands.x} extracts eigenvalues from files produced
by \texttt{pw.x} for band structure plotting
\item \texttt{projwfc.x} calculates projections of wavefunction
over atomic orbitals, performs L\"owdin population
analysis and calculates projected density of states
\item \texttt{chdens.x} plots data produced by \texttt{pp.x},
writing them into a format that is suitable for several
plotting programs
\item \texttt{plotrho.x} reads the output of \texttt{chdens.x},
produces PostScript 2-d contour plots
\item \texttt{plotband.x} reads the output of \texttt{bands.x},
produces band structure PostScript plots
\item \texttt{average.x} calculates planar averages of
potentials
\item \texttt{voronoy.x} divides the charge density into Voronoy
polyhedra (obsolete, use at your own risk)
\item \texttt{dos.x} calculates electronic Density of States
(DOS).
\item \texttt{pw2wan.x}: interface with code WanT for calculation of
transport properties via Wannier (also known as Boyd)
functions: see\hfill\break
\htmladdnormallink%
{\texttt{http://www.wannier-transport.org/}}%
{http://www.wannier-transport.org/}
\item \texttt{pw2casino.x}: interface with CASINO code for Quantum
Monte Carlo calculation
(\htmladdnormallink%
{\texttt{http://www.tcm.phy.cam.ac.uk/\~{}mdt26/casino.html}}%
{http://www.tcm.phy.cam.ac.uk/~mdt26/casino.html}).
\end{itemize}
The utility programs in \texttt{pwtools/} are:
\begin{itemize}
\item \texttt{dynmat.x} calculates LO-TO splitting at
$\mathbf{q}=0$ in insulator, IR cross sections, from the
dynamical matrix produced by \texttt{ph.x}
\item \texttt{q2r.x} calculates Interatomic Force Constants ion
real space from dynamical matrices produced by
\texttt{ph.x} on a regular \textbf{q}-grid
\item \texttt{matdyn.x} produces phonon frequencies at a generic
wave vector using the Interatomic Force Constants
calculated by \texttt{q2r.x}; may also calculate phonon
DOS
\item \texttt{fqha.x} for quasi-harmonic calculations
\item \texttt{lambda.x} calculates the electron-phonon coefficient
$\lambda$ and the function $\alpha^2F(\omega)$
\item \texttt{dist.x} calculates distances and angles between
atoms in a cell, taking into account periodicity
\item \texttt{ev.x} fits energy-vs-volume data to an equation of
state
\item \texttt{kpoints.x} produces lists of k-points
\item \texttt{pwi2xsf.sh}, \texttt{pwo2xsf.sh} process
respectively input and output files (not data files!) for
\texttt{pw.x} and produce an XSF-formatted file suitable
for plotting with XCrySDen, a powerful crystalline and
molecular structure visualization program
(\texttt{http://www.xcrysden.org/}).
BEWARE: the \texttt{pwi2xsf.sh} shell script requires the
\texttt{pwi2xsf.x} executables to be located somewhere in
your \texttt{\$PATH}.
\item \texttt{band\_plot.x}: undocumented and possibly obsolete
\item \texttt{bs.awk}, \texttt{mv.awk} are scripts that process
the output of \texttt{pw.x} (not data files!).
Usage:
\begin{verbatim}
awk -f bs.awk < my-pw-file > myfile.bs
awk -f mv.awk < my-pw-file > myfile.mv
\end{verbatim}
The files so produced are suitable for use with
\texttt{xbs}, a very simple X-windows utility to display
molecules, available at:\hfill\break
\htmladdnormallink%
{\texttt{http://www.ccl.net/cca/software/X-WINDOW/xbsa/README.shtml}}%
{http://www.ccl.net/cca/software/X-WINDOW/xbsa/README.shtml}
\end{itemize}
\subsection{Run examples}
\label{runexamples}
As a final check that compilation was successful, you may want to run
some or all of the examples contained within the \texttt{examples}
directory of the $\nu-$ESPRESSO distribution.
Those examples try to exercise all the programs and features of the
$\nu-$ESPRESSO package: for details, see the \texttt{README} file in each
example's directory.
If you find that any relevant feature isn't being tested, please
contact us (or even better, write and send us a new example
yourself!).
If you haven't downloaded the full $\nu-$ESPRESSO distribution and don't
have the examples, you can get them from the Test and Examples Page of
the $\nu-$ESPRESSO web site
(\htmladdnormallink{\texttt{http://www.pwscf.org/tests.htm}}%
{http://www.pwscf.org/tests.htm}).
The necessary pseudopotentials are included.
To run the examples, you should follow this procedure:
\begin{enumerate}
\item
Go to the \texttt{examples} directory and edit the
\texttt{environment\_variables} file, setting the following variables
as needed:
\begin{quote}
\texttt{BIN\_DIR=} directory where $\nu-$ESPRESSO executables reside\\
\texttt{PSEUDO\_DIR=} directory where pseudopotential files reside\\
\texttt{TMP\_DIR=} directory to be used as temporary storage area
\end{quote}
If you have downloaded the full $\nu-$ESPRESSO distribution, you may set
\texttt{BIN\_DIR=\$TOPDIR/bin} and
\texttt{PSEUDO\_DIR=\$TOPDIR/pseudo}, where \texttt{\$TOPDIR} is the
root of the $\nu-$ESPRESSO source tree.
The \texttt{PSEUDO\_DIR} directory must contain the following files:
\begin{quote}
\begin{flushleft}
%
% to regenerate this list:
% grep UPF */run_example | grep -v PSEUDO_LIST | grep -o "[^ ]*UPF" | \
% sed 's/_/\\_/g' | sort | uniq | awk '{print " \\texttt{" $0 "},"}'
%
\texttt{Al.vbc.UPF},
\texttt{As.gon.UPF},
\texttt{C.pz-rrkjus.UPF},
\texttt{Cu.pz-d-rrkjus.UPF},
\texttt{Fe.pz-nd-rrkjus.UPF},
\texttt{H.fpmd.UPF},
\texttt{H.vbc.UPF},
\texttt{N.BLYP.UPF},
\texttt{Ni.pbe-nd-rrkjus.UPF},
\texttt{NiUS.RRKJ3.UPF},
\texttt{O.BLYP.UPF},
\texttt{O.LDA.US.RRKJ3.UPF},
\texttt{O.pbe-rrkjus.UPF},
\texttt{O.vdb.UPF},
\texttt{OPBE\_nc.UPF},
\texttt{Pb.vdb.UPF},
\texttt{Ptrel.RRKJ3.UPF},
\texttt{Si.vbc.UPF},
\texttt{SiPBE\_nc.UPF},
\texttt{Ti.vdb.UPF}
\end{flushleft}
\end{quote}
If any of these are missing, you may not be able to run some of the
examples. You can download them from the Pseudopotentials Page of the
$\nu-$ESPRESSO web site
(\htmladdnormallink{\texttt{http://www.pwscf.org/pseudo.htm}}%
{http://www.pwscf.org/pseudo.htm}).
\texttt{TMP\_DIR} must be a directory you have read and write access
to, with enough available space to host the temporary files produced
by the example runs, and possibly offering high I/O performance (i.e.,
don't use an NFS-mounted directory).
\item
If you have compiled the parallel version of $\nu-$ESPRESSO (that is the
default), you'll usually have to specify a driver program (such as
\texttt{poe} or \texttt{mpiexec}) and the number of processors: read
section \ref{runparallel}, ``Running on parallel machines'' for
details.
In order to do that, edit again the \texttt{environment\_variables}
file and set the \texttt{PARA\_PREFIX} and \texttt{PARA\_POSTFIX}
variables as needed.
Parallel executables will be run by a command like this:
\begin{verbatim}
$PARA_PREFIX pw.x $PARA_POSTFIX < file.in > file.out
\end{verbatim}
For example, if the command line is like this (as for an IBM SP4):
\begin{verbatim}
poe pw.x -procs 4 < file.in > file.out
\end{verbatim}
you should set \texttt{PARA\_PREFIX="poe"},
\texttt{PARA\_POSTFIX="-procs 4"}.
Furthermore, if your machine does not support interactive use, you
must run the commands specified below through the batch queueing
system installed on that machine.
Ask your system administrator for instructions.
\item
To run a single example, go to the corresponding directory (for
instance, \texttt{example/example01}) and execute:
\begin{verbatim}
./run_example
\end{verbatim}
This will create a subdirectory \texttt{results}, containing the input
and output files generated by the calculation.
Some examples take only a few seconds to run, while others may require
several minutes depending on your system.
To run all the examples in one go, execute:
\begin{verbatim}
./run_all_examples
\end{verbatim}
from the \texttt{examples} directory.
On a single-processor machine, this typically takes one to three
hours.
The \texttt{make\_clean} script cleans the examples tree, by removing
all the \texttt{results} subdirectories. However, if additional
subdirectories have been created, they aren't deleted.
\item
In each example's directory, the \texttt{reference} subdirectory
contains verified output files, that you can check your results
against.
They were generated on a 1.7 GHz Pentium IV using Intel compiler
(\texttt{ifc}) v.6 and MKL libraries v.5.1.
On different architectures the precise numbers could be slightly
different, in particular if different FFT dimensions are automatically
selected. For this reason, a plain \texttt{diff} of your results
against the reference data doesn't work, or at least, it requires
human inspection of the results.
Instead, you can run the \texttt{check\_example} script in the
\texttt{examples} directory:
\medskip
\quad\texttt{./check\_example} \emph{example\_dir}
\medskip
\noindent
where \emph{example\_dir} is the directory of the example that you
want to check (e.g., \texttt{./check\_example example01}).
You can specify multiple directories.
Note: at the moment \texttt{check\_example} is in early development
and (should be) guaranteed to work only on examples 01 to 04.
\end{enumerate}
\subsection{Installation issues}
\label{installissues}
The main development platforms are IBM SP and Intel/AMD PC with Linux
and Intel compiler. For other machines, we rely on user's feedback.
\paragraph{All machines}
Working fortran-95 and C compilers are needed in order to compile
$\nu-$ESPRESSO. Most so-called ``fortran-90'' compilers implement the
fortran-95 standard, but older versions may not be fortran-95
compliant.
If you get ``Compiler Internal Error'' or similar messages, try to
lower the optimization level, or to remove optimization, just for the
routine that has problems. If it doesn't work, or if you experience
weird problems, try to install patches for your version of the
compiler (most vendors release at least a few patches for free), or to
upgrade to a more recent version.
If you get an error in the loading phase that looks like ``ld: file
XYZ.o: unknown (unrecognized, invalid, wrong, missing, \dots) file
type'', or ``While processing relocatable file XYZ.o, no relocatable
objects were found'' (T3E), one of the following things have happened:
\begin{enumerate}
\item you have leftover object files from a compilation with another
compiler: run \texttt{make clean} and recompile.
\item \texttt{make} does not stop at the first compilation error (it
happens with some compilers).
Remove file XYZ.o and look for the compilation error.
\end{enumerate}
If many symbols are missing in the loading phase, you did not specify
the location of all needed libraries (LAPACK, BLAS, FFTW,
machine-specific optimized libraries). If you did, but symbols are
still missing, see below (for Linux PC).
\paragraph{SGI machines with MIPS compiler}
Many versions of the MIPS compiler yield compilation errors in
conjunction with with \texttt{FORALL} constructs. There is no
known solution other than editing the \texttt{FORALL} construct
that gives a problem, or to replace it with an equivalent
\texttt{DO...END DO} construct.
\paragraph{Linux Alphas with Compaq compiler}
If at linking stage you get error messages like: ``undefined reference
to `for\_check\_mult\_overflow64' '' with Compaq/HP fortran compiler
on Linux Alphas, check the following page:
\htmladdnormallink%
{\texttt{http://linux.iol.unh.edu/linux/fortran/faq/cfal-X1.0.2.html}}%
{http://linux.iol.unh.edu/linux/fortran/faq/cfal-X1.0.2.html}.
\paragraph{Linux PC}
The web site of Axel Kohlmeyer contains a very informative section
on compiling and running CPMD on Linux.
Most of its contents applies to the $\nu-$ESPRESSO code as well:\hfill\break
\htmladdnormallink%
{\texttt{http://www.theochem.rub.de/\~{}axel.kohlmeyer/cpmd-linux.html}}%
{http://www.theochem.rub.de/~axel.kohlmeyer/cpmd-linux.html}.
On newer Linux machines, even statically linked binaries will try
to open some shared libraries, which will lead to crashes
if libc/libm/libpthreads are not linked dynamically. Machines
using glibc-2.2.4 and older seem ok: compile on these machines
if you want to share precompiled binaries. Crashes due to multithreading
(e.g. when using a multithreaded ATLAS or MKL) on machines with
the newer threads (nptl) can be worked around by setting the
environment variable \texttt{LD\_ASSUME\_KERNEL} to '2.2.5'. For
the newest Intel compilers, \texttt{-static-libcxa} does the
trick most of the time. (info from Axel Kohlmeyer)
Since there is no standard compiler for Linux, different compilers
have different ideas about the right way to call external libraries.
As a consequence you may have a mismatch between what your compiler
calls ("symbols") and the actual name of the required library call.
Use the \texttt{nm} command to determine the name of a library call,
as in the following examples:%
\begin{verbatim}
nm /usr/local/lib/libblas.a | grep T | grep -i daxpy
nm /usr/local/lib/liblapack.a | grep T | grep -i zhegv
\end{verbatim}
where typical location and name of libraries is assumed.
Most precompiled libraries have lowercase names with one or two
underscores (\_) appended. \texttt{configure} should select the
appropriate preprocessing options in \texttt{make.sys}, but in
case of trouble, be aware that:
\begin{itemize}
\item the Absoft compiler is case-sensitive (like C and unlike
other Fortran compilers) and does not add an underscore
to symbol names (note that if your libraries contain
uppercase or mixed case names, you are out of luck:
You must either recompile your own libraries, or change
the \texttt{\#define}'s in \texttt{include/f\_defs.h});
\item both Portland compiler (pgf90) and Intel compiler (ifort/ifc)
are case insensitive and add an underscore to symbol names.
\end{itemize}
With some precompiled lapack libraries, you may need to add
\texttt{-lg2c} or \texttt{-lm} or both.
\paragraph{Linux PCs with Portland Group compiler (pgf90)}
$\nu-$ESPRESSO does not work reliably, or not at all, with some versions of
the Portland Group compiler. In particular, with some versions PWscf
works only for small systems, but not for larger systems. We think
that this is a compiler bug. Use the latest version of each release
of the compiler, with patches if available: see the Portland Group web
site,\hfill\break
\htmladdnormallink%
{\texttt{http://www.pgroup.com/faq/install.htm\#release\_info}}%
{http://www.pgroup.com/faq/install.htm\#release\_info}
\paragraph{Linux PCs (Pentium) with Intel compiler (ifort, formerly
ifc)}
If \texttt{configure} doesn't find the compiler, or if you get ``Error
loading shared libraries...'' at run time, you have forgotten to
execute the script that sets up the correct path and library path.
Unless your system manager has done this for you, you should execute
the appropriate script --- located in the directory containing the
compiler executable --- in your initialization files.
Consult the documentation provided by Intel.
Each major release of the Intel compiler differs a lot from
the previous one. Do not mix compiled objects from different releases:
they are incompatible. Intel compiler v.~7 and later use a different
method to locate where modules are with respect to v.~$< 7$: if you
are using the manual configuration, choose the appropriate line
\texttt{MODULEFLAG=...} in \texttt{make.sys}.
Some releases of Intel compiler v.~7 and 8 yield ``Compiler Internal
Error''.
Update to the last version (presently 7.1.41, 8.0.046 or
8.1.018, respectively), available via Intel Premier support
(registration free of charge for Linux):
\htmladdnormallink%
{\texttt{http://developer.intel.com/software/products/support/\#premier}}%
{http://developer.intel.com/software/products/support/\#premier}
Note that \texttt{pwcond.x} does not work with some (but not all)
releases of Intel compiler v.~7 and 8, for no apparent good reason.
Warnings ``size of symbol ... changed ...'' are produced by ifc 7.1 at
the loading stage.
These seem to be harmless, but they may cause the loader to stop,
depending on your system configuration.
If this happens and no executable is produced, add the following to
\texttt{LDFLAGS}: \texttt{-Xlinker --noinhibit-exec}.
On Intel CPUs, it is very convenient to use Intel MKL libraries.
If \texttt{configure} doesn't find them, try
\texttt{configure --enable-shared}.
MKL also contains optimized FFT routines, but they are
presently not supported: use FFTW instead. Note that Intel
compiler v.~8 fails to load with MKL v.~5.2 or earlier versions,
because some symbols that are referenced by MKL are missing. There
is a fix for this (info from Konstantin Kudin): add libF90.a from
ifc 7.1 at the linking stage, as the last library.
Note that some combinations of not-so-recent versions of MKL
and ifc may yield a lot of "undefined references" when statically
loaded: use \texttt{configure --enable-shared},
or remove the \texttt{-static} option in \texttt{make.sys}.
When using/testing/benchmarking MKL on SMP (multiprocessor)
machines, one should set the environmental variable
\texttt{OMP\_NUM\_THREADS} to 1, unless the OpenMP
parallelization is desired. MKL by default sets the
variable to the number of CPUs installed and thus gives
the impression of a much better performance, as the CPUu time
is only measured for the master thread (info from Axel Kohlmeyer).
The I/O libraries used by the Intel compiler ifc are incompatible
with those called by most precompiled BLAS/LAPACK libraries
(including ATLAS): you get error messages at linking stage.
A workaround is to recompile BLAS/LAPACK with ifc, or (better) to
replace the BLAS routine \texttt{xerbla} and LAPACK routine
\texttt{dlamch} (the only two containing I/O calls) with recompiled
objects:
\begin{verbatim}
ifc -c xerbla.f
ifc -O0 -c dlamch.f
\end{verbatim}
(do not forget \texttt{-O0} --- \texttt{dlamch.f} \emph{must} be
compiled without optimization) and replace them into the library, as
in the following example:
\begin{verbatim}
ar rv libatlas.a xerbla.o dlamch.o
\end{verbatim}
(assuming that the library and the two object files are in the same
directory). See also Axel Kohlmeyer's web site.
Linux distributions using glibc 2.3 or later (such as e.g. RedHat 9)
may be incompatible with ifc 7.0 and 7.1.
The incompatibility shows up in the form of messages ``undefined
reference to `errno' '' at linking stage.
A workaround is available: see
\htmladdnormallink%
{\texttt{http://newweb.ices.utexas.edu/misc/ctype.c}}%
{http://newweb.ices.utexas.edu/misc/ctype.c}.
\paragraph{AMD CPUs, Intel Itanium}
AMD Athlon CPUs can be basically treated like Intel Pentium CPUs.
You can use the Intel compiler and MKL with Pentium-3 optimization.
Konstantin Kudin reports that the best results in terms of
performances are obtained with ATLAS optimized BLAS/LAPACK
libraries, using AMD Core Math Library (ACML) for the missing
libraries. ACML can be freely downloaded from AMD web site.
Beware: some versions of ACML -- i.e. the GCC version with SSE2 --
crash PWscf. The ``\_nosse2'' version appears to be stable.
Load first ATLAS, then ACML, then \texttt{-lg2c}, as in the
following example (replace what follows \texttt{-L} with
something appropriate to your configuration):
\begin{verbatim}
-L/location/of/fftw/lib/ -lfftw \
-L/location/of/atlas/lib -lf77blas -llapack -lcblas -latlas \
-L/location/of/gnu32_nosse2/lib -lacml -lg2c
\end{verbatim}
64-bit CPUs like the AMD Opteron and the Intel Itanium are
supported and should work both in 32-bit emulation and in
64-bit mode (in the latter case, \texttt{-D\_\_LINUX64} is
needed among the preprocessing flags). Both the PGI and the
Intel compiler (v8.1 EM64T-edition, available via Intel Premier
support) should work. 64-bit executables can address a
much larger memory space, but apparently they are not especially
faster than 32-bit executables. The Intel compiler has been
reported to be mor reliable and to produce faster executables
wrt the PGI compiler.
\paragraph{Linux PC clusters with MPI}
PC clusters running some version of MPI are a very popular
computational platform nowadays. Two major MPI implementations
(MPICH, LAM-MPI) are available. The number of possible
configurations, in terms of type and and version of the MPI
libraries, kernels, system libraries, compilers, is very large.
$\nu-$ESPRESSO compiles and works on all non-buggy, properly configured
configuration. You may have to recompile MPI libraries in order
to be able to use them with the Intel compiler. See Axel Kohlmeyer's
web site for precompiled versions of the MPI library.
If $\nu-$ESPRESSO does not work for some reason on a PC cluster, try first
if it works in serial execution. If the problem is clearly related to
parallelism, it is likely that your MPI libraries are buggy or not
properly configured: see Axel Kohlmeyer's web site for help.
A frequent problem is that $\nu-$ESPRESSO does not read from standard
output: see section ``Running on parallel machines''.
If you are dissatisfied with the performances in parallel
execution, read the ``Parallelization issues'' section.
\paragraph{T3E}
The following workaround is needed: in files \texttt{PW/bp\_zgefa.f}
and \texttt{PW/bp\_zgedi.f}, replace all occurrences of
\texttt{zscal}, \texttt{zaxpy}, \texttt{zswap}, \texttt{izamax} with
\texttt{cscal}, \texttt{caxpy}, \texttt{cswap}, \texttt{icamax}.
Also, in \texttt{PP/dist.f} you need to comment the call to
\texttt{getarg} and uncomment the call to \texttt{pxfgetarg}.
If you have a T3E with ``benchlib'' installed, you may want to use it
by adding \texttt{-D\_\_BENCHLIB} to preprocessing flags.
If you get errors at loading because symbols \texttt{LPUTP},
\texttt{LGETV}, \texttt{LSETV} are undefined, you either need to link
``benchlib'', or to remove \texttt{-D\_\_BENCHLIB} and recompile
(after a \texttt{make clean}).
\clearpage
\section{Running on parallel machines}
\label{runparallel}
Parallel execution is strongly system- and installation-dependent.
Typically one has to specify:
\begin{itemize}
\item a launcher program, such as \texttt{poe}, \texttt{mpirun}, or
\texttt{mpiexec};
\item the number of processors, typically as an option to the
launcher program, but in some cases \emph{after} the program
to be executed;
\item the program to be executed, with the proper path if needed:
for instance, \texttt{pw.x}, or \texttt{./pw.x}, or
\texttt{\$(HOME)/bin/pw.x}, or whatever applies;
\item the number of ``pools'' into which processors are to be
grouped (see section \ref{parissues}, ``Parallelization
Issues'', for an explanation of what a pool~is).
\end{itemize}
The last item is optional and is read by the code.
The first and second items are machine- and installation-dependent,
and may be different for interactive and batch execution.
\paragraph{Please note:}
Your machine might be configured so as to disallow interactive
execution: if in doubt, ask your system administrator.
\bigskip
For illustration, here's how to run \texttt{pw.x} on 16 processors
partitioned into 8 pools (2 processors each), for several typical
cases.
For convenience, we also give the corresponding values of
\texttt{PARA\_PREFIX}, \texttt{PARA\_POSTFIX} to be used in running
the examples distributed with $\nu-$ESPRESSO (see section \ref{runexamples},
``Run examples'').
\begin{description}
\item [IBM SP machines,] batch:
\begin{verbatim}
pw.x -npool 8 < input
PARA_PREFIX="", PARA_POSTFIX="-npool 8"
\end{verbatim}
This should also work interactively, with environment variables
\texttt{NPROC} set to 16, \texttt{MP\_HOSTFILE} set to the file
containing a list of processors.
\item [IBM SP machines,] interactive, using \texttt{poe}:
\begin{verbatim}
poe pw.x -procs 16 -npool 8 < input
PARA_PREFIX="poe", PARA_POSTFIX="-procs 16 -npool 8"
\end{verbatim}
\item [SGI Origin and PC clusters] using \texttt{mpirun}:
\begin{verbatim}
mpirun -np 16 pw.x -npool 8 < input
PARA_PREFIX="mpirun -np 16", PARA_POSTFIX="-npool 8"
\end{verbatim}
\item [PC clusters] using \texttt{mpiexec}:
\begin{verbatim}
mpiexec -n 16 pw.x -npool 8 < input
PARA_PREFIX="mpiexec -n 16", PARA_POSTFIX="-npool 8"
\end{verbatim}
\item [Cray T3E] (old):
\begin{verbatim}
mpprun -n 16 pw.x -npool 8 < input
PARA_PREFIX="mpprun -n 16", PARA_POSTFIX="-npool 8"
\end{verbatim}
\end{description}
Note that each processor writes its own set of temporary wavefunction
files during the calculation.
If \texttt{wf\_collect=.true.} (in namelist \texttt{control}), the
final result is collected into a single file, whose format is
independent on the number of processors; otherwise, one wavefunction
file per processor is left on the disk.
In the latter case, the files are readable only by a job running on
the same number of processors and pools, and if all files are on a
file system that is visible to all processors (i.e., you cannot use
local scratch directories: there is presently no way to ensure that
the distribution of processes on processors will follow the same
pattern for different jobs).
Some implementations of the MPI library may have problems with
input redirection in parallel.
If this happens, use the option \texttt{-in} (or \texttt{-inp} or
\texttt{-input}), followed by the input file name.
Example: \texttt{pw.x -in input -npool 4 > output}.
Please note that all postprocessing codes \emph{not} reading data
files produced by \texttt{pw.x} --- that is, \texttt{chdens.x},
\texttt{average.x}, \texttt{voronoy.x}, \texttt{dos.x} --- the
plotting codes \texttt{plotrho.x}, \texttt{plotband.x}, and all
executables in \texttt{pwtools/}, should be executed on just one
processor.
Unpredictable results may follow if those codes are run on more than
one processor.
\clearpage
\section{Pseudopotentials}
\label{pseudopotentials}
Currently PWscf and CP support both Ultrasoft (US) Vanderbilt
pseudopotentials (PPs) and Norm-Conserving (NC)
Hamann-Schl\"uter-Chiang PPs in separable Kleinman-Bylander form.
Note however that calculation of third-order derivatives is not (yet)
implemented with US PPs. Presently FPMD supports only NC PPs.
The $\nu-$ESPRESSO package uses a unified pseudopotential format (UPF)
(\htmladdnormallink{\texttt{http://www.pwscf.org/format.htm}}%
{http://www.pwscf.org/format.htm})
for all types of PPs, but still accepts a number of other formats:
(\htmladdnormallink{\texttt{http://www.pwscf.org/oldformat.htm}}%
{http://www.pwscf.org/oldformat.htm}):
\begin{enumerate}
\item the ``old PWscf'' format for NC PPs,
\item the ``old CP'' format for NC PPs,
\item the ``old FPMD'' format for NC PPs,
\item the ``new PWscf'' format for NC and US PPs,
\item the ``Vanderbilt'' format (formatted, not binary) for NC and
US PPs.
\end{enumerate}
Note however that PWscf accept the first, fourth and fifth in the
above list; CP the second, fourth and fifth; FPMD the third only.
PPs for selected elements can be downloaded from the Pseudopotentials
Page of the $\nu-$ESPRESSO web site
(\htmladdnormallink{\texttt{http://www.pwscf.org/pseudo.htm}}%
{http://www.pwscf.org/pseudo.htm}).
If you do not find there the PP you need (because there is no PP for
the atom you need or you need a different exchange-correlation
functional or a different core-valence partition or for whatever
reason may apply), it may be taken, if available, from published
tables, such as e.g.:
\begin{itemize}
\item G.B. Bachelet, D.R. Hamann and M. Schl\"uter, Phys. Rev. B
\textbf{26}, 4199 (1982)
\item X. Gonze, R. Stumpf, and M. Scheffler, Phys. Rev. B
\textbf{44}, 8503 (1991)
\item S. Goedecker, M. Teter, and J. Hutter, Phys. Rev. B
\textbf{54}, 1703 (1996)
\end{itemize}
or otherwise it must be generated. Since version 2.1, $\nu-$ESPRESSO
includes a PP generation package, in the
directory \texttt{atomic/} (sources) and \texttt{atomic\_doc/}
(documentation, tests and examples).
The package can generate both NC and US PPs in UPF (and older, not
recommended) format.
We refer to its documentation for instructions on how to generate PPs
with the \texttt{atomic/} code.
Other PP generation packages are available on-line:
\begin{itemize}
\item
David Vanderbilt's code (UltraSoft PPs):\hfill\break
\htmladdnormallink%
{\texttt{http://www.physics.rutgers.edu/\~{}dhv/uspp/index.html}}%
{http://www.physics.rutgers.edu/~dhv/uspp/index.html}
\item
Fritz Haber's code (Norm-Conserving PPs):\hfill\break
\htmladdnormallink%
{\texttt{http://www.fhi-berlin.mpg.de/th/fhi98md/fhi98PP}}%
{http://www.fhi-berlin.mpg.de/th/fhi98md/fhi98PP}
\item
Jos\'e-Lu\'\i{}s Martins' code (Norm-Conserving PPs):\hfill\break
\htmladdnormallink%
{\texttt{http://bohr.inesc-mn.pt/\~{}jlm/pseudo.html}}%
{http://bohr.inesc-mn.pt/~jlm/pseudo.html}
\end{itemize}
The first two codes produce PPs in UPF format, or in a format that
can be converted to unified format using the utilities of directory
\texttt{upftools/}.
Finally, other electronic-structure packages (CAMPOS, ABINIT)
provide tables of PPs that can be freely downloaded, but need
to be converted into a suitable format for use with $\nu-$ESPRESSO.
Remember: \emph{always} test the PPs on simple test systems before
proceeding to serious calculations.
\clearpage
\section{Using PWscf}
Input files for the PWscf codes may be either written by hand (the
good old way), or produced via the ``PWgui'' graphical user interface
by Anton Kokalj, included in the $\nu-$ESPRESSO distribution.
See \texttt{PWgui-}\emph{x.y.z}\texttt{/INSTALL} (where \emph{x.y.z}
is the version number) for more info on PWgui, or \texttt{GUI/README}
if you are using CVS sources.
You may take the examples distributed with $\nu-$ESPRESSO as templates for
writing your own input files: see section \ref{runexamples}, ``Run
examples''. In the following, whenever we mention ``Example N'', we
refer to those.
Input files are those in the \texttt{results} directories, with names
ending in \texttt{.in} (they'll appear after you've run the examples).
Note about exchange-correlation: the type of exchange-correlation used
in the calculation is read from PP files.
All PP's must have been generated using the same exchange-correlation.
\subsection{Electronic and ionic structure calculations}
Electronic and ionic structure calculations are performed by program
\texttt{pw.x}.
\subsubsection{Input data}
The input data is organized as several namelists, followed by other
fields introduced by keywords.
The namelists are
\begin{quote}
\texttt{\&CONTROL}: general variables controlling the run\\
\texttt{\&SYSTEM}: structural information on the system under
investigation\\
\texttt{\&ELECTRONS}: electronic variables: self-consistency,
smearing\\
\texttt{\&IONS} (optional): ionic variables: relaxation,
dynamics\\
\texttt{\&CELL} (optional): variable-cell dynamics\\
\texttt{\&PHONON} (optional): information required to produce
data for phonon calculations
\end{quote}
Optional namelist may be omitted if the calculation to be performed
does not require them.
This depends on the value of variable \texttt{calculation} in namelist
\texttt{\&CONTROL}.
Most variables in namelists have default values.
Only the following variables in \texttt{\&SYSTEM} must always be
specified:
\begin{quote}
\texttt{ibrav} (integer): bravais-lattice index\\
\texttt{celldm} (real, dimension 6): crystallographic constants\\
\texttt{nat} (integer): number of atoms in the unit cell\\
\texttt{ntyp} (integer): number of types of atoms in the unit cell\\
\texttt{ecutwfc} (real): kinetic energy cutoff (Ry) for
wavefunctions.
\end{quote}
For metallic systems, you have to specify how metallicity
is treated by setting variable \texttt{occupations}. If you choose
\texttt{occupations='smearing'}, you have to specify the
smearing width \texttt{degauss} and optionally the smearing
type \texttt{smearing}. If you choose \texttt{occupations='tetrahedra'},
you need to specify a suitable uniform k-point grid (card
\texttt{K\_POINTS} with option \texttt{automatic}).
Spin-polarized systems must be treated as metallic system,
except the special cases of a single k-point for which
occupancies can be fixed (\texttt{occupations='from\_input'}
and card \texttt{OCCUPATIONS}).
Explanations for the meaning of variables \texttt{ibrav} and
\texttt{celldm} are in file \texttt{INPUT\_PW}.
Please read them carefully.
There is a large number of other variables, having default values,
which may or may not fit your needs.
After the namelists, you have several fields introduced by keywords
with self-explanatory names:
\begin{quote}
\texttt{ATOMIC\_SPECIES}\\
\texttt{ATOMIC\_POSITIONS}\\
\texttt{K\_POINTS}\\
\texttt{CELL\_PARAMETERS} (optional)\\
\texttt{OCCUPATIONS} (optional) \\
\texttt{CLIMBING\_IMAGES} (optional)
\end{quote}
The keywords may be followed on the same line by an option.
Unknown fields (including some that are specific to CP and FPMD codes)
are ignored by PWscf.
See file \texttt{Doc/INPUT\_PW} for a detailed explanation of the
meaning and format of the various fields.
Note about k points:
The k-point grid can be either automatically generated or manually
provided as a list of k-points and a weight in the Irreducible
Brillouin Zone only of the \emph{Bravais lattice} of the crystal.
The code will generate (unless instructed not to do so: see variable
\texttt{nosym}) all required k-points and weights if the symmetry of
the system is lower than the symmetry of the Bravais lattice.
The automatic generation of k-points follows the convention of
Monkhorst and Pack.
\subsubsection{Typical cases}
We may distinguish the following typical cases for \texttt{pw.x}:
\begin{description}
\item [single-point (fixed-ion) SCF calculation.]
Set \texttt{calculation='scf'}.
Namelists \texttt{\&IONS} and \texttt{\&CELL} need not to be
present (this is the default). See Example 01.
\item [band structure calculation.]
First perform a SCF calculation as above; then do a non-SCF
calculation specifying \texttt{calculation='nscf'}, with the
desired k-point grid and number \texttt{nbnd} of bands.
Specify \texttt{nosym=.true.} to avoid generation of additional
k-points in low symmetry cases. Variables \texttt{prefix} and
\texttt{outdir}, which determine the names of input or output
files, should be the same in the two runs. See Example~01.
\item [structural optimization.]
\hyphenation{name-list}
Specify \texttt{calculation='relax'} and add namelist \texttt{\&IONS}.
All options for a single SCF calculation apply, plus a few others.
You may follow a structural optimization with a non-SCF
band-structure calculation, but do not forget to update the input
ionic coordinates. See Example 03.
\item [molecular dynamics.]
Specify \texttt{calculation='md'} and time step \texttt{dt}.
Use variable \texttt{ion\_dynamics} in namelist \texttt{\&IONS}
for a fine-grained control of the kind of dynamics. Other options
for setting the initial temperature and for thermalization using
velocity rescaling are available. Remember: this is MD on the
electronic ground state, not Car-Parrinello MD. See Example 04.
\item [polarization via Berry Phase.]
See Example 10, its \texttt{README}, and the documentation in the
header of \texttt{PW/bp\_c\_phase.f90}.
\item [Nudged Elastic Band calculation.]
\hfill Specify \texttt{calculation='neb'} and add namelist
\texttt{\&IONS}.
All options for a single SCF calculation apply, plus a few others.
In the namelist \texttt{\&IONS} the number of images used to
discretize the elastic band must be specified. All other
variables have a default value. Coordinates of the initial and
final image of the elastic band have to be specified in the
\texttt{ATOMIC\_POSITIONS} card. A detailed description of all
input variables is contained in the file \texttt{Doc/INPUT\_PW}.
See also Example 17.
\end{description}
The output data files are written in the directory specified by
variable \texttt{outdir}, with names specified by variable
\texttt{prefix} (a string that is prepended to all file names,
whose default value is: \texttt{prefix='pwscf'}).
The execution stops if you create a file \texttt{prefix.EXIT} in the
working directory. Note that just killing the process may leave the
output files in an unusable state.
\subsection{Phonon calculations}
The phonon code \texttt{ph.x} calculates normal modes at a given
\textbf{q}-vector, starting from data files produced by \texttt{pw.x}.
If $\mathbf{q}=0$, the data files can be produced directly by a simple
SCF calculation.
For phonons at a generic \textbf{q}-vector, you need to perform first
a SCF calculation, then a band-structure calculation (see above)
with
\texttt{calculation = 'phonon'}, specifying the \textbf{q}-vector
in variable \texttt{xq} of namelist \texttt{\&PHONON}.
The output data file appear in the directory specified by variables
\texttt{outdir}, with names specified by variable \texttt{prefix}.
After the output file(s) has been produced (do not remove any of the
files, unless you know which are used and which are not), you can run
\texttt{ph.x}.
The first input line of \texttt{ph.x} is a job identifier.
At the second line the namelist \texttt{\&INPUTPH} starts.
The meaning of the variables in the namelist (most of them having a
default value) is described in file \texttt{INPUT\_PH}.
Variables \texttt{outdir} and \texttt{prefix} must be the same as in
the input data of \texttt{pw.x}.
Presently you must also specify \texttt{amass} (real, dimension
\texttt{ntyp}): the atomic mass of each atomic type.
After the namelist you must specify the \textbf{q}-vector of the
phonon mode.
This must be the same \textbf{q}-vector given in the input of
\texttt{pw.x}.
A sample phonon calculation is performed in Example 02.
\subsubsection{Calculation of interatomic force constants in real
space}
First, dynamical matrices are calculated and saved for a suitable
uniform grid of \textbf{q}-vectors.
Only the \textbf{q}-vectors in the Irreducible Brillouin Zone of the
crystal are needed.
If the system is an insulator, effective charges and dielectric tensor
must be calculated (variable \texttt{epsil=.true}) at $\mathbf{q}=0$.
Second, all dynamical matrices are given as input to code
\texttt{q2r.x}.
The $\mathbf{q}=0$ file must be the first in the list.
This produces a file of Interatomic Force Constants in real space, up
to a distance that depends on the size of the grid of
\textbf{q}-vectors.
Program \texttt{matdyn.x} may be used to produce phonon modes and
frequencies at any \textbf{q} using the Interatomic Force Constants
file as input.
Note that if you want to calculate LO-TO splitting and IR cross
sections in insulators at $\mathbf{q}=0$ you should use program
\texttt{dynmat.x} instead.
See Example 06.
\subsubsection{Calculation of electron-phonon interaction
coefficients}
The calculation of electron-phonon coefficients in metals is made
difficult by the slow convergence of the sum at the Fermi energy.
It is convenient to calculate phonons, for each \textbf{q}-vector of a
suitable grid, using a smaller k-point grid, saving the dynamical
matrix and the self-consistent first-order variation of the potential
(variable \texttt{fildvscf}).
Then a non-SCF calculation with a larger k-point grid is performed.
Finally the electron-phonon calculation is performed by specifying
\texttt{elph=.true.}, \texttt{trans=.false.}, and the input files
\texttt{fildvscf}, \texttt{fildyn}.
The electron-phonon coefficients are calculated using several values
of gaussian broadening (see \texttt{PH/elphon.f90}) because this
quickly shows whether results are converged or not with respect to the
k-point grid and Gaussian broadening. See Example 07.
All of the above must be repeated for all desired \textbf{q}-vectors
and the final result is summed over all \textbf{q}-vectors, using
\texttt{pwtools/lambda.x}. The input data for the latter is
described in the header of \texttt{pwtools/lambda.f90}.
\subsection{Post-processing}
There are a number of auxiliary codes performing postprocessing tasks
such as plotting, averaging, and so on, on the various quantities
calculated by \texttt{pw.x}.
Such quantities are saved by \texttt{pw.x} into the output data
file(s).
The main postprocessing code \texttt{pp.x} reads data file(s) and may
produce on output either the projection of wavefunctions on atomic
wavefunctions, or another file containing one of the following
quantities:
\begin{quote}
charge\\
spin polarization\\
various potentials\\
local density of states at $E_F$\\
local density of electronic entropy\\
STM images\\
wavefunction squared\\
electron localization function\\
planar averages\\
integrated local density of states
\end{quote}
See file \texttt{INPUT\_PP} for a detailed description of the input
for code \texttt{pp.x}.
The file(s) produced by \texttt{pp.x} are processed by program
\texttt{chdens.x} for plotting.
The type of plotting (along a line, on a plane, three-dimensional,
polar) and the output format must be specified here.
The output file can be directly read by the free plotting system
Gnuplot (1D or 2D plots), or by code \texttt{plotrho.x} that comes
with PWscf (2D plots), or by advanced plotting software XCrySDen and
gOpenMol (3D plots).
More details on the input data are contained in file
\texttt{INPUT\_CHDENS}.
See Example 05 for a charge density plot.
The postprocessing code \texttt{bands.x} reads data file(s), extracts
eigenvalues, regroups them into bands (the algorithm used to order
bands and to resolve crossings may not work in all circumstances,
though).
The output is written to a file in a simple format that can be
directly read by plotting program \texttt{plotband.x}.
Unpredictable plots may results if \textbf{k}-points are not in
sequence along lines.
See Example 05 for a simple band plot.
The postprocessing code \texttt{projwfc.x} calculates projections of
wavefunction over atomic orbitals.
The atomic wavefunctions are those contained in the pseudopotential
file(s).
The L\"owdin population analysis (similar to Mulliken analysis) is
presently implemented.
The projected DOS (the DOS projected onto atomic orbitals) can also be
calculated.
More details on the input data are found in the header of file
\texttt{PP/projwfc.f90}.
The total electronic DOS is instead calculated by code
\texttt{PP/dos.x}.
See Example 08 for total and projected electronic DOS calculations.
The postprocessing code \texttt{path\_int.x} is intended to be used in
the framework of NEB calculations.
It is a tool to generate a new path (what is actually generated is the
restart file) starting from an old one through interpolation (cubic
splines).
The new path can be discretized with a different number of images
(this is its main purpose), images are equispaced and the
interpolation can be also performed on a subsection of the old path.
The input file needed by \texttt{path\_int.x} can be easily set up
with the help of the self explanatory \texttt{path\_int.sh} shell
script.
\clearpage
\section{Using FPMD and CP}
This section is intended to explain how to perform basic
Car-Parrinello (CP) simulations using the FPMD and CP codes.
It is important to understand that a CP simulation is a sequence of
different runs, some of them used to "prepare" the initial state
of the system, and other performed to collect statistics,
or to modify the state of the system itself, i.e. modify the temperature
or the pressure.
To prepare and run a CP simulation you should:
\begin{enumerate}
\item
define the system:
\begin{enumerate}
\item atomic positions
\item system cell
\item pseudopotentials
\item number of electrons and bands
\item cut-offs
\item FFT grids (CP code only)
\end{enumerate}
\item
The first run, when starting from scratch, is always an electronic
minimization, with fixed ions and cell, to bring the electronic
system on the ground state (GS) relative to the starting atomic
configuration.
Example of input file (Benzene Molecule):
\begin{verbatim}
&control
title = ' Benzene Molecule ',
calculation = 'cp',
restart_mode = 'from_scratch',
ndr = 51,
ndw = 51,
nstep = 100,
iprint = 10,
isave = 100,
tstress = .TRUE.,
tprnfor = .TRUE.,
dt = 5.0d0,
etot_conv_thr = 1.d-9,
ekin_conv_thr = 1.d-4,
prefix = 'c6h6'
pseudo_dir='/scratch/acv0/benzene/',
outdir='/scratch/acv0/benzene/Out/'
/
&system
ibrav = 14,
celldm(1) = 16.0,
celldm(2) = 1.0,
celldm(3) = 0.5,
celldm(4) = 0.0,
celldm(5) = 0.0,
celldm(6) = 0.0,
nat = 12,
ntyp = 2,
nbnd = 15,
nelec = 30,
ecutwfc = 40.0,
nr1b= 10, nr2b = 10, nr3b = 10,
xc_type = 'BLYP'
/
&electrons
emass = 400.d0,
emass_cutoff = 2.5d0,
electron_dynamics = 'sd',
/
&ions
ion_dynamics = 'none',
/
&cell
cell_dynamics = 'none',
press = 0.0d0,
/
ATOMIC_SPECIES
C 12.0d0 c_blyp_gia.pp
H 1.00d0 h.ps
ATOMIC_POSITIONS (bohr)
C 2.6 0.0 0.0
C 1.3 -1.3 0.0
C -1.3 -1.3 0.0
C -2.6 0.0 0.0
C -1.3 1.3 0.0
C 1.3 1.3 0.0
H 4.4 0.0 0.0
H 2.2 -2.2 0.0
H -2.2 -2.2 0.0
H -4.4 0.0 0.0
H -2.2 2.2 0.0
H 2.2 2.2 0.0
\end{verbatim}
You can find the description of the input variables in files
\texttt{INPUT.FPMD}, \texttt{INPUT.HOWTO} and \texttt{INPUT}.
\item
Sometimes a single run is not enough to reach the GS.
In this case, you need to re-run the electronic minimization
stage.
Use the input of the first run, changing \texttt{restart\_mode =
'from\_scratch'} to \texttt{restart\_mode = 'restart'}.
Important: unless you are already experienced with the system you
are studying or with the code internals, usually you need to tune
some input parameters, like \texttt{emass}, \texttt{dt}, and
cut-offs.
For this purpose, a few trial runs could be useful: you can
perform short minimizations (say, 10 steps) changing and adjusting
these parameters to your need.
You could specify the degree of convergence with these two
thresholds:
\texttt{etot\_conv\_thr}: total energy difference between two
consecutive steps
\texttt{ekin\_conv\_thr}: value of the fictitious kinetic energy
of the electrons
Usually we consider the system on the GS when
\texttt{ekin\_conv\_thr}${} < \sim 10^{-5}$.
You could check the value of the fictitious kinetic energy on the
standard output (column EKINC).
Different strategies are available to minimize electrons, but the
most used ones are:
\begin{itemize}
\item
steepest descent:
\begin{verbatim}
electron_dynamics = 'sd'
\end{verbatim}
\item
damped dynamics:
\begin{verbatim}
electron_dynamics = 'damp',
electron_damping = 0.1,
\end{verbatim}
See input description to compute damping factor, usually the
value is between 0.1 and 0.5.
\end{itemize}
\item
Once your system is in the GS, depending on how you have prepared
the starting atomic configuration, you should do several things:
\begin{itemize}
\item
if you have set the atomic positions ``by hand'' and/or from a
classical code, check the forces on atoms, and if they are
large ($\sim 0.1 - 1.0$ atomic units), you should perform an
ionic minimization, otherwise the sistem could break-up during
the dynamics.
\item
if you have taken the positions from a previous run or a
previous ab-initio simulation, check the forces, and if they
are too small ($\sim 10^{-4}$ atomic units), this means that
atoms are already in equilibrium positions and, even if left
free, they will not move.
Then you need to randomize positions a little bit. see below.
\end{itemize}
\item
Minimize ionic positions.
As we pointed out in 4) if the interatomic forces are too high,
the system could "explode" if we switch on the ionic dynamics.
To avoid that we need to relax the system.
Again there are different strategies to relax the system, but the
most used are again steepest descent or damped dynamics for ions
and electrons.
You could also mix electronic and ionic minimization scheme
freely, i.e. ions in steepest and electron in damping or vice
versa.
\begin{enumerate}
\item
suppose we want to perform a steepest for ions.
Then we should specify the following section for ions:
\begin{verbatim}
&ions
ion_dynamics = 'sd',
/
\end{verbatim}
Change also the ionic masses to accelerate the minimization:
\begin{verbatim}
ATOMIC_SPECIES
C 2.0d0 c_blyp_gia.pp
H 2.00d0 h.ps
\end{verbatim}
while leaving unchanged other input parameters.
Note that if the forces are really high ($> 1.0$ atomic
units), you should always use stepest descent for the first
relaxation steps ($\sim 100$).
\item
as the system approaches the equilibrium positions, the
steepest descent scheme slows down, so is better to switch to
damped dynamics:
\begin{verbatim}
&ions
ion_dynamics = 'damp',
ion_damping = 0.2,
ion_velocities = 'zero',
/
\end{verbatim}
A value of \texttt{ion\_damping} between 0.05 and 0.5 is
usually used for many systems.
It is also better to specify to restart with zero ionic and
electronic velocities, since we have changed the masses.
Change further the ionic masses to accelerate the
minimization:
\begin{verbatim}
ATOMIC_SPECIES
C 0.1d0 c_blyp_gia.pp
H 0.1d0 h.ps
\end{verbatim}
\item
when the system is really close to the equilibrium, the damped
dynamics slow down too, especially because, since we are
moving electron and ions together, the ionic forces are not
properly correct, then it is often better to perform a ionic
step every $N$ electronic steps, or to move ions only when
electron are in their GS (within the chosen threshold).
This can be specified adding, in the ionic section, the
\texttt{ion\_nstepe} parameter, then the ionic input section
become as follows:
\begin{verbatim}
&ions
ion_dynamics = 'damp',
ion_damping = 0.2,
ion_velocities = 'zero',
ion_nstepe = 10,
/
\end{verbatim}
Then we specify in the control input section:
\begin{verbatim}
etot_conv_thr = 1.d-6,
ekin_conv_thr = 1.d-5,
forc_conv_thr = 1.d-3
\end{verbatim}
As a result, the code checks every 10 electronic steps whether
the electronic system satisfies the two thresholds
\texttt{etot\_conv\_thr}, \texttt{ekin\_conv\_thr}: if it
does, the ions are advanced by one step.
The process thus continues until the forces become smaller
than \texttt{forc\_conv\_thr}.
Note that to fully relax the system you need many run, and
different strategies, that you shold mix and change in order
to speed-up the convergence.
The process is not automatic, but is strongly based on
experience, and trial and error.
Remember also that the convergence to the equilibrium
positions depends on the energy threshold for the electronic
GS, in fact correct forces (required to move ions toward the
minimum) are obtained only when electrons are in their GS.
Then a small threshold on forces could not be satisfied, if
you do not require an even smaller threshold on total energy.
\end{enumerate}
\item
randomization of positions.
If you have relaxed the system or if the starting system is
already in the equilibrium positions, then you need to move ions
from the equilibrium positions, otherwise they won't move in a
dynamics simulation.
After the randomization you should bring electrons on the GS
again, in order to start a dynamic with the correct forces and
with electrons in the GS.
Then you should switch off the ionic dynamics and activate the
randomization for each species, specifying the amplitude of the
randomization itself.
This could be done with the following ionic input section:
\begin{verbatim}
&ions
ion_dynamics = 'none',
tranp(1) = .TRUE.,
tranp(2) = .TRUE.,
amprp(1) = 0.01
amprp(2) = 0.01
/
\end{verbatim}
In this way a random displacement (of max 0.01 a.u.) is added to
atoms of specie 1 and 2.
All other input parameters could remain the same.
Note that the difference in the total energy (\texttt{etot})
between relaxed and randomized positions can be used to estimate
the temperature that will be reached by the system.
In fact, starting with zero ionic velocities, all the difference
is potential energy, but in a dynamics simulation, the energy will
be equipartitioned between kinetic and potential, then to estimate
the temperature take the difference in energy (de), convert it in
Kelvins, divide for the number of atoms and multiply by 2/3.
Randomization could be useful also while we are relaxing the
system, especially when we suspect that the ions are in a local
minimum or in an energy plateau.
\item
Start the Car-Parrinello dynamics.
At this point after having minimized the electrons, and with ions
displaced from their equilibrium positions, we are ready to start
a CP dynamics.
We need to specify \texttt{'verlet'} both in ionic and electronic
dynamics.
The threshold in control input section will be ignored, like any
parameter related to minimization strategy.
The first time we perform a CP run after a minimization, it is
always better to put velocities equal to zero, unless we have
velocities, from a previous simulation, to specify in the input
file.
Restore the proper masses for the ions.
In this way we will sample the microcanonical ensemble.
The input section changes as follow:
\begin{verbatim}
&electrons
emass = 400.d0,
emass_cutoff = 2.5d0,
electron_dynamics = 'verlet',
electron_velocities = 'zero',
/
&ions
ion_dynamics = 'verlet',
ion_velocities = 'zero',
/
ATOMIC_SPECIES
C 12.0d0 c_blyp_gia.pp
H 1.00d0 h.ps
\end{verbatim}
If you want to specify the initial velocities for ions, you have
to set \texttt{ion\_velocities = 'from\_input'}, and add the
\texttt{IONIC\_VELOCITIES}\break
card, with the list of velocities in atomic units.
IMPORTANT: in restarting the dynamics after the first CP run,
remember to remove or comment the velocities parameters:
\begin{verbatim}
&electrons
emass = 400.d0,
emass_cutoff = 2.5d0,
electron_dynamics = 'verlet',
! electron_velocities = 'zero',
/
&ions
ion_dynamics = 'verlet',
! ion_velocities = 'zero',
/
\end{verbatim}
otherwise you will quench the system interrupting the sampling of
the microcanonical ensemble.
\item
Changing the temperature of the system.
It is possible to change the temperature of the system or to
sample the canonical ensemble fixing the average temperature, this
is done using the Nos\`e thermostat.
To activate this thermostat for ions you have to specify in the
ions input section:
\begin{verbatim}
&ions
ion_dynamics = 'verlet',
ion_temperature = 'nose',
fnosep = 60.0,
tempw = 300.0,
! ion_velocities = 'zero',
/
\end{verbatim}
where \texttt{fnosep} is the frequency of the thermostat in THz,
this should be chosen to be comparable with the center of the
vibrational spectrum of the system, in order to excite as many
vibrational modes as possible.
\texttt{tempw} is the desired average temperature in Kelvin.
It is possible to specify also the thermostat for the electrons,
this is usually activated in metal or in system where we have a
transfer of energy between ionic and electronic degrees of
freedom.
\end{enumerate}
\clearpage
\section{Performance issues (PWscf)}
\label{performance}
\subsection{CPU time requirements}
The following holds for code {\tt pw.x} and for non-US PPs.
For US PPs there are additional terms to be calculated.
For phonon calculations, each of the $3 N_{at}$ modes requires a CPU
time of the same order of that required by a self-consistent
calculation in the same system.
The computer time required for the self-consistent solution at fixed
ionic positions, $T_{scf}$, is:
$$
T_{scf} = N_{iter} \cdot T_{iter} + T_{init}
$$
where $N_{iter}=\mathtt{niter}=$ number of self-consistency
iterations, $T_{iter}=$ CPU time for a single iteration,
$T_{sub}=$ initialization time for a single iteration.
Usually $T_{init} << N_{iter} \cdot T_{iter}$.
The time required for a single self-consistency iteration
$T_{iter}$ is:
$$
T_{iter} = N_k \cdot T_{diag} + T_{rho} + T_{scf}
$$
where $N_k=$ number of k-points, $T_{diag}=$ CPU time per hamiltonian
iterative diagonalization, $T_{rho}=$ CPU time for charge density
calculation, $T_{scf}=$ CPU time for Hartree and exchange-correlation
potential calculation.
The time for a Hamiltonian iterative diagonalization $T_{diag}$ is:
$$
T_{diag} = N_h \cdot T_h + T_{orth} + T_{sub}
$$
where $N_h=$ number of $H\psi$ products needed by iterative
diagonalization, $T_h=$ CPU time per $H\psi$ product, $T_{orth}=$ CPU
time for orthonormalization, $T_{sub}=$ CPU time for subspace
diagonalization.
The time $T_h$ required for a $H\psi$ product is
$$
T_h = a_1 \cdot M \cdot N
+ a_2 \cdot M \cdot N_1 \cdot N_2 \cdot N_3 \cdot
\log(N_1 \cdot N_2 \cdot N_3)
+ a_3 \cdot M \cdot P \cdot N.
$$
The first term comes from the kinetic term and is usually much smaller
than the others.
The second and third terms come respectively from local and nonlocal
potential.
$a_1$, $a_2$, $a_3$ are prefactors, $M=$ number of valence bands,
$N=$ number of plane waves (basis set dimension),
$N_1$, $N_2$, $N_3=$ dimensions of the FFT grid for wavefunctions
($N_1 \cdot N_2 \cdot N_3 \sim 8N$), $P=$ number of projectors for PPs
(summed on all atoms, on all values of the angular momentum $l$, and
$m=1,\dots,2l+1$)
The time $T_{orth}$ required by orthonormalization is
$$
T_{orth}=b_1*M_x^2*N
$$
and the time $T_{sub}$ required by subspace diagonalization is
$$
T_{sub}=b_2*M_x^3
$$
where $b_1$ and $b_2$ are prefactors, $M_x=$ number of trial
wavefunctions (this will vary between $M$ and a few times $M$,
depending on the algorithm).
The time $T_{rho}$ for the calculation of charge density from
wavefunctions is
$$
T_{rho} = c_1 \cdot M \cdot Nr_1 \cdot Nr_2 \cdot Nr_3 \cdot
\log(Nr_1 \cdot Nr_2 \cdot Nr_3)
+ c_2 \cdot M \cdot Nr_1 \cdot Nr_2 \cdot Nr_3 + T_{us}
$$
where $c_1$, $c_2$, $c_3$ are prefactors,
$Nr_1$, $Nr_2$, $Nr_3=$ dimensions of the FFT grid for charge density
($Nr_1 \cdot Nr_2 \cdot Nr_3 \sim 8N_g$, where $N_g=$ number of
G-vectors for the charge density), and $T_{us}=$ CPU time required by
ultrasoft contribution (if any).
The time $T_{scf}$ for calculation of potential from charge density is
$$
T_{scf} = d_2 \cdot Nr_1 \cdot Nr_2 \cdot Nr_3 + d_3 \cdot
Nr_1 \cdot Nr_2 \cdot Nr_3 \cdot
\log(Nr_1 \cdot Nr_2 \cdot Nr_3)
$$
where $d_1$, $d_2$ are prefactors.
\subsection{Memory requirements}
A typical self-consistency or molecular-dynamics run requires
a maximum memory in the order
of $O$ double precision complex numbers, where
$$
O = m \cdot M \cdot N + P \cdot N + p \cdot N_1 \cdot N_2 \cdot N_3
+ q \cdot Nr_1 \cdot Nr_2 \cdot Nr_3
$$
with $m$, $p$, $q=$ small factors; all other variables have the same
meaning as above.
Note that if the $\Gamma$-point only ($\mathbf{q}=0$) is used to
sample the Brillouin Zone, the value of $N$ will be cut into half.
Code \texttt{memory.x} yields a rough estimate of the memory required
by \texttt{pw.x} and checks for the validity of the input data file as
well. Use it exactly as \texttt{pw.x}.
The memory required by the phonon code follows the same patterns,
with somewhat larger factors $m$, $p$, $q$.
\subsection{File space requirements}
A typical \texttt{pw.x} run will require an amount of temporary disk
space in the order of $O$ double precision complex numbers:
$$
O = N_k \cdot M \cdot N + q \cdot Nr_1 \cdot Nr_2 \cdot Nr_3
$$
where $q=2 \cdot \mathtt{mixing\_ndim}$ (number of iterations used in
self-consistency, default value $=8$) if \texttt{disk\_io} is set to
\texttt{'high'} or not specified;
$q=0$ if \texttt{disk\_io='low'} or \texttt{'minimal'}.
\subsection{Parallelization issues}
\label{parissues}
\texttt{pw.x} can run in principle on any number of processors (up to
\texttt{maxproc}, presently fixed at 128 in \texttt{PW/para.f90}).
The $N_p$ processors can be divided into $N_{pk}$ pools of $N_{pr}$
processors, $N_p=N_{pk}*N_{pr}$.
The k-points are divided across $N_{pk}$ pools (``k-point
parallelization''), while both R- and G-space grids are divided across
the $N_{pr}$ processors of each pool (``PW parallelization'').
A third level of parallelization, on the number of bands, is
currently confined to the calculation of a few quantities that
would not be parallelized at all otherwise.
A fourth level of parallelization, on the number of NEB images,
is available for NEB calculation only.
The effectiveness of parallelization depends on the size and type of
the system and on a judicious choice of the $N_{pk}$ and $N_{pr}$:
\begin{itemize}
\item
k-point parallelization is very effective if $N_{pk}$ is a divisor
of the number of k-points (linear speedup guaranteed), \emph{but}
it does not reduce the amount of memory per processor taken by the
calculation.
As a consequence, large systems may not fit into memory.
The same applies to parallelization over NEB images.
\item
PW parallelization works well if $N_{pr}$ is a divisor of both
dimensions along the $z$ axis of the FFT grids, $N_3$ and $Nr_3$
(which may coincide).
It does not scale so well as k-point parallelization, but it
reduces both CPU time AND memory (the latter almost linearly).
\item
Optimal serial performances are achieved when the data are as much
as possible kept into the cache.
As a side effect, one can achieve better than linear scaling with
the number of processors, thanks to the increase in serial speed
coming from the reduction of data size (making it easier for the
machine to keep data in the cache).
\end{itemize}
Note that for each system there is an optimal range of number of
processors on which to run the job.
A too large number of processors will yield performance degradation,
or may cause the parallelization algorithm to fail in distributing
properly R- and G-space grids.
Note also that Beowulf-style machines (PC clusters) may have
disappointing parallelization performances unless they have a decent
communication hardware (at least Gigabit ethernet).
Do not expect good scaling with cheap hardware: plane-wave
calculations are not at all an "embarrassing parallel" problem.
Note that multiprocessor motherboards for Intel Pentium CPUs typically
have just one memory bus for all processors.
This dramatically slows down any code doing massive access to memory
(as most codes in the $\nu-$ESPRESSO package do) that runs on processors of
the same motherboard.
\clearpage
\section{Troubleshooting (PWscf)}
Almost all problems in PWscf arise from incorrect input data and
result in error stops. Error messages should be self-explanatory,
but unfortunately this is not always true. If the code issues a
warning messages and continues, pay attention to it but do not
assume that something is necessarily wrong in your calculation:
most warning messages signal harmless problems.
Note for PC Linux clusters in parallel execution: in at least some
versions of MPICH, the current directory is set to the directory where
the \emph{executable code} resides, instead of being set to the
directory where the code is executed.
This MPICH weirdness may cause unexpected failures in some
postprocessing codes (i.e., \texttt{chdens.x}) that expect a data file
in the current directory.
Workaround: use symbolic links, or copy the executable to the current
directory.
Typical \texttt{pw.x} and/or \texttt{ph.x} (mis-)behavior:
\paragraph{\texttt{pw.x} yields a message like ``error while loading
shared libraries: \dots{} cannot open shared object file''
and does not start.}
Possible reasons:
\begin{itemize}
\item
If you are running on the same machines on which the code was
compiled, this is a library configuration problem.
The solution is machine-dependent.
On Linux, find the path to the missing libraries; then either add
it to file \texttt{/etc/ld.so.conf} and run \texttt{ldconfig}
(must be done as root), or add it to variable
\texttt{LD\_LIBRARY\_PATH} and export it.
Another possibility is to load non-shared version of libraries
(ending with \texttt{.a}) instead of shared ones (ending with
\texttt{.so}).
\item
If you are \emph{not} running on the same machines on which the
code was compiled: you need either to have the same shared
libraries installed on both machines, or to load statically all
libraries (using appropriate \texttt{configure} or loader options).
The same applies to Beowulf-style parallel machines: the needed
shared libraries must be present on all PC's.
\end{itemize}
\paragraph{errors in examples with parallel execution}
If you get error messages in the example scripts -- i.e. not errors
in the codes -- on a parallel machine, such as e.g. :
``\texttt{run\_example: -n: command not found}''
you have forgotten the `''` in the definitions of
\texttt{PARA\_PREFIX} and \texttt{PARA\_POSTFIX}.
\paragraph{\texttt{pw.x} stops with error in reading.}
There is an error in the input data.
Usually it is a misspelled namelist variable, or an empty input file.
Note that out-of-bound indices in dimensioned variables read in the
namelist may cause the code to crash with really mysterious error
messages.
Also note that input data files containing \texttt{\^{}M} (Control-M)
characters at the end of lines (typically, files coming from Windows
PC) may yield error in reading.
If none of the above applies and the code stops at the first namelist
(``control'')
and you are running on a PC cluster: your communication library
(MPI) might not be properly configured to allow input
redirection (so that you are effectively reading an empty file).
See section ``Running on parallel machines'', or inquire with your
local computer wizard (if any).
\paragraph{\texttt{pw.x} mumbles something like ``cannot recover'' or
``error reading recover file''.}
You are trying to restart from a previous job that either produced
corrupted files, or did not do what you think it did. No luck:
you have to restart from scratch.
\paragraph{\texttt{pw.x} stops with error in cdiagh or cdiaghg.}
Possible reasons:
\begin{itemize}
\item
serious error in data, such as bad atomic positions or bad crystal
structure/supercell;
\item
a bad PP (for instance, with a ghost);
\item
a failure of the algorithm performing subspace diagonalization.
The LAPACK algorithms used by cdiagh or cdiaghg are very robust
and extensively tested. Still, it may seldom happen that such
algorithms fail. In at least one case the failures was tracked
to the non-positiveness of the S matrix appearing in the US-PP
formalism. In other cases, the error is found to be non reproducible
on different architectures and disappearing if the calculation
is repeated with even minimal changes in its parameters.
In both cases, the reasons for such behavior are unclear and
the only advice is to use conjugate-gradient diagonalization
(\texttt{diagonalization='cg'}), a slower but very robust
algorithm, and see what happens.
\item
HP-Compaq alphas with \texttt{cxml} libraries: try to use compiled
BLAS and LAPACK (or better, ATLAS) instead of those contained in
\texttt{cxml} (just load them before \texttt{cxml}).
\end{itemize}
\paragraph{\texttt{pw.x} crashes with ``floating invalid'' or ``floating divide by zero''.}
If this happens on HP-Compaq True64 Alpha machines with an old
version of the compiler: the compiler is most likely buggy.
Otherwise, move to next item.
\paragraph{\texttt{pw.x} crashes with no error message at all.}
This happens quite often in parallel execution, or under a batch
queue, or if you are writing the output to a file.
When the program crashes, part of the output, including the error
message, may be lost, or hidden into error files where nobody looks
into.
It is the fault of the operating system, not of the code.
Try to run interactively and to write to the screen.
If this doesn't help, move to next point.
\paragraph{\texttt{pw.x} crashes with ``segmentation fault'' or
similarly obscure messages.}
Possible reasons:
\begin{itemize}
\item
nonexistent or non accessible {\tt outdir}.
Note that in parallel execution, {\tt outdir} must exist and be
accessible to all active processors.
\item
too much RAM memory requested (see next item).
\item
if you are using highly optimized mathematical libraries, verify
that they are designed for your hardware.
In particular, for Intel compiler and MKL libraries, verify that
you loaded the correct set of CPU-specific MKL libraries.
\item
buggy compiler.
If you are using Portland or Intel compilers on Linux PC's or
clusters, see section \ref{installissues}, ``Installation
issues''.
\end{itemize}
\paragraph{\texttt{pw.x} works for simple systems, but not for large
systems or whenever more RAM is needed.}
Possible solutions:
\begin{itemize}
\item
increase the amount of RAM you are authorized to use (which may be
much smaller than the available RAM).
Ask your system administrator if you don't know what to do.
\item
reduce \texttt{nbnd} to the strict minimum, or reduce the cutoffs,
or the cell size.
\item
use conjugate-gradient (\texttt{diagonalization='cg'}: slow
but very robust) or DIIS (\texttt{diagonalization='diis'}:
fast but not very robust):
both requires less memory than the default Davidson algorithm.
\item
in parallel execution, use more processors, or use the same number
of processors with less pools.
Remember that parallelization with respect to k-points (pools)
does not distribute memory: parallelization with respect to
\textbf{R}- (and \textbf{G}-) space does.
\item
IBM only (32-bit machines): if you need more than 256 MB you must
specify it at link time (option \texttt{-bmaxdata}).
\item
buggy compiler.
Some versions of Portland compiler on Linux PC's or clusters have
this problem.
\end{itemize}
\paragraph{\texttt{pw.x} runs but nothing happens.}
Possible reasons:
\begin{itemize}
\item
in parallel execution, the code died on just one processor.
Unpredictable behavior may follow.
\item
in serial execution, the code encountered a floating-point error
and goes on producing NaN's (Not a Number) forever unless
exception handling is on (and usually it isn't).
In both cases, look for one of the reasons given above.
\item
maybe your calculation will take more time than you expect.
\end{itemize}
\paragraph{\texttt{pw.x} yields weird results.}
Possible solutions:
\begin{itemize}
\item
if this happen after a change in the code or in compilation or
preprocessing options, try \texttt{make clean} and recompile.
The \texttt{make} command should take care of all dependencies,
but do not rely too heavily on it.
If the problem persists, \texttt{make clean} and recompile with
reduced optimization level.
\item
maybe your input data are weird.
\end{itemize}
\paragraph{\texttt{pw.x} stops with error message ``the system is
metallic, specify occupations''.}
You did not specify state occupations, but you need to, since your
system appears to have an odd number of electrons.
The variable controlling how metallicity is treated is
\texttt{occupations} in namelist \texttt{\&SYSTEM}.
The default, \texttt{occupations='fixed'}, occupies the lowest
\texttt{nelec/2} states and works only for insulators with a gap.
In all other cases, use \texttt{'smearing'} or \texttt{'tetrahedra'}.
See file \texttt{INPUT\_PW} for more details.
\paragraph{\texttt{pw.x} stops with ``unexpected error'' in
\texttt{efermi}.}
Possible reasons:
\begin{itemize}
\item
serious error in data, such as bad number of electrons,
insufficient number of bands, absurd value of broadening, or too
few tetrahedra;
\item
the Fermi energy is found by bisection assuming that the
integrated DOS $N(E)$ is an increasing function of the energy.
This is {\em not} guaranteed for Methfessel-Paxton smearing of
order 1 and can give problems when very few k-points are used.
Use some other smearing function: simple Gaussian broadening or,
better, Marzari-Vanderbilt ``cold smearing''.
\end{itemize}
\paragraph{the FFT grids in \texttt{pw.x} are machine-dependent.}
Yes, they are!
The code automatically chooses the smallest grid that is compatible
with the specified cutoff in the specified cell, \emph{and} is an
allowed value for the FFT library used.
Most FFT libraries are implemented, or perform well, only with
dimensions that factors into products of small numers (2, 3, 5
typically, sometimes 7 and 11).
Different FFT libraries follow different rules and thus different
dimensions can result for the same system on different machines (or
even on the same machine, with a different FFT).
See function \texttt{allowed} in \texttt{Modules/fft\_scalar.f90}.
As a consequence, the energy may be slightly different on different
machines.
The only piece that depends explicitely on the grid parameters is the
XC part of the energy that is computed numerically on the grid.
The differences should be small, though, expecially for LDA
calculations.
Manually setting the FFT grids to a desired value is possible, but
slightly tricky, using input variables \texttt{nr1, nr2, nr3} and
\texttt{nr1s, nr2s, nr3s}.
The code will still increase them if not acceptable.
Automatic FFT grid dimensions are slightly overestimated, so one may
try --- very carefully --- to reduce them a little bit.
The code will stop if too small values are required, it will waste CPU
time and memory for too large values.
Note that in parallel execution, it is very convenient to have FFT
grid dimensions along z that are a multiple of the number of
processors.
\paragraph{``warning: symmetry operation \# N not allowed''.}
This is not an error.
\texttt{pw.x} determines first the symmetry operations (rotations)
of the Bravais lattice; then checks which of these are symmetry
operations of the system (including if needed fractional
translations).
This is done by rotating (and translating if needed) the atoms in
the unit cell and verifying if the rotated unit cell coincides
with the original one.
If a symmetry operation contains a
fractional translation that is incompatible with the FFT grid,
it is discarded in order to prevent problems with symmetrization.
Typical fractional translations are 1/2 or 1/3 of a lattice
vector. If the FFT grid dimension along that direction is not
divisible respectively by 2 or by 3, the symmetry operation will
not transform the FFT grid into itself.
\paragraph{\texttt{pw.x} doesn't find all the symmetries you
expected.}
See above to learn how PWscf finds symmetry operations.
Some of them might be missing because:
\begin{itemize}
\item
the number of significant figures in the atomic positions is not
large enough.
In file \texttt{PW/eqvect.f90}, the variable \texttt{accep} is
used to decide whether a rotation is a symmetry operation.
Its current value ($10^{-5}$) is quite strict: a rotated atom must
coincide with another atom to 5 significant digits.
You may change the value of \texttt{accep} and recompile.
\item
they are not acceptable symmetry operations of the Bravais
lattice.
This is the case for C$_{60}$, for instance: the $I_h$ icosahedral
group of C$_{60}$ contains 5-fold rotations that are incompatible
with translation symmetry.
\item
the system is rotated with respect to symmetry axis.
For instance: a C$_{60}$ molecule in the fcc lattice will have 24
symmetry operations ($T_h$ group) only if the double bond is
aligned along one of the crystal axis; if C$_{60}$ is rotated in
some arbitrary way, \texttt{pw.x} may not find any symmetry, apart
from inversion.
\item
they contain a fractional translation that is incompatible with
the FFT grid (see previous paragraph).
Note that if you change cutoff or unit cell volume, the
automatically computed FFT grid changes, and this may explain
changes in symmetry (and in the number of k-points as a
consequence) for no apparent good reason (only if you have
fractional translations in the system, though).
\item
a fractional translation, without rotation, is a symmetry
operation of the system. This means that the cell is actually
a supercell. In this case, all symmetry operations containing
fractional translations are disabled.
The reason is that in this rather exotic case there is no simple
way to select those symmetry operations forming a true group, in
the mathematical sense of the term.
\end{itemize}
\paragraph{the CPU time is time-dependent!}
Yes it is!
On most machines and on most operating systems, depending on machine
load, on communication load (for parallel machines), on various other
factors (including maybe the phase of the moon), reported CPU times
may vary quite a lot for the same job.
Also note that what is printed is supposed to be the CPU time per
process, but with some compilers it is actually the wall time.
\paragraph{on parallel execution, \texttt{pw.x} stops complaining that
``some processors have no planes'' or ``smooth planes'' or
some other strange error.}
Your system does not require that many processors: reduce the number
of processors to a more sensible value.
In particular, both $N_3$ and $Nr_3$ must be $\geq N_{pr}$ (see
section \ref{performance}, ``Performance Issues'', and in particular
section \ref{parissues}, ``Parallelization issues'', for the meaning
of these variables).
\paragraph{``warning : N eigenvectors not converged ...''}
This is a warning message that can be safely ignored if it
is not present in the last steps of self-consistency. If it
is still present in the last steps of self-consistency, and
if the number of unconverged eigevector is a significant
part of the total, it may signal serious trouble in self-consistency
(see next point) or something badly wrong in input data.
\paragraph{``warning : negative or imaginary charge...'', or
``...core charge ...''}
This is a warning message that can be safely ignored
unless the negative or imaginary charge is sizable,
let us say {\cal O(0.1)}. If it is, something seriously
wrong is going on. Otherwise, the origin of the negative
charge is the following. When one transforms a positive
function in real space to Fourier space and truncates at
some finite cutoff, the positive function is no longer
guaranteed to be positive when transformed back to real
space. This happens only with core corrections and with
ultrasoft pseudopotentials. In some cases it may be a
source of trouble (see next point) but it is usually
solved by increasing the cutoff for the charge density.
\paragraph{self-consistency is slow or does not converge.}
Reduce \texttt{mixing\_beta} from the default value (0.7) to $\sim
0.3-0.1$ or smaller, or try a different \texttt{mixing\_mode}.
You may also try to increase \texttt{mixing\_ndim} to more than 8
(default value).
Beware: the larger \texttt{mixing\_ndim}, the larger the amount of
memory you need.
If the above doesn't help: verify if your system is metallic or is
close to a metallic state, especially if you have few k-points.
If the highest occupied and lowest unoccupied state(s) keep exchanging
place during self-consistency, forget about reaching convergence. A
typical sign of such behavior is that the self-consistency error
goes down, down, down, than all of a sudden up again, and so on.
Usually one can solve the problem by adding a few empty bands and a
broadening.
Specific to US PP: the presence of negative charge density regions due
to either the pseudization procedure of the augmentation part or to
truncation at finite cutoff may give convergence problems.
Raising the \texttt{ecutrho} cutoff for charge density will usually
help, especially in gradient-corrected calculations.
\paragraph{structural optimization is slow or does not converge.}
Typical structural optimizations, based on the BFGS algorithm, converge to
the default thresholds ( \texttt{etot\_conv\_thr} and
\texttt{forc\_conv\_thr} ) in 15-25 BFGS steps (depending on the starting
configuration). This may not happen when your system is characterized by
``floppy'' low-energy modes, that make very difficult --- and of little use
anyway --- to reach a well converged structure, no matter what. Other
possible reasons for a problematic convergence are listed below.
Close to convergence the self-consistency error in forces may become
large with respect to the value of forces. The resulting mismatch
between forces and energies may confuse the line minimization
algorithm, which assumes consistency between the two. The code
reduces the starting self-consistency threshold
\texttt{conv\_thr} when approaching the minimum energy configuration,
up to a factor defined by \texttt{upscale}. Reducing
\texttt{conv\_thr} (or increasing \texttt{upscale}) yields a smoother
structural optimization, but if \texttt{conv\_thr} becomes too small,
electronic self-consistency may not converge. You may also increase
variables \texttt{etot\_conv\_thr} and
\texttt{forc\_conv\_thr} that determine the threshold for convergence
(the default values are quite strict).
A limitation to the accuracy of forces comes from the absence of
perfect translational invariance. If we had only the Hartree
potential, our PW calculation would be translationally invariant to
machine precision. The presence of an exchange-correlation potential
introduces Fourier components in the potential that are not in our
basis set. This loss of precision (more serious for
gradient-corrected functionals) translates into a slight but
detectable loss of translational invariance (the energy changes if all
atoms are displaced by the same quantity, not commensurate with the
FFT grid). This puts a limit to the accuracy of forces. The
situation improves somewhat by increasing the \texttt{ecutrho} cutoff.
\paragraph{\texttt{ph.x} stops with ``error reading file''.}
The data file produced by \texttt{pw.x} is bad or incomplete or
produced by an incompatible version of the code.
In parallel execution: if you did not set \texttt{wf\_collect=.true.},
the number of processors and pools for the phonon run should be the
same as for the self-consistent run; all files must be visible to all
processors.
\paragraph{\texttt{ph.x} mumbles something like ``cannot recover'' or
``error reading recover file''.}
You have a bad restart file from a preceding failed execution.
Remove all files \texttt{recover*} in \texttt{outdir}.
\paragraph{\texttt{ph.x} says ``occupation numbers probably wrong''
and continues; or ``phonon + tetrahedra not implemented'' and stops}
You have a metallic or spin-polarized system but occupations are not
set to ``smearing''. Note that the correct way to calculate occupancies
must be specified in the input data of the non-selfconsistent
calculation, if the phonon code reads data from it. The non-selfconsistent
calculation will not use this information but the phonon code will.
\paragraph{\texttt{ph.x} does not yield acoustic modes with $\omega=0$
at $\mathbf{q}=0$.}
This may not be an error: the Acoustic Sum Rule (ASR) is never exactly
verified, because the system is never exactly translationally
invariant as it should be (see the discussion above).
The calculated frequency of the acoustic mode is typically less than
10 cm$^{-1}$, but in some cases it may be much higher, up to 100
cm$^{-1}$.
The ultimate test is to diagonalize the dynamical matrix with program
\texttt{dynmat.x}, imposing the ASR.
If you obtain an acoustic mode with a much smaller $\omega$ (let's say
$<1 \textrm{cm}^{-1}$) with all other modes virtually unchanged, you
can trust your results.
\paragraph{\texttt{ph.x} yields really lousy phonons, with bad or
negative frequencies or wrong symmetries or gross ASR
violations.}
Possible reasons:
\begin{itemize}
\item
wrong data file file read.
\item
wrong atomic masses given in input will yield wrong frequencies
(but the content of file {\tt fildyn} should be valid, since the
force constants, not the dynamical matrix, are written to file).
\item
convergence threshold for either SCF ({\tt conv\_thr}) or phonon
calculation ({\tt tr2\_ph}) too large (try to reduce them).
\item
maybe your system \emph{does} have negative or strange phonon
frequencies, with the approximations you used.
A negative frequency signals a mechanical instability of the
chosen structure.
Check that the structure is reasonable, and check the following
parameters:
\begin{itemize}
\item The cutoff for wavefunctions, \texttt{ecutwfc}
\item For US PP: the cutoff for the charge density,
\texttt{ecutrho}
\item The k-point grid, especially for metallic systems!
\end{itemize}
\end{itemize}
\paragraph{``Wrong degeneracy'' error in star\_q.}
Verify the \textbf{q}-point for which you are calculating phonons.
In order to check whether a symmetry operation belongs to the small
group of \textbf{q}, the code compares \textbf{q} and the rotated
\textbf{q}, with an acceptance tolerance of $10^{-5}$ (set in routine
\texttt{PW/eqvect.f90}).
You may run into trouble if your \textbf{q}-point differs from a
high-symmetry point by an amount in that order of magnitude.
\end{document}