quantum-espresso/Doc/users-guide.tex

3446 lines
140 KiB
TeX

\documentclass[12pt,a4paper]{article}
\def\version{3.1.1}
\def\stableversion{3.1.1} % last stable release
\usepackage{epsfig}
\usepackage{html}
%\def\htmladdnormallink#1#2{#1}
\begin{document}
\author{}
\date{}
\title{
% PWscf and Democritos logos, raise the latter to align
\epsfig{figure=pwscf,width=4cm}\hfill%
\raisebox{0.5cm}{\epsfig{figure=democritos,width=8cm}}
\vspace{1.5cm}
\\
% title
\huge User's Guide for Quantum-ESPRESSO \smallskip\\
\Large (version \version)
}
\maketitle
\tableofcontents
\clearpage
\section{Introduction}
This guide covers the installation and usage of Quantum-ESPRESSO
(opEn-Source Package for Research in Electronic Structure, Simulation,
and Optimization), version \version.
The Quantum-ESPRESSO package contains the following codes for the
calculation of electronic-structure properties within
Density-Functional Theory, using a Plane-Wave basis set and
pseudopotentials:
\begin{itemize}
\item PWscf (Plane-Wave Self-Consistent Field).
\item CP (Car-Parrinello).
\end{itemize}
and the following auxiliary codes:
\begin{itemize}
\item PWgui (Graphical User Interface for PWscf): a graphical
interface for producing input data files for PWscf.
\item atomic: a program for atomic calculations and generation of
pseudopotentials.
\item iotk: an Input-Output ToolKit.
\end{itemize}
%
The Quantum-ESPRESSO codes work on many different types of Unix machines,
including parallel machines using Message Passing Interface (MPI).
Running Quantum-ESPRESSO on Mac OS X and MS-Windows is also possible:
see section \ref{installation}, ``Installation''.
Further documentation, beyond what is provided in this guide, can be
found in:
\begin{itemize}
\item the \texttt{Doc/} directory of the Quantum-ESPRESSO distribution
In particular the \texttt{INPUT\_*} files contain the detailed
listing of available input variables and cards.
\item the various \texttt{README} files found in the distribution
\item the Pw\_forum mailing list
(\htmladdnormallink{\texttt{pw\_forum@pwscf.org}}%
{mailto:pw_forum@pwscf.org})
You can subscribe to this list and browse and search its
archives from the PWscf web site
(\htmladdnormallink{\texttt{http://www.pwscf.org/}}%
{http://www.pwscf.org/}).
Only subscribed users can post.
Please search the archives before posting: your question may
have already been answered.
\item the ``Scientific Software'' page of the Democritos web site
\hfill\break
(\htmladdnormallink%
{\texttt{http://www.democritos.it/scientific.php}}%
{http://www.democritos.it/scientific.php})
\end{itemize}
%
This guide does \emph{not} explain solid state physics and its
computational methods.
If you want to learn that, read a good textbook.
\subsection{Codes}
PWscf can currently perform the following kinds of calculations:
\begin{itemize}
\item ground-state energy and one-electron (Kohn-Sham) orbitals
\item atomic forces, stresses, and structural optimization
\item molecular dynamics on the ground-state Born-Oppenheimer
surface, also with variable-cell
\item Nudged Elastic Band (NEB) and Fourier String Method Dynamics (SMD)
for energy barriers and reaction paths
\item phonon frequencies and eigenvectors at a generic wave vector,
using Density-Functional Perturbation Theory
\item effective charges and dielectric tensors
\item electron-phonon interaction coefficients for metals
\item interatomic force constants in real space
\item third-order anharmonic phonon lifetimes
\item Infrared and Raman (nonresonant) cross section
\item macroscopic polarization via Berry Phase
\end{itemize}
All of the above work for both insulators and metals, in any crystal
structure, for many exchange-correlation functionals (including spin
polarization and LDA+U), for both norm-conserving (Hamann-Schl\"uter-Chiang)
pseudopotentials in separable form, and --- with very few exceptions
--- for Ultrasoft (Vanderbilt) pseudopotentials. Non-collinear
magnetism and spin-orbit interactions are also implemented. Finite
electric fields are implemented in both the supercell and the
``modern theory of polarization'' approaches (the latter is still
at an experimental stage).
Various postprocessing and data analysis programs are available.
CP can currently perform the following kinds of calculations:
\begin{itemize}
\item Car-Parrinello molecular dynamics simulation
\item geometry optimization by damped dynamics
\item constant-temperature simulation with Nos\`e thermostats
(including\break Nos\`e-Hoover chains for each atom)
\item variable-cell (Parrinello-Rahman) dynamics
\item Nudged Elastic Band (NEB) for energy barriers and reaction
paths
\item String Method Dynamics (in real space)
\item dynamics with Wannier functions and under finite electric
fields
\end{itemize}
Spin-polarized calculations.
CP works with both norm-conserving and Ultrasoft pseudopotentials.
There are implementations of a dynamics for metals using
conjugate-gradient algorithms, and of the meta-GGA functionals.
Both are at an experimental stage.
\subsection{People}
\hyphenation{gian-noz-zi}
The maintenance and further development of the Quantum-ESPRESSO code is
promoted by the DEMOCRITOS National Simulation Center of INFM (Italian
Institute for Condensed Matter Physics) under the coordination of
Paolo Giannozzi (Scuola Normale Superiore, Pisa), with the strong
support of the CINECA National Supercomputing Center in Bologna under
the responsibility of Carlo Cavazzoni.
The PWscf package was originally developed by Stefano Baroni, Stefano
de Gironcoli, Andrea Dal Corso (SISSA), Paolo Giannozzi, and many
others, in particular:\\
-- Matteo Cococcioni (MIT) and SdG implemented LDA+U. \\
-- Michele Lazzeri (Paris VI) implemented the $2n+1$ code and Raman cross
section calculation with 2nd-order response.\\
-- Oswaldo Dieguez (Rutgers) implemented Berry's phase calculations.\\
-- Ralph Gebauer (ICTP, Trieste) and Adriano Mosca Conte (SISSA, Trieste)
implemented noncolinear magnetism, AdC the spin-orbit.\\
-- Mickael Profeta (Paris VI) implemented electric-field gradients.\\
-- Carlo Sbraccia (Princeton) implemented NEB, Strings method, Metadynamics.\\
-- Alexander Smogunov (SISSA) and AdC implemented ballistic conductance.\\
-- Paolo Umari (MIT) implemented finite electric fields.\\
-- Renata Wentzcovitch (UMinn) implemented variable-cell molecular dynamics.\\
-- Yudong Wu (Princeton) and Carlo Sbraccia implemented Metadynamics.
The CP code is based on the original code written by Roberto Car and
Michele Parrinello. CP was developed by Alfredo Pasquarello (IRRMA,
Lausanne), Kari Laasonen (Oulu), Andrea Trave (LLNL), Roberto Car
(Princeton), Nicola Marzari (MIT), Paolo Giannozzi, and by former
FPMD team: Carlo Cavazzoni, Gerardo Ballabio (CINECA), Sandro Scandolo
(ICTP), Guido Chiarotti (SISSA), Paolo Focher, and others.
In particular:\\
-- Yosuke Kanai (Princeton) implemented Strings method.\\
-- Carlo Sbraccia (Princeton) implemented NEB and Metadynamics.\\
-- Manu Sharma (Princeton) and Yudong Wu (Princeton) implemented
maximally localized Wannier functions and dynamics with Wannier functions.\\
-- Paolo Umari (MIT) implemented finite electric fields and conjugate
gradients.\\
-- Paolo Umari and Ismaila Dabo (MIT) implemented ensemble-DFT.\\
-- Xiaofei Wang (Princeton) implemented META-GGA.\\
-- The Autopilot feature was implemented by Targacept, Inc.
Gerardo Ballabio implemented ``configure" for Quantum-Espresso.
PWgui was written by Anton Kokalj (Jo\v{z}ef Stefan Institute, Ljubljana)
and is based on his GUIB concept
(\htmladdnormallink{\texttt{http://www-k3.ijs.si/kokalj/guib/}}%
{http://www-k3.ijs.si/kokalj/guib/}).
The pseudopotential generation package ``atomic'' was written by
Andrea Dal Corso and it is the result of many additions to the
original code by Paolo Giannozzi and others.
\hyphenation{mo-de-na}
The input/output toolkit ``iotk''
(\htmladdnormallink{\texttt{http://www.s3.infm.it/iotk}}%
{http://www.s3.infm.it/iotk/})
was written by Giovanni Bussi (S3, Modena).
The calculation of the finite (imaginary) frequency molecular
polarizability using the approximated Thomas-Fermi + von Weizaecker
scheme was contributed by Huy-Viet Nguyen (SISSA),
The frozen-phonon code was contributed by Silviu Zilberman
(Princeton).
The calculation of the finite (imaginary) frequency molecular
polarizability using the approximated Thomas-Fermi + von Weiz\"acker
scheme was contributed by Huy-Viet Nguyen (Sissa),
The BlueGene porting was done by Costas Bekas and Alessandro Curioni
(IBM Zurich).
\hyphenation{fran-ce-sco}
\hyphenation{ce-re-so-li}
An alphabetical list of further contributors includes:
Dario Alf\`e,
Francesco Antoniella,
Mauro Boero,
Nicola Bonini,
Claudia Bungaro,
Paolo Cazzato,
Davide Ceresoli,
Gabriele Cipriani,
Matteo Cococcioni,
Cesar Da Silva,
Alberto Debernardi,
Gernot Deinzer,
Andrea Ferretti,
Guido Fratesi,
Martin Hilgeman,
Eyvaz Isaev,
Axel Kohlmeyer,
Konstantin Kudin,
Nicolas Lacorne,
Stephane Lefranc,
Sergey Lisenkov,
Kurt Maeder,
Andrea Marini,
Francesco Mauri,
Riccardo Mazzarello,
Nicolas Mounet,
Pasquale Pavone,
Guido Roma,
Kurt Stokbro,
Paul Tangney,
Pascal Thibaudeau,
Antonio Tilocca,
Jaro Tobik,
Malgorzata Wierzbowska,
and let us apologize to everybody we have forgotten.
This guide was mostly written by Paolo Giannozzi, Gerardo Ballabio,
Carlo Cavazzoni.
\subsection{Contacts}
The web site for Quantum-ESPRESSO is:
\medskip
\htmladdnormallink{\texttt{http://www.quantum-espresso.org/}}%
{http://www.quantum-espresso.org/}
\medskip
\noindent
Releases and patches of Quantum-ESPRESSO can be downloaded from this
site or following the links contained in it.
Announcements about new versions of Quantum-ESPRESSO are available
via a low-traffic mailing list Pw$\_$users:
(\htmladdnormallink{\texttt{pw\_users@pwscf.org}}%
{mailto:pw\_users@pwscf.org}).
You can subscribe (but not post) to this list from the PWscf web site.
The recommended place where to ask questions about installation and
usage of Quantum-ESPRESSO, and to report bugs, is the Pw$\_$forum
mailing list
(\htmladdnormallink{\texttt{pw\_forum@pwscf.org}}%
{mailto:pw\_forum@pwscf.org}).
Here you can obtain help from the developers and many knowledgeable
users. You can subscribe to this list and browse and search its
archive from the PWscf web site. Only subscribed users can post
Please search the archives before posting: your
question may have already been answered.
If you specifically need to contact the developers of Quantum-ESPRESSO
(and only them), write to
\htmladdnormallink{\texttt{pwscf@pwscf.org}}%
{mailto:pwscf@pwscf.org}. Please note that such
address may change in the future: see the web site for the updated e-mail.
Other pointers:\\
DEMOCRITOS:
\htmladdnormallink{\texttt{http://www.democritos.it/}}%
{http://www.democritos.it/}\\
INFM:
\htmladdnormallink{\texttt{http://www.infm.it/}}%
{http://www.infm.it/}\\
CINECA:
\htmladdnormallink{\texttt{http://www.cineca.it/}}%
{http://www.cineca.it/}\\
SISSA:
\htmladdnormallink{\texttt{http://www.sissa.it/}}%
{http://www.sissa.it/}
\subsection{Terms of use}
Quantum-ESPRESSO is free software, released under the GNU General Public
License
(\htmladdnormallink{\texttt{http://www.pwscf.org/License.txt}}%
{http://www.pwscf.org/License.txt},
or the file \texttt{License} in the distribution).
All trademarks mentioned in this guide belong to their respective
owners.
We shall greatly appreciate if scientific work done using this code
will contain an explicit acknowledgment and a reference to the
Quantum-ESPRESSO web page.
Our preferred form for the acknowledgment is the following:
\begin{quote}
\emph{Acknowledgments:}\\
Calculations in this work have been done using the Quantum-ESPRESSO package
[\emph{ref}].
\par\noindent
\emph{Bibliography:}\\{}
[\emph{ref}]
S.~Baroni, A.~Dal Corso, S.~de Gironcoli, P.~Giannozzi, % PWscf
C.~Cavazzoni, G.~Ballabio, S.~Scandolo, G.~Chiarotti, P.~Focher, % FPMD
A.~Pasquarello, K.~Laasonen, A.~Trave, R.~Car, N.~Marzari, % CP
A.~Kokalj, % PWgui
\texttt{http://www.pwscf.org/}.
\end{quote}
\clearpage
\section{Installation}
\label{installation}
Presently, the Quantum-ESPRESSO package is only distributed in source
form; some precompiled executables (binary files) are provided only
for\break PWgui. Providing binaries would require too much effort
and would work only for a small number of machines anyway.
Stable releases of the Quantum-ESPRESSO source package (current version
is \stableversion) can be downloaded from this URL:
\medskip
\htmladdnormallink{\texttt{http://www.pwscf.org/download.htm}}%
{http://www.pwscf.org/download.htm}
\medskip
\noindent
Uncompress and unpack the distribution using the command:
\medskip
\texttt{tar zxvf espresso-\stableversion.tar.gz}
\medskip
\noindent
If your version of \texttt{tar} doesn't recognize the \texttt{z} flag,
use this instead:
\medskip
\texttt{gunzip -c espresso-\stableversion.tar.gz | tar xvf -}
\medskip
\noindent
\texttt{cd} to the directory \texttt{espresso/} that will be created.
The bravest may access the (unstable) development version via anonymous
CVS (Concurrent Version System): see the file \texttt{README.cvs}
contained in the distribution.
To install Quantum-ESPRESSO from source, you need C and Fortran-95
compilers (Fortran-90 is not sufficient, but most ``Fortran-90"
compilers are actually Fortran-95-compliant).
If you don't have a commercial Fortran-95 compiler, you may install
the free \texttt{g95} compiler:
(\htmladdnormallink{\texttt{http://www.g95.org/}}%
{http://www.g95.org/})
or the GNU fortran compiler \texttt{gfortran}:
(\htmladdnormallink{\texttt{http://www.gfortran.org/}}%
{http://www.gfortran.org/}).
You also need a minimal Unix environment: basically, a command shell
(e.g., \texttt{bash} or \texttt{tcsh}) and the utilities
\texttt{make}, \texttt{awk} and \texttt{sed}.
MS-Windows users need to have Cygwin (a UNIX environment which runs
under Windows) installed: see
\htmladdnormallink%
{\texttt{http://www.cygwin.com/}}%
{http://www.cygwin.com/}
Instructions for the impatient:
\begin{verbatim}
./configure
make all
\end{verbatim}
Executable programs (actually, symlinks to them) will be placed in the
\texttt{bin/} directory.
If you have problems or would like to tweak the default settings, read
the detailed instructions below.
\subsection{Configure}
To configure the Quantum-ESPRESSO source package, run the \texttt{configure}
script. It will (try to) detect compilers and libraries available on
your machine, and set up things accordingly.
Presently it is expected to work on most Linux 32- and 64-bit (Itanium
and Opteron) PCs and clusters, IBM SP machines, SGI Origin, some
HP-Compaq Alpha machines, Cray X1, Mac OS X, MS-Windows PCs.
It may work with
some assistance also on other architectures (see below).
For cross-compilation, you have to specify the target machine with the
\texttt{--host} option (see below). This feature has not been
extensively tested, but we had at least one successful report
(compilation for NEC SX6 on a PC).
Specifically, \texttt{configure} generates the following files:
\begin{quote}
\texttt{make.sys}: compilation rules and flags\\
\texttt{*/make.depend}: dependencies, per source directory\\
\texttt{configure.msg}: a report of the configuration run
\end{quote}
\texttt{configure.msg} is only used by \texttt{configure} to print its
final report. It isn't needed for compilation.
\texttt{make.depend} files are actually generated by invoking the
\texttt{makedeps.sh} shell script. If you modify the program sources,
you might have to rerun it.
You should always be able to compile the Quantum-ESPRESSO suite of programs
without having to edit any of the generated files. However you may
have to tune \texttt{configure} by specifying appropriate environment
variables and/or command-line options.
Usually the most tricky part is to get external libraries recognized
and used: see section \ref{libraries}, ``Libraries'', for details and
hints.
Environment variables may be set in any of these ways:
\begin{verbatim}
export VARIABLE=value # sh, bash, ksh
./configure
setenv VARIABLE value # csh, tcsh
./configure
./configure VARIABLE=value # any shell
\end{verbatim}
Some environment variables that are relevant to \texttt{configure} are:
\begin{quote}
\texttt{ARCH}:
label identifying the machine type (see below)\\
\texttt{F90}, \texttt{F77}, \texttt{CC}:
names of Fortran 95, Fortran 77, and C compilers\\
\texttt{MPIF90}, \texttt{MPIF77}, \texttt{MPICC}:
names of parallel compilers\\
\texttt{CPP}:
source file preprocessor (defaults to \texttt{\$CC -E})\\
\texttt{LD}: linker (defaults to \texttt{\$MPIF90})\\
\texttt{CFLAGS}, \texttt{FFLAGS}, \texttt{F90FLAGS},
\texttt{CPPFLAGS}, \texttt{LDFLAGS}:
compilation flags\\
\texttt{LIBDIRS}:
extra directories to search for libraries (see below)
\end{quote}
For example, the following command line:
\begin{verbatim}
./configure MPIF90=mpf90 FFLAGS="-O2 -assume byterecl" \
CC=gcc CFLAGS=-O3 LDFLAGS=-static
\end{verbatim}
instructs \texttt{configure} to use \texttt{mpf90} as Fortran 95
compiler with flags \texttt{-O2 -assume byterecl},
\texttt{gcc} as C compiler with flags \texttt{-O3}, and to link with
flags \texttt{-static}. Note that the value of \texttt{FFLAGS} must
be quoted, because it contains spaces.
If your machine type is unknown to \texttt{configure}, you may use the
\texttt{ARCH} variable to suggest an architecture among supported
ones. Try the one that looks more similar to your machine type;
you'll probably have to do some additional tweaking.
Currently supported architectures are:
\begin{quote}
\texttt{ia32}: Intel 32-bit machines (x86) running Linux\\
\texttt{ia64}: Intel 64-bit (Itanium) running Linux\\
\texttt{amd64}: AMD 64-bit (Opteron) running Linux\\
\texttt{aix}: IBM AIX machines\\
\texttt{mips}: SGI MIPS machines\\
\texttt{alpha}: HP-Compaq alpha machines\\
\texttt{alinux}: HP-Compaq alpha running Linux\\
\texttt{sparc}: Sun SPARC machines\\
\texttt{crayx1}: Cray X1 machines\\
\texttt{mac}: Apple PowerPC machines running Mac OS X\\
\texttt{cygwin}: MS-Windows PCs with Cygwin
\end{quote}
Finally, \texttt{configure} recognizes the following command-line
options:
\begin{quote}
\texttt{--disable-parallel}:
compile serial code, even if parallel environment is available.\\
\texttt{--disable-shared}:
don't use shared libraries: generate static executables.\\
\texttt{--enable-shared}:
use shared libraries.\\
\texttt{--host=}\emph{target}:
specify target machine for cross-compilation.\break
\emph{Target} must be a string identifying the architecture that
you want to compile for; you can obtain it by running
\texttt{config.guess} on the target machine.
\end{quote}
If you want to modify the \texttt{configure} script (advanced users
only!), read the instructions in \texttt{README.configure} first.
You'll need GNU Autoconf
(\htmladdnormallink{\texttt{http://www.gnu.org/software/autoconf/}}%
{http://www.gnu.org/software/autoconf/}).
\subsubsection{Libraries}
\label{libraries}
Quantum-ESPRESSO makes use of the following external libraries:
\begin{itemize}
\item BLAS
(\htmladdnormallink{\texttt{http://www.netlib.org/blas/}}%
{http://www.netlib.org/blas/})
and LAPACK\hfill\break
(\htmladdnormallink{\texttt{http://www.netlib.org/lapack/}}%
{http://www.netlib.org/lapack/})
for linear algebra
\item FFTW
(\htmladdnormallink{\texttt{http://www.fftw.org/}}%
{http://www.fftw.org/})
for Fast Fourier Transforms
\end{itemize}
A copy of the needed routines is provided with the distribution.
However, when available, optimized vendor-specific libraries can be
used instead: this often yields huge performance gains.
Quantum-ESPRESSO can use the following architecture-specific replacements for
BLAS and LAPACK:
\begin{quote}
\texttt{MKL} for Intel Linux PCs\\
\texttt{ACML} for AMD Linux PCs\\
\texttt{essl} for IBM machines\\
\texttt{complib.sgimath} for SGI Origin\\
\texttt{SCSL} for SGI Altix\\
\texttt{scilib} for Cray T3e\\
\texttt{SUNperf} for Sun\\
\texttt{cxml} for HP-Compaq Alphas.
\end{quote}
If none of these is available, we suggest that you use the optimized
ATLAS library
(\htmladdnormallink{\texttt{http://math-atlas.sourceforge.net/}}%
{http://math-atlas.sourceforge.net/}).
Note that ATLAS is not a complete replacement for LAPACK: it contains
all of the BLAS, plus the LU code, plus the full storage Cholesky
code. Follow the instructions in the ATLAS distributions to produce a
full LAPACK replacement.
Axel Kohlmeyer maintains a set of ATLAS libraries,
containing all of LAPACK and no external reference to fortran
libraries:\hfill\break
\htmladdnormallink%
{{\small\texttt{http://www.theochem.rub.de/\~{}axel.kohlmeyer/%
cpmd-linux.html\#atlas}}}%
{http://www.theochem.rub.de/~axel.kohlmeyer/cpmd-linux.html\#atlas}
Sergei Lisenkov reported success and good performances with
optimized BLAS by Kazushige Goto.
They can be downloaded freely (but not redistributed!) from:
\htmladdnormallink%
{\texttt{http://www.cs.utexas.edu/users/flame/goto/}}%
{http://www.cs.utexas.edu/users/flame/goto/}
At compilation time you have to choose whether to use the built-in
copy of FFTW (v.$<$3), a precompiled FFTW v.$<$3 library, or a
precompiled FFTW v.3 library. This is done using preprocessing
options with rather obvious meaning :
\texttt{\_\_FFTW}, \texttt{\_\_USE\_INTERNAL\_FFTW}, \texttt{\_\_FFTW3}.
The FFTW library can also be replaced by vendor-specific FFT libraries,
if available and if a driver is available in the code. Presently
drivers are present for IBM ESSL, Intel MKL v.8, SCSL and COMPLIB
scientific libraries from SGI, sunperf from SUN. Not all of them
are automatically selected by \texttt{configure}, though.
Finally, Quantum-ESPRESSO can use the MASS vector math library from
IBM, if available (only on AIX).
The \texttt{configure} script attempts to find optimized libraries,
but may fail if they have been installed in non-standard places.
You should examine the final value of \texttt{BLAS\_LIBS},
\texttt{LAPACK\_LIBS}, \texttt{FFT\_LIBS}, \texttt{MPI\_LIBS} (if
needed), \texttt{MASS\_LIBS} (IBM only), either in the output of
\texttt{configure} or in the generated \texttt{make.sys}, to check
whether it found all the libraries that you intend to use.
If any libraries weren't found, you can specify a list of directories
to search in the environment variable \texttt{LIBDIRS}, and rerun
\texttt{configure}; directories in the list must be separated by
spaces. For example:
\begin{verbatim}
./configure LIBDIRS="/opt/intel/mkl70/lib/32 /usr/lib/math"
\end{verbatim}
If this still fails, you may set some or all of the \texttt{*\_LIBS}
variables manually and retry. For example:
\begin{verbatim}
./configure BLAS_LIBS="-L/usr/lib/math -lf77blas -latlas_sse"
\end{verbatim}
Beware that in this case, \texttt{configure} will blindly accept the
specified value, and won't do any extra search. This is so that if
\texttt{configure} finds any library that you don't want to use, you
can override it.
If you want to link to a precompiled FFTW v.$<$3 library, you will need
the corresponding \texttt{fftw.h} include file. That may or may not
have been installed on your system together with the library: in
particular, most Linux distributions split libraries into ``base''
and ``development'' packages, include files normally belonging to the
latter. Thus if you can't find \texttt{fftw.h} on your machine, chances
are you must install the FFTW development package (how to do this and
what it is exactly called depends on your operating system version).
If instead the file is there, but \texttt{configure} doesn't find it,
you may specify its location in the \texttt{INCLUDEFFTW} environment
variable.
For example:
\begin{verbatim}
./configure INCLUDEFFTW="/usr/lib/fftw-2.1.3/fftw"
\end{verbatim}
If everything else fails, you'll have to write the \texttt{make.sys}
file manually: see section \ref{manualconf}, ``Manual configuration''.
\textbf{Please note:}
If you change any settings after a previous (successful or failed)
compilation, you must run \texttt{make clean} before recompiling,
unless you know exactly which routines are affected by the changed
settings and how to force their recompilation.
\subsubsection{Manual configuration}
\label{manualconf}
To configure Quantum-ESPRESSO manually, you have to write a working
\texttt{make.sys} yourself, and run \texttt{makedeps.sh} to generate
\texttt{*/make.depend} files.
For \texttt{make.sys}, several templates (each for a different machine
type) to start with are provided in the \texttt{install/} directory:
they have names of the form \texttt{Make.}\emph{system}, where
\emph{system} is a string identifying the architecture and compiler.
Currently available systems are:
\begin{quote}
\texttt{alpha}: HP-Compaq alpha workstations\\
\texttt{alphaMPI}: HP-Compaq alpha parallel machines\\
\texttt{altix}: SGI Altix 350/3000 with Linux, Intel compiler\\
\texttt{beo\_ifc}: Linux clusters of PCs, Intel compiler\\
\texttt{beowulf}: Linux clusters of PCs, Portland compiler\\
\texttt{bgl}: IBM Blue Gene/L machines\\
\texttt{cygwin}: Windows PC, Intel compiler\\
\texttt{fujitsu}: Fujitsu vector machines\\
\texttt{hitachi}: Hitachi SR8000\\
\texttt{hp}: HP PA-RISC workstations\\
\texttt{hpMPI}: HP PA-RISC parallel machines\\
\texttt{ia64}: HP Itanium workstations\\
\texttt{ibm}: IBM RS6000 workstations\\
\texttt{ibmsp}: IBM SP machines\\
\texttt{irix}: SGI workstations\\
\texttt{origin}: SGI Origin 2000/3000\\
\texttt{pc\_abs}: Linux PCs, Absoft compiler\\
\texttt{pc\_ifc}: Linux PCs, Intel compiler\\
\texttt{pc\_lahey}: Linux PCs, Lahey compiler\\
\texttt{pc\_pgi}: Linux PCs, Portland compiler\\
\texttt{sun}: Sun workstations\\
\texttt{sunMPI}: Sun parallel machines\\
\texttt{sxcross}: NEC SX-6 (cross-compilation)
\end{quote}
\textbf{Please note:}
Most of these files are old and haven't been tested for a long time.
They may or may not work.
Copy \texttt{Make.}\emph{system} to \texttt{make.sys}. If you
have the Intel compiler \texttt{ifc} v.6 or earlier, you will have
to run the script \texttt{ifcmods.sh}. Finally, run
\texttt{makedeps.sh} to generate \texttt{*/make.depend} files.
Most probably (and even more so if there isn't an exact match to your
machine type), you'll have to tweak \texttt{make.sys} by hand.
In particular, you must specify the full list of libraries that
you intend to link to.
You'll also have to set the \texttt{MYLIB} variable to:
\begin{quote}
\texttt{blas\_and\_lapack} to compile BLAS and LAPACK from source;\\
\texttt{lapack\_mkl} to use the Intel MKL library;\\
\texttt{lapack\_essl} to use IBM ESSL libraries;\\
otherwise, leave it empty.
\end{quote}
\paragraph{Note for HP PA-RISC users:}
The Makefile for HP PA-RISC workstations and parallel machines is
based on a Makefile contributed by Sergei Lysenkov.
It assumes that you have HP compiler with MLIB libraries installed on
a machine running HP-UX.
\paragraph{Note for MS-Windows users:}
The Makefile for Windows PCs is based on a Makefile written for an
earlier version of PWscf (1.2.0), contributed by Lu Fu-Fa, CCIT,
Taiwan. You will need the Cygwin package. The provided Makefile
assumes that you have the Intel compiler with MKL libraries installed.
It is untested.
If you run into trouble, a possibility is to install Linux in
dual-boot mode. You need to create a partition for Linux,
install it, install a boot loader (LILO, GRUB). The latter step
is not needed if you boot from floppy or CD-ROM. In principle
one could avoid installation altogether using a distribution
like Knoppix that runs directly from CD-ROM, but for serious use
disk access is needed.
\subsection{Compile}
There are a few adjustable parameters in
\texttt{Modules/parameters.f90}.
The present values will work for most cases. All other variables are
dynamically allocated: you do not need to recompile your code for a
different system.
At your option, you may compile the complete Quantum-ESPRESSO suite of
programs (with \texttt{make all}), or only some specific programs.
\texttt{make} with no arguments yields a list of valid compilation
targets.
Here is a list:
\begin{itemize}
\item
\texttt{make pw} produces \texttt{PW/pw.x} and
\texttt{PW/memory.x}.
\texttt{pw.x} calculates electronic structure, structural
optimization, molecular dynamics, barriers with NEB.
\texttt{memory.x} is an auxiliary program that checks the input of
\texttt{pw.x} for correctness and yields a rough (under-) estimate
of the required memory.
\item
\texttt{make ph} produces \texttt{PH/ph.x}.
\texttt{ph.x} calculates phonon frequencies and displacement
patterns, dielectric tensors, effective charges (uses data
produced by \texttt{pw.x}).
\item
\texttt{make d3} produces \texttt{D3/d3.x}
\texttt{d3.x} calculates anharmonic phonon lifetimes (third-order
derivatives of the energy), using data produced by \texttt{pw.x}
and \texttt{ph.x} (Ultrasoft pseudopotentials not supported).
\item
\texttt{make gamma} produces \texttt{Gamma/phcg.x}.
\texttt{phcg.x} is a version of \texttt{ph.x} that calculates
phonons at $\mathbf{q}=0$ using conjugate-gradient minimization of
the density functional expanded to second-order.
Only the $\Gamma$ ($\mathbf{q}=0$) point is used for Brillouin
zone integration.
It is faster and takes less memory than \texttt{ph.x}, but does
not support Ultrasoft pseudopotentials.
% \item
% \texttt{make raman} produces \texttt{Raman/ram.x}.
%
% \texttt{ram.x} calculates nonresonant Raman tensor coefficients
% (derivatives of the polarizability wrt atomic displacements)
% using the $(2n+1)$ theorem.
\item
\texttt{make pp} produces several codes for data postprocessing, in
\texttt{PP/} (see list below).
\item
\texttt{make tools} produces several utility programs, mostly for
phonon calculations, in \texttt{pwtools/} (see list below).
\item
\texttt{make pwcond} produces \texttt{PWCOND/pwcond.x}, for
ballistic conductance calculations (experimental).
\item
\texttt{make pwall} produces all of the above.
\item
\texttt{make ld1} produces code \texttt{atomic/ld1.x} for
pseudopotential generationd (see the specific
documentation in \texttt{atomic\_doc/}).
\item
\texttt{make upf} produces utilities for pseudopotential
conversion in directory \texttt{upftools/} (see section
\ref{pseudopotentials}, ``Pseudopotentials'').
\item
\texttt{make cp} produces the Car-Parrinello code CP in
\texttt{CPV/cp.x}. and the postprocessing code
\texttt{CPV/cppp.x}.
\item
\texttt{make all} produces all of the above.
\end{itemize}
For the setup of the GUI, refer to the
\texttt{PWgui-}\emph{X.Y.Z}\texttt{/INSTALL} file, where \emph{X.Y.Z}
stands for the version number of the GUI (should be the same as the
general version number, currently \version).
If you are using the CVS-sources, see the \texttt{GUI/README}
file instead.
The codes for data postprocessing in \texttt{PP/} are:
\begin{itemize}
\item \texttt{pp.x} extracts the specified data from files
produced by \texttt{pw.x}, prepare data for plotting
by writing them into formats that can be read by
several plotting programs
\item \texttt{bands.x} extracts and reorders eigenvalues
from files produced by \texttt{pw.x} for band structure plotting
\item \texttt{projwfc.x} calculates projections of wavefunction
over atomic orbitals, performs L\"owdin population
analysis and calculates projected density of states.
These can be summed using auxiliary code \texttt{sumpdos.x}
\item \texttt{dipole.x} calculates the dipole moment for
isolated systems (molecules) and the Makov-Payne correction
for molecules in supercells (beware: meaningful results
only if the charge density is completely contained into
the Wigner-Seitz cell)
\item \texttt{plotrho.x} produces PostScript 2-d contour plots
\item \texttt{plotband.x} reads the output of \texttt{bands.x},
produces band structure PostScript plots
\item \texttt{average.x} calculates planar averages of quantities
produced by pp.x (potentials, charge, magnetization densities,...)
\item \texttt{voronoy.x} divides the charge density into Voronoy
polyhedra (obsolete, use at your own risk)
\item \texttt{dos.x} calculates electronic Density of States
(DOS)
\item \texttt{pw2wan.x}: interface with code WanT for calculation
of transport properties via Wannier (also known as Boyd)
functions: see\hfill\break
\htmladdnormallink%
{\texttt{http://www.wannier-transport.org/}}%
{http://www.wannier-transport.org/}
\item \texttt{pmw.x} generates Poor Man's Wannier functions,
to be used in LDA+U calculations
\item \texttt{pw2casino.x}: interface with CASINO code for Quantum
Monte Carlo calculation
(\htmladdnormallink%
{\texttt{http://www.tcm.phy.cam.ac.uk/\~{}mdt26/casino.html}}%
{http://www.tcm.phy.cam.ac.uk/~mdt26/casino.html}).
\end{itemize}
The utility programs in \texttt{pwtools/} are:
\begin{itemize}
\item \texttt{dynmat.x} applies various kinds of Acoustic Sum Rule
(ASR), calculates LO-TO splitting at $\mathbf{q}=0$ in
insulators, IR and Raman cross sections (if the coefficients
have been properly calculated), from the dynamical matrix
produced by \texttt{ph.x}
\item \texttt{q2r.x} calculates Interatomic Force Constants (IFC) in
real space from dynamical matrices produced by
\texttt{ph.x} on a regular \textbf{q}-grid
\item \texttt{matdyn.x} produces phonon frequencies at a generic
wave vector using the IFC file calculated by \texttt{q2r.x};
may also calculate phonon DOS
\item \texttt{fqha.x} for quasi-harmonic calculations
\item \texttt{lambda.x} calculates the electron-phonon coefficient
$\lambda$ and the function $\alpha^2F(\omega)$
\item \texttt{dist.x} calculates distances and angles between
atoms in a cell, taking into account periodicity
\item \texttt{ev.x} fits energy-vs-volume data to an equation of
state
\item \texttt{kpoints.x} produces lists of k-points
\item \texttt{pwi2xsf.sh}, \texttt{pwo2xsf.sh} process
respectively input and output files (not data files!) for
\texttt{pw.x} and produce an XSF-formatted file suitable
for plotting with XCrySDen, a powerful crystalline and
molecular structure visualization program
(\texttt{http://www.xcrysden.org/}).
BEWARE: the \texttt{pwi2xsf.sh} shell script requires the
\texttt{pwi2xsf.x} executables to be located somewhere in
your \texttt{\$PATH}.
\item \texttt{band\_plot.x}: undocumented and possibly obsolete
\item \texttt{bs.awk}, \texttt{mv.awk} are scripts that process
the output of \texttt{pw.x} (not data files!).
Usage:
\begin{verbatim}
awk -f bs.awk < my-pw-file > myfile.bs
awk -f mv.awk < my-pw-file > myfile.mv
\end{verbatim}
The files so produced are suitable for use with
\texttt{xbs}, a very simple X-windows utility to display
molecules, available at:\hfill\break
\htmladdnormallink%
{\texttt{http://www.ccl.net/cca/software/X-WINDOW/xbsa/README.shtml}}%
{http://www.ccl.net/cca/software/X-WINDOW/xbsa/README.shtml}
\item \texttt{path\_int.sh/path\_int.x}: utility to generate, starting
from a path (a set of images), a new one with a different number of
images. The initial and final points of the new path can differ
from those in the original one. Useful for NEB calculations.
\item \texttt{kvecs\_FS.x, bands\_FS.x}: utilities for Fermi Surface
plotting using XCrySDen
\end{itemize}
Other utilities:
\begin{itemize}
\item \texttt{VIB/} contains the sources of a frozen-phonon code,
using either \texttt{pw.x} or \texttt{cp.x} as computational
engine. Contributed by Silviu Zilberman (Princeton). Compile with
\texttt{make vib}, executables in \texttt{VIB/pwvib.x} and
\texttt{VIB/cpvib.x}, documentation in \texttt{Doc/INPUT\_CPVIB},
example in \texttt{examples/example32}.
\item \texttt{VdW/} contains the sources for the calculation of the
finite (imaginary) frequency molecular polarizability using the
approximated Thomas-Fermi + von Weiz\"acker scheme, contributed
by H.-V. Nguyen (Sissa and Hanoi University). Compile with
\texttt{make vdw}, executables in \texttt{VdW/vdw.x}, no
documentation yet, but an example in \texttt{examples/example34}.
\end{itemize}
\subsection{Run examples}
\label{runexamples}
As a final check that compilation was successful, you may want to run
some or all of the examples contained within the \texttt{examples}
directory of the Quantum-ESPRESSO distribution.
Those examples try to exercise all the programs and features of the
Quantum-ESPRESSO package. A list of examples and of what each example
does is contained in \texttt{examples/README}. For details, see the
\texttt{README} file in each example's directory.
If you find that any relevant feature isn't being tested, please
contact us (or even better, write and send us a new example
yourself!).
If you haven't downloaded the full Quantum-ESPRESSO distribution and don't
have the examples, you can get them from the Test and Examples Page of
the Quantum-ESPRESSO web site
(\htmladdnormallink{\texttt{http://www.pwscf.org/tests.htm}}%
{http://www.pwscf.org/tests.htm}).
The necessary pseudopotentials are included.
To run the examples, you should follow this procedure:
\begin{enumerate}
\item
Go to the \texttt{examples} directory and edit the
\texttt{environment\_variables} file, setting the following variables
as needed:
\begin{quote}
\texttt{BIN\_DIR=} directory where Quantum-ESPRESSO executables reside\\
\texttt{PSEUDO\_DIR=} directory where pseudopotential files reside\\
\texttt{TMP\_DIR=} directory to be used as temporary storage area
\end{quote}
If you have downloaded the full Quantum-ESPRESSO distribution, you may set
\texttt{BIN\_DIR=\$TOPDIR/bin} and
\texttt{PSEUDO\_DIR=\$TOPDIR/pseudo}, where \texttt{\$TOPDIR} is the
root of the Quantum-ESPRESSO source tree.
In order to be able to run all the examples, the \texttt{PSEUDO\_DIR}
directory must contain the following files:
\begin{quote}
\begin{flushleft}
%
% to regenerate this list:
% grep UPF */run_example | grep -v PSEUDO_LIST | grep -o "[^ ]*UPF" | \
% sed 's/_/\\_/g' | sort | uniq | awk '{print " \\texttt{" $0 "},"}'
%
\texttt{Al.vbc.UPF},
\texttt{As.gon.UPF},
\texttt{C.pz-rrkjus.UPF},
\texttt{Cu.pz-d-rrkjus.UPF},
\texttt{Fe.pz-nd-rrkjus.UPF},
\texttt{H.fpmd.UPF},
\texttt{H.vbc.UPF},
\texttt{N.BLYP.UPF},
\texttt{Ni.pbe-nd-rrkjus.UPF},
\texttt{NiUS.RRKJ3.UPF},
\texttt{O.BLYP.UPF},
\texttt{O.LDA.US.RRKJ3.UPF},
\texttt{O.pbe-rrkjus.UPF},
\texttt{O.vdb.UPF},
\texttt{OPBE\_nc.UPF},
\texttt{Pb.vdb.UPF},
\texttt{Ptrel.RRKJ3.UPF},
\texttt{Si.vbc.UPF},
\texttt{SiPBE\_nc.UPF},
\texttt{Ti.vdb.UPF}
\end{flushleft}
\end{quote}
%
If any of these are missing, you can download them (and many others) from the
Pseudopotentials Page of the Quantum-ESPRESSO web site
(\htmladdnormallink{\texttt{http://www.pwscf.org/pseudo.htm}}%
{http://www.pwscf.org/pseudo.htm}).
\texttt{TMP\_DIR} must be a directory you have read and write access
to, with enough available space to host the temporary files produced
by the example runs, and possibly offering high I/O performance (i.e.,
don't use an NFS-mounted directory).
\item
If you have compiled the parallel version of Quantum-ESPRESSO (this
is the default if parallel libraries are detected), you will usually
have to specify a driver program (such as
\texttt{poe} or \texttt{mpiexec}) and the number of processors: read
section \ref{runparallel}, ``Running on parallel machines'' for
details.
In order to do that, edit again the \texttt{environment\_variables}
file and set the \texttt{PARA\_PREFIX} and \texttt{PARA\_POSTFIX}
variables as needed.
Parallel executables will be run by a command like this:
\begin{verbatim}
$PARA_PREFIX pw.x $PARA_POSTFIX < file.in > file.out
\end{verbatim}
For example, if the command line is like this (as for an IBM SP4):
\begin{verbatim}
poe pw.x -procs 4 < file.in > file.out
\end{verbatim}
you should set \texttt{PARA\_PREFIX="poe"},
\texttt{PARA\_POSTFIX="-procs 4"}.
Furthermore, if your machine does not support interactive use, you
must run the commands specified below through the batch queueing
system installed on that machine.
Ask your system administrator for instructions.
\item
To run a single example, go to the corresponding directory (for
instance, \texttt{example/example01}) and execute:
\begin{verbatim}
./run_example
\end{verbatim}
This will create a subdirectory \texttt{results}, containing the input
and output files generated by the calculation.
Some examples take only a few seconds to run, while others may require
several minutes depending on your system.
To run all the examples in one go, execute:
\begin{verbatim}
./run_all_examples
\end{verbatim}
from the \texttt{examples} directory.
On a single-processor machine, this typically takes one to three
hours.
The \texttt{make\_clean} script cleans the examples tree, by removing
all the \texttt{results} subdirectories. However, if additional
subdirectories have been created, they aren't deleted.
\item
In each example's directory, the \texttt{reference} subdirectory
contains verified output files, that you can check your results
against.
They were generated on a Linux PC using the Intel compiler.
On different architectures the precise numbers could be slightly
different, in particular if different FFT dimensions are automatically
selected. For this reason, a plain \texttt{diff} of your results
against the reference data doesn't work, or at least, it requires
human inspection of the results.
Instead, you can run the \texttt{check\_example} script in the
\texttt{examples} directory:
\medskip
\quad\texttt{./check\_example} \emph{example\_dir}
\medskip
\noindent
where \emph{example\_dir} is the directory of the example that you
want to check (e.g., \texttt{./check\_example example01}).
You can specify multiple directories.
Note: at the moment \texttt{check\_example} is in early development
and (should be) guaranteed to work only on examples 01 to 04.
\end{enumerate}
\subsection{Installation Issues}
\label{installissues}
The main development platforms are IBM SP and Intel/AMD PC with Linux
and Intel compiler. For other machines, we rely on user's feedback.
\paragraph{All machines}
Working fortran-95 and C compilers are needed in order to compile
Quantum-ESPRESSO. Most so-called ``fortran-90'' compilers implement the
fortran-95 standard, but older versions may not be fortran-95
compliant.
If you get ``Compiler Internal Error'' or similar messages, try to
lower the optimization level, or to remove optimization, just for the
routine that has problems. If it doesn't work, or if you experience
weird problems, try to install patches for your version of the
compiler (most vendors release at least a few patches for free), or to
upgrade to a more recent version.
If you get an error in the loading phase that looks like ``ld: file
XYZ.o: unknown (unrecognized, invalid, wrong, missing, \dots) file
type'', or ``While processing relocatable file XYZ.o, no relocatable
objects were found'', one of the following things have happened:
\begin{enumerate}
\item you have leftover object files from a compilation with another
compiler: run \texttt{make clean} and recompile.
\item \texttt{make} does not stop at the first compilation error (it
happens with some compilers).
Remove file XYZ.o and look for the compilation error.
\end{enumerate}
If many symbols are missing in the loading phase, you did not specify
the location of all needed libraries (LAPACK, BLAS, FFTW,
machine-specific optimized libraries). If you did, but symbols are
still missing, see below (for Linux PC).
\paragraph{IBM AIX}
On some IBM machines running AIX, the command \texttt{/usr/bin/oslevel}
used by \texttt{configure} to get info about the type of system is not
executable to normal users. As a consequence \texttt{configure} stops.'
Complain with your system manager.
\paragraph{SGI machines with IRIX/MIPS compiler}
The script \texttt{moduldep.sh} used by \texttt{configure} doesn't
work properly on old SGI machines: some strings are truncated
(likely a IRIX weirdness). A workaround by Andrea Ferretti:
\htmladdnormallink%
{\texttt{http://www.democritos.it/pipermail/pw\_forum/2006-May/004200.html}}
{http://www.democritos.it/pipermail/pw\_forum/2006-May/004200.html}.
Many versions of the MIPS compiler yield compilation errors in
conjunction with with \texttt{FORALL} constructs. There is no
known solution other than editing the \texttt{FORALL} construct
that gives a problem, or to replace it with an equivalent
\texttt{DO...END DO} construct.
\paragraph{Linux Alphas with Compaq compiler}
If at linking stage you get error messages like: ``undefined reference
to `for\_check\_mult\_overflow64' '' with Compaq/HP fortran compiler
on Linux Alphas, check the following page:
\htmladdnormallink%
{\texttt{http://linux.iol.unh.edu/linux/fortran/faq/cfal-X1.0.2.html}}%
{http://linux.iol.unh.edu/linux/fortran/faq/cfal-X1.0.2.html}.
\paragraph{Linux PC}
The web site of Axel Kohlmeyer contains a very informative section
on compiling and running CPMD on Linux.
Most of its contents applies to the Quantum-ESPRESSO code as well:\hfill\break
\htmladdnormallink%
{\texttt{http://www.theochem.rub.de/\~{}axel.kohlmeyer/cpmd-linux.html}}%
{http://www.theochem.rub.de/~axel.kohlmeyer/cpmd-linux.html}.
It is convenient to create semi-statically linked executables
(with only libc/libm/libpthread linked dynamically). If you want
to produce a binary that runs on different machines, compile it
on the oldest machine you have (i.e. the one with the oldest verison
of the operating system).
Since there is no standard compiler for Linux, different compilers
have different ideas about the right way to call external libraries.
As a consequence you may have a mismatch between what your compiler
calls (``symbols") and the actual name of the required library call.
Use the \texttt{nm} command to determine the name of a library call,
as in the following examples:%
\begin{verbatim}
nm /usr/local/lib/libblas.a | grep T | grep -i daxpy
nm /usr/local/lib/liblapack.a | grep T | grep -i zhegv
\end{verbatim}
where typical location and name of libraries is assumed.
Most precompiled libraries have lowercase names with one or two
underscores (\_) appended. \texttt{configure} should select the
appropriate preprocessing options in \texttt{make.sys}, but in
case of trouble, be aware that:
\begin{itemize}
\item the Absoft compiler is case-sensitive (like C and unlike
other Fortran compilers) and does not add an underscore
to symbol names (note that if your libraries contain
uppercase or mixed case names, you are out of luck:
You must either recompile your own libraries, or change
the \texttt{\#define}'s in \texttt{include/f\_defs.h});
\item both Portland compiler (pgf90) and Intel compiler (ifort/ifc)
are case insensitive and add an underscore to symbol names.
\end{itemize}
With some precompiled lapack libraries, you may need to add
\texttt{-lg2c} or \texttt{-lm} or both.
\paragraph{Linux PCs with Portland Group compiler (pgf90)}
\hfill\break
Quantum-ESPRESSO does not work reliably, or not at all, with many
versions of the Portland Group compiler (in particular, v.5.2
and 6.0). Version 5.1 used to work, v.6.1 is reported to work
(info from Paolo Cazzato). Use the latest version of each release
of the compiler, with patches if available: see the Portland Group
web site,\hfill\break
\htmladdnormallink%
{\texttt{http://www.pgroup.com/faq/install.htm\#release\_info}}%
{http://www.pgroup.com/faq/install.htm\#release\_info}
\paragraph{Linux PCs with Pathscale compiler}
Versions 2.3 ad 2.4 of the Pathscale compiler crash when compiling
\texttt{CPV/phasefactors.f90}. Workaround: replace \texttt{SUM(na(1:nsp))}
with \texttt{nat} (info by Paolo Cazzato; fixed in version \version).
\paragraph{Linux PCs (Pentium) with Intel compiler (ifort, formerly
ifc)}
\hfill\break
If \texttt{configure} doesn't find the compiler, or if you get ``Error
loading shared libraries...'' at run time, you may have forgotten to
execute the script that sets up the correct path and library path.
Unless your system manager has done this for you, you should execute
the appropriate script --- located in the directory containing the
compiler executable --- in your initialization files.
Consult the documentation provided by Intel.
Starting from the latests v 8.1 patchlevels, the recommended way to
build semi-statically linked binaries is to use the \texttt{-i-static}
flag; for multi-threaded libraries the linker flag would be
\texttt{-i-static -openmp} (linking \texttt{libguide} is no longer
needed and the compiler will pick the correct one). For previous
versions, try \texttt{-static-libcxa} (this will
give an incomplete semi-static link on newer versions).
Each major release of the Intel compiler differs a lot from
the previous one. Do not mix compiled objects from different releases:
they are incompatible.
In case of trouble, update your version with the most recent
patches, available via Intel Premier support (registration free
of charge for Linux):
\htmladdnormallink%
{\texttt{http://developer.intel.com/software/products/support/\#premier}}%
{http://developer.intel.com/software/products/support/\#premier}.
\paragraph{ifort v.9}
The latest (July 2006) 32-bit version of ifort 9.1 works flawlessy.
Earlier versions yielded ``Compiler Internal Error''.
At least some versions of ifort 9.0 have a buggy preprocessor that
either prevents compilation of \texttt{iotk}, or produces runtime
errors in \texttt{cft3}. Update to a more patched version, or
modify \texttt{make.sys} to explicitly perform preprocessing
using \texttt{/lib/cpp}, as in the following example (courtesy
from Sergei Lysenkov):
\begin{verbatim}
.f90.o:
$(CPP) $(CPPFLAGS) $< -o $*.F90
$(MPIF90) $(F90FLAGS) -c $*.F90 -o $*.o
CPP = /lib/cpp
CPPFLAGS = -P -C -traditional $(DFLAGS) $(IFLAGS)
\end{verbatim}
On some versions of RedHat Linux, you may get an obscure error:
\texttt{IPO link: can not find "(" ... }, due to a bad system
configuration. Add option \texttt{-no-ipo} to \texttt{LDFLAGS}
in file \texttt{make.sys}.
\paragraph{ifort v.8}
Some releases of ifort 8 yield ``Compiler Internal Error''.
Update to a more patched version: 8.0.046 for v.~8.0,
8.1.018 for v.~8.1.
There is a well known problem with ifort 8 and pthreads
(that are used both in Debian Woody and Sarge) that causes
``segmentation fault" errors (info from Lucas Fernandez Seivane).
Version 7 did not have this problem.
\paragraph{ifc v.7}
Some releases of ifc 7.0 and 7.1 yield ``Compiler Internal
Error''. Update to the last version (should be 7.1.41).
Warnings ``size of symbol ... changed ...'' are produced by ifc 7.1 at
the loading stage.
These seem to be harmless, but they may cause the loader to stop,
depending on your system configuration.
If this happens and no executable is produced, add the following to
\texttt{LDFLAGS}: \texttt{-Xlinker --noinhibit-exec}.
Linux distributions using glibc 2.3 or later (such as e.g. RedHat 9)
may be incompatible with ifc 7.0 and 7.1.
The incompatibility shows up in the form of messages ``undefined
reference to `errno' '' at linking stage.
A workaround is available: see
\htmladdnormallink%
{\texttt{http://newweb.ices.utexas.edu/misc/ctype.c}}%
{http://newweb.ices.utexas.edu/misc/ctype.c}.
\paragraph{MKL}
On Intel CPUs, it is very convenient to use Intel MKL libraries.
If \texttt{configure} doesn't find them, try
\texttt{configure --enable-shared}.
MKL also contains optimized FFT routines, but they are
presently not supported: use FFTW instead. Note that ifort 8 fails
to load with MKL v.~5.2 or earlier versions,
because some symbols that are referenced by MKL are missing. There
is a fix for this (info from Konstantin Kudin): add libF90.a from
ifc 7.1 at the linking stage, as the last library.
Note that some combinations of not-so-recent versions of MKL
and ifc may yield a lot of ``undefined references" when statically
loaded: use \texttt{configure --enable-shared},
or remove the \texttt{-static} option in \texttt{make.sys}.
Note that \texttt{pwcond.x} works only with recent versions
(v.7 or later) of MKL.
When using/testing/benchmarking MKL on SMP (multiprocessor)
machines, one should set the environmental variable
\texttt{OMP\_NUM\_THREADS} to 1, unless the OpenMP
parallelization is desired (do not confuse OpenMP and OpenMPI!!!
they refer to different parallelization paradigms).
MKL by default sets the variable to the number of CPUs installed and
thus gives the impression of a much better performance, as the CPU time
is only measured for the master thread (info from Axel Kohlmeyer).
\paragraph{AMD CPUs, Intel Itanium}
AMD Athlon CPUs can be basically treated like Intel Pentium CPUs.
You can use the Intel compiler and MKL with Pentium-3 optimization.
Konstantin Kudin reports that the best results in terms of
performances are obtained with ATLAS optimized BLAS/LAPACK
libraries, using AMD Core Math Library (ACML) for the missing
libraries. ACML can be freely downloaded from AMD web site.
Beware: some versions of ACML -- i.e. the GCC version with SSE2 --
crash PWscf. The ``\_nosse2'' version appears to be stable.
Load first ATLAS, then ACML, then \texttt{-lg2c}, as in the
following example (replace what follows \texttt{-L} with
something appropriate to your configuration):
\begin{verbatim}
-L/location/of/fftw/lib/ -lfftw \
-L/location/of/atlas/lib -lf77blas -llapack -lcblas -latlas \
-L/location/of/gnu32_nosse2/lib -lacml -lg2c
\end{verbatim}
64-bit CPUs like the AMD Opteron and the Intel Itanium are
supported and should work both in 32-bit emulation and in
64-bit mode (in the latter case, \texttt{-D\_\_LINUX64} is
needed among the preprocessing flags). Both the Portland and the
Intel compiler (v8.1 EM64T-edition, available via Intel Premier
support) should work. 64-bit executables can address a
much larger memory space, but apparently they are not especially
faster than 32-bit executables. The Intel compiler has been
reported to be more reliable and to produce faster executables
wrt the Portland compiler. You may also try with g95.
\paragraph{Linux PC clusters with MPI}
PC clusters running some version of MPI are a very popular
computational platform nowadays. Quantum-ESPRESSO is known to work
with at least two of the major MPI implementations (MPICH, LAM-MPI),
plus with the newer OpenMPI implementation.
The number of possible configurations, in terms of type and version of
the MPI libraries, kernels, system libraries, compilers, is very large.
Quantum-ESPRESSO compiles and works on all non-buggy, properly configured
hardware and software combinations. You may have to recompile MPI
libraries in order
to be able to use them with the Intel compiler. See Axel Kohlmeyer's
web site for precompiled versions of the MPI libraries.
If Quantum-ESPRESSO does not work for some reason on a PC cluster, try first
if it works in serial execution. A frequent problem with parallel execution
is that Quantum-ESPRESSO does not read from standard input, due to a bad
configuration of MPI libraries: see section ``Running on parallel machines''.
If you get weird errors with LAM-MPI, add \texttt{-D\_\_LAM} to preprocessing
options and recompile. See also Axel Kohlmeyer's web site for more info.
If you are dissatisfied with the performances in parallel
execution, read the ``Parallelization issues'' section.
\paragraph{Mac OS X}
Compilation with \texttt{xlf} under Mac OSX 10.4 (``Tiger") may produce
the following linkage error:
\begin{verbatim}
ld: Undefined symbols:
_sprintf$LDBLStub
_fprintf$LDBLStub
_printf$LDBLStub
\end{verbatim}
Workaround: add \texttt{-lSystemStubs} to \texttt{LDFLAGS} in
\texttt{make.sys} (information by Fabrizio Cleri, May 2006).
Other workaround: Set gcc version to 3.3. This is done with the command
\begin{verbatim}
sudo gcc_select 3.3
\end{verbatim}
If you get the message ``Error trying to determine current cc version (got)"
change the order of directory in your \texttt{PATH} variable in order to make
\texttt{/opt/ibm/...} to appear at its end. The \texttt{xlc} alias to
\texttt{cc} will stop working, but as soon you have set gcc version,
you can change PATH to its normal directory order (information by Cesar
Da Silva, May 2006).
Because of an upgrade to a new release of GCC (4.0.1) with MacOSX 10.4.5,
the IBM fortran compiler does not work correctly with an error message
such as
\begin{verbatim}
/usr/bin/ld: warning -L: directory name
(/usr/lib/gcc/powerpc-apple-darwin8/4.0.0) does not exist
/usr/bin/ld: can't locate file for: -lgcc
\end{verbatim}
and fails to run configure properly. The easiest way to correct this bug
is to help the XLF compiler to find the correct location of gcc. Do the
following:
\begin{enumerate}
\item {\tt sudo mv /etc/opt/ibmcmp/xlf/8.1/xlf.cfg \\
/etc/opt/ibmcmp/xlf/8.1/xlf.cfg.2006.MM.DD.HH.MM.SS} \\
with MM.DD.HH.MM.SS is the current date (MM=month, DD=day etc...), then
\item {\tt
sudo /opt/ibmcmp/xlf/8.1/bin/xlf\_configure -gcc /usr -install -smprt
/opt/ibmcmp/xlsmp/1.4 -xlf /opt/ibmcmp/xlf/8.1 -xlfrt
/opt/ibmcmp/xlf/8.1 -xlflic /opt/ibmcmp/xlf/8.1 \\
/opt/ibmcmp/xlf/8.1/etc/xlf.base.cfg}
\end{enumerate}
replaces the xlf.cfg with the correct location (info by Pascal
Thibeadeau, April 2006).
The Absoft 9.1 compiler on Mac OS-X does not work (info by Axel
Kohlmeyer, June 2006).
\paragraph{T3E}
T3D/T3E is no longer supported since v.3.
\clearpage
\section{Running on parallel machines}
\label{runparallel}
Parallel execution is strongly system- and installation-dependent.
Typically one has to specify:
\begin{itemize}
\item a launcher program, such as \texttt{poe}, \texttt{mpirun}, or
\texttt{mpiexec};
\item the number of processors, typically as an option to the
launcher program, but in some cases \emph{after} the program
to be executed;
\item the program to be executed, with the proper path if needed:
for instance, \texttt{pw.x}, or \texttt{./pw.x}, or
\texttt{\$HOME/bin/pw.x}, or whatever applies;
\item the number of ``pools'' into which processors are to be
grouped (see section \ref{parissues}, ``Parallelization
Issues'', for an explanation of what a pool~is).
\end{itemize}
The last item is optional and is read by the code.
The first and second items are machine- and installation-dependent,
and may be different for interactive and batch execution.
\textbf{Please note:}
Your machine might be configured so as to disallow interactive
execution: if in doubt, ask your system administrator.
\bigskip
For illustration, here's how to run \texttt{pw.x} on 16 processors
partitioned into 8 pools (2 processors each), for several typical
cases.
For convenience, we also give the corresponding values of
\texttt{PARA\_PREFIX}, \texttt{PARA\_POSTFIX} to be used in running
the examples distributed with Quantum-ESPRESSO (see section \ref{runexamples},
``Run examples'').
\begin{description}
\item [IBM SP machines,] batch:
\begin{verbatim}
pw.x -npool 8 < input
PARA_PREFIX="", PARA_POSTFIX="-npool 8"
\end{verbatim}
This should also work interactively, with environment variables
\texttt{NPROC} set to 16, \texttt{MP\_HOSTFILE} set to the file
containing a list of processors.
\item [IBM SP machines,] interactive, using \texttt{poe}:
\begin{verbatim}
poe pw.x -procs 16 -npool 8 < input
PARA_PREFIX="poe", PARA_POSTFIX="-procs 16 -npool 8"
\end{verbatim}
\item [SGI Origin and PC clusters] using \texttt{mpirun}:
\begin{verbatim}
mpirun -np 16 pw.x -npool 8 < input
PARA_PREFIX="mpirun -np 16", PARA_POSTFIX="-npool 8"
\end{verbatim}
\item [PC clusters] using \texttt{mpiexec}:
\begin{verbatim}
mpiexec -n 16 pw.x -npool 8 < input
PARA_PREFIX="mpiexec -n 16", PARA_POSTFIX="-npool 8"
\end{verbatim}
\item [Cray T3E] (old):
\begin{verbatim}
mpprun -n 16 pw.x -npool 8 < input
PARA_PREFIX="mpprun -n 16", PARA_POSTFIX="-npool 8"
\end{verbatim}
\end{description}
Note that each processor writes its own set of temporary wavefunction
files during the calculation. If \texttt{wf\_collect=.true.} (in namelist
\texttt{control}), the final wavefunctions are collected into a single
directory, written by a single processor, whose format is independent
on the number of processors. If \texttt{wf\_collect=.false.} (this is the
default), the final wavefunctions are left on disk in the internal format
used by PWscf. The former case requires more disk I/O and disk space,
but produces portable data files; the latter case requires less I/O and
disk space, but the data so produced can be read only by a job running on
the same number of processors and pools, and if all files are on a
file system that is visible to all processors (i.e., you cannot use
local scratch directories: there is presently no way to ensure that
the distribution of processes on processors will follow the same
pattern for different jobs).
IMPORTANT: with the new file format (v.3.1 and later) all data
(except wavefunctions if \texttt{wf\_collect=.false.}) is written
to and read from a single directory \texttt{outdir/prefix.save}.
A copy of pseudopotential files is also written there. There is
however an inconsistency that cannot be quickly fixed: pseudopotential
files must be read by each processor, so if \texttt{outdir/prefix.save}
is not accessible by each processor, you will get an error message.
A workaround that doesn't require to copy everything is just copying
the pseudopotential files.
Some implementations of the MPI library may have problems with
input redirection in parallel.
If this happens, use the option \texttt{-in} (or \texttt{-inp} or
\texttt{-input}), followed by the input file name.
Example: \texttt{pw.x -in input -npool 4 > output}.
A bug in the \texttt{poe} environment of IBM sp5 machines
may cause a dramatic slowdown of quantum-espresso in parallel
execution. Workaround: set environment variable
\texttt{MP\_STDINMODE} to 0, as in
\begin{verbatim}
export MP_STDINMODE=0
\end{verbatim}
for sh/bash,
\begin{verbatim}
setenv MP_STDINMODE 0
\end{verbatim}
for csh/tcsh; or start the code with option \texttt{-stdinmode 0} to
\texttt{poe}:
\begin{verbatim}
poe -stdinmode 0 [options] [executable code] < input file
\end{verbatim}
Please note that all postprocessing codes \emph{not} reading data
files produced by \texttt{pw.x} --- that is,
\texttt{average.x}, \texttt{voronoy.x}, \texttt{dos.x} --- the
plotting codes \texttt{plotrho.x}, \texttt{plotband.x}, and all
executables in \texttt{pwtools/}, should be executed on just one
processor.
Unpredictable results may follow if those codes are run on more than
one processor.
\clearpage
\section{Pseudopotentials}
\label{pseudopotentials}
Currently PWscf and CP support both Ultrasoft (US) Vanderbilt
pseudopotentials (PPs) and Norm-Conserving (NC)
Hamann-Schl\"uter-Chiang PPs in separable Kleinman-Bylander form.
Note however that calculation of third-order derivatives is not (yet)
implemented with US PPs.
The Quantum-ESPRESSO package uses a unified pseudopotential format (UPF)
(\htmladdnormallink{\texttt{http://www.pwscf.org/format.htm}}%
{http://www.pwscf.org/format.htm})
for all types of PPs, but still accepts a number of other formats:
\begin{itemize}
\item the ``old PWscf'' format for NC PPs (PWscf only!),
\item the ``old CP'' format for NC PPs (CP only!),
\item the ``old FPMD'' format for NC PPs (CP only!),
\item the ``new PWscf'' format for NC and US PPs,
\item the ``Vanderbilt'' format (formatted, not binary) for NC and
US PPs.
\end{itemize}
See also
\htmladdnormallink{\texttt{http://www.pwscf.org/oldformat.htm}}%
{http://www.pwscf.org/oldformat.htm}.
A large collection of PPs (currently about 60 elements covered) can
be downloaded from the Pseudopotentials Page of the Quantum-ESPRESSO
web site
(\htmladdnormallink{\texttt{http://www.pwscf.org/pseudo.htm}}%
{http://www.pwscf.org/pseudo.htm}).
The naming convention for these PPs is explained in file
\texttt{Doc/nomefile.upf}.
If you do not find there the PP you need (because there is no PP for
the atom you need or you need a different exchange-correlation
functional or a different core-valence partition or for whatever
reason may apply), it may be taken, if available, from published
tables, such as e.g.:
\begin{itemize}
\item G.B. Bachelet, D.R. Hamann and M. Schl\"uter, Phys. Rev. B
\textbf{26}, 4199 (1982)
\item X. Gonze, R. Stumpf, and M. Scheffler, Phys. Rev. B
\textbf{44}, 8503 (1991)
\item S. Goedecker, M. Teter, and J. Hutter, Phys. Rev. B
\textbf{54}, 1703 (1996)
\end{itemize}
or otherwise it must be generated. Since version 2.1, Quantum-ESPRESSO
includes a PP generation package, in the
directory \texttt{atomic/} (sources) and \texttt{atomic\_doc/}
(documentation, tests and examples).
The package can generate both NC and US PPs in UPF format.
We refer to its documentation for instructions on how to generate PPs
with the \texttt{atomic/} code.
Other PP generation packages are available on-line:
\begin{itemize}
\item
David Vanderbilt's code (UltraSoft PPs):\hfill\break
\htmladdnormallink%
{\texttt{http://www.physics.rutgers.edu/\~{}dhv/uspp/index.html}}%
{http://www.physics.rutgers.edu/~dhv/uspp/index.html}
\item
Fritz Haber's code (Norm-Conserving PPs):\hfill\break
\htmladdnormallink%
{\texttt{http://www.fhi-berlin.mpg.de/th/fhi98md/fhi98PP}}%
{http://www.fhi-berlin.mpg.de/th/fhi98md/fhi98PP}
\item
Jos\'e-Lu\'\i{}s Martins' code (Norm-Conserving PPs):\hfill\break
\htmladdnormallink%
{\texttt{http://bohr.inesc-mn.pt/\~{}jlm/pseudo.html}}%
{http://bohr.inesc-mn.pt/~jlm/pseudo.html}
\end{itemize}
The first two codes produce PPs in UPF format, or in a format that
can be converted to unified format using the utilities of directory
\texttt{upftools/}.
Finally, other electronic-structure packages (CAMPOS, ABINIT)
provide tables of PPs that can be freely downloaded, but need
to be converted into a suitable format for use with Quantum-ESPRESSO.
Remember: \emph{always} test the PPs on simple test systems before
proceeding to serious calculations.
\clearpage
\section{Using PWscf}
Input files for the PWscf codes may be either written by hand (the
good old way), or produced via the ``PWgui'' graphical interface
by Anton Kokalj, included in the Quantum-ESPRESSO distribution.
See \texttt{PWgui-}\emph{x.y.z}\texttt{/INSTALL} (where \emph{x.y.z}
is the version number) for more info on PWgui, or \texttt{GUI/README}
if you are using CVS sources.
You may take the examples distributed with Quantum-ESPRESSO as templates for
writing your own input files: see section \ref{runexamples}, ``Run
examples''. In the following, whenever we mention ``Example N'', we
refer to those.
Input files are those in the \texttt{results} directories, with names
ending in \texttt{.in} (they'll appear after you've run the examples).
Note about exchange-correlation: the type of exchange-correlation used
in the calculation is read from PP files.
All PP's must have been generated using the same exchange-correlation.
\subsection{Electronic and ionic structure calculations}
Electronic and ionic structure calculations are performed by program
\texttt{pw.x}.
\subsubsection{Input data}
The input data is organized as several namelists, followed by other
fields introduced by keywords.
The namelists are
\begin{quote}
\texttt{\&CONTROL}: general variables controlling the run\\
\texttt{\&SYSTEM}: structural information on the system under
investigation\\
\texttt{\&ELECTRONS}: electronic variables: self-consistency,
smearing\\
\texttt{\&IONS} (optional): ionic variables: relaxation,
dynamics\\
\texttt{\&CELL} (optional): variable-cell dynamics\\
\texttt{\&PHONON} (optional): information required to produce
data for phonon calculations
\end{quote}
Optional namelist may be omitted if the calculation to be performed
does not require them.
This depends on the value of variable \texttt{calculation} in namelist
\texttt{\&CONTROL}.
Most variables in namelists have default values.
Only the following variables in \texttt{\&SYSTEM} must always be
specified:
\begin{quote}
\texttt{ibrav} (integer): bravais-lattice index\\
\texttt{celldm} (real, dimension 6): crystallographic constants\\
\texttt{nat} (integer): number of atoms in the unit cell\\
\texttt{ntyp} (integer): number of types of atoms in the unit cell\\
\texttt{ecutwfc} (real): kinetic energy cutoff (Ry) for
wavefunctions.
\end{quote}
For metallic systems, you have to specify how metallicity
is treated by setting variable \texttt{occupations}. If you choose
\texttt{occupations='smearing'}, you have to specify the
smearing width \texttt{degauss} and optionally the smearing
type \texttt{smearing}. If you choose \texttt{occupations='tetrahedra'},
you need to specify a suitable uniform k-point grid (card
\texttt{K\_POINTS} with option \texttt{automatic}).
Spin-polarized systems must be treated as metallic system,
except the special case of a single k-point, for which
occupation numbers can be fixed (\texttt{occupations='from\_input'}
and card \texttt{OCCUPATIONS}).
Explanations for the meaning of variables \texttt{ibrav} and
\texttt{celldm} are in file \texttt{INPUT\_PW}.
Please read them carefully.
There is a large number of other variables, having default values,
which may or may not fit your needs.
After the namelists, you have several fields introduced by keywords
with self-explanatory names:
\begin{quote}
\texttt{ATOMIC\_SPECIES}\\
\texttt{ATOMIC\_POSITIONS}\\
\texttt{K\_POINTS}\\
\texttt{CELL\_PARAMETERS} (optional)\\
\texttt{OCCUPATIONS} (optional) \\
\texttt{CLIMBING\_IMAGES} (optional)
\end{quote}
The keywords may be followed on the same line by an option.
Unknown fields (including some that are specific to CP code)
are ignored by PWscf.
See file \texttt{Doc/INPUT\_PW} for a detailed explanation of the
meaning and format of the various fields.
Note about k points:
The k-point grid can be either automatically generated or manually
provided as a list of k-points and a weight in the Irreducible
Brillouin Zone only of the \emph{Bravais lattice} of the crystal.
The code will generate (unless instructed not to do so: see variable
\texttt{nosym}) all required k-points and weights if the symmetry of
the system is lower than the symmetry of the Bravais lattice.
The automatic generation of k-points follows the convention of
Monkhorst and Pack.
\subsubsection{Typical cases}
We may distinguish the following typical cases for \texttt{pw.x}:
\begin{description}
\item [single-point (fixed-ion) SCF calculation.]
Set \texttt{calculation='scf'}.
Namelists \texttt{\&IONS} and \texttt{\&CELL} need not to be
present (this is the default). See Example 01.
\item [band structure calculation.]
First perform a SCF calculation as above; then do a non-SCF
calculation by specifying \texttt{calculation='bands'} or
\texttt{calculation='nscf'}, with the desired k-point grid
and number \texttt{nbnd} of bands.
If you are interested in calculating only the Kohn-Sham states
for the given set of k-points, use \texttt{calculation='bands'}.
If you are interested in further processing of the results of
non-SCF calculations (for instance, in DOS calculations) use
\texttt{calculations='nscf'}.
Specify \texttt{nosym=.true.} to avoid generation of additional
k-points in low symmetry cases. Variables \texttt{prefix} and
\texttt{outdir}, which determine the names of input or output
files, should be the same in the two runs. See Example~01.
\item [structural optimization.]
\hyphenation{name-list}
Specify \texttt{calculation='relax'} and add namelist \texttt{\&IONS}.
All options for a single SCF calculation apply, plus a few others.
You may follow a structural optimization with a non-SCF
band-structure calculation, but do not forget to update the input
ionic coordinates. See Example 03.
\item [molecular dynamics.]
Specify \texttt{calculation='md'} and time step \texttt{dt}.
Use variable \texttt{ion\_dynamics} in namelist \texttt{\&IONS}
for a fine-grained control of the kind of dynamics. Other options
for setting the initial temperature and for thermalization using
velocity rescaling are available. Remember: this is MD on the
electronic ground state, not Car-Parrinello MD. See Example 04.
\item [polarization via Berry Phase.]
See Example 10, its \texttt{README}, and the documentation in the
header of \texttt{PW/bp\_c\_phase.f90}.
\item [Nudged Elastic Band calculation.]
\hfill Specify \texttt{calculation='neb'} and add namelist
\texttt{\&IONS}.
All options for a single SCF calculation apply, plus a few others.
In the namelist \texttt{\&IONS} the number of images used to
discretize the elastic band must be specified. All other
variables have a default value. Coordinates of the initial and
final image of the elastic band have to be specified in the
\texttt{ATOMIC\_POSITIONS} card. A detailed description of all
input variables is contained in the file \texttt{Doc/INPUT\_PW}.
See also Example 17.
\end{description}
The output data files are written in the directory specified by
variable \texttt{outdir}, with names specified by variable
\texttt{prefix} (a string that is prepended to all file names,
whose default value is: \texttt{prefix='pwscf'}).
The execution stops if you create a file \texttt{prefix.EXIT} in the
working directory. Note that just killing the process may leave the
output files in an unusable state.
\subsection{Phonon calculations}
The phonon code \texttt{ph.x} calculates normal modes at a given
\textbf{q}-vector, starting from data files produced by \texttt{pw.x}.
If $\mathbf{q}=0$, the data files can be produced directly by a simple
SCF calculation.
For phonons at a generic \textbf{q}-vector, you need to perform first
a SCF calculation, then a band-structure calculation (see above)
with
\texttt{calculation = 'phonon'}, specifying the \textbf{q}-vector
in variable \texttt{xq} of namelist \texttt{\&PHONON}.
The output data file appear in the directory specified by variables
\texttt{outdir}, with names specified by variable \texttt{prefix}.
After the output file(s) has been produced (do not remove any of the
files, unless you know which are used and which are not), you can run
\texttt{ph.x}.
The first input line of \texttt{ph.x} is a job identifier.
At the second line the namelist \texttt{\&INPUTPH} starts.
The meaning of the variables in the namelist (most of them having a
default value) is described in file \texttt{INPUT\_PH}.
Variables \texttt{outdir} and \texttt{prefix} must be the same as in
the input data of \texttt{pw.x}.
Presently you must also specify \texttt{amass} (real, dimension
\texttt{ntyp}): the atomic mass of each atomic type.
After the namelist you must specify the \textbf{q}-vector of the
phonon mode.
This must be the same \textbf{q}-vector given in the input of
\texttt{pw.x}.
Notice that the dynamical matrix calculated by \texttt{ph.x}
at $\mathbf{q}=0$ does not contain the non-analytic term
occuring in polar materials, i.e. there is no LO-TO splitting
in insulators. Moreover no Acoustic Sum Rule (ASR) is applied.
In order to have the complete dynamical matrix at $\mathbf{q}=0$
including the non-analytic terms, you need to calculate effective
charges by specifying option \texttt{epsil=.true.} to \texttt{ph.x}.
Use program \texttt{dynmat.x} to calculate the correct LO-TO
splitting, IR cross sections, and to impose various forms
of ASR. If \texttt{ph.x} was instructed to calculate Raman
coefficients, \texttt{dynmat.x} will also calculate Raman cross
sections for a typical experimental setup.
A sample phonon calculation is performed in Example 02.
\subsubsection{Calculation of interatomic force constants in real
space}
First, dynamical matrices $D(\mathbf{q})$ are calculated and saved
for a suitable uniform grid of \textbf{q}-vectors (only those in the
Irreducible Brillouin Zone of the crystal are needed). Although
this can be done one \textbf{q}-vector at the time, a simpler procedure
is to specify variable \texttt{ldisp=.true.} and to set variables
\texttt{nq1,nq2,nq3} to some suitable Monkhorst-Pack grid, that
will be automatically generated, centered at $\mathbf{q}=0$.
Do not forget to specify \texttt{epsil=.true.} in the input data
of \texttt{ph.x} if you want the correct TO-LO splitting in
polar materials.
Second, code \texttt{q2r.x} reads the $D(\mathbf{q})$ dynamical
matrices produced in the preceding step and Fourier-transform them,
writing a file of Interatomic Force Constants in real space, up
to a distance that depends on the size of the grid of
\textbf{q}-vectors.
Program \texttt{matdyn.x} may be used to produce phonon modes and
frequencies at any \textbf{q} using the Interatomic Force Constants
file as input.
See Example 06.
\subsubsection{Calculation of electron-phonon interaction
coefficients}
The calculation of electron-phonon coefficients in metals is made
difficult by the slow convergence of the sum at the Fermi energy.
It is convenient to calculate phonons, for each \textbf{q}-vector of a
suitable grid, using a smaller k-point grid, saving the dynamical
matrix and the self-consistent first-order variation of the potential
(variable \texttt{fildvscf}).
Then a non-SCF calculation with a larger k-point grid is performed.
Finally the electron-phonon calculation is performed by specifying
\texttt{elph=.true.}, \texttt{trans=.false.}, and the input files
\texttt{fildvscf}, \texttt{fildyn}.
The electron-phonon coefficients are calculated using several values
of gaussian broadening (see \texttt{PH/elphon.f90}) because this
quickly shows whether results are converged or not with respect to the
k-point grid and Gaussian broadening. See Example 07.
All of the above must be repeated for all desired \textbf{q}-vectors
and the final result is summed over all \textbf{q}-vectors, using
\texttt{pwtools/lambda.x}. The input data for the latter is
described in the header of \texttt{pwtools/lambda.f90}.
\subsection{Post-processing}
There are a number of auxiliary codes performing postprocessing tasks
such as plotting, averaging, and so on, on the various quantities
calculated by \texttt{pw.x}.
Such quantities are saved by \texttt{pw.x} into the output data
file(s).
The main postprocessing code \texttt{pp.x} reads data file(s),
extracts or calculates the selected quantity, writes it into
a format that is suitable for plotting. Quantities that can
be read or calculated are:
\begin{quote}
charge density\\
spin polarization\\
various potentials\\
local density of states at $E_F$\\
local density of electronic entropy\\
STM images\\
wavefunction squared\\
electron localization function\\
planar averages\\
integrated local density of states
\end{quote}
Various types of plotting (along a line, on a plane, three-dimensional,
polar) and output formats (including the popular {\tt cube} format) can
be specified. The output files can be directly read by the free plotting
system Gnuplot (1D or 2D plots),
or by code \texttt{plotrho.x} that comes with PWscf (2D plots), or
by advanced plotting software XCrySDen and gOpenMol (3D plots).
See file \texttt{INPUT\_PP} for a detailed description of the input
for code \texttt{pp.x}.
See Example 05 for a charge density plot.
The postprocessing code \texttt{bands.x} reads data file(s), extracts
eigenvalues, regroups them into bands (the algorithm used to order
bands and to resolve crossings may not work in all circumstances,
though).
The output is written to a file in a simple format that can be
directly read by plotting program \texttt{plotband.x}.
Unpredictable plots may results if \textbf{k}-points are not in
sequence along lines.
See Example 05 for a simple band plot.
The postprocessing code \texttt{projwfc.x} calculates projections of
wavefunction over atomic orbitals.
The atomic wavefunctions are those contained in the pseudopotential
file(s).
The L\"owdin population analysis (similar to Mulliken analysis) is
presently implemented.
The projected DOS (PDOS, the DOS projected onto atomic orbitals) can
also be calculated and written to file(s).
More details on the input data are found in the header of file
\texttt{PP/projwfc.f90}. The auxiliary code \texttt{sumpdos.x}
(courtesy of Andrea Ferretti) can be used to sum selected PDOS,
by specifiying the names of files containing the desired PDOS.
Type \texttt{sumpdos.x -h} or look into the source code for
more details.
The total electronic DOS is instead calculated by code
\texttt{PP/dos.x}.
See Example 08 for total and projected electronic DOS calculations.
The postprocessing code \texttt{path\_int.x} is intended to be used in
the framework of NEB calculations.
It is a tool to generate a new path (what is actually generated is the
restart file) starting from an old one through interpolation (cubic
splines).
The new path can be discretized with a different number of images
(this is its main purpose), images are equispaced and the
interpolation can be also performed on a subsection of the old path.
The input file needed by \texttt{path\_int.x} can be easily set up
with the help of the self explanatory \texttt{path\_int.sh} shell
script.
\clearpage
\section{Using CP}
This section is intended to explain how to perform basic
Car-Parrinello (CP) simulations using the CP codes.
It is important to understand that a CP simulation is a sequence of
different runs, some of them used to ``prepare" the initial state
of the system, and other performed to collect statistics,
or to modify the state of the system itself, i.e. modify the temperature
or the pressure.
To prepare and run a CP simulation you should:
\begin{enumerate}
\item
define the system:
\begin{enumerate}
\item atomic positions
\item system cell
\item pseudopotentials
\item number of electrons and bands
\item cut-offs
\item FFT grids (CP code only)
\end{enumerate}
\item
The first run, when starting from scratch, is always an electronic
minimization, with fixed ions and cell, to bring the electronic
system on the ground state (GS) relative to the starting atomic
configuration.
Example of input file (Benzene Molecule):
\begin{verbatim}
&control
title = ' Benzene Molecule ',
calculation = 'cp',
restart_mode = 'from_scratch',
ndr = 51,
ndw = 51,
nstep = 100,
iprint = 10,
isave = 100,
tstress = .TRUE.,
tprnfor = .TRUE.,
dt = 5.0d0,
etot_conv_thr = 1.d-9,
ekin_conv_thr = 1.d-4,
prefix = 'c6h6'
pseudo_dir='/scratch/acv0/benzene/',
outdir='/scratch/acv0/benzene/Out/'
/
&system
ibrav = 14,
celldm(1) = 16.0,
celldm(2) = 1.0,
celldm(3) = 0.5,
celldm(4) = 0.0,
celldm(5) = 0.0,
celldm(6) = 0.0,
nat = 12,
ntyp = 2,
nbnd = 15,
nelec = 30,
ecutwfc = 40.0,
nr1b= 10, nr2b = 10, nr3b = 10,
xc_type = 'BLYP'
/
&electrons
emass = 400.d0,
emass_cutoff = 2.5d0,
electron_dynamics = 'sd',
/
&ions
ion_dynamics = 'none',
/
&cell
cell_dynamics = 'none',
press = 0.0d0,
/
ATOMIC_SPECIES
C 12.0d0 c_blyp_gia.pp
H 1.00d0 h.ps
ATOMIC_POSITIONS (bohr)
C 2.6 0.0 0.0
C 1.3 -1.3 0.0
C -1.3 -1.3 0.0
C -2.6 0.0 0.0
C -1.3 1.3 0.0
C 1.3 1.3 0.0
H 4.4 0.0 0.0
H 2.2 -2.2 0.0
H -2.2 -2.2 0.0
H -4.4 0.0 0.0
H -2.2 2.2 0.0
H 2.2 2.2 0.0
\end{verbatim}
You can find the description of the input variables in file
\texttt{INPUT\_CP} in the \texttt{Doc/}
directory. A short description of the logic behind the choice
of parameters in contained in \texttt{INPUT.HOWTO}
\item
Sometimes a single run is not enough to reach the GS.
In this case, you need to re-run the electronic minimization
stage.
Use the input of the first run, changing \texttt{restart\_mode =
'from\_scratch'} to \texttt{restart\_mode = 'restart'}.
Important: unless you are already experienced with the system you
are studying or with the code internals, usually you need to tune
some input parameters, like \texttt{emass}, \texttt{dt}, and
cut-offs.
For this purpose, a few trial runs could be useful: you can
perform short minimizations (say, 10 steps) changing and adjusting
these parameters to your need.
You could specify the degree of convergence with these two
thresholds:
\texttt{etot\_conv\_thr}: total energy difference between two
consecutive steps
\texttt{ekin\_conv\_thr}: value of the fictitious kinetic energy
of the electrons
Usually we consider the system on the GS when
\texttt{ekin\_conv\_thr}${} < \sim 10^{-5}$.
You could check the value of the fictitious kinetic energy on the
standard output (column EKINC).
Different strategies are available to minimize electrons, but the
most used ones are:
\begin{itemize}
\item
steepest descent:
\begin{verbatim}
electron_dynamics = 'sd'
\end{verbatim}
\item
damped dynamics:
\begin{verbatim}
electron_dynamics = 'damp',
electron_damping = 0.1,
\end{verbatim}
See input description to compute damping factor, usually the
value is between 0.1 and 0.5.
\end{itemize}
\item
Once your system is in the GS, depending on how you have prepared
the starting atomic configuration, you should do several things:
\begin{itemize}
\item
if you have set the atomic positions ``by hand'' and/or from a
classical code, check the forces on atoms, and if they are
large ($\sim 0.1 - 1.0$ atomic units), you should perform an
ionic minimization, otherwise the sistem could break-up during
the dynamics.
\item
if you have taken the positions from a previous run or a
previous ab-initio simulation, check the forces, and if they
are too small ($\sim 10^{-4}$ atomic units), this means that
atoms are already in equilibrium positions and, even if left
free, they will not move.
Then you need to randomize positions a little bit. see below.
\end{itemize}
\item
Minimize ionic positions.
As we pointed out in 4) if the interatomic forces are too high,
the system could ``explode" if we switch on the ionic dynamics.
To avoid that we need to relax the system.
Again there are different strategies to relax the system, but the
most used are again steepest descent or damped dynamics for ions
and electrons.
You could also mix electronic and ionic minimization scheme
freely, i.e. ions in steepest and electron in damping or vice
versa.
\begin{enumerate}
\item
suppose we want to perform a steepest for ions.
Then we should specify the following section for ions:
\begin{verbatim}
&ions
ion_dynamics = 'sd',
/
\end{verbatim}
Change also the ionic masses to accelerate the minimization:
\begin{verbatim}
ATOMIC_SPECIES
C 2.0d0 c_blyp_gia.pp
H 2.00d0 h.ps
\end{verbatim}
while leaving unchanged other input parameters.
Note that if the forces are really high ($> 1.0$ atomic
units), you should always use stepest descent for the first
relaxation steps ($\sim 100$).
\item
as the system approaches the equilibrium positions, the
steepest descent scheme slows down, so is better to switch to
damped dynamics:
\begin{verbatim}
&ions
ion_dynamics = 'damp',
ion_damping = 0.2,
ion_velocities = 'zero',
/
\end{verbatim}
A value of \texttt{ion\_damping} between 0.05 and 0.5 is
usually used for many systems.
It is also better to specify to restart with zero ionic and
electronic velocities, since we have changed the masses.
Change further the ionic masses to accelerate the
minimization:
\begin{verbatim}
ATOMIC_SPECIES
C 0.1d0 c_blyp_gia.pp
H 0.1d0 h.ps
\end{verbatim}
\item
when the system is really close to the equilibrium, the damped
dynamics slow down too, especially because, since we are
moving electron and ions together, the ionic forces are not
properly correct, then it is often better to perform a ionic
step every $N$ electronic steps, or to move ions only when
electron are in their GS (within the chosen threshold).
This can be specified adding, in the ionic section, the
\texttt{ion\_nstepe} parameter, then the ionic input section
become as follows:
\begin{verbatim}
&ions
ion_dynamics = 'damp',
ion_damping = 0.2,
ion_velocities = 'zero',
ion_nstepe = 10,
/
\end{verbatim}
Then we specify in the control input section:
\begin{verbatim}
etot_conv_thr = 1.d-6,
ekin_conv_thr = 1.d-5,
forc_conv_thr = 1.d-3
\end{verbatim}
As a result, the code checks every 10 electronic steps whether
the electronic system satisfies the two thresholds
\texttt{etot\_conv\_thr}, \texttt{ekin\_conv\_thr}: if it
does, the ions are advanced by one step.
The process thus continues until the forces become smaller
than \texttt{forc\_conv\_thr}.
Note that to fully relax the system you need many run, and
different strategies, that you shold mix and change in order
to speed-up the convergence.
The process is not automatic, but is strongly based on
experience, and trial and error.
Remember also that the convergence to the equilibrium
positions depends on the energy threshold for the electronic
GS, in fact correct forces (required to move ions toward the
minimum) are obtained only when electrons are in their GS.
Then a small threshold on forces could not be satisfied, if
you do not require an even smaller threshold on total energy.
\end{enumerate}
\item
randomization of positions.
If you have relaxed the system or if the starting system is
already in the equilibrium positions, then you need to move ions
from the equilibrium positions, otherwise they won't move in a
dynamics simulation.
After the randomization you should bring electrons on the GS
again, in order to start a dynamic with the correct forces and
with electrons in the GS.
Then you should switch off the ionic dynamics and activate the
randomization for each species, specifying the amplitude of the
randomization itself.
This could be done with the following ionic input section:
\begin{verbatim}
&ions
ion_dynamics = 'none',
tranp(1) = .TRUE.,
tranp(2) = .TRUE.,
amprp(1) = 0.01
amprp(2) = 0.01
/
\end{verbatim}
In this way a random displacement (of max 0.01 a.u.) is added to
atoms of specie 1 and 2.
All other input parameters could remain the same.
Note that the difference in the total energy (\texttt{etot})
between relaxed and randomized positions can be used to estimate
the temperature that will be reached by the system.
In fact, starting with zero ionic velocities, all the difference
is potential energy, but in a dynamics simulation, the energy will
be equipartitioned between kinetic and potential, then to estimate
the temperature take the difference in energy (de), convert it in
Kelvins, divide for the number of atoms and multiply by 2/3.
Randomization could be useful also while we are relaxing the
system, especially when we suspect that the ions are in a local
minimum or in an energy plateau.
\item
Start the Car-Parrinello dynamics.
At this point after having minimized the electrons, and with ions
displaced from their equilibrium positions, we are ready to start
a CP dynamics.
We need to specify \texttt{'verlet'} both in ionic and electronic
dynamics.
The threshold in control input section will be ignored, like any
parameter related to minimization strategy.
The first time we perform a CP run after a minimization, it is
always better to put velocities equal to zero, unless we have
velocities, from a previous simulation, to specify in the input
file.
Restore the proper masses for the ions.
In this way we will sample the microcanonical ensemble.
The input section changes as follow:
\begin{verbatim}
&electrons
emass = 400.d0,
emass_cutoff = 2.5d0,
electron_dynamics = 'verlet',
electron_velocities = 'zero',
/
&ions
ion_dynamics = 'verlet',
ion_velocities = 'zero',
/
ATOMIC_SPECIES
C 12.0d0 c_blyp_gia.pp
H 1.00d0 h.ps
\end{verbatim}
If you want to specify the initial velocities for ions, you have
to set \texttt{ion\_velocities = 'from\_input'}, and add the
\texttt{IONIC\_VELOCITIES}\break
card, with the list of velocities in atomic units.
IMPORTANT: in restarting the dynamics after the first CP run,
remember to remove or comment the velocities parameters:
\begin{verbatim}
&electrons
emass = 400.d0,
emass_cutoff = 2.5d0,
electron_dynamics = 'verlet',
! electron_velocities = 'zero',
/
&ions
ion_dynamics = 'verlet',
! ion_velocities = 'zero',
/
\end{verbatim}
otherwise you will quench the system interrupting the sampling of
the microcanonical ensemble.
\item
Changing the temperature of the system.
It is possible to change the temperature of the system or to
sample the canonical ensemble fixing the average temperature, this
is done using the Nos\`e thermostat.
To activate this thermostat for ions you have to specify in the
ions input section:
\begin{verbatim}
&ions
ion_dynamics = 'verlet',
ion_temperature = 'nose',
fnosep = 60.0,
tempw = 300.0,
! ion_velocities = 'zero',
/
\end{verbatim}
where \texttt{fnosep} is the frequency of the thermostat in THz,
this should be chosen to be comparable with the center of the
vibrational spectrum of the system, in order to excite as many
vibrational modes as possible.
\texttt{tempw} is the desired average temperature in Kelvin.
It is possible to specify also the thermostat for the electrons,
this is usually activated in metal or in system where we have a
transfer of energy between ionic and electronic degrees of
freedom. Beware: the usage of electronic thermostats is quite
delicate. The following information comes from K. Kudin:
{\em The main issue is that there is usually some ``natural" fictitious
kinetic energy that electrons gain from the ionic motion (``drag"). One
could easily quantify how much of the fictitious energy comes from this
drag by doing a CP run, then a couple of CG (same as BO) steps, and
then going back to CP. The fictitious electronic energy at the last CP
restart will be purely due to the drag effect.
The thermostat on electrons will either try to overexcite the
otherwise ``cold" electrons, or, will try to take them down to an
unnaturally cold state where their fictitious kinetic energy is even
below what would be just due pure drag. Neither of this is good.
I think the only workable regime with an electronic thermostat is a
mild overexcitation of the electrons, however, to do this one will need
to know rather precisely what is the fictititious kinetic energy due to
the drag.}
\end{enumerate}
\clearpage
\section{Performance issues (PWscf)}
\label{performance}
\subsection{CPU time requirements}
The following holds for code {\tt pw.x} and for non-US PPs.
For US PPs there are additional terms to be calculated.
For phonon calculations, each of the $3 N_{at}$ modes requires a CPU
time of the same order of that required by a self-consistent
calculation in the same system.
The computer time required for the self-consistent solution at fixed
ionic positions, $T_{scf}$, is:
$$
T_{scf} = N_{iter} \cdot T_{iter} + T_{init}
$$
where $N_{iter}=\mathtt{niter}=$ number of self-consistency
iterations, $T_{iter}=$ CPU time for a single iteration,
$T_{sub}=$ initialization time for a single iteration.
Usually $T_{init} << N_{iter} \cdot T_{iter}$.
The time required for a single self-consistency iteration
$T_{iter}$ is:
$$
T_{iter} = N_k \cdot T_{diag} + T_{rho} + T_{scf}
$$
where $N_k=$ number of k-points, $T_{diag}=$ CPU time per hamiltonian
iterative diagonalization, $T_{rho}=$ CPU time for charge density
calculation, $T_{scf}=$ CPU time for Hartree and exchange-correlation
potential calculation.
The time for a Hamiltonian iterative diagonalization $T_{diag}$ is:
$$
T_{diag} = N_h \cdot T_h + T_{orth} + T_{sub}
$$
where $N_h=$ number of $H\psi$ products needed by iterative
diagonalization, $T_h=$ CPU time per $H\psi$ product, $T_{orth}=$ CPU
time for orthonormalization, $T_{sub}=$ CPU time for subspace
diagonalization.
The time $T_h$ required for a $H\psi$ product is
$$
T_h = a_1 \cdot M \cdot N
+ a_2 \cdot M \cdot N_1 \cdot N_2 \cdot N_3 \cdot
\log(N_1 \cdot N_2 \cdot N_3)
+ a_3 \cdot M \cdot P \cdot N.
$$
The first term comes from the kinetic term and is usually much smaller
than the others.
The second and third terms come respectively from local and nonlocal
potential.
$a_1$, $a_2$, $a_3$ are prefactors, $M=$ number of valence bands,
$N=$ number of plane waves (basis set dimension),
$N_1$, $N_2$, $N_3=$ dimensions of the FFT grid for wavefunctions
($N_1 \cdot N_2 \cdot N_3 \sim 8N$), $P=$ number of projectors for PPs
(summed on all atoms, on all values of the angular momentum $l$, and
$m=1,\dots,2l+1$)
The time $T_{orth}$ required by orthonormalization is
$$
T_{orth}=b_1*M_x^2*N
$$
and the time $T_{sub}$ required by subspace diagonalization is
$$
T_{sub}=b_2*M_x^3
$$
where $b_1$ and $b_2$ are prefactors, $M_x=$ number of trial
wavefunctions (this will vary between $M$ and a few times $M$,
depending on the algorithm).
The time $T_{rho}$ for the calculation of charge density from
wavefunctions is
$$
T_{rho} = c_1 \cdot M \cdot Nr_1 \cdot Nr_2 \cdot Nr_3 \cdot
\log(Nr_1 \cdot Nr_2 \cdot Nr_3)
+ c_2 \cdot M \cdot Nr_1 \cdot Nr_2 \cdot Nr_3 + T_{us}
$$
where $c_1$, $c_2$, $c_3$ are prefactors,
$Nr_1$, $Nr_2$, $Nr_3=$ dimensions of the FFT grid for charge density
($Nr_1 \cdot Nr_2 \cdot Nr_3 \sim 8N_g$, where $N_g=$ number of
G-vectors for the charge density), and $T_{us}=$ CPU time required by
ultrasoft contribution (if any).
The time $T_{scf}$ for calculation of potential from charge density is
$$
T_{scf} = d_2 \cdot Nr_1 \cdot Nr_2 \cdot Nr_3 + d_3 \cdot
Nr_1 \cdot Nr_2 \cdot Nr_3 \cdot
\log(Nr_1 \cdot Nr_2 \cdot Nr_3)
$$
where $d_1$, $d_2$ are prefactors.
\subsection{Memory requirements}
A typical self-consistency or molecular-dynamics run requires
a maximum memory in the order
of $O$ double precision complex numbers, where
$$
O = m \cdot M \cdot N + P \cdot N + p \cdot N_1 \cdot N_2 \cdot N_3
+ q \cdot Nr_1 \cdot Nr_2 \cdot Nr_3
$$
with $m$, $p$, $q=$ small factors; all other variables have the same
meaning as above.
Note that if the $\Gamma$-point only ($\mathbf{q}=0$) is used to
sample the Brillouin Zone, the value of $N$ will be cut into half.
Code \texttt{memory.x} yields a rough estimate of the memory required
by \texttt{pw.x} and checks for the validity of the input data file as
well. Use it exactly as \texttt{pw.x}.
The memory required by the phonon code follows the same patterns,
with somewhat larger factors $m$, $p$, $q$.
\subsection{File space requirements}
A typical \texttt{pw.x} run will require an amount of temporary disk
space in the order of $O$ double precision complex numbers:
$$
O = N_k \cdot M \cdot N + q \cdot Nr_1 \cdot Nr_2 \cdot Nr_3
$$
where $q=2 \cdot \mathtt{mixing\_ndim}$ (number of iterations used in
self-consistency, default value $=8$) if \texttt{disk\_io} is set to
\texttt{'high'} or not specified;
$q=0$ if \texttt{disk\_io='low'} or \texttt{'minimal'}.
\subsection{Parallelization issues}
\label{parissues}
\texttt{pw.x} can run in principle on any number of processors (up to
\texttt{maxproc}, presently fixed at 128 in \texttt{PW/para.f90}).
The $N_p$ processors can be divided into $N_{pk}$ pools of $N_{pr}$
processors, $N_p=N_{pk}*N_{pr}$.
The k-points are divided across $N_{pk}$ pools (``k-point
parallelization''), while both R- and G-space grids are divided across
the $N_{pr}$ processors of each pool (``PW parallelization'').
A third level of parallelization, on the number of bands, is
currently confined to the calculation of a few quantities that
would not be parallelized at all otherwise.
A fourth level of parallelization, on the number of NEB images,
is available for NEB calculation only.
The effectiveness of parallelization depends on the size and type of
the system and on a judicious choice of the $N_{pk}$ and $N_{pr}$:
\begin{itemize}
\item
k-point parallelization is very effective if $N_{pk}$ is a divisor
of the number of k-points (linear speedup guaranteed), \emph{but}
it does not reduce the amount of memory per processor taken by the
calculation.
As a consequence, large systems may not fit into memory.
The same applies to parallelization over NEB images.
\item
PW parallelization works well if $N_{pr}$ is a divisor of both
dimensions along the $z$ axis of the FFT grids, $N_3$ and $Nr_3$
(which may coincide).
It does not scale so well as k-point parallelization, but it
reduces both CPU time AND memory (the latter almost linearly).
\item
Optimal serial performances are achieved when the data are as much
as possible kept into the cache.
As a side effect, one can achieve better than linear scaling with
the number of processors, thanks to the increase in serial speed
coming from the reduction of data size (making it easier for the
machine to keep data in the cache).
\end{itemize}
Note that for each system there is an optimal range of number of
processors on which to run the job.
A too large number of processors will yield performance degradation,
or may cause the parallelization algorithm to fail in distributing
properly R- and G-space grids.
Actual parallel performances will also depend a lot on the available
software (MPI libraries) and on the available communication hardware.
For Beowulf-style machines (clusters of PC) the newest version 1.1
of the OpenMPI libraries (\htmladdnormallink{\texttt{http://www.openmpi.org/}}%
{http://www.openmpi.org/}) seems to yield better performances
than other implementations (info by Kostantin Kudin).
Note however that you need a decent communication hardware (at least
Gigabit ethernet) in order to have acceptable performances with PW
parallelization.
Do not expect good scaling with cheap hardware: plane-wave
calculations are by no means an ``embarrassing parallel" problem.
Also note that multiprocessor motherboards for Intel Pentium CPUs
typically have just one memory bus for all processors. This dramatically
slows down any code doing massive access to memory (as most codes in the
Quantum-ESPRESSO package do) that runs on processors of the same motherboard.
\clearpage
\section{Troubleshooting (PWscf)}
Almost all problems in PWscf arise from incorrect input data and
result in error stops. Error messages should be self-explanatory,
but unfortunately this is not always true. If the code issues a
warning messages and continues, pay attention to it but do not
assume that something is necessarily wrong in your calculation:
most warning messages signal harmless problems.
Typical \texttt{pw.x} and/or \texttt{ph.x} (mis-)behavior:
\paragraph{\texttt{pw.x} yields a message like ``error while loading
shared libraries: \dots{} cannot open shared object file''
and does not start.}
Possible reasons:
\begin{itemize}
\item
If you are running on the same machines on which the code was
compiled, this is a library configuration problem.
The solution is machine-dependent.
On Linux, find the path to the missing libraries; then either add
it to file \texttt{/etc/ld.so.conf} and run \texttt{ldconfig}
(must be done as root), or add it to variable
\texttt{LD\_LIBRARY\_PATH} and export it.
Another possibility is to load non-shared version of libraries
(ending with \texttt{.a}) instead of shared ones (ending with
\texttt{.so}).
\item
If you are \emph{not} running on the same machines on which the
code was compiled: you need either to have the same shared
libraries installed on both machines, or to load statically all
libraries (using appropriate \texttt{configure} or loader options).
The same applies to Beowulf-style parallel machines: the needed
shared libraries must be present on all PC's.
\end{itemize}
\paragraph{errors in examples with parallel execution}
If you get error messages in the example scripts -- i.e. not errors
in the codes -- on a parallel machine, such as e.g. :
``\texttt{run\_example: -n: command not found}''
you have forgotten the `''` in the definitions of
\texttt{PARA\_PREFIX} and \texttt{PARA\_POSTFIX}.
\paragraph{\texttt{pw.x} prints the first few lines and then nothing
happens (parallel execution).}
If the code looks like it is not reading from input, maybe it isn't:
the MPI libraries need to be properly configured to accept input
redirection. See section ``Running on parallel machines'', or inquire
with your local computer wizard (if any).
\paragraph{\texttt{pw.x} stops with error in reading.}
There is an error in the input data.
Usually it is a misspelled namelist variable, or an empty input file.
Note that out-of-bound indices in dimensioned variables read in the
namelist may cause the code to crash with really mysterious error
messages.
Also note that input data files containing \texttt{\^{}M} (Control-M)
characters at the end of lines (typically, files coming from Windows
PC) may yield error in reading.
If none of the above applies and the code stops at the first namelist
(``control'') and you are running in parallel: your MPI libraries
might not be properly configured to allow input redirection, so that
what you are effectively reading is an empty file.
See section ``Running on parallel machines'', or inquire with your
local computer wizard (if any).
\paragraph{\texttt{pw.x} mumbles something like ``cannot recover'' or
``error reading recover file''.}
You are trying to restart from a previous job that either produced
corrupted files, or did not do what you think it did. No luck:
you have to restart from scratch.
\paragraph{\texttt{pw.x} stops with ``inconsistent DFT'' error.}
As a rule, the flavor of DFT used in the calculation should be the
same as the one used in the generation of PP's, and all PP's should
be generated using the same flavor of DFT. This is actually enforced:
the type of DFT is read from PP files and it is checked that the same
DFT is read from all PP's. If this does not hold, the code stops with
the above error message.
If you really want to use PP's generated with different DFT, or
to perform a calculation with a DFT that differs from what used in
PP generation, change the appropriate field in the PP file(s), at
your own risk.
\paragraph{\texttt{pw.x} stops with error in cdiaghg or rdiaghg.}
Possible reasons for such behavior are not always clear, but they
typically fall into one of the following cases:
\begin{itemize}
\item
serious error in data, such as bad atomic positions or bad crystal
structure/supercell;
\item
a bad PP, typicall with a ghost, but also a US-PP with non-positive
charge density, leading to a violation of positiveness of the S
matrix appearing in the US-PP formalism;
\item
a failure of the algorithm performing subspace diagonalization.
The LAPACK algorithms used by cdiaghg/rdiaghg are very robust
and extensively tested. Still, it may seldom happen that such
algorithms fail. Try to use conjugate-gradient diagonalization
(\texttt{diagonalization='cg'}), a slower but very robust
algorithm, and see what happens.
\item
buggy libraries. Machine-optimized mathematical libraries are
very fast but sometimes not so robust from a numerical point
of view. Suspicious behavior: you get an error that is not
reproducible on other architectures or that disappears if the
calculation is repeated with even minimal changes in parameters.
One known case: HP-Compaq alphas with \texttt{cxml} libraries.
Try to use compiled BLAS and LAPACK (or better, ATLAS) instead of
machine-optimized libraries.
\end{itemize}
\paragraph{\texttt{pw.x} crashes with ``floating invalid'' or
``floating divide by zero''.}
If this happens on HP-Compaq True64 Alpha machines with an old
version of the compiler: the compiler is most likely buggy.
Otherwise, move to next item.
\paragraph{\texttt{pw.x} crashes with no error message at all.}
This happens quite often in parallel execution, or under a batch
queue, or if you are writing the output to a file.
When the program crashes, part of the output, including the error
message, may be lost, or hidden into error files where nobody looks
into.
It is the fault of the operating system, not of the code.
Try to run interactively and to write to the screen.
If this doesn't help, move to next point.
\paragraph{\texttt{pw.x} crashes with ``segmentation fault'' or
similarly obscure messages.}
Possible reasons:
\begin{itemize}
\item
too much RAM memory requested (see next item).
\item
if you are using highly optimized mathematical libraries, verify
that they are designed for your hardware.
In particular, for Intel compiler and MKL libraries, verify that
you loaded the correct set of CPU-specific MKL libraries.
\item
buggy compiler.
If you are using Portland or Intel compilers on Linux PC's or
clusters, see section \ref{installissues}, ``Installation
issues''.
\end{itemize}
\paragraph{\texttt{pw.x} works for simple systems, but not for large
systems or whenever more RAM is needed.}
Possible solutions:
\begin{itemize}
\item
increase the amount of RAM you are authorized to use (which may be
much smaller than the available RAM).
Ask your system administrator if you don't know what to do.
\item
reduce \texttt{nbnd} to the strict minimum, or reduce the cutoffs,
or the cell size.
\item
use conjugate-gradient (\texttt{diagonalization='cg'}: slow
but very robust): it requires less memory than the default
Davidson algorithm.
\item
in parallel execution, use more processors, or use the same number
of processors with less pools.
Remember that parallelization with respect to k-points (pools)
does not distribute memory: parallelization with respect to
\textbf{R}- (and \textbf{G}-) space does.
\item
IBM only (32-bit machines): if you need more than 256 MB you must
specify it at link time (option \texttt{-bmaxdata}).
\item
buggy or weird-behaving compiler.
Some versions of the Portland and Intel compilers on Linux PC's
or clusters have this problem. For Intel ifort 8.1, the problem
seems to be due to the allocation of large automatic arrays
that exceeds the available stack. Increasing the stack size
(with commands \texttt{limits} or \texttt{ulimit}) may (or may
not) solve the problem. In particular, if you try to run \texttt{ph.x}
on a PC with ifort you will get segmentation faults unless you run
small systems.
It is a compiler problem and the only solution is to reduce the
size of arrays if you can (for instance by running in parallel
or on more processors) or to find a different machine or compiler.
\end{itemize}
\paragraph{\texttt{pw.x} crashes in parallel execution with an obscure
message related to MPI errors.}
With LAM-MPI, add \texttt{-D\_\_LAM} to preprocessing options in
\texttt{make.sys} and recompile.
See info from Axel Kohlmeyer:\hfill\break
\htmladdnormallink%
{{\small\texttt{http://www.democritos.it/pipermail/pw\_forum/2005-April/002338.html}}}%
{http://www.democritos.it/pipermail/pw_forum/2005-April/002338.html}
Random crashes due to MPI errors have often been reported in Linux PC
clusters. We cannot rule out the possibility that bugs in Quantum-ESPRESSO
cause such behavior, but we are quite confident that the likely explanation
is a hardware problem (defective RAM for instance) or a software bug (in MPI
libraries, compiler, operating system).
\paragraph{\texttt{pw.x} runs but nothing happens.}
Possible reasons:
\begin{itemize}
\item
in parallel execution, the code died on just one processor.
Unpredictable behavior may follow.
\item
in serial execution, the code encountered a floating-point error
and goes on producing NaN's (Not a Number) forever unless
exception handling is on (and usually it isn't).
In both cases, look for one of the reasons given above.
\item
maybe your calculation will take more time than you expect.
\end{itemize}
\paragraph{\texttt{pw.x} yields weird results.}
Possible solutions:
\begin{itemize}
\item
if this happens after a change in the code or in compilation or
preprocessing options, try \texttt{make clean} and recompile.
The \texttt{make} command should take care of all dependencies,
but do not rely too heavily on it.
If the problem persists, \texttt{make clean} and recompile with
reduced optimization level.
\item
maybe your input data are weird.
\end{itemize}
\paragraph{\texttt{pw.x} stops with error message ``the system is
metallic, specify occupations''.}
You did not specify state occupations, but you need to, since your
system appears to have an odd number of electrons.
The variable controlling how metallicity is treated is
\texttt{occupations} in namelist \texttt{\&SYSTEM}.
The default, \texttt{occupations='fixed'}, occupies the lowest
\texttt{nelec/2} states and works only for insulators with a gap.
In all other cases, use \texttt{'smearing'} or \texttt{'tetrahedra'}.
See file \texttt{INPUT\_PW} for more details.
\paragraph{\texttt{pw.x} stops with ``internal error: cannot braket Ef'' in
\texttt{efermig}.}
Possible reasons:
\begin{itemize}
\item
serious error in data, such as bad number of electrons,
insufficient number of bands, absurd value of broadening;
\item
the Fermi energy is found by bisection assuming that the
integrated DOS $N(E)$ is an increasing function of the energy.
This is {\em not} guaranteed for Methfessel-Paxton smearing of
order 1 and can give problems when very few k-points are used.
Use some other smearing function: simple Gaussian broadening or,
better, Marzari-Vanderbilt ``cold smearing''.
\end{itemize}
\paragraph{\texttt{pw.x} yields ``internal error: cannot braket Ef'' message
in \texttt{efermit}, then stops because ``charge is incorrect''.}
There is either a serious error in data (bad number of electrons,
insufficient number of bands), or too few tetrahedra (i.e. k-points).
The tetrahedron method may become unstable in the latter case, especially
if the bands are very narrow. Remember that tetrahedra should be used only
in conjunction with uniform k-point grids.
\paragraph{\texttt{pw.x} yields ``internal error: cannot braket Ef'' message
in \texttt{efermit} but doesn't stop.}
This may happen under special circumstances when you are calculating the band
structure for selected high-symmetry lines. The message signals that
occupations and Fermi energy are not correct (but eigenvalues and eigenvectors
are). Remove \texttt{occupations='tetrahedra'} in the input data to get rid of
the message.
\paragraph{in parallel execution, \texttt{pw.x} stops complaining that
``some processors have no planes'' or ``smooth planes'' or
some other strange error.}
Your system does not require that many processors: reduce the number
of processors to a more sensible value.
In particular, both $N_3$ and $Nr_3$ must be $\geq N_{pr}$ (see
section \ref{performance}, ``Performance Issues'', and in particular
section \ref{parissues}, ``Parallelization issues'', for the meaning
of these variables).
\paragraph{the FFT grids in \texttt{pw.x} are machine-dependent.}
Yes, they are!
The code automatically chooses the smallest grid that is compatible
with the specified cutoff in the specified cell, \emph{and} is an
allowed value for the FFT library used.
Most FFT libraries are implemented, or perform well, only with
dimensions that factors into products of small numers (2, 3, 5
typically, sometimes 7 and 11).
Different FFT libraries follow different rules and thus different
dimensions can result for the same system on different machines (or
even on the same machine, with a different FFT).
See function \texttt{allowed} in \texttt{Modules/fft\_scalar.f90}.
As a consequence, the energy may be slightly different on different
machines.
The only piece that depends explicitely on the grid parameters is the
XC part of the energy that is computed numerically on the grid.
The differences should be small, though, expecially for LDA
calculations.
Manually setting the FFT grids to a desired value is possible, but
slightly tricky, using input variables \texttt{nr1, nr2, nr3} and
\texttt{nr1s, nr2s, nr3s}.
The code will still increase them if not acceptable.
Automatic FFT grid dimensions are slightly overestimated, so one may
try --- very carefully --- to reduce them a little bit.
The code will stop if too small values are required, it will waste CPU
time and memory for too large values.
Note that in parallel execution, it is very convenient to have FFT
grid dimensions along $z$ that are a multiple of the number of
processors.
\paragraph{``warning: symmetry operation \# N not allowed''.}
This is not an error.
\texttt{pw.x} determines first the symmetry operations (rotations)
of the Bravais lattice; then checks which of these are symmetry
operations of the system (including if needed fractional
translations).
This is done by rotating (and translating if needed) the atoms in
the unit cell and verifying if the rotated unit cell coincides
with the original one.
If a symmetry operation contains a
fractional translation that is incompatible with the FFT grid,
it is discarded in order to prevent problems with symmetrization.
Typical fractional translations are 1/2 or 1/3 of a lattice
vector. If the FFT grid dimension along that direction is not
divisible respectively by 2 or by 3, the symmetry operation will
not transform the FFT grid into itself.
\paragraph{\texttt{pw.x} doesn't find all the symmetries you
expected.}
See above to learn how PWscf finds symmetry operations.
Some of them might be missing because:
\begin{itemize}
\item
the number of significant figures in the atomic positions is not
large enough.
In file \texttt{PW/eqvect.f90}, the variable \texttt{accep} is
used to decide whether a rotation is a symmetry operation.
Its current value ($10^{-5}$) is quite strict: a rotated atom must
coincide with another atom to 5 significant digits.
You may change the value of \texttt{accep} and recompile.
\item
they are not acceptable symmetry operations of the Bravais
lattice.
This is the case for C$_{60}$, for instance: the $I_h$ icosahedral
group of C$_{60}$ contains 5-fold rotations that are incompatible
with translation symmetry.
\item
the system is rotated with respect to symmetry axis.
For instance: a C$_{60}$ molecule in the fcc lattice will have 24
symmetry operations ($T_h$ group) only if the double bond is
aligned along one of the crystal axis; if C$_{60}$ is rotated in
some arbitrary way, \texttt{pw.x} may not find any symmetry, apart
from inversion.
\item
they contain a fractional translation that is incompatible with
the FFT grid (see previous paragraph).
Note that if you change cutoff or unit cell volume, the
automatically computed FFT grid changes, and this may explain
changes in symmetry (and in the number of k-points as a
consequence) for no apparent good reason (only if you have
fractional translations in the system, though).
\item
a fractional translation, without rotation, is a symmetry
operation of the system. This means that the cell is actually
a supercell. In this case, all symmetry operations containing
fractional translations are disabled.
The reason is that in this rather exotic case there is no simple
way to select those symmetry operations forming a true group, in
the mathematical sense of the term.
\end{itemize}
\paragraph{I don't get the same results in different machines!}
If the difference is small, do not panic. It is quite normal for iterative
methods to reach convergence through different paths as soon as anything
changes. In particular, between serial and parallel execution there are
operations that are not performed in the same order. As the numerical
accuracy of computer numbers is finite, this can yield slightly different
results.
It is also normal that the total energy converges to a better accuracy
than the parts it is composed of. Thus if the convergence threshold is
for instance $10^{-8}$, you get 8-digit accuracy on the total energy,
but one or two less on other terms. It is not a problem, but if you mind,
try to reduce the threshold for instance to $10^{-10}$ or $10^{-12}$.
The differences should go away (but it will probably take a few more
iterations to converge).
\paragraph{the CPU time is time-dependent!}
Yes it is!
On most machines and on most operating systems, depending on machine
load, on communication load (for parallel machines), on various other
factors (including maybe the phase of the moon), reported CPU times
may vary quite a lot for the same job.
Also note that what is printed is supposed to be the CPU time per
process, but with some compilers it is actually the wall time.
\paragraph{``warning : N eigenvectors not converged ...''}
This is a warning message that can be safely ignored if it
is not present in the last steps of self-consistency. If it
is still present in the last steps of self-consistency, and
if the number of unconverged eigenvector is a significant
part of the total, it may signal serious trouble in self-consistency
(see next point) or something badly wrong in input data.
\paragraph{``warning : negative or imaginary charge...'', or
``...core charge ...'', or ``npt with rhoup$<$0...'' or ''rhodw$<$0...'' }
These are warning messages that can be safely ignored unless the
negative or imaginary charge is sizable,
let us say {\cal O(0.1)}. If it is, something seriously
wrong is going on. Otherwise, the origin of the negative
charge is the following. When one transforms a positive
function in real space to Fourier space and truncates at
some finite cutoff, the positive function is no longer
guaranteed to be positive when transformed back to real
space. This happens only with core corrections and with
ultrasoft pseudopotentials. In some cases it may be a
source of trouble (see next point) but it is usually
solved by increasing the cutoff for the charge density.
\paragraph{self-consistency is slow or does not converge.}
Reduce \texttt{mixing\_beta} from the default value (0.7) to $\sim
0.3-0.1$ or smaller. Try the \texttt{mixing\_mode} value that is
more appropriate for your problem. For slab geometries used in surface
problems or for elongated cells, \texttt{mixing\_mode='local-TF'} should
be the better choice, dampening ``charge sloshing". You may also try to
increase \texttt{mixing\_ndim} to more than 8 (default value). Beware:
the larger \texttt{mixing\_ndim}, the larger the amount of memory you need.
If the above doesn't help: verify if your system is metallic or is
close to a metallic state, especially if you have few k-points.
If the highest occupied and lowest unoccupied state(s) keep exchanging
place during self-consistency, forget about reaching convergence. A
typical sign of such behavior is that the self-consistency error
goes down, down, down, than all of a sudden up again, and so on.
Usually one can solve the problem by adding a few empty bands and a
broadening.
Specific to US PP: the presence of negative charge density regions due
to either the pseudization procedure of the augmentation part or to
truncation at finite cutoff may give convergence problems.
Raising the \texttt{ecutrho} cutoff for charge density will usually
help, especially in gradient-corrected calculations.
\paragraph{structural optimization is slow or does not converge.}
Typical structural optimizations, based on the BFGS algorithm, converge to
the default thresholds ( \texttt{etot\_conv\_thr} and
\texttt{forc\_conv\_thr} ) in 15-25 BFGS steps (depending on the starting
configuration). This may not happen when your system is characterized by
``floppy'' low-energy modes, that make very difficult --- and of little use
anyway --- to reach a well converged structure, no matter what. Other
possible reasons for a problematic convergence are listed below.
Close to convergence the self-consistency error in forces may become
large with respect to the value of forces. The resulting mismatch
between forces and energies may confuse the line minimization
algorithm, which assumes consistency between the two. The code
reduces the starting self-consistency threshold
\texttt{conv\_thr} when approaching the minimum energy configuration,
up to a factor defined by \texttt{upscale}. Reducing
\texttt{conv\_thr} (or increasing \texttt{upscale}) yields a smoother
structural optimization, but if \texttt{conv\_thr} becomes too small,
electronic self-consistency may not converge. You may also increase
variables \texttt{etot\_conv\_thr} and
\texttt{forc\_conv\_thr} that determine the threshold for convergence
(the default values are quite strict).
A limitation to the accuracy of forces comes from the absence of
perfect translational invariance. If we had only the Hartree
potential, our PW calculation would be translationally invariant to
machine precision. The presence of an exchange-correlation potential
introduces Fourier components in the potential that are not in our
basis set. This loss of precision (more serious for
gradient-corrected functionals) translates into a slight but
detectable loss of translational invariance (the energy changes if all
atoms are displaced by the same quantity, not commensurate with the
FFT grid). This sets a limit to the accuracy of forces. The
situation improves somewhat by increasing the \texttt{ecutrho} cutoff.
\paragraph{\texttt{pw.x} stops during variable-cell optimization
in \texttt{checkallsym} with ``non orthogonal operation'' error.}
Variable-cell optimization may occasionally break the starting
symmetry of the cell. When this happens, the run is stopped
because the number of k-points calculated for the starting
configuration may no longer be suitable. Possible solutions:
\begin{itemize}
\item start with a nonsymmetric cell
\item use a symmetry-conserving algorithm: the Wentzcovitch
algorithm \\
(\texttt{cell\_dynamics='damp-w'}) shouldn't break the symmetry.
\end{itemize}
\paragraph{Why are codes in PP/ complaining that they do not
find some files?}
For Linux PC clusters in parallel execution: in at least some
versions of MPICH, the current directory is set to the directory where
the \emph{executable code} resides, instead of being set to the
directory where the code is executed.
This MPICH weirdness may cause unexpected failures in some
postprocessing codes that expect a data file in the current directory.
Workaround: use symbolic links, or copy the executable to the current
directory.
\paragraph{\texttt{ph.x} stops with ``error reading file''.}
The data file produced by \texttt{pw.x} is bad or incomplete or
produced by an incompatible version of the code.
In parallel execution: if you did not set \texttt{wf\_collect=.true.},
the number of processors and pools for the phonon run should be the
same as for the self-consistent run; all files must be visible to all
processors.
\paragraph{\texttt{ph.x} mumbles something like ``cannot recover'' or
``error reading recover file''.}
You have a bad restart file from a preceding failed execution.
Remove all files \texttt{recover*} in \texttt{outdir}.
\paragraph{\texttt{ph.x} says ``occupation numbers probably wrong''
and continues; or ``phonon + tetrahedra not implemented'' and stops}
You have a metallic or spin-polarized system but occupations are not
set to ``smearing''. Note that the correct way to calculate occupancies
must be specified in the input data of the non-selfconsistent
calculation, if the phonon code reads data from it. The non-selfconsistent
calculation will not use this information but the phonon code will.
\paragraph{\texttt{ph.x} does not yield acoustic modes with $\omega=0$
at $\mathbf{q}=0$.}
This may not be an error: the Acoustic Sum Rule (ASR) is never exactly
verified, because the system is never exactly translationally
invariant as it should be (see the discussion above).
The calculated frequency of the acoustic mode is typically less than
10 cm$^{-1}$, but in some cases it may be much higher, up to 100
cm$^{-1}$.
The ultimate test is to diagonalize the dynamical matrix with program
\texttt{dynmat.x}, imposing the ASR.
If you obtain an acoustic mode with a much smaller $\omega$ (let's say
$<1$ cm$^{-1}$) with all other modes virtually unchanged, you
can trust your results.
\paragraph{\texttt{ph.x} yields really lousy phonons, with bad or
negative frequencies or wrong symmetries or gross ASR
violations.}
Possible reasons:
\begin{itemize}
\item
wrong data file read.
\item
wrong atomic masses given in input will yield wrong frequencies
(but the content of file {\tt fildyn} should be valid, since the
force constants, not the dynamical matrix, are written to file).
\item
convergence threshold for either SCF ({\tt conv\_thr}) or phonon
calculation ({\tt tr2\_ph}) too large (try to reduce them).
\item
maybe your system \emph{does} have negative or strange phonon
frequencies, with the approximations you used.
A negative frequency signals a mechanical instability of the
chosen structure.
Check that the structure is reasonable, and check the following
parameters:
\begin{itemize}
\item The cutoff for wavefunctions, \texttt{ecutwfc}
\item For US PP: the cutoff for the charge density,
\texttt{ecutrho}
\item The k-point grid, especially for metallic systems!
\end{itemize}
\end{itemize}
\paragraph{``Wrong degeneracy'' error in star\_q.}
Verify the \textbf{q}-point for which you are calculating phonons.
In order to check whether a symmetry operation belongs to the small
group of \textbf{q}, the code compares \textbf{q} and the rotated
\textbf{q}, with an acceptance tolerance of $10^{-5}$ (set in routine
\texttt{PW/eqvect.f90}).
You may run into trouble if your \textbf{q}-point differs from a
high-symmetry point by an amount in that order of magnitude.
\section{Frequently Asked Questions}
\subsection{Compilation/Installation}
Most compilation problems have obvious origins and can be solved by
reading error messages and acting accordingly. Sometimes the reason
for a failure is less obvious. In such a case, you should look into
this guide, in the ``Installation Issues'' section, and into the
\texttt{pw\_forum} archive to see if a similar problem (with
solution) is described. If you get really weird error messages
during installation, look for them with your preferred Internet
search engine (such as Google).
\begin{itemize}
\item {\texttt{configure} \em says I have no fortran compiler!}
You haven't one. Really. More exactly, you have none of the fortran
compilers \texttt{configure} is trying in your execution path. If
your hardware/software combination is supported, fix your
execution path.
\item {\texttt{configure} \em complains that it has no permission
to run /usr/bin/oslevel and stops!} On some IBM AIX machines, the command
\texttt{/usr/bin/oslevel} used by \texttt{configure} to get info about
the type of system is not executable to normal users. Complain with
your system manager.
\item {\texttt{configure} \em says ``unsupported C/Fortran compilers
combination''!}
Unless you have trouble in compilation/linking, never mind.
\item {\texttt{configure} \em says ``unsupported architecture''!}
If compilation/linking still works, never mind, Otherwise, see
instructions in \texttt{README.configure} on what to do. Note that
in most cases you may use \texttt{configure} to produce dependencies,
then edit the file \texttt{make.sys}.
\item {\texttt{configure} \em doesn't find my (parallel/mathematical)
libraries!}
\texttt{configure} tries to locate libraries (both mathematical and
parallel libraries) in logical places with logical names, but if they
have strange names or strange locations, you will have to rename/move
them, or to instruct \texttt{configure} to find them (see subsection
``Libraries''). Note that if MPI libraries are not found, parallel
compilation is disabled.
\item {\texttt{configure} \em doesn't recognize that I have a parallel
machine!}
You do not have a properly configure parallel environment (libraries and
compiler). \texttt{configure} tries to locate a parallel compiler in a
logical place with a logical name, but if it has a strange names or it
is located in a strange location, you will have to instruct
\texttt{configure} to find it. Note that in most PC clusters (Beowulf),
there is no parallel Fortran-95 compiler: you have to configure an
appropriate script, such as \texttt{mpif90}. For libraries, see above.
\end{itemize}
\subsection{In general}
\begin{itemize}
\item {\em How can I choose parameters for variable-cell
molecular dynamics?}
``A common mistake many new users make is to set the time step
\texttt{dt} inproperly to the same order of magnitude as for CP
algorithm, or not setting \texttt{dt} at all. This will produce
a ``not evolving dynamics". Good values for the original RMW
(RM Wentzcovitch) dynamics are \texttt{dt}$=50\div70$.
The choice of the cell mass is a delicate matter. An off-optimal mass
will make convergence slower. Too small masses, as well as too long time
steps, can make the algorithm unstable. A good cell mass will make the
oscillation times for internal degrees of freedom comparable to
cell degrees of freedom in non-damped Variable-Cell MD. Test calculations
are advisable before extensive calculation.
``I have tested the damping algorithm that I have developed and it has
worked well so far. It allows for a much longer time step
(\texttt{dt}=$100\div150$) than the RMW one and is much more stable
with very small cell masses, which is useful when the cell shape,
not the internal degrees of freedom, is far out of equilibrium.
It also converges in a smaller number of steps than RMW.''
(Info from Cesar Da Silva: the new damping algorithm is the default
since v. 3.1).
% \item {\em How can I optimize the structural parameters of a
% low-symmetry lattice? should I use $E(v)$ curves, the stress,
% variable-cell molecular dynamics?}
\item {\em How is the charge density (the potential, etc.) stored?
What position in real space corresponds to an array value? }
The index of arrays used to store functions defined on 3D meshes is
actually a shorthand for three indeces, following the FORTRAN
convention (``leftmost index runs faster"). An example will explain
this better. Suppose you have a 3D array of dimension \texttt{(nr1,nr2,nr3)},
say \texttt{psi(nr1,nr2,nr3)}. FORTRAN compilers store this array
sequentially in the computer RAM in the following way:
\begin{quote}
\texttt{psi(1,1,1)}\\
\texttt{psi(2,1,1)}\\
...\\
\texttt{psi(nr1,1,1)}\\
\texttt{psi(1,2,1)}\\
\texttt{psi(2,2,1)}\\
...\\
\texttt{psi(nr1,2,1)}\\
...\\
\texttt{psi(nr1,nr2,1)}\\
\texttt{psi(1,1,nr3)}
\end{quote}
etc
Let \texttt{ind} be the position of the \texttt{(i,j,k)} element
in the above list: the relation between \texttt{ind} and \texttt{(i,j,k)}
is:
\begin{equation}
ind = i + (j-1)*nr1 + (k-1)*nr2*nr1
\end{equation}
This should clarify the relation between 1D and 3D indexing. In real
space, the \texttt{(i,j,k)} point of the mesh is
\begin{equation}
{\bf r}_{ijk} = {i-1\over nr1}*\tau_1
+ {j-1\over nr2}*\tau_2
+ {k-1\over nr3}*\tau_3
\end{equation}
where the $\tau$'s are the basis vectors of the Bravais lattice. The
latter are stored row-wise in the ``AT" array:
\begin{equation}
\tau_1 = at(:,1), \tau_2 = at(:,2), \tau_3 = at(:,3)
\end{equation}
(info by Stefano Baroni)
\item {\em Is there a simple way to determine the symmetry
of a given phonon mode?}
In some cases, degeneracy will help. In other cases, the character of a
mode can be easily determined by direct inspection. In general, one needs
to perform a group-symmetry analysis of the phonon mode, and this is
presently not implemented. So the short answer is: no, only not-so-simple
ways.
You might find the ISOTROPY package useful:\\
\htmladdnormallink{\texttt{http://stokes.byu.edu/iso/isotropy.html}}%
{http://stokes.byu.edu/iso/isotropy.html}.
You might also find the following info from Pascal Thibeadeau useful:\\
``please follow
\htmladdnormallink{\texttt{http://dx.doi.org/10.1016/0010-4655(94)00164-W}}%
{http://dx.doi.org/10.1016/0010-4655(94)00164-W}
and
\htmladdnormallink{\texttt{http://dx.doi.org/10.1016/0010-4655(74)90057-5}}%
{http://dx.doi.org/10.1016/0010-4655(74)90057-5}.
These are connected to some programs found in the Computer Physics
Communications Program Library
(\htmladdnormallink{\texttt{http://www.cpc.cs.qub.ac.uk}}%
{http://www.cpc.cs.qub.ac.uk} )
which are described in the articles:\\
ACKJ\_v1.0 {\em Normal coordinate analysis of crystals,}
J.Th.M. de Hosson.\\
ACMI\_v1.0 {\em Group-theoretical analysis of lattice vibrations},
T.G. Worlton, J.L. Warren. See erratum Comp. Phys. Commun. 4(1972)382.\\
ACMM\_v1.0 {\em Improved version of group-theoretical analysis of lattice
dynamics}, J.L. Warren, T.G. Worlton.''
\item {\em What are the \texttt{nr1b}, \texttt{nr2b}, \texttt{nr3b}?}
``\texttt{ecutrho} defines the resolution on the real space FFT mesh
(as expressed by \texttt{nr1}, \texttt{nr2} and \texttt{nr3}, that
the code left on its own sets automatically). In the ultrasoft
case we refer to this mesh as the ``hard" mesh, since
it is denser than the smooth mesh that is needed to
represent the square of the non-norm-conserving wavefunctions.
On this ``hard", fine-spaced mesh, you need to determine the size
of the cube that will encompass the largest of the augmentation
charges - this is what \texttt{nr1b}, \texttt{nr2b}, \texttt{nr3b} are.
So, \texttt{nr1b} is independent of the system size, but dependent on the
size of the augmentation charge (that doesn't vary that much)
and on the real-space resolution needed by augmentation charges
(rule of thumb: \texttt{ecutrho} is between 6 and 12 times \texttt{ecutwfc}).
In practice, \texttt{nr1b} et al. are often in the region of 20-24-28;
testing seems again a necessity (unless the code started
automagically to estimate these).
The core charge is in principle finite only at the core region (as
defined by $r_{cut}$) and vanishes out side the core. Numerically the charge
is represented in a Fourier series which may give rise to small charge
oscillations outside the core and even to negative charge density, but
only if the cut-off is too low. Having these small boxes removes the
charge oscillations problem (at least outside the box) and also offers
some numerical advantages in going to higher cut-offs.
The small boxes should be set as small as possible, but large enough to
contain the core of the largest element in your system. The formula for
determining the box size is quite simple:
$nr1b=(2*r_{cut})/L_x*nr1$,
where $r_{cut}$ is the cut-off radius for the largest element and $L_x$
is the physical length of your box along the $x$ axis. You have to round
your result to the nearest larger integer.'' (info by Nicola Marzari)
\end{itemize}
\end{document}