quantum-espresso/Doc/user_guide.tex

4338 lines
198 KiB
TeX

\documentclass[12pt,a4paper]{article}
\def\version{4.3}
\def\qe{{\sc Quantum ESPRESSO}}
\usepackage{html}
% BEWARE: don't revert from graphicx for epsfig, because latex2html
% doesn't handle epsfig commands !!!
\usepackage{graphicx}
\textwidth = 17cm
\textheight = 24cm
\topmargin =-1 cm
\oddsidemargin = 0 cm
\def\pw.x{\texttt{pw.x}}
\def\cp.x{\texttt{cp.x}}
\def\ph.x{\texttt{ph.x}}
\def\configure{\texttt{configure}}
\def\PWscf{\texttt{PWscf}}
\def\PHonon{\texttt{PHonon}}
\def\CP{\texttt{CP}}
\def\PostProc{\texttt{PostProc}}
\def\make{\texttt{make}}
\begin{document}
\author{}
\date{}
\def\qeImage{quantum_espresso.pdf}
\def\democritosImage{democritos.pdf}
\begin{htmlonly}
\def\qeImage{quantum_espresso.png}
\def\democritosImage{democritos.png}
\end{htmlonly}
\title{
\includegraphics[width=5cm]{\qeImage} \hskip 2cm
\includegraphics[width=6cm]{\democritosImage}\\
\vskip 1cm
% title
\Huge User's Guide for \qe\ \smallskip
\Large (version \version)
}
%\endhtmlonly
%\latexonly
%\title{
% \epsfig{figure=quantum_espresso.png,width=5cm}\hskip 2cm
% \epsfig{figure=democritos.png,width=6cm}\vskip 1cm
% % title
% \Huge User's Guide for \qe \smallskip
% \Large (version \version)
%}
%\endlatexonly
\maketitle
\tableofcontents
\section{Introduction}
This guide covers the installation and usage of \qe\ (opEn-Source
Package for Research in Electronic Structure, Simulation,
and Optimization), version \version.
The \qe\ distribution contains the following core packages
for the calculation of electronic-structure properties within
Density-Functional Theory (DFT), using a Plane-Wave (PW) basis set
and pseudopotentials (PP):
\begin{itemize}
\item \PWscf\ (Plane-Wave Self-Consistent Field).
\item \CP\ (Car-Parrinello).
\end{itemize}
It also includes the following more specialized packages:
\begin{itemize}
\item \texttt{PWneb}:
energy barriers and reaction pathways.
\item \PHonon:
phonons with Density-Functional Perturbation Theory.
\item \PostProc: various utilities for data postprocessing.
\item \texttt{PWcond}:
ballistic conductance.
\item \texttt{GIPAW}
(Gauge-Independent Projector Augmented Waves):
EPR g-tensor and NMR chemical shifts.
\item \texttt{XSPECTRA}:
K-edge X-ray adsorption spectra.
\item \texttt{vdW}:
(experimental) dynamic polarizability.
\item \texttt{GWW}:
(experimental) GW calculation using Wannier functions.
% \item \texttt{TD-DFPT}:
% calculations of spectra using Time-Dependent
% Density-Functional Perturbation Theory.
\end{itemize}
The following auxiliary codes are included as well:
\begin{itemize}
\item \texttt{PWgui}:
a Graphical User Interface, producing input data files for
\PWscf.
\item \texttt{atomic}:
a program for atomic calculations and generation of pseudopotentials.
\item \texttt{QHA}:
utilities for the calculation of projected density of states (PDOS)
and of the free energy in the Quasi-Harmonic Approximation (to be
used in conjunction with \PHonon).
\item \texttt{PlotPhon}:
phonon dispersion plotting utility (to be
used in conjunction with \PHonon).
\end{itemize}
A copy of required external libraries are included:
\begin{itemize}
\item \texttt{iotk}:
an Input-Output ToolKit.
\item PMG:
Multigrid solver for Poisson equation.
\item BLAS and LAPACK
\end{itemize}
Finally, several additional packages that exploit data produced by \qe\
can be installed as {\em plug-ins}:
\begin{itemize}
\item \texttt{Wannier90}:
maximally localized Wannier functions
(\texttt{http://www.wannier.org/}), written by A. Mostofi,
J. Yates, Y.-S Lee.
\item \texttt{WanT}:
quantum transport properties with Wannier functions.
\item \texttt{YAMBO}:
optical excitations with Many-Body Perturbation Theory.
\end{itemize}
This guide documents \PWscf, \CP, \PHonon, \PostProc.
The remaining packages have separate documentation.
The \qe\ codes work on many different types of Unix machines,
including parallel machines using both OpenMP and MPI
(Message Passing Interface).
Running \qe\ on Mac OS X and MS-Windows is also possible:
see section \ref{Sec:Installation}.
Further documentation, beyond what is provided in this guide, can be found in:
\begin{itemize}
\item the \texttt{pw\_forum} mailing list (\texttt{pw\_forum@pwscf.org}).
You can subscribe to this list, browse and search its archives
(links in \texttt{http://www.quantum-espresso.org/contacts.php}).
See section \ref{SubSec:Contacts}, ``Contacts'', for more info.
\item the \texttt{Doc/} directory of the \qe\ distribution,
containing a detailed description of input data for most codes
in files \texttt{INPUT\_*.txt} and \texttt{INPUT\_*.html},
plus and a few additional pdf documents
\item the \qe\ web site:\\
\texttt{http://www.quantum-espresso.org};
\item the \qe\ Wiki:\\
\texttt{http://www.quantum-espresso.org/wiki/index.php/Main\_Page}.
\end{itemize}
People who want to contribute to \qe\ should read the
Developer Manual: \texttt{Doc/developer\_man.pdf}.
This guide does not explain the basic Unix concepts (shell, execution
path, directories etc.) and utilities needed to run \qe; it does not
explain either solid state physics and its computational methods.
If you want to learn the latter, you should read a good textbook,
such as e.g. the book by Richard Martin:
{\em Electronic Structure: Basic Theory and Practical Methods},
Cambridge University Press (2004). See also the ``Learn'' section in
the \qe\ web site; the ``Reference Papers''
section in the Wiki.
All trademarks mentioned in this guide belong to their respective owners.
\subsection{What can \qe\ do}
\PWscf\ can currently perform the following kinds of calculations:
\begin{itemize}
\item ground-state energy and one-electron (Kohn-Sham) orbitals;
\item atomic forces, stresses, and structural optimization;
\item molecular dynamics on the ground-state Born-Oppenheimer surface, also with variable cell;
\item macroscopic polarization and finite electric fields via
the modern theory of polarization (Berry Phases).
\end{itemize}
All of the above works for both insulators and metals,
in any crystal structure, for many exchange-correlation (XC) functionals
(including spin polarization, DFT+U, nonlocal VdW functionas,
hybrid functionals), for
norm-conserving (Hamann-Schluter-Chiang) PPs (NCPPs) in
separable form or Ultrasoft (Vanderbilt) PPs (USPPs)
or Projector Augmented Waves (PAW) method.
Non-collinear magnetism and spin-orbit interactions
are also implemented. An implementation of finite electric
fields with a sawtooth potential in a supercell is also available.
Note that the calculation of reaction pathways and energy barriers
using the Nudged Elastci Band (NEB) and Fourier String Method Dynamics
(SMD) methods, is no longer performed by \PWscf. It is now performed
by a different executable, contained in the subpackage \texttt{PWneb}.
\PHonon\ can perform the following types of calculations:
\begin{itemize}
\item phonon frequencies and eigenvectors at a generic wave vector,
using Density-Functional Perturbation Theory;
\item effective charges and dielectric tensors;
\item electron-phonon interaction coefficients for metals;
\item interatomic force constants in real space;
\item third-order anharmonic phonon lifetimes;
\item Infrared and Raman (nonresonant) cross section.
\end{itemize}
\PHonon\ can be used whenever \PWscf\ can be
used, with the exceptions of DFT+U, nonlocal VdW and hybrid functionals.
PAW is not implemented for higher-order response calculations.
Calculations, in the Quasi-Harmonic approximations, of the vibrational
free energy can be performed using the \texttt{QHA} package.
\PostProc\ can perform the following types of calculations:
\begin{itemize}
\item Scanning Tunneling Microscopy (STM) images;
\item plots of Electron Localization Functions (ELF);
\item Density of States (DOS) and Projected DOS (PDOS);
\item L\"owdin charges;
\item planar and spherical averages;
\end{itemize}
plus interfacing with a number of graphical utilities and with
external codes.
\CP\ can perform Car-Parrinello molecular dynamics, including
variable-cell dynamics.
\subsection{People}
In the following, the cited affiliation is either the current one
or the one where the last known contribution was done.
The maintenance and further development of the \qe\ distribution
is promoted by the DEMOCRITOS National Simulation Center
of IOM-CNR under the coordination of
Paolo Giannozzi (Univ.Udine, Italy) and Layla Martin-Samos
(Democritos) with the strong support
of the CINECA National Supercomputing Center in Bologna under
the responsibility of Carlo Cavazzoni.
The \PWscf\ package (which included \PHonon\ and \PostProc\
in earlier releases)
was originally developed by Stefano Baroni, Stefano
de Gironcoli, Andrea Dal Corso (SISSA), Paolo Giannozzi, and many others.
We quote in particular:
\begin{itemize}
\item Matteo Cococcioni (Univ. Minnesota) for DFT+U implementation;
\item David Vanderbilt's group at Rutgers for Berry's phase
calculations;
\item Ralph Gebauer (ICTP, Trieste) and Adriano Mosca Conte
(SISSA, Trieste) for noncolinear magnetism;
\item Andrea Dal Corso for spin-orbit interactions;
\item Carlo Sbraccia (Princeton) for NEB, Strings method,
for improvements to structural optimization
and to many other parts;
\item Paolo Umari (Democritos) for finite electric fields;
\item Renata Wentzcovitch and collaborators (Univ. Minnesota)
for variable-cell molecular dynamics;
\item Lorenzo Paulatto (Univ.Paris VI) for PAW implementation,
built upon previous work by Guido Fratesi (Univ.Milano Bicocca)
and Riccardo Mazzarello (ETHZ-USI Lugano);
\item Ismaila Dabo (INRIA, Palaiseau) for electrostatics with
free boundary conditions.
\end{itemize}
For \PHonon, we mention in particular:
\begin{itemize}
\item Michele Lazzeri (Univ.Paris VI) for the 2n+1 code and Raman
cross section calculation with 2nd-order response;
\item Andrea Dal Corso for USPP, noncollinear, spin-orbit
extensions to \PHonon.
\end{itemize}
For \PostProc, we mention:
\begin{itemize}
\item Andrea Benassi (SISSA) for the \texttt{epsilon} utility;
\item Norbert Nemec (U.Cambridge) for the \texttt{pw2casino}
utility;
\item Dmitry Korotin (Inst. Met. Phys. Ekaterinburg) for the
\texttt{wannier\_ham} utility.
\end{itemize}
The \CP\ package is based on the original code written by
Roberto Car
and Michele Parrinello. \CP\ was developed by Alfredo Pasquarello
(IRRMA, Lausanne), Kari Laasonen (Oulu), Andrea Trave, Roberto
Car (Princeton), Nicola Marzari (Univ. Oxford), Paolo Giannozzi, and others.
FPMD, later merged with \CP, was developed by Carlo
Cavazzoni,
Gerardo Ballabio (CINECA), Sandro Scandolo (ICTP),
Guido Chiarotti (SISSA), Paolo Focher, and others.
We quote in particular:
\begin{itemize}
\item Manu Sharma (Princeton) and Yudong Wu (Princeton) for
maximally localized Wannier functions and dynamics with
Wannier functions;
\item Paolo Umari (Democritos) for finite electric fields and conjugate
gradients;
\item Paolo Umari and Ismaila Dabo for ensemble-DFT;
\item Xiaofei Wang (Princeton) for META-GGA;
\item The Autopilot feature was implemented by Targacept, Inc.
\end{itemize}
Other packages in \qe:
\begin{itemize}
\item
\texttt{PWcond} was written by Alexander Smogunov (SISSA) and Andrea
Dal Corso. For an introduction, see
\texttt{http://people.sissa.it/\~{}smogunov/PWCOND/pwcond.html}
\item
\texttt{GIPAW} (\texttt{http://www.gipaw.net})
was written by Davide Ceresoli (MIT), Ari Seitsonen (Univ.Zurich),
Uwe Gerstmann, Francesco Mauri (Univ. Paris VI).
\item
\texttt{PWgui} was written by Anton Kokalj (IJS Ljubljana) and is
based on his GUIB concept (\texttt{http://www-k3.ijs.si/kokalj/guib/}).
\item
\texttt{atomic} was written by Andrea Dal Corso and it is the result
of many additions to the original code by Paolo Giannozzi
and others. Lorenzo Paulatto wrote the PAW extension.
\item
\texttt{iotk} (\texttt{http://www.s3.infm.it/iotk}) was written by Giovanni Bussi (SISSA) .
\item
\texttt{XSPECTRA} was written by Matteo Calandra (Univ. Paris VI)
and collaborators.
\item \texttt{VdW} was contributed by Huy-Viet Nguyen (SISSA).
\item \texttt{GWW} was written by Paolo Umari and Geoffrey Stenuit (Democritos).
\item
\texttt{QHA} amd \texttt{PlotPhon} were contributed by Eyvaz Isaev
(Moscow Steel and Alloy Inst. and Linkoping and Uppsala Univ.).
\end{itemize}
Other relevant contributions to \qe:
\begin{itemize}
\item Andrea Ferretti (MIT) contributed the \texttt{qexml} and
\texttt{sumpdos} utility,
helped with file formats and with various problems;
\item Hannu-Pekka Komsa (CSEA/Lausanne) contributed
the HSE functional;
\item Dispersions interaction in the framework of DFT-D were
contributed by Daniel Forrer (Padua Univ.) and Michele Pavone
(Naples Univ. Federico II);
\item Filippo Spiga (Univ. Milano Bicocca) contributed the
mixed MPI-OpenMP parallelization;
\item The initial BlueGene porting was done by Costas Bekas and
Alessandro Curioni (IBM Zurich);
\item Gerardo Ballabio wrote the first \configure\ for \qe
\item Audrius Alkauskas (IRRMA), Uli Aschauer (Princeton),
Simon Binnie (Univ. College London), Guido Fratesi, Axel Kohlmeyer (UPenn),
Konstantin Kudin (Princeton), Sergey Lisenkov (Univ.Arkansas),
Nicolas Mounet (MIT), William Parker (Ohio State Univ),
Guido Roma (CEA), Gabriele Sclauzero (SISSA), Sylvie Stucki (IRRMA),
Pascal Thibaudeau (CEA), Vittorio Zecca, Federico Zipoli (Princeton)
answered questions on the mailing list, found bugs, helped in
porting to new architectures, wrote some code.
\end{itemize}
An alphabetical list of further contributors includes: Dario Alf\`e,
Alain Allouche, Francesco Antoniella, Francesca Baletto,
Mauro Boero, Nicola Bonini, Claudia Bungaro,
Paolo Cazzato, Gabriele Cipriani, Jiayu Dai, Cesar Da Silva,
Alberto Debernardi, Gernot Deinzer, Yves Ferro,
Martin Hilgeman, Yosuke Kanai, Nicolas Lacorne, Stephane Lefranc,
Kurt Maeder, Andrea Marini,
Pasquale Pavone, Mickael Profeta, Kurt Stokbro,
Paul Tangney,
Antonio Tilocca, Jaro Tobik,
Malgorzata Wierzbowska, Silviu Zilberman,
and let us apologize to everybody we have forgotten.
This guide was mostly written by Paolo Giannozzi.
Gerardo Ballabio and Carlo Cavazzoni wrote the section on \CP.
\subsection{Contacts}
\label{SubSec:Contacts}
The web site for \qe\ is \texttt{http://www.quantum-espresso.org/}.
Releases and patches can be downloaded from this
site or following the links contained in it. The main entry point for
developers is the QE-forge web site:
\texttt{http://www.qe-forge.org/}.
The recommended place where to ask questions about installation
and usage of \qe, and to report bugs, is the \texttt{pw\_forum}
mailing list: \texttt{pw\_forum@pwscf.org}. Here you can receive
news about \qe\ and obtain help from the developers and from
knowledgeable users. Please read the guidelines for posting,
section \ref{SubSec:Guidelines}!
You have to be subscribed in order to post to the \texttt{pw\_forum}
list. NOTA BENE: only messages that appear to come from the
registered user's e-mail address, in its {\em exact form}, will be
accepted. Messages "waiting for moderator approval" are
automatically deleted with no further processing (sorry, too
much spam). In case of trouble, carefully check that your return
e-mail is the correct one (i.e. the one you used to subscribe).
Since \texttt{pw\_forum} averages $\sim 10$ message a day, an alternative
low-traffic mailing list,\\
\texttt{pw\_users@pwscf.org}, is provided for
those interested only in \qe-related news, such as e.g. announcements
of new versions, tutorials, etc.. You can subscribe (but not post) to
this list from the \qe\ web site (``Contacts'' section).
If you need to contact the developers for {\em specific} questions
about coding, proposals, offers of help, etc., send a message to the
developers' mailing list: user \texttt{q-e-developers}, address
\texttt{qe-forge.org}.
\subsubsection{Guidelines for posting to the mailing list}
\label{SubSec:Guidelines}
Life for subscribers of \texttt{pw\_forum} will be easier if everybody
complies with the following guidelines:
\begin{itemize}
\item Before posting, {\em please}: browse or search the archives --
links are available in the "Contacts" page of the \qe\ web site:\\
\texttt{http://www.quantum-espresso.org/contacts.php}. Most questions
are asked over and over again. Also: make an attempt to search the
available documentation, notably the FAQs and the User Guide.
The answer to most questions is already there.
\item Sign your post with your name and affiliation.
\item Choose a meaningful subject. Do not use "reply" to start a new
thread:
it will confuse the ordering of messages into threads that most mailers
can do. In particular, do not use "reply" to a Digest!!!
\item Be short: no need to send 128 copies of the same error message just
because you this is what came out of your 128-processor run. No need to
send the entire compilation log for a single error appearing at the end.
\item Avoid excessive or irrelevant quoting of previous messages. Your
message must be immediately visible and easily readable, not hidden
into a sea of quoted text.
\item Remember that even experts cannot guess where a problem lies in
the absence of sufficient information.
\item Remember that the mailing list is a voluntary endeavour: nobody is
entitled to an answer, even less to an immediate answer.
\item Finally, please note that the mailing list is not a replacement
for your own work, nor is it a replacement for your thesis director's work.
\end{itemize}
\subsection{Terms of use}
\qe\ is free software, released under the
GNU General Public License. See
\texttt{http://www.gnu.org/licenses/old-licenses/gpl-2.0.txt},
or the file License in the distribution).
We shall greatly appreciate if scientific work done using this code will
contain an explicit acknowledgment and the following reference:
\begin{quote}
P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car, C. Cavazzoni,
D. Ceresoli, G. L. Chiarotti, M. Cococcioni, I. Dabo, A. Dal Corso,
S. Fabris, G. Fratesi, S. de Gironcoli, R. Gebauer, U. Gerstmann,
C. Gougoussis, A. Kokalj, M. Lazzeri, L. Martin-Samos, N. Marzari,
F. Mauri, R. Mazzarello, S. Paolini, A. Pasquarello, L. Paulatto,
C. Sbraccia, S. Scandolo, G. Sclauzero, A. P. Seitsonen, A. Smogunov,
P. Umari, R. M. Wentzcovitch, J.Phys.:Condens.Matter 21, 395502 (2009),
http://arxiv.org/abs/0906.2569
\end{quote}
Note the form \qe\ for textual citations of the code.
Pseudopotentials should be cited as (for instance)
\begin{quote}
[ ] We used the pseudopotentials C.pbe-rrjkus.UPF
and O.pbe-vbc.UPF from\\
\texttt{http://www.quantum-espresso.org}.
\end{quote}
\section{Installation}
\subsection{Download}
Presently, \qe\ is only distributed in source form;
some precompiled executables (binary files) are provided only for
\texttt{PWgui}.
Stable releases of the \qe\ source package (current version
is \version) can be downloaded from this URL: \\
\texttt{http://www.quantum-espresso.org/download.php}.
Uncompress and unpack the core distribution using the command:
\begin{verbatim}
tar zxvf espresso-X.Y.Z.tar.gz
\end{verbatim}
(a hyphen before "zxvf" is optional) where \texttt{X.Y.Z} stands for the
version number. If your version of \texttt{tar}
doesn't recognize the "z" flag:
\begin{verbatim}
gunzip -c espresso-X.Y.Z.tar.gz | tar xvf -
\end{verbatim}
A directory \texttt{espresso-X.Y.Z/} will be created. Given the size
of the complete distribution, you may need to download more packages
and to unpack them following the same procedure (they will unpack into
the same directory). Plug-ins should instead be downloaded into
subdirectory \texttt{plugin/archive} but not unpacked or uncompressed:
command \texttt{make} will take care of this during installation.
Occasionally, patches for the current version, fixing some errors and bugs,
may be distributed as a "diff" file. In order to install a patch (for
instance):
\begin{verbatim}
cd espresso-X.Y.Z/
patch -p1 < /path/to/the/diff/file/patch-file.diff
\end{verbatim}
If more than one patch is present, they should be applied in the correct order.
Daily snapshots of the development version can be downloaded from the
developers' site \texttt{qe-forge.org}: follow the link ''Quantum ESPRESSO'',
then ''SCM''. Beware: the development version
is, well, under development: use at your own risk! The bravest
may access the development version via anonymous CVS
(Concurrent Version System): see the Developer Manual
(\texttt{Doc/developer\_man.pdf}), section ''Using CVS''.
The \qe\ distribution contains several directories. Some of them are
common to all packages:
\begin{tabular}{ll}
\texttt{Modules/} & source files for modules that are common to all programs\\
\texttt{include/} & files *.h included by fortran and C source files\\
\texttt{clib/} & external libraries written in C\\
\texttt{flib/} & external libraries written in Fortran\\
\texttt{iotk/ } & Input/Output Toolkit\\
\texttt{install/} & installation scripts and utilities\\
\texttt{pseudo}/ & pseudopotential files used by examples\\
\texttt{upftools/}& converters to unified pseudopotential format (UPF)\\
\texttt{examples/}& sample input and output files\\
\texttt{Doc/} & general documentation\\
\end{tabular}
\\
while others are specific to a single package:
\begin{tabular}{ll}
\texttt{PW/} &\PWscf: source files for scf calculations (\pw.x)\\
\texttt{pwtools/} &\PWscf: source files for miscellaneous analysis programs\\
\texttt{tests/} &\PWscf: automated tests\\
\texttt{NEB/} &\texttt{PWneb}: source files for NEB calculations (\texttt{neb.x})\\
\texttt{PP/} &\PostProc: source files for post-processing of \pw.x\
data file\\
\texttt{PH/} &\PHonon: source files for phonon calculations (\ph.x)
and analysis\\
\texttt{Gamma/} &\PHonon: source files for Gamma-only phonon calculation
(\texttt{phcg.x})\\
\texttt{D3/} &\PHonon: source files for third-order derivative
calculations (\texttt{d3.x})\\
\texttt{PWCOND/} &\texttt{PWcond}: source files for conductance calculations
(\texttt{pwcond.x})\\
\texttt{vdW/} &\texttt{VdW}: source files for molecular polarizability
calculation at finite frequency\\
\texttt{CPV/} &\CP: source files for Car-Parrinello code (\cp.x)\\
\texttt{atomic/} &\texttt{atomic}: source files for the pseudopotential
generation package (\texttt{ld1.x})\\
\texttt{atomic\_doc/} &Documentation, tests and examples for \texttt{atomic}\\
\texttt{GUI/} & \texttt{PWGui}: Graphical User Interface\\
\end{tabular}
\subsection{Prerequisites}
\label{Sec:Installation}
To install \qe\ from source, you need first of all a minimal Unix
environment: basically, a command shell (e.g.,
bash or tcsh) and the utilities \make, \texttt{awk}, \texttt{sed}.
MS-Windows users need
to have Cygwin (a UNIX environment which runs under Windows) installed:
see \texttt{http://www.cygwin.com/}. Note that the scripts contained in the distribution
assume that the local language is set to the standard, i.e. "C"; other
settings
may break them. Use \texttt{export LC\_ALL=C} (sh/bash) or
\texttt{setenv LC\_ALL C} (csh/tcsh) to prevent any problem
when running scripts (including installation scripts).
Second, you need C and Fortran-95 compilers. For parallel
execution, you will also need MPI libraries and a ``parallel''
(i.e. MPI-aware) compiler. For massively parallel machines, or
for simple multicore parallelization, an OpenMP-aware compiler
and libraries are also required.
Big machines with
specialized hardware (e.g. IBM SP, CRAY, etc) typically have a
Fortran-95 compiler with MPI and OpenMP libraries bundled with
the software. Workstations or ``commodity'' machines, using PC
hardware, may or may not have the needed software. If not, you need
either to buy a commercial product (e.g Portland) or to install
an open-source compiler like gfortran or g95.
Note that several commercial compilers are available free of charge
under some license for academic or personal usage (e.g. Intel, Sun).
\subsection{\configure}
To install the \qe\ source package, run the \configure\
script. This is actually a wrapper to the true \configure,
located in the \texttt{install/} subdirectory. \configure\
will (try to) detect compilers and libraries available on
your machine, and set up things accordingly. Presently it is expected
to work on most Linux 32- and 64-bit PCs (all Intel and AMD CPUs) and PC clusters, SGI Altix, IBM SP machines, NEC SX, Cray XT
machines, Mac OS X, MS-Windows PCs. It may work with
some assistance also on other architectures (see below).
Instructions for the impatient:
\begin{verbatim}
cd espresso-X.Y.Z/
./configure
make all
\end{verbatim}
Symlinks to executable programs will be placed in the
\texttt{bin/}
subdirectory. Note that both C and Fortran compilers must be in your execution
path, as specified in the PATH environment variable.
Additional instructions for special machines:
\begin{tabular}{ll}
\texttt{./configure ARCH=crayxt4r}& for CRAY XT machines \\
\texttt{./configure ARCH=necsx} & for NEC SX machines \\
\texttt{./configure ARCH=ppc64-mn}& PowerPC Linux + xlf (Marenostrum) \\
\texttt{./configure ARCH=ppc64-bg}& IBM BG/P (BlueGene)
\end{tabular}
\configure\ Generates the following files:
\begin{tabular}{ll}
\texttt{install/make.sys} & compilation rules and flags (used by \texttt{Makefile})\\
\texttt{install/configure.msg} & a report of the configuration run (not needed for compilation)\\
\texttt{install/config.log} & detailed log of the configuration run (may be needed for debugging)\\
\texttt{include/fft\_defs.h} & defines fortran variable for C pointer (used only by FFTW)\\
\texttt{include/c\_defs.h} & defines C to fortran calling convention\\
& and a few more definitions used by C files\\
\end{tabular}\\
NOTA BENE: unlike previous versions, \configure\ no longer runs the
\texttt{makedeps.sh} shell script that updates dependencies. If you modify the
sources, run \texttt{./install/makedeps.sh} or type \texttt{make depend}
to update files \texttt{make.depend} in the various subdirectories.
You should always be able to compile the \qe\ suite
of programs without having to edit any of the generated files. However you
may have to tune \configure\ by specifying appropriate environment variables
and/or command-line options. Usually the tricky part is to get external
libraries recognized and used: see Sec.\ref{Sec:Libraries}
for details and hints.
Environment variables may be set in any of these ways:
\begin{verbatim}
export VARIABLE=value; ./configure # sh, bash, ksh
setenv VARIABLE value; ./configure # csh, tcsh
./configure VARIABLE=value # any shell
\end{verbatim}
Some environment variables that are relevant to \configure\ are:
\begin{tabular}{ll}
\texttt{ARCH}& label identifying the machine type (see below)\\
\texttt{F90, F77, CC} &names of Fortran 95, Fortran 77, and C compilers\\
\texttt{MPIF90} & name of parallel Fortran 95 compiler (using MPI)\\
\texttt{CPP} & source file preprocessor (defaults to \$CC -E)\\
\texttt{LD} & linker (defaults to \$MPIF90)\\
\texttt{(C,F,F90,CPP,LD)FLAGS}& compilation/preprocessor/loader flags\\
\texttt{LIBDIRS}& extra directories where to search for libraries\\
\end{tabular}\\
For example, the following command line:
\begin{verbatim}
./configure MPIF90=mpf90 FFLAGS="-O2 -assume byterecl" \
CC=gcc CFLAGS=-O3 LDFLAGS=-static
\end{verbatim}
instructs \configure to use \texttt{mpf90} as Fortran 95 compiler
with flags \texttt{-O2 -assume byterecl}, \texttt{gcc} as C compiler with
flags \texttt{-O3}, and to link with flag \texttt{-static}.
Note that the value of \texttt{FFLAGS} must be quoted, because it contains
spaces. NOTA BENE: do not pass compiler names with the leading path
included. \texttt{F90=f90xyz} is ok, \texttt{F90=/path/to/f90xyz} is not.
Do not use
environmental variables with \configure\ unless they are needed! try
\configure\ with no options as a first step.
If your machine type is unknown to \configure, you may use the
\texttt{ARCH}
variable to suggest an architecture among supported ones. Some large
parallel machines using a front-end (e.g. Cray XT) will actually
need it, or else \configure\ will correctly recognize the front-end
but not the specialized compilation environment of those
machines. In some cases, cross-compilation requires to specify the target machine with the
\texttt{--host} option. This feature has not been extensively
tested, but we had at least one successful report (compilation
for NEC SX6 on a PC). Currently supported architectures are:\\
\begin{tabular}{ll}
\texttt{ia32}& Intel 32-bit machines (x86) running Linux\\
\texttt{ia64}& Intel 64-bit (Itanium) running Linux\\
\texttt{x86\_64}& Intel and AMD 64-bit running Linux - see note below\\
\texttt{aix}& IBM AIX machines\\
\texttt{solaris}& PC's running SUN-Solaris\\
\texttt{sparc}& Sun SPARC machines\\
\texttt{crayxt4}& Cray XT4/5 machines\\
\texttt{macppc}& Apple PowerPC machines running Mac OS X\\
\texttt{mac686}& Apple Intel machines running Mac OS X\\
\texttt{cygwin}& MS-Windows PCs with Cygwin\\
\texttt{necsx}& NEC SX-6 and SX-8 machines\\
\texttt{ppc64}& Linux PowerPC machines, 64 bits\\
\texttt{ppc64-mn}&as above, with IBM xlf compiler\\
\texttt{ppc64-bg}&IBM BlueGene
\end{tabular}\\
{\em Note}: \texttt{x86\_64} replaces \texttt{amd64} since v.4.1.
Cray Unicos machines, SGI
machines with MIPS architecture, HP-Compaq Alphas are no longer supported
since v.\version.
Finally, \configure\ recognizes the following command-line options:\\
\begin{tabular}{ll}
\texttt{--enable-parallel}& compile for parallel execution if possible (default: yes)\\
\texttt{--enable-openmp}& compile for openmp execution if possible (default: no)\\
\texttt{--enable-shared}& use shared libraries if available (default: yes)\\
\texttt{--disable-wrappers}& disable C to fortran wrapper check (default: enabled)\\
\texttt{--enable-signals}& enable signal trapping (default: disabled)\\
\end{tabular}\\
and the following optional packages:\\
\begin{tabular}{ll}
\texttt{--with-internal-blas}& compile with internal BLAS (default: no)\\
\texttt{--with-internal-lapack}& compile with internal LAPACK (default: no)\\
\texttt{--with-scalapack}& use ScaLAPACK if available (default: yes)\\
\end{tabular}\\
If you want to modify the \configure\ script (advanced users only!),
see the Developer Manual.
\subsubsection{Manual configuration}
\label{SubSec:manconf}
If \configure\ stops before the end, and you don't find a way to fix
it, you have to write working \texttt{make.sys}, \texttt{include/fft\_defs.h}
and \texttt{include/c\_defs.h} files.
For the latter two files, follow the explanations in
\texttt{include/defs.h.README}.
If \configure\ has run till the end, you should need only to
edit \texttt{make.sys}. A few templates (each for a different
machine type)
are provided in the \texttt{install/} directory: they have names of the
form \texttt{Make.}{\em system}, where {\em system} is a string identifying the
architecture and compiler. The template used by \configure\ is also found
there as \texttt{make.sys.in} and contains explanations of the meaning
of the various variables. The difficult part will be to locate libraries.
Note that you will need to select appropriate preprocessing flags
in conjunction with the desired or available
libraries (e.g. you need to add \texttt{-D\_\_FFTW}) to \texttt{DFLAGS}
if you want to link internal FFTW). For a correct choice of preprocessing
flags, refer to the documentation in \texttt{include/defs.h.README}.
NOTA BENE: If you change any settings (e.g. preprocessing,
compilation flags)
after a previous (successful or failed) compilation, you must run
\texttt{make clean} before recompiling, unless you know exactly which
routines are affected by the changed settings and how to force their recompilation.
\subsection{Libraries}
\label{Sec:Libraries}
\qe\ makes use of the following external libraries:
\begin{itemize}
\item BLAS (\texttt{http://www.netlib.org/blas/}) and
\item LAPACK (\texttt{http://www.netlib.org/lapack/}) for linear algebra
\item FFTW (\texttt{http://www.fftw.org/}) for Fast Fourier Transforms
\end{itemize}
A copy of the needed routines is provided with the distribution. However,
when available, optimized vendor-specific libraries should be used: this
often yields huge performance gains.
\paragraph{BLAS and LAPACK}
\qe\ can use the following architecture-specific replacements for BLAS and LAPACK:\\
\begin{quote}
MKL for Intel Linux PCs\\
ACML for AMD Linux PCs\\
ESSL for IBM machines\\
SCSL for SGI Altix\\
SUNperf for Sun
\end{quote}
If none of these is available, we suggest that you use the optimized ATLAS library: see \\
\texttt{http://math-atlas.sourceforge.net/}. Note that ATLAS is not
a complete replacement for LAPACK: it contains all of the BLAS, plus the
LU code, plus the full storage Cholesky code. Follow the instructions in the
ATLAS distributions to produce a full LAPACK replacement.
Sergei Lisenkov reported success and good performances with optimized
BLAS by Kazushige Goto. They can be freely downloaded,
but not redistributed. See the "GotoBLAS2" item at\\
\texttt{http://www.tacc.utexas.edu/tacc-projects/}.
\paragraph{FFT}
\qe\ has an internal copy of an old FFTW version, and it
can use the following vendor-specific FFT libraries:
\begin{quote}
IBM ESSL\\
SGI SCSL\\
SUN sunperf\\
NEC ASL\\
AMD ACML
\end{quote}
\configure\ will first search for vendor-specific FFT libraries;
if none is found, it will search for an external FFTW v.3 library;
if none is found, it will fall back to the internal copy of FFTW.
If you have recent versions of MKL installed, you may try the
FFTW interface provided with MKL. You will have to compile them
(only sources are distributed with the MKL library)
and to modify file \texttt{make.sys} accordingly (MKL must be linked
{\em after} the FFTW-MKL interface)
\paragraph{MPI libraries}
MPI libraries are usually needed for parallel execution
(unless you are happy with OpenMP multicore parallelization).
In well-configured machines, \configure\ should find the appropriate
parallel compiler for you, and this should find the appropriate
libraries. Since often this doesn't
happen, especially on PC clusters, see Sec.\ref{SubSec:LinuxPCMPI}.
\paragraph{Other libraries}
\qe\ can use the MASS vector math
library from IBM, if available (only on AIX).
\subsubsection{If optimized libraries are not found}
The \configure\ script attempts to find optimized libraries, but may fail
if they have been installed in non-standard places. You should examine
the final value of \texttt{BLAS\_LIBS, LAPACK\_LIBS, FFT\_LIBS, MPI\_LIBS} (if needed),
\texttt{MASS\_LIBS} (IBM only), either in the output of \configure\ or in the generated
\texttt{make.sys}, to check whether it found all the libraries that you intend to use.
If some library was not found, you can specify a list of directories to search
in the environment variable \texttt{LIBDIRS},
and rerun \configure; directories in the
list must be separated by spaces. For example:
\begin{verbatim}
./configure LIBDIRS="/opt/intel/mkl70/lib/32 /usr/lib/math"
\end{verbatim}
If this still fails, you may set some or all of the \texttt{*\_LIBS} variables manually
and retry. For example:
\begin{verbatim}
./configure BLAS_LIBS="-L/usr/lib/math -lf77blas -latlas_sse"
\end{verbatim}
Beware that in this case, \configure\ will blindly accept the specified value,
and won't do any extra search.
\subsection{Compilation}
There are a few adjustable parameters in \texttt{Modules/parameters.f90}.
The
present values will work for most cases. All other variables are dynamically
allocated: you do not need to recompile your code for a different system.
At your option, you may compile the complete \qe\ suite of programs
(with \texttt{make all}), or only some specific programs.
\texttt{make} with no arguments yields a list of valid compilation targets.
Here is a list:
\begin{itemize}
\item \texttt{make pw} produces \texttt{PW/pw.x} \\
\pw.x\ calculates electronic structure, structural optimization, molecular dynamics.
\item \texttt{make neb} produces the following codes in \texttt{NEB/}
for NEB calculations:
\begin{itemize}
\item \texttt{neb.x}: calculates reaction barriers and pathways using NEB.
\item \texttt{path\_int.x}: used by utility \texttt{path\_int.sh}
that generates, starting from a path (a set of images), a new one with a
different number of images. The initial and final points of the new
path can differ from those in the original one.
\end{itemize}
\item \texttt{make ph} produces the following codes in \texttt{PH/}
for phonon calculations:
\begin{itemize}
\item \ph.x\ : Calculates phonon frequencies and displacement patterns,
dielectric tensors, effective charges (uses data produced by \pw.x\ ).
\item \texttt{dynmat.x}: applies various kinds of Acoustic Sum Rule (ASR),
calculates LO-TO splitting at ${\bf q} = 0$ in insulators, IR and Raman
cross sections (if the coefficients have been properly calculated),
from the dynamical matrix produced by \ph.x\
\item \texttt{q2r.x}: calculates Interatomic Force Constants (IFC) in real space
from dynamical matrices produced by \ph.x on a regular {\bf q}-grid
\item \texttt{matdyn.x}: produces phonon frequencies at a generic wave vector
using the IFC file calculated by \texttt{q2r.x}; may also calculate phonon DOS,
the electron-phonon coefficient $\lambda$, the function $\alpha^2F(\omega)$
\item \texttt{lambda.x}: also calculates $\lambda$ and $\alpha^2F(\omega)$,
plus $T_c$ for superconductivity using the McMillan formula
\end{itemize}
\item \texttt{make d3} produces \texttt{D3/d3.x}:
calculates anharmonic phonon lifetimes (third-order derivatives
of the energy), using data produced by \pw.x and \ph.x (USPP
and PAW not supported).
\item \texttt{make gamma} produces \texttt{Gamma/phcg.x}:
a version of \ph.x that calculates phonons at ${\bf q} = 0$ using
conjugate-gradient minimization of the density functional expanded to
second-order. Only the $\Gamma$ (${\bf k} = 0$) point is used for Brillouin zone
integration. It is faster and takes less memory than \ph.x, but does
not support USPP and PAW.
\item \texttt{make pp} produces several codes for data postprocessing, in
\texttt{PP/} (see list below).
\item \texttt{make tools} produces several utility programs in \texttt{pwtools/} (see
list below).
\item \texttt{make pwcond} produces \texttt{PWCOND/pwcond.x}
for ballistic conductance calculations.
\item \texttt{make pwall} produces all of the above.
\item \texttt{make ld1} produces code \texttt{atomic/ld1.x} for pseudopotential
generation (see specific documentation in \texttt{atomic\_doc/}).
\item \texttt{make upf} produces utilities for pseudopotential conversion in
directory \texttt{upftools/}.
\item \texttt{make cp} produces the Car-Parrinello code \texttt{CPV/cp.x}
and the postprocessing code \texttt{CPV/cppp.x}.
\item \texttt{make all} produces all of the above.
\end{itemize}
For the setup of the GUI, refer to the \texttt{PWgui-X.Y.Z /INSTALL} file, where
X.Y.Z stands for the version number of the GUI (should be the same as the
general version number). If you are using the CVS sources, see
the \texttt{GUI/README} file instead.
The codes for data postprocessing in \texttt{PP/} are:
\begin{itemize}
\item \texttt{pp.x} extracts the specified data from files produced by \pw.x,
prepares data for plotting by writing them into formats that can be
read by several plotting programs.
\item \texttt{bands.x} extracts and reorders eigenvalues from files produced by
\pw.x for band structure plotting
\item \texttt{projwfc.x} calculates projections of wavefunction over atomic
orbitals, performs L\"owdin population analysis and calculates
projected density of states. These can be summed using auxiliary
code \texttt{sumpdos.x}.
\item \texttt{plotrho.x} produces PostScript 2-d contour plots
\item \texttt{plotband.x} reads the output of \texttt{bands.x}, produces
PostScript plots of the band structure
\item \texttt{average.x} calculates planar averages of quantities produced by
\texttt{pp.x} (potentials, charge, magnetization densities,...)
\item \texttt{dos.x} calculates electronic Density of States (DOS)
\item \texttt{epsilon.x} calculates RPA frequency-dependent complex dielectric function
\item \texttt{pw2wannier.x}: interface with Wannier90 package
\item \texttt{wannier\_ham.x}: generate a model Hamiltonian
in Wannier functions basis
\item \texttt{pmw.x} generates Poor Man's Wannier functions, to be used in
DFT+U calculations
\item \texttt{pw2casino.x}: interface with CASINO code for Quantum Monte Carlo
calculation \\
(\texttt{http://www.tcm.phy.cam.ac.uk/\~{}mdt26/casino.html}).
See the header of \texttt{PP/pw2casino.f90} for instructions on how to use it.
\end{itemize}
Note about Bader's analysis: on
\texttt{http://theory.cm.utexas.edu/bader/} one can find a software that performs
Bader's analysis starting from charge on a regular grid. The required
"cube" format can be produced by \qe\ using \texttt{pp.x} (info by G. Lapenna
who has successfully used this technique, but adds: ``Problems occur with polar
X-H bonds or in all cases where the zero-flux of density comes too close to
atoms described with pseudo-potentials"). This code should perform
decomposition into Voronoi polyhedra as well, in place of obsolete
code \texttt{voronoy.x} (removed from distribution since v.4.2).
The utility programs in \texttt{pwtools/} are:
\begin{itemize}
\item \texttt{dist.x} calculates distances and angles between atoms in a cell,
taking into account periodicity
\item \texttt{ev.x} fits energy-vs-volume data to an equation of state
\item \texttt{kpoints.x} produces lists of k-points
\item \texttt{pwi2xsf.sh}, \texttt{pwo2xsf.sh} process respectively input and output
files (not data files!) for \pw.x and produce an XSF-formatted file
suitable for plotting with XCrySDen, a powerful crystalline and
molecular structure visualization program
( \texttt{http://www.xcrysden.org/}). BEWARE: the \texttt{pwi2xsf.sh} shell script
requires the \texttt{pwi2xsf.x} executables to be located somewhere in your PATH.
\item \texttt{band\_plot.x}: undocumented and possibly obsolete
\item \texttt{bs.awk}, \texttt{mv.awk} are scripts that process the output of \pw.x (not
data files!). Usage:
\begin{verbatim}
awk -f bs.awk < my-pw-file > myfile.bs
awk -f mv.awk < my-pw-file > myfile.mv
\end{verbatim}
The files so produced are suitable for use with \texttt{xbs}, a very simple
X-windows utility to display molecules, available at:\\
\texttt{http://www.ccl.net/cca/software/X-WINDOW/xbsa/README.shtml}
\item \texttt{kvecs\_FS.x}, \texttt{bands\_FS.x}: utilities for Fermi Surface plotting
using XCrySDen
\end{itemize}
\paragraph{Other utilities}
\texttt{VdW/} contains the sources for the calculation of the finite (imaginary)
frequency molecular polarizability using the approximated Thomas-Fermi
+ von Weiz\"acker scheme, contributed by H.-V. Nguyen (Sissa and
Hanoi University). Compile with \texttt{make vdw}, executables in
\texttt{VdW/vdw.x}, no
documentation yet, but an example in \texttt{examples/example34}.
\subsection{Running examples}
\label{SubSec:Examples}
As a final check that compilation was successful, you may want to run some or
all of the examples. You should first of all ensure that you have downloaded
and correctly unpacked the package containing examples (since v.4.1 in a
separate package):
\begin{verbatim}
tar -zxvf /path/to/package/espresso-X.Y.Z-examples.tar.gz
\end{verbatim}
will unpack several subdirectories into \texttt{espresso-X.Y.Z/}.
There are two different types of examples:
\begin{itemize}
\item automated tests (in directories \texttt{tests/}
and \texttt{cptests/}). Quick and exhaustive, but not
meant to be realistic, implemented only for \pw.x and \cp.x.
\item examples (in directory \texttt{examples/}).
Cover many more programs and features of the \qe\ distribution,
but they require manual inspection of the results.
\end{itemize}
Let us first consider the tests. Automated tests for \pw.x\ are in directory
\texttt{tests/}. File \texttt{tests/README} contains a list of what is tested.
To run tests, follow the directions in the header if file
\texttt{check\_pw.x.j}, edit variables PARA\_PREFIX, PARA\_POSTFIX
if needed (see below). Same for \cp.x, this time in directory
\texttt{cptests/}.
Let us now consider examples. A list of examples and of what each example
does is contained in \texttt{examples/README}.
For details, see the \texttt{README} file in each example's directory.
If you find that any relevant feature isn't being tested, please contact us
(or even better, write and send us a new example yourself !).
To run the examples, you should follow this procedure:
\begin{enumerate}
\item Go to the \texttt{examples/} directory and edit the
\texttt{environment\_variables} file, setting the following variables as needed:
\begin{quote}
BIN\_DIR: directory where executables reside\\
PSEUDO\_DIR: directory where pseudopotential files reside\\
TMP\_DIR: directory to be used as temporary storage area
\end{quote}
The default values of BIN\_DIR and PSEUDO\_DIR should be fine,
unless you have installed things in nonstandard places. TMP\_DIR
must be a directory where you have read and write access to, with
enough available space to host the temporary files produced by the
example runs, and possibly offering high I/O performance (i.e., don't
use an NFS-mounted directory). NOTA BENE: do not use a
directory containing other data, the examples wil clean it!
\item If you have compiled the parallel version of \qe\ (this
is the default if parallel libraries are detected), you will usually
have to specify a driver program (such as \texttt{mpirun} or \texttt{mpiexec})
and the number of processors: see Sec.\ref{SubSec:para} for
details. In order to do that, edit again the \texttt{environment\_variables}
file
and set the PARA\_PREFIX and PARA\_POSTFIX variables as needed.
Parallel executables will be run by a command like this:
\begin{verbatim}
$PARA_PREFIX pw.x $PARA_POSTFIX < file.in > file.out
\end{verbatim}
For example, if the command line is like this (as for an IBM SP):
\begin{verbatim}
poe pw.x -procs 4 < file.in > file.out
\end{verbatim}
you should set PARA\_PREFIX="poe", PARA\_POSTFIX="-procs
4". Furthermore, if your machine does not support interactive use, you
must run the commands specified below through the batch queuing
system installed on that machine. Ask your system administrator for
instructions.
\item To run a single example, go to the corresponding directory (e.g.
\texttt{example/example01}) and execute:
\begin{verbatim}
./run_example
\end{verbatim}
This will create a subdirectory results, containing the input and
output files generated by the calculation. Some examples take only a
few seconds to run, while others may require several minutes depending
on your system. To run all the examples in one go, execute:
\begin{verbatim}
./run_all_examples
\end{verbatim}
from the examples directory. On a single-processor machine, this
typically takes a few hours. The \texttt{make\_clean} script cleans the
examples tree, by removing all the results subdirectories. However, if
additional subdirectories have been created, they aren't deleted.
\item In each example's directory, the \texttt{reference/} subdirectory contains
verified output files, that you can check your results against. They
were generated on a Linux PC using the Intel compiler. On different
architectures the precise numbers could be slightly different, in
particular if different FFT dimensions are automatically selected. For
this reason, a plain diff of your results against the reference data
doesn't work, or at least, it requires human inspection of the
results.
\end{enumerate}
\subsection{Installation tricks and problems}
\subsubsection{All architectures}
Working Fortran-95 and C compilers are needed in order
to compile \qe. Most ``Fortran-90'' compilers actually
implement the Fortran-95 standard, but older versions
may not be Fortran-95 compliant. Moreover,
C and Fortran compilers must be in your PATH.
If \configure\ says that you have no working compiler, well,
you have no working compiler, at least not in your PATH, and
not among those recognized by \configure.
If you get {\em Compiler Internal Error}' or similar messages: your
compiler version is buggy. Try to lower the optimization level, or to
remove optimization just for the routine that has problems. If it
doesn't work, or if you experience weird problems at run time, try to
install patches for your version of the compiler (most vendors release
at least a few patches for free), or to upgrade to a more recent
compiler version.
If you get error messages at the loading phase that look like
{\em file XYZ.o: unknown / not recognized/ invalid / wrong
file type / file format / module version},
one of the following things have happened:
\begin{enumerate}
\item you have leftover object files from a compilation with another
compiler: run \texttt{make clean} and recompile.
\item \make\ did not stop at the first compilation error (it may
happen in some software configurations). Remove the file *.o
that triggers the error message, recompile, look for a
compilation error.
\end{enumerate}
If many symbols are missing in the loading phase: you did not specify the
location of all needed libraries (LAPACK, BLAS, FFTW, machine-specific
optimized libraries), in the needed order.
If only symbols from \texttt{clib/} are missing, verify that
you have the correct C-to-Fortran bindings, defined in
\texttt{include/c\_defs.h}.
Note that \qe\ is self-contained (with the exception of MPI libraries for
parallel compilation): if system libraries are missing, the problem is in
your compiler/library combination or in their usage, not in \qe.
If you get mysterious errors in the provided tests and examples:
your compiler, or your mathematical libraries, or MPI libraries,
or a combination thereof, is very likely buggy. Although the
presence of subtle bugs in \qe\ that are not revealed during
the testing phase can never be ruled out, it is very unlikely
that this happens on the provided tests and examples.
\subsubsection{Cray XT machines}
Use \texttt{./configure ARCH=crayxt4} or else \configure will
not recognize the Cray-specific software environment. Older Cray
machines: T3D, T3E, X1, are no longer supported.
\subsubsection{IBM AIX}
On IBM machines with ESSL libraries installed, there is a
potential conflict between a few LAPACK routines that are also part of ESSL,
but with a different calling sequence. The appearance of run-time errors like {\em
ON ENTRY TO ZHPEV PARAMETER NUMBER 1 HAD AN ILLEGAL VALUE}
is a signal that you are calling the bad routine. If you have defined
\texttt{-D\_\_ESSL} you should load ESSL before LAPACK: see
variable LAPACK\_LIBS in make.sys.
\subsubsection{IBM BlueGene}
The current \configure\ is tested and works only on the machine at
J\"ulich. For other sites, you should try something like
\begin{verbatim}
./configure ARCH=ppc64-bg BLAS_LIBS=... LAPACK_LIBS=... \
SCALAPACK_DIR=... BLACS_DIR=..."
\end{verbatim}
where the various *\_LIBS and *\_DIR "suggest" where the various libraries
are located.
\subsubsection{Linux PC}
Both AMD and Intel CPUs, 32-bit and 64-bit, are supported and work,
either in 32-bit emulation and in 64-bit mode. 64-bit executables
can address a much larger memory space than 32-bit executable, but
there is no gain in speed.
Beware: the default integer type for 64-bit machine is typically
32-bit long. You should be able to use 64-bit integers as well,
but it will not give you any advantage and you may run into trouble.
Currently the following compilers are supported by \configure:
Intel (ifort), Portland (pgf90), g95, gfortran, Pathscale (pathf95),
Sun Studio (sunf95), AMD Open64 (openf95). The ordering approximately
reflects the quality of support. Both Intel MKL and AMD acml mathematical
libraries are supported. Some combinations of compilers and of libraries
may however require manual editing of \texttt{make.sys}.
It is usually convenient to create semi-statically linked executables (with only
libc, libm, libpthread dynamically linked). If you want to produce a binary
that runs on different machines, compile it on the oldest machine you have
(i.e. the one with the oldest version of the operating system).
If you get errors like {\em IPO Error: unresolved : \_\_svml\_cos2}
at the linking stage, your compiler is optimized to use the SSE
version of sine, cosine etc. contained in the SVML library. Append
\texttt{-lsvml} to the list of libraries in your \texttt{make.sys} file (info by Axel
Kohlmeyer, oct.2007).
\paragraph{Linux PCs with Portland compiler (pgf90)}
\qe\ does not work reliably, or not at all, with many old
versions ($< 6.1$) of the Portland Group compiler (pgf90).
Use the latest version of each
release of the compiler, with patches if available (see
the Portland Group web site, \texttt{http://www.pgroup.com/}).
\paragraph{Linux PCs with Pathscale compiler}
Version 2.99 of the Pathscale EKO compiler (web site
\texttt{http://www.pathscale.com/})
works and is recognized by
\configure, but the preprocessing command, \texttt{pathcc -E},
causes a mysterious error in compilation of iotk and should be replaced by
\begin{verbatim}
/lib/cpp -P --traditional
\end{verbatim}
The MVAPICH parallel environment with Pathscale compilers also works.
(info by Paolo Giannozzi, July 2008)
\paragraph{Linux PCs with gfortran}
gfortran v.4.1.2 and later are supported. Earlier gfortran versions used to produce nonfunctional phonon executables (segmentation faults and the like), but more recent versions should be fine.
If you experience problems in reading files produced by previous versions
of \qe: ``gfortran used 64-bit record markers to allow writing of records
larger than 2 GB. Before with 32-bit record markers only records $<$2GB
could be written. However, this caused problems with older files and
inter-compiler operability. This was solved in GCC 4.2 by using 32-bit
record markers but such that one can still store $>$2GB records (following
the implementation of Intel). Thus this issue should be gone. See 4.2
release notes (item ``Fortran") at
\texttt{http://gcc.gnu.org/gcc-4.2/changes.html}."
(Info by Tobias Burnus, March 2010).
``Using gfortran v.4.4 (after May 27, 2009) and 4.5 (after May 5, 2009) can
produce wrong results, unless the environment variable
GFORTRAN\_UNBUFFERED\_ALL=1 is set. Newer 4.4/4.5 versions
(later than April 2010) should be OK. See\\
\texttt{http://gcc.gnu.org/bugzilla/show\_bug.cgi?id=43551}."
(Info by Tobias Burnus, March 2010).
\paragraph{Linux PCs with g95}
g95 v.0.91 and later versions (\texttt{http://www.g95.org}) work.
The executables thet produce are however slower (let us say 20\% or so)
that those produced by gfortran, which in turn are slower
(by another 20\% or so) than those produced by ifort.
\paragraph{Linux PCs with Sun Studio compiler}
``The Sun Studio compiler, sunf95, is free (web site:
\texttt{http://developers.sun.com/sunstudio/} and comes
with a set of algebra libraries that can be used in place of the slow
built-in libraries. It also supports OpenMP, which g95 does not. On the
other hand, it is a pain to compile MPI with it. Furthermore the most
recent version has a terrible bug that totally miscompiles the iotk
input/output library (you'll have to compile it with reduced optimization).''
(info by Lorenzo Paulatto, March 2010).
\paragraph{Linux PCs with AMD Open64 suite}
The AMD Open64 compiler suite, openf95 (web site:
\texttt{http://developer.amd.com/cpu/open64/pages/default.aspx})
can be freely downloaded from the AMD site.
It is recognized by \configure\ but little tested. It sort of works
but it fails to pass several tests.
(info by Paolo Giannozzi, March 2010).
\paragraph{Linux PCs with Intel compiler (ifort)}
The Intel compiler, ifort, is available for free for personal
usage (\texttt{http://software.intel.com/}) It seem to produce the faster executables,
at least on Intel CPUs, but not all versions work as expected.
ifort versions $<9.1$ are not recommended, due to the presence of subtle
and insidious bugs. In case of trouble, update your version with
the most recent patches,
available via Intel Premier support (registration free of charge for Linux):
\texttt{http://software.intel.com/en-us/articles/intel-software-developer-support}.
If \configure\ doesn't find the compiler, or if you get
{\em Error loading shared libraries} at run time, you may have
forgotten to execute the script that
sets up the correct PATH and library path. Unless your system manager has
done this for you, you should execute the appropriate script -- located in
the directory containing the compiler executable -- in your
initialization files. Consult the documentation provided by Intel.
The warning: {\em feupdateenv is not implemented and will always fail},
showing up in recent versions, can be safely ignored.
Since each major release of ifort
differs a lot from the previous one. compiled objects from different
releases may be incompatible and should not be mixed.
{\bf ifort v.12}: release 12.0.0 miscompiles iotk, leading to
mysterious errors when reading data files. Workaround: increasing
the parameter BLOCKSIZE to e.g. 131072*1024 when opening files in
\texttt{iotk/src/iotk\_files.f90} seems to work. (info by Lorenzo Paulatto,
Nov. 2010)
{\bf ifort v.11}: Segmentation faults were reported for the combination
ifort 11.0.081, MKL 10.1.1.019, OpenMP 1.3.3. The problem disappeared
with ifort 11.1.056 and MKL 10.2.2.025 (Carlo Nervi, Oct. 2009).
{\bf ifort v.10}: On 64-bit AMD CPUs, at least some versions of ifort 10.1
miscompile subroutine \texttt{write\_rho\_xml} in
\texttt{Module/xml\_io\_base.f90} with -O2
optimization. Using -O1 instead solves the problem (info by Carlo
Cavazzoni, March 2008).
"The intel compiler version 10.1.008 miscompiles a lot of codes (I have proof
for CP2K and CPMD) and needs to be updated in any case" (info by Axel
Kohlmeyer, May 2008).
{\bf ifort v.9}: The latest (July 2006) 32-bit version of ifort 9.1
works. Earlier versions yielded {\em Compiler Internal Error}.
\paragraph{Linux PCs with MKL libraries}
On Intel CPUs it is very convenient to use Intel MKL libraries. They can be
also used for AMD CPU, selecting the appropriate machine-optimized
libraries, and also together with non-Intel compilers. Note however
that recent versions of MKL (10.2 and following) do not perform
well on AMD machines.
\configure\ should recognize properly installed MKL libraries.
By default the non-threaded version of MKL is linked, unless option
\texttt{configure --with-openmp} is specified. In case of trouble,
refer to the following web page to find the correct way to link MKL:\\
\texttt{http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/}.
MKL contains optimized FFT routines and a FFTW interface, to be separately
compiled. For 64-bit Intel Core2 processors, they are slightly faster than
FFTW (MKL v.10, FFTW v.3 fortran interface, reported by P. Giannozzi,
November 2008).
For parallel (MPI) execution on multiprocessor (SMP) machines, set the
environmental variable OMP\_NUM\_THREADS to 1 unless you know what you
are doing. See Sec.\ref{Sec:para} for more info on this
and on the difference between MPI and OpenMP parallelization.
\paragraph{Linux PCs with ACML libraries}
For AMD CPUs, especially recent ones, you may find convenient to
link AMD acml libraries (can be freely downloaded from AMD web site).
\configure\ should recognize properly installed acml libraries,
together with the compilers most frequently used on AMD systems:
pgf90, pathscale, openf95, sunf95.
\subsubsection{Linux PC clusters with MPI}
\label{SubSec:LinuxPCMPI}
PC clusters running some version of MPI are a very popular
computational platform nowadays. \qe\ is known to work
with at least two of the major MPI implementations (MPICH, LAM-MPI),
plus with the newer MPICH2 and OpenMPI implementation.
\configure\ should automatically recognize a properly installed
parallel environment and prepare for parallel compilation.
Unfortunately this not always happens. In fact:
\begin{itemize}
\item \configure\ tries to locate a parallel compiler in a logical
place with a logical name, but if it has a strange names or it is
located in a strange location, you will have to instruct \configure\
to find it. Note that in many PC clusters (Beowulf), there is no
parallel Fortran-95 compiler in default installations: you have to
configure an appropriate script, such as mpif90.
\item \configure\ tries to locate libraries (both mathematical and
parallel libraries) in the usual places with usual names, but if
they have strange names or strange locations, you will have to
rename/move them, or to instruct \configure\ to find them. If MPI
libraries are not found,
parallel compilation is disabled.
\item \configure\ tests that the compiler and the libraries are
compatible (i.e. the compiler may link the libraries without
conflicts and without missing symbols). If they aren't and the
compilation fail, \configure\ will revert to serial compilation.
\end{itemize}
Apart from such problems, \qe\ compiles and works on all non-buggy, properly
configured hardware and software combinations. You may have to
recompile MPI libraries: not all MPI installations contain support for
the fortran-90 compiler of your choice (or for any fortran-90 compiler
at all!). Useful step-by-step instructions for MPI compilation can be
found in the following post by Javier Antonio Montoya:\\
\texttt{http://www.democritos.it/pipermail/pw\_forum/2008April/008818.htm}.
If \qe\ does not work for some reason on a PC cluster,
try first if it works in serial execution. A frequent problem with parallel
execution is that \qe\ does not read from standard input,
due to the configuration of MPI libraries: see Sec.\ref{SubSec:para}.
If you are dissatisfied with the performances in parallel execution,
see Sec.\ref{Sec:para} and in particular Sec.\ref{SubSec:badpara}.
See also the following post from Axel Kohlmeyer:\\
\texttt{http://www.democritos.it/pipermail/pw\_forum/2008-April/008796.html}
\subsubsection{Intel Mac OS X}
Newer Mac OS-X machines (10.4 and later) with Intel CPUs are supported
by \configure,
with gcc4+g95, gfortran, and the Intel compiler ifort with MKL libraries.
Parallel compilation with OpenMPI also works.
\paragraph{Intel Mac OS X with ifort}
"Uninstall darwin ports, fink and developer tools. The presence of all of
those at the same time generates many spooky events in the compilation
procedure. I installed just the developer tools from apple, the intel
fortran compiler and everything went on great" (Info by Riccardo Sabatini,
Nov. 2007)
\paragraph{Intel Mac OS X 10.4 with g95 and gfortran}
An updated version of Developer Tools (XCode 2.4.1 or 2.5), that can be
downloaded from Apple, may be needed. Some tests fails with mysterious
errors, that disappear if
fortran BLAS are linked instead of system Atlas libraries. Use:
\begin{verbatim}
BLAS_LIBS_SWITCH = internal
BLAS_LIBS = /path/to/espresso/BLAS/blas.a -latlas
\end{verbatim}
(Info by Paolo Giannozzi, jan.2008, updated April 2010)
\paragraph{Detailed installation instructions for Mac OS X 10.6}
(Instructions for 10.6.3 by Osman Baris Malcioglu, tested as of May 2010)
Summary for the hasty:
GNU:
Install macports compilers,
Install MPI environment,
Configure \qe\ using
\begin{verbatim}
./configure CC=gcc-mp-4.3 CPP=cpp-mp-4.3 CXX=g++-mp-4.3 F77=g95 FC=g95
\end{verbatim}
Intel:
Use Version >11.1.088
Use 32 bit compilers
Install MPI environment,
install macports provided cpp (optional)
Configure \qe\ using
\begin{verbatim}
./configure CC=icc CXX=icpc F77=ifort F90=ifort FC=ifort CPP=cpp-mp-4.3
\end{verbatim}
Compilation with GNU compilers:
The following instructions use macports version of gnu compilers due to some
issues in mixing gnu supplied fortran compilers with apple modified gnu compiler
collection. For more information regarding macports please refer to:
\texttt{http://www.macports.org/}
First install necessary compilers from macports
\begin{verbatim}
port install gcc43
port install g95
\end{verbatim}
The apple supplied MPI environment has to be overridden since there is
a new set of compilers now (and Apple provided mpif90 is just an empty
placeholder since Apple does not provide fortran compilers). I have used
OpenMPI for this case. Recommended minimum configuration line is:
\begin{verbatim}
./configure CC=gcc-mp-4.3 CPP=cpp-mp-4.3 CXX=g++-mp-4.3 F77=g95 FC=g95
\end{verbatim}
of course, installation directory should be set accordingly if a multiple
compiler environment is desired. The default installation directory of
OpenMPI overwrites apple supplied MPI permanently!\\
Next step is \qe\ itself. Sadly, the Apple supplied optimized BLAS/LAPACK
libraries tend to misbehave under different tests, and it is much safer to
use internal libraries. The minimum recommended configuration
line is (presuming the environment is set correctly):
\begin{verbatim}
./configure CC=gcc-mp-4.3 CXX=g++-mp-4.3 F77=g95 F90=g95 FC=g95 CPP=cpp-mp-4.3 --with-internal-blas --with-internal-lapack
\end{verbatim}
Compilation with Intel compilers:
Newer versions of Intel compiler (>11.1.067) support Mac OS X 10.6, and furthermore they are
bundled with intel MKL. 32 bit binaries obtained using 11.1.088 are tested and no problems
have been encountered so far. Sadly, as of 11.1.088 the 64 bit binary misbehave
under some tests. Any attempt to compile 64 bit binary using <11.1.088 will result in
very strange compilation errors.
Like the previous section, I would recommend installing macports compiler suite.
First, make sure that you are using the 32 bit version of the compilers,
i.e.
\begin{verbatim}
. /opt/intel/Compiler/11.1/088/bin/ifortvars.sh ia32
\end{verbatim}
\begin{verbatim}
. /opt/intel/Compiler/11.1/088/bin/iccvars.sh ia32
\end{verbatim}
will set the environment for 32 bit compilation in my case.
Then, the MPI environment has to be set up for Intel compilers similar to previous
section.
The recommended configuration line for \qe\ is:
\begin{verbatim}
./configure CC=icc CXX=icpc F77=ifort F90=ifort FC=ifort CPP=cpp-mp-4.3
\end{verbatim}
MKL libraries will be detected automatically if they are in their default locations.
Otherwise, mklvars32 has to be sourced before the configuration script.
Security issues:
MacOs 10.6 comes with a disabled firewall. Preparing a ipfw based firewall is recommended.
Open source and free GUIs such as "WaterRoof" and "NoobProof" are available that may help
you in the process.
\subsubsection{SGI, Alpha}
SGI Mips machines (e.g. Origin) and HP-Compaq Alpha machines are
no longer supported since v.4.2.
\newpage
\section{Parallelism}
\label{Sec:para}
\subsection{Understanding Parallelism}
Two different parallelization paradigms are currently implemented
in \qe:
\begin{enumerate}
\item {\em Message-Passing (MPI)}. A copy of the executable runs
on each CPU; each copy lives in a different world, with its own
private set of data, and communicates with other executables only
via calls to MPI libraries. MPI parallelization requires compilation
for parallel execution, linking with MPI libraries, execution using
a launcher program (depending upon the specific machine). The number of CPUs used
is specified at run-time either as an option to the launcher or
by the batch queue system.
\item {\em OpenMP}. A single executable spawn subprocesses
(threads) that perform in parallel specific tasks.
OpenMP can be implemented via compiler directives ({\em explicit}
OpenMP) or via {\em multithreading} libraries ({\em library} OpenMP).
Explicit OpenMP require compilation for OpenMP execution;
library OpenMP requires only linking to a multithreading
version of mathematical libraries, e.g.:
ESSLSMP, ACML\_MP, MKL (the latter is natively multithreading).
The number of threads is specified at run-time in the environment
variable OMP\_NUM\_THREADS.
\end{enumerate}
MPI is the well-established, general-purpose parallelization.
In \qe\ several parallelization levels, specified at run-time
via command-line options to the executable, are implemented
with MPI. This is your first choice for execution on a parallel
machine.
Library OpenMP is a low-effort parallelization suitable for
multicore CPUs. Its effectiveness relies upon the quality of
the multithreading libraries and the availability of
multithreading FFTs. If you are using MKL,\footnote{Beware:
MKL v.10.2.2 has a buggy \texttt{dsyev} yielding wrong results
with more than one thread; fixed in v.10.2.4}
you may want to select FFTW3 (set \texttt{CPPFLAGS=-D\_\_FFTW3...}
in \texttt{make.sys}) and to link with the MKL interface to FFTW3.
You will get a decent speedup ($\sim 25$\%) on two cores.
Explicit OpenMP is a very recent addition, still at an
experimental stage, devised to increase scalability on
large multicore parallel machines. Explicit OpenMP is
devised to be run together with MPI and also together
with multithreaded libraries. BEWARE: you have to be VERY
careful to prevent conflicts between the various kinds of
parallelization. If you don't know how to run MPI processes
and OpenMP threads in a controlled manner, forget about mixed
OpenMP-MPI parallelization.
\subsection{Running on parallel machines}
\label{SubSec:para}
Parallel execution is strongly system- and installation-dependent.
Typically one has to specify:
\begin{enumerate}
\item a launcher program (not always needed),
such as \texttt{poe}, \texttt{mpirun}, \texttt{mpiexec},
with the appropriate options (if any);
\item the number of processors, typically as an option to the launcher
program, but in some cases to be specified after the name of the
program to be
executed;
\item the program to be executed, with the proper path if needed: for
instance, \pw.x, or \texttt{./pw.x}, or \texttt{\$HOME/bin/pw.x}, or
whatever applies;
\item other \qe-specific parallelization options, to be
read and interpreted by the running code:
\begin{itemize}
\item the number of ``images'' used by NEB or phonon calculations;
\item the number of ``pools'' into which processors are to be grouped
(\pw.x only);
\item the number of ``task groups'' into which processors are to be
grouped;
\item the number of processors performing iterative diagonalization
(for \pw.x) or orthonormalization (for \cp.x).
\end{itemize}
\end{enumerate}
Items 1) and 2) are machine- and installation-dependent, and may be
different for interactive and batch execution. Note that large
parallel machines are often configured so as to disallow interactive
execution: if in doubt, ask your system administrator.
Item 3) also depend on your specific configuration (shell, execution
path, etc).
Item 4) is optional but may be important: see the following section
for the meaning of the various options.
For illustration, here is how to run \pw.x on 16 processors partitioned into
8 pools (2 processors each), for several typical cases.
IBM SP machines, batch:
\begin{verbatim}
pw.x -npool 8 < input
\end{verbatim}
This should also work interactively, with environment variables NPROC
set to 16, MP\_HOSTFILE set to the file containing a list of processors.
IBM SP machines, interactive, using \texttt{poe}:
\begin{verbatim}
poe pw.x -procs 16 -npool 8 < input
\end{verbatim}
PC clusters using \texttt{mpiexec}:
\begin{verbatim}
mpiexec -n 16 pw.x -npool 8 < input
\end{verbatim}
SGI Altix and PC clusters using \texttt{mpirun}:
\begin{verbatim} mpirun -np 16 pw.x -npool 8 < input
\end{verbatim}
IBM BlueGene using \texttt{mpirun}:
\begin{verbatim}
mpirun -np 16 -exe /path/to/executable/pw.x -args "-npool 8" \
-in /path/to/input -cwd /path/to/work/directory
\end{verbatim}
If you want to run in parallel the examples distributed with \qe\
(see Sec.\ref{SubSec:Examples}), set PARA\_PREFIX to everything
before the executable (\pw.x in the above examples),
PARA\_POSTFIX to what follows it until the first redirection sign
($<, >, |,..$), if any. For execution using OpenMP on N threads,
set PARA\_PREFIX to \texttt{env OMP\_NUM\_THREADS=N}.
\subsection{Parallelization levels}
Data structures are distributed across processors.
Processors are organized in a hierarchy of groups,
which are identified by different MPI communicators level.
The groups hierarchy is as follow:
\begin{verbatim}
/ pools _ task groups
world _ images
\ linear-algebra groups
\end{verbatim}
{\bf world}: is the group of all processors (MPI\_COMM\_WORLD).
{\bf images}: Processors can then be divided into different "images",
corresponding to a point in configuration space (i.e. to
a different set of atomic positions) for NEB calculations;
to one (or more than one) "irrep" or wave-vector in phonon
calculations.
{\bf pools}: When k-point sampling is used, each image group can be
subpartitioned into "pools", and k-points can distributed to pools.
Within each pool, reciprocal space basis set (PWs)
and real-space grids are distributed across processors.
This is usually referred to as "PW parallelization".
All linear-algebra operations on array of PW /
real-space grids are automatically and effectively parallelized.
3D FFT is used to transform electronic wave functions from
reciprocal to real space and vice versa. The 3D FFT is
parallelized by distributing planes of the 3D grid in real
space to processors (in reciprocal space, it is columns of
G-vectors that are distributed to processors).
{\bf task groups}:
In order to allow good parallelization of the 3D FFT when
the number of processors exceeds the number of FFT planes,
data can be redistributed to "task groups" so that each group
can process several wavefunctions at the same time.
{\bf linear-algebra group}:
A further level of parallelization, independent on
PW or k-point parallelization, is the parallelization of
subspace diagonalization (\pw.x) or iterative orthonormalization
(\cp.x). Both operations required the diagonalization of
arrays whose dimension is the number of Kohn-Sham states
(or a small multiple). All such arrays are distributed block-like
across the ``linear-algebra group'', a subgroup of the pool of processors,
organized in a square 2D grid. As a consequence the number of processors
in the linear-algebra group is given by $n^2$, where $n$ is an integer;
$n^2$ must be smaller than the number of processors of a single pool.
The diagonalization is then performed
in parallel using standard linear algebra operations.
(This diagonalization is used by, but should not be confused with,
the iterative Davidson algorithm). One can choose to compile
ScaLAPACK if available, internal built-in algorithms otherwise.
{\bf Communications}:
Images and pools are loosely coupled and processors communicate
between different images and pools only once in a while, whereas
processors within each pool are tightly coupled and communications
are significant. This means that Gigabit ethernet (typical for
cheap PC clusters) is ok up to 4-8 processors per pool, but {\em fast}
communication hardware (e.g. Mirynet or comparable) is absolutely
needed beyond 8 processors per pool.
{\bf Choosing parameters}:
To control the number of processors in each group,
command line switches: \texttt{-nimage}, \texttt{-npools},
\texttt{-ntg}, \texttt{northo} (for \cp.x) or \texttt{-ndiag}
(for \pw.x) are used.
As an example consider the following command line:
\begin{verbatim}
mpirun -np 4096 ./pw.x -nimage 8 -npool 2 -ntg 8 -ndiag 144 -input my.input
\end{verbatim}
This executes \PWscf\ on 4096 processors, to simulate a system
with 8 images, each of which is distributed across 512 processors.
k-points are distributed across 2 pools of 256 processors each,
3D FFT is performed using 8 task groups (64 processors each, so
the 3D real-space grid is cut into 64 slices), and the diagonalization
of the subspace Hamiltonian is distributed to a square grid of 144
processors (12x12).
Default values are: \texttt{-nimage 1 -npool 1 -ntg 1} ;
\texttt{ndiag} is set to 1 if ScaLAPACK is not compiled,
it is set to the square integer smaller than or equal to half the number
of processors of each pool.
\paragraph{Massively parallel calculations}
For very large jobs (i.e. O(1000) atoms or so) or for very long jobs
to be run on massively parallel machines (e.g. IBM BlueGene) it is
crucial to use in an effective way both the "task group" and the
"linear-algebra" parallelization. Without a judicious choice of
parameters, large jobs will find a stumbling block in either memory or
CPU requirements. In particular, the linear-algebra parallelization is
used in the diagonalization of matrices in the subspace of Kohn-Sham
states (whose dimension is as a strict minumum equal to the number of
occupied states). These are stored as block-distributed matrices
(distributed across processors) and diagonalized using custom-tailored
diagonalization algorithms that work on block-distributed matrices.
Since v.4.1, ScaLAPACK can be used to diagonalize block distributed
matrices, yielding better speed-up than the default algorithms for
large ($ > 1000$) matrices, when using a large number of processors
($> 512$). If you want to test ScaLAPACK,
use \texttt{configure --with-scalapack}. This
will add
\texttt{-D\_\_SCALAPACK} to DFLAGS in \texttt{make.sys} and set LAPACK\_LIBS to something
like:
\begin{verbatim}
LAPACK_LIBS = -lscalapack -lblacs -lblacsF77init -lblacs -llapack
\end{verbatim}
The repeated \texttt{-lblacs} is not an error, it is needed! If \configure\ does not recognize
ScaLAPACK, inquire with your system manager
on the correct way to link them.
A further possibility to expand scalability, especially on machines
like IBM BlueGene, is to use mixed MPI-OpenMP. The idea is to have
one (or more) MPI process(es) per multicore node, with OpenMP
parallelization inside a same node. This option is activated by \texttt{configure --with-openmp},
which adds preprocessing flag -D\_\_OPENMP
and one of the following compiler options:
\begin{quote}
ifort: \texttt{-openmp}\\
xlf: \texttt{-qsmp=omp}\\
PGI: \texttt{-mp}\\
ftn: \texttt{-mp=nonuma}
\end{quote}
OpenMP parallelization is currently implemented and tested for the following combinations of FFTs
and libraries:
\begin{quote}
internal FFTW copy: \texttt{-D\_\_FFTW}\\
ESSL: \texttt{-D\_\_ESSL} or \texttt{-D\_\_LINUX\_ESSL}, link
with \texttt{-lesslsmp}\\
ACML: \texttt{-D\_\_ACML}, link with \texttt{-lacml\_mp}.
\end{quote}
Currently, ESSL (when available) are faster than internal FFTW,
which in turn are faster than ACML.
\subsubsection{Understanding parallel I/O}
In parallel execution, each processor has its own slice of wavefunctions,
to be written to temporary files during the calculation. The way wavefunctions
are written by \pw.x\ is governed by variable \texttt{wf\_collect},
in namelist \&CONTROL
If \texttt{wf\_collect=.true.}, the final wavefunctions are collected into a single
directory, written by a single processor, whose format is independent on
the number of processors. If \texttt{wf\_collect=.false.} (default) each processor
writes its own slice of the final
wavefunctions to disk in the internal format used by \PWscf.
The former case requires more
disk I/O and disk space, but produces portable data files; the latter case
requires less I/O and disk space, but the data so produced can be read only
by a job running on the same number of processors and pools, and if
all files are on a file system that is visible to all processors
(i.e., you cannot use local scratch directories: there is presently no
way to ensure that the distribution of processes on processors will
follow the same pattern for different jobs).
\cp.x\ instead always collects the final wavefunctions into a single directory.
Files written by \pw.x\ can be read by \cp.x\ only if \texttt{wf\_collect=.true.} (and if
produced for $k=0$ case).
The directory for data is specified in input variables
\texttt{outdir} and \texttt{prefix} (the former can be specified
as well in environment variable ESPRESSO\_TMPDIR):
\texttt{outdir/prefix.save}. A copy of pseudopotential files
is also written there. If some processor cannot access the
data directory, the pseudopotential files are read instead
from the pseudopotential directory specified in input data.
Unpredictable results may follow if those files
are not the same as those in the data directory!
{\em IMPORTANT:}
Avoid I/O to network-mounted disks (via NFS) as much as you can!
Ideally the scratch directory \texttt{outdir} should be a modern
Parallel File System. If you do not have any, you can use local
scratch disks (i.e. each node is physically connected to a disk
and writes to it) but you may run into trouble anyway if you
need to access your files that are scattered in an unpredictable
way across disks residing on different nodes.
You can use input variable \texttt{disk\_io='minimal'}, or even
\texttt{'none'}, if you run
into trouble (or into angry system managers) with excessive I/O with \pw.x.
The code will store wavefunctions into RAM during the calculation.
Note however that this will increase your memory usage and may limit
or prevent restarting from interrupted runs.
\paragraph{Cray XT3}
On the cray xt3 there is a special hack to keep files in
memory instead of writing them without changes to the code.
You have to do a:
module load iobuf
before compiling and then add liobuf at link time.
If you run a job you set the environment variable
IOBUF\_PARAMS to proper numbers and you can gain a lot.
Here is one example:
\begin{verbatim}
env IOBUF_PARAMS='*.wfc*:noflush:count=1:size=15M:verbose,\
*.dat:count=2:size=50M:lazyflush:lazyclose:verbose,\
*.UPF*.xml:count=8:size=8M:verbose' pbsyod =\
\~{}/pwscf/pwscfcvs/bin/pw.x npool 4 in si64pw2x2x2.inp > & \
si64pw2x2x232moreiobuf.out &
\end{verbatim}
This will ignore all flushes on the *wfc* (scratch files) using a
single i/o buffer large enough to contain the whole file ($\sim 12$ Mb here).
this way they are actually never(!) written to disk.
The *.dat files are part of the restart, so needed, but you can be
'lazy' since they are writeonly. .xml files have a lot of accesses
(due to iotk), but with a few rather small buffers, this can be
handled as well. You have to pay attention not to make the buffers
too large, if the code needs a lot of memory, too and in this example
there is a lot of room for improvement. After you have tuned those
parameters, you can remove the 'verboses' and enjoy the fast execution.
Apart from the i/o issues the cray xt3 is a really nice and fast machine.
(Info by Axel Kohlmeyer, maybe obsolete)
\subsection{Tricks and problems}
\paragraph{Trouble with input files}
Some implementations of the MPI library have problems with input
redirection in parallel. This typically shows up under the form of
mysterious errors when reading data. If this happens, use the option
\texttt{-in} (or \texttt{-inp} or \texttt{-input}), followed by the input file name.
Example:
\begin{verbatim}
pw.x -in inputfile npool 4 > outputfile
\end{verbatim}
Of course the
input file must be accessible by the processor that must read it
(only one processor reads the input file and subsequently broadcasts
its contents to all other processors).
Apparently the LSF implementation of MPI libraries manages to ignore or to
confuse even the \texttt{-in/inp/input} mechanism that is present in all
\qe\ codes. In this case, use the \texttt{-i} option of \texttt{mpirun.lsf}
to provide an input file.
\paragraph{Trouble with MKL and MPI parallelization}
If you notice very bad parallel performances with MPI and MKL libraries,
it is very likely that the OpenMP parallelization performed by the latter
is colliding with MPI. Recent versions of MKL enable autoparallelization
by default on multicore machines. You must set the environmental variable
OMP\_NUM\_THREADS to 1 to disable it.
Note that if for some reason the correct setting of variable
OMP\_NUM\_THREADS
does not propagate to all processors, you may equally run into trouble.
Lorenzo Paulatto (Nov. 2008) suggests to use the \texttt{-x} option to \texttt{mpirun} to
propagate OMP\_NUM\_THREADS to all processors.
Axel Kohlmeyer suggests the following (April 2008):
"(I've) found that Intel is now turning on multithreading without any
warning and that is for example why their FFT seems faster than
FFTW. For serial and OpenMP based runs this makes no difference (in
fact the multi-threaded FFT helps), but if you run MPI locally, you
actually lose performance. Also if you use the 'numactl' tool on linux
to bind a job to a specific cpu core, MKL will still try to use all
available cores (and slow down badly). The cleanest way of avoiding
this mess is to either link with
\begin{quote}
\texttt{-lmkl\_intel\_lp64 -lmkl\_sequential -lmkl\_core} (on 64-bit:
x86\_64, ia64)\\
\texttt{-lmkl\_intel -lmkl\_sequential -lmkl\_core} (on 32-bit, i.e. ia32 )
\end{quote}
or edit the \texttt{libmkl\_'platform'.a} file. I'm using now a file
\texttt{libmkl10.a} with:
\begin{verbatim}
GROUP (libmkl_intel_lp64.a libmkl_sequential.a libmkl_core.a)
\end{verbatim}
It works like a charm". UPDATE: Since v.4.2, \configure\ links by
default MKL without multithreaded support.
\paragraph{Trouble with compilers and MPI libraries}
Many users of \qe, in particular those working on PC clusters,
have to rely on themselves (or on less-than-adequate system managers) for
the correct configuration of software for parallel execution. Mysterious and
irreproducible crashes in parallel execution are sometimes due to bugs
in \qe, but more often than not are a consequence of buggy
compilers or of buggy or miscompiled MPI libraries. Very useful step-by-step
instructions to compile and install MPI libraries
can be found in the following post by Javier Antonio Montoya:\\
\texttt{http://www.democritos.it/pipermail/pw\_forum/2008-April/008818.htm}.
On a Xeon quadriprocessor cluster, erratic crashes in parallel
execution have been reported, apparently correlated with ifort 10.1
(info by Nathalie Vast and Jelena Sjakste, May 2008).
\newpage
\section{Using \qe}
Input files for \PWscf\ codes may be either written by hand
or produced via the \texttt{PWgui} graphical interface by Anton Kokalj,
included in the \qe\ distribution. See \texttt{PWgui-x.y.z/INSTALL}
(where x.y.z is the version number) for more info on \texttt{PWgui},
or \texttt{GUI/README} if you are using CVS sources.
You may take the examples distributed with \qe\ as
templates for writing your own input files: see Sec.\ref{SubSec:Examples}.
In the following, whenever we mention "Example N", we refer to those.
Input files are those in the \texttt{results/} subdirectories, with names ending
with \texttt{.in}
(they will appear after you have run the examples).
\subsection{Input data}
Input data for the basic codes of the \qe\ distribution, \pw.x\ and \c.x,
is organized as several namelists, followed by other fields
introduced by keywords. The namelists are
\begin{tabular}{ll}
\&CONTROL:& general variables controlling the run\\
\&SYSTEM: &structural information on the system under investigation\\
\&ELECTRONS: &electronic variables: self-consistency, smearing\\
\&IONS (optional): &ionic variables: relaxation, dynamics\\
\&CELL (optional): &variable-cell dynamics\\
\&EE (optional): &for density counter charge electrostatic corrections
\end{tabular} \\
Optional namelist may be omitted if the calculation to be performed
does not require them. This depends on the value of variable calculation
in namelist \&CONTROL. Most variables in namelists have default values. Only
the following variables in \&SYSTEM must always be specified:
\begin{tabular}{lll}
\texttt{ibrav} & (integer)& Bravais-lattice index\\
\texttt{celldm} &(real, dimension 6)& crystallographic constants\\
\texttt{nat} &(integer)& number of atoms in the unit cell\\
\texttt{ntyp} &(integer)& number of types of atoms in the unit cell\\
\texttt{ecutwfc} &(real)& kinetic energy cutoff (Ry) for wavefunctions.
\end{tabular} \\
For metallic systems, you have to specify how metallicity is treated
in
variable \texttt{occupations}. If you choose \texttt{occupations='smearing'},
you have
to specify the smearing width \texttt{degauss} and optionally the smearing
type
\texttt{smearing}. Spin-polarized systems must be treated as metallic system, except the
special case of a single k-point, for which occupation numbers can be fixed
(\texttt{occupations='from input'} and card OCCUPATIONS).
Explanations for the meaning of variables \texttt{ibrav} and \texttt{celldm},
as well as on alternative ways to input structural data,
are in files \texttt{Doc/INPUT\_PW.*} (for \pw.x) and \texttt{Doc/INPUT\_CP.*}
(for \cp.x). These files are the reference for input data and describe
a large number of other variables as well. Almopst all variables have default
values, which may or may not fit your needs.
After the namelists, you have several fields (``cards'')
introduced by keywords with self-explanatory names:
\begin{quote}
ATOMIC\_SPECIES\\
ATOMIC\_POSITIONS\\
K\_POINTS\\
CELL\_PARAMETERS (optional)\\
OCCUPATIONS (optional)\\
CLIMBING\_IMAGES (optional)
\end{quote}
The keywords may be followed on the same line by an option. Unknown
fields (including some that are specific to \CP) are ignored by
\PWscf (and vice versa, \CP\ ignores \PWscf-specific fields).
See the files mentioned above for details on the available ``cards''.
Note about k points: The k-point grid can be either automatically generated
or manually provided as a list of k-points and a weight in the Irreducible
Brillouin Zone only of the Bravais lattice of the crystal. The code will
generate (unless instructed not to do so: see variable \texttt{nosym}) all
required k-point
and weights if the symmetry of the system is lower than the symmetry of the
Bravais lattice. The automatic generation of k-points follows the convention
of Monkhorst and Pack.
\subsection{Data files}
The output data files are written in the directory specified by variable
\texttt{outdir}, with names specified by variable \texttt{prefix} (a string that is prepended
to all file names, whose default value is: \texttt{prefix='pwscf'}). The \texttt{iotk}
toolkit is used to write the file in a XML format, whose definition can
be found in the Developer Manual. In order to use the data directory
on a different machine, you need to convert the binary files to formatted
and back, using the \texttt{bin/iotk} script.
The execution stops if you create a file \texttt{prefix.EXIT} in the working
directory. NOTA BENE: this is the directory where the program
is executed, NOT the directory \texttt{outdir} defined in input, where files
are written. Note that with some versions of MPI, the working directory
is the directory where the \pw.x\ executable is! The advantage of this
procedure is that all files are properly closed, whereas just killing
the process may leave data and output files in unusable state.
\subsection{Format of arrays containing charge density, potential, etc.}
The index of arrays used to store functions defined on 3D meshes is
actually a shorthand for three indices, following the FORTRAN convention
("leftmost index runs faster"). An example will explain this better.
Suppose you have a 3D array \texttt{psi(nr1x,nr2x,nr3x)}. FORTRAN
compilers store this array sequentially in the computer RAM in the following way:
\begin{verbatim}
psi( 1, 1, 1)
psi( 2, 1, 1)
...
psi(nr1x, 1, 1)
psi( 1, 2, 1)
psi( 2, 2, 1)
...
psi(nr1x, 2, 1)
...
...
psi(nr1x,nr2x, 1)
...
psi(nr1x,nr2x,nr3x)
etc
\end{verbatim}
Let \texttt{ind} be the position of the \texttt{(i,j,k)} element in the above list:
the following relation
\begin{verbatim}
ind = i + (j - 1) * nr1x + (k - 1) * nr2x * nr1x
\end{verbatim}
holds. This should clarify the relation between 1D and 3D indexing. In real
space, the \texttt{(i,j,k)} point of the FFT grid with dimensions
\texttt{nr1} ($\le$\texttt{nr1x}),
\texttt{nr2} ($\le$\texttt{nr2x}), , \texttt{nr3} ($\le$\texttt{nr3x}), is
$$
r_{ijk}=\frac{i-1}{nr1} \tau_1 + \frac{j-1}{nr2} \tau_2 +
\frac{k-1}{nr3} \tau_3
$$
where the $\tau_i$ are the basis vectors of the Bravais lattice.
The latter are stored row-wise in the \texttt{at} array:
$\tau_1 = $ \texttt{at(:, 1)},
$\tau_2 = $ \texttt{at(:, 2)},
$\tau_3 = $ \texttt{at(:, 3)}.
The distinction between the dimensions of the FFT grid,
\texttt{(nr1,nr2,nr3)} and the physical dimensions of the array,
\texttt{(nr1x,nr2x,nr3x)} is done only because it is computationally
convenient in some cases that the two sets are not the same.
In particular, it is often convenient to have \texttt{nrx1}=\texttt{nr1}+1
to reduce memory conflicts.
\subsection{Pseudopotential files}
\label{SubSec:pseudo}
Pseudopotential files for tests and examples are found in the
\texttt{pseudo/}
subdirectory. A much larger set of PP's can be found under
the "pseudo'' link of the web site. \qe\ uses a unified
pseudopotential format (UPF) for all types of pseudopotentials,
but still accepts a number of older formats. If you do not find
what you need, you may
\begin{itemize}
\item Convert pseudopootentials written in a different format,
using the converters listed in \texttt{upftools/UPF} (compile with
\texttt{make upf}).
\item Generate it, using \texttt{atomic}. See the documentation in
\texttt{atomic\_doc/} and in particular the library of input files
in \texttt{pseudo\_library/}.
\item Generate it, using other packages:
\begin{itemize}
\item David Vanderbilt's code (UltraSoft and Norm-Conserving)
\item OPIUM (Norm-Conserving)
\item The Fritz Haber code (Norm-Conserving)
\end{itemize}
The first two codes produce pseudopotentials in one of the
supported formats; the third, in a format that can be converted
to UPF.
\end{itemize}
Remember: {\em always} test the pseudopotentials on simple test
systems before proceeding to serious calculations.
Note that the type of XC used in the calculation is read from
pseudopotential files. As a rule, you should use only
pseudopotentials that have been generated using the same
XC that you are using in your simulation. You can override
this restriction by setting input variable \texttt{input\_dft}. The list of
allowed XC functionals and of their acronyms can be found in
\texttt{Modules/funct.f90}.
More documentation on pseudopotentials and on the UPF format
can be found in the wiki.
\section{Using \PWscf}
Code \pw.x\ performs various kinds of electronic and ionic structure
calculations.
We may distinguish the following typical cases of usage for \pw.x:
2
\subsection{Electronic structure calculations}
\paragraph{Single-point (fixed-ion) SCF calculation}
Set \texttt{calculation='scf'} (this is actually the default).
Namelists \&IONS and \&CELL will be ignored. See Example 01.
\paragraph{Band structure calculation}
First perform a SCF calculation as above;
then do a non-SCF calculation with the desired k-point grid and
number \texttt{nbnd} of bands.
Specify \texttt{calculation='bands'} if you are interested in calculating
only the Kohn-Sham states for the given set of k-points; specify
\texttt{calculation='nscf'} if you are interested in further processing
of the results of non-SCF calculations (for instance, in DOS calculations).
In the latter case, you should specio a uniform grid of points.
For DOS calculations you should choose \texttt{occupations='tetrahedra'},
together with an automatically generated uniform k-point grid
(card K\_POINTS with option ``automatic'').
Specify \texttt{nosym=.true.} to avoid generation of additional k-points in
low symmetry cases. Variables \texttt{prefix} and \texttt{outdir}, which determine
the names of input or output files, should be the same in the two runs.
See Examples 01, 05, 08,
NOTA BENE: until v.4.0, atomic positions for a non scf calculations
were read from input, while the scf potential was read from the data file
of the scf calculation. Since v.4.1, both atomic positions and the scf
potential are read from the data file so that consistency is guaranteed.
\paragraph{Noncollinear magnetization, spin-orbit interactions}
The following input variables are relevant for noncollinear and
spin-orbit calculations:
\begin{quote}
\texttt{noncolin}\\
\texttt{lspinorb}\\
\texttt{starting\_magnetization} (one for each type of atoms)
\end{quote}
To make a spin-orbit calculation \texttt{noncolin} must be true.
If \texttt{starting\_magnetization} is set to zero (or not given)
the code makes a spin-orbit calculation without spin magnetization
(it assumes that time reversal symmetry holds and it does not calculate
the magnetization). The states are still two-component spinors but the
total magnetization is zero.
If \texttt{starting\_magnetization} is different from zero, it makes a non
collinear spin polarized calculation with spin-orbit interaction. The
final spin magnetization might be zero or different from zero depending
on the system.
Furthermore to make a spin-orbit calculation you must use fully
relativistic pseudopotentials at least for the atoms in which you
think that spin-orbit interaction is large. If all the pseudopotentials
are scalar
relativistic the calculation becomes equivalent to a noncolinear
calculation without spin orbit. (Andrea Dal Corso, 2007-07-27)
See Example 13 for non-collinear magnetism, Example 22
for spin-orbit interactions.
\paragraph{DFT+U}
DFT+U (formerly known as LDA+U) calculation can be
performed within a simplified rotationally invariant form
of the $U$ Hubbard correction. See Example 25 and references
quoted therein.
\paragraph{Dispersion Interactions (DFT-D)}
For DFT-D (DFT + semiempirical dispersion interactions), see the
description of input variables \texttt{london*}, sample files
\texttt{tests/vdw.*}, and the comments in source file
\texttt{Modules/mm\_dispersion.f90}.
\paragraph{Hartree-Fock and Hybrid functionals}
Calculations in the Hartree-Fock approximation, or using hybrid XC functionals
that include some Hartree-Fock exchange, currently require that
\texttt{-DEXX} is added to the preprocessing options \texttt{DFLAGS} in file
\texttt{make.sys} before compilation (if you change this after the first
compilation, \texttt{make clean}, recompile).
Documentation on usage can be found in subdirectory
\texttt{examples/EXX\_example/}.
The algorithm is quite standard: see for instance Chawla and Voth,
JCP {bf 108}, 4697 (1998); Sorouri, Foulkes and Hine, JCP {\bf 124},
064105 (2006); Spencer and Alavi, PRB {\bf 77}, 193110 (2008).
Basically, one generates auxiliary densities $\rho_{-q}=\phi^{*}_{k+q}*\psi_k$
in real space and transforms them to reciprocal space using FFT;
the Poisson equation is solved and the resulting potential is transformed
back to real space using FFT, then multiplied by $\phi_{k+q}$ and the
results are accumulated.
The only tricky point is the treatment of the $q\rightarrow 0$ limit,
which is described in the Appendix A.5 of the \qe\ paper mentioned
in the Introduction (note the reference to the Gygi and Baldereschi paper).
See also J. Comp. Chem. {\bf 29}, 2098 (2008);
JACS {\bf 129}, 10402 (2007) for examples of applications.
\paragraph{Polarization via Berry Phase}
See Example 10, file \texttt{example10/README}, and the documentation
in the header of \texttt{PW/bp\_c\_phase.f90}.
\paragraph{Finite electric fields}
There are two different implementations of macroscopic electric fields
in \pw.x: via an external sawtooth potential (input variable
\texttt{tefield=.true.}) and via the modern theory of polarizability
(\texttt{lelfield=.true.}).
The former is useful for surfaces, especially in conjunction
with dipolar corrections (\texttt{dipfield=.true.}):
see \texttt{examples/dipole\_example} for an example of application.
Electric fields via modern theory of polarization are documented in
example 31. The exact meaning of the related variables, for both
cases, is explained in the general input documentation.
\subsection{Optimization and dynamics}
\paragraph{Structural optimization}
For fixed-cell optimization, specify \texttt{calculation='relax'} and
add namelist \&IONS. All options for a single SCF calculation apply,
plus a few others. You
may follow a structural optimization with a non-SCF band-structure
calculation (since v.4.1, you do not need any longer to update the
atomic positions in the input file for non scf calculation).\\
See Example 03.
\paragraph{Molecular Dynamics}
Specify \texttt{calculation='md'}, the time step \texttt{dt}, and possibly the number of MD stops \texttt{nstep}.
Use variable \texttt{ion\_dynamics} in namelist \&IONS for a fine-grained control
of the kind of dynamics. Other options for setting the initial
temperature and for thermalization using velocity rescaling are
available. Remember: this is MD on the electronic ground state, not
Car-Parrinello MD.
See Example 04.
\paragraph{Variable-cell molecular dynamics}
"A common mistake many new users make is to set the time step \texttt{dt}
improperly to the same order of magnitude as for CP algorithm, or
not setting \texttt{dt} at all. This will produce a ``not evolving dynamics''.
Good values for the original RMW (RM Wentzcovitch) dynamics are
\texttt{dt} $ = 50 \div 70$. The choice of the cell mass is a delicate matter. An
off-optimal mass will make convergence slower. Too small masses, as
well as too long time steps, can make the algorithm unstable. A good
cell mass will make the oscillation times for internal degrees of
freedom comparable to cell degrees of freedom in non-damped
Variable-Cell MD. Test calculations are advisable before extensive
calculation. I have tested the damping algorithm that I have developed
and it has worked well so far. It allows for a much longer time step
(dt=$100 \div 150$) than the RMW one and is much more stable with very
small cell masses, which is useful when the cell shape, not the
internal degrees of freedom, is far out of equilibrium. It also
converges in a smaller number of steps than RMW." (Info from Cesar Da
Silva: the new damping algorithm is the default since v. 3.1).
See also \texttt{examples/VCSexample}.
\section{NEB calculations}
Reminder: NEB calculations are no longer performed by \pw.x.
In order to perform a NEB calculation, you should compile
\texttt{NEB/neb.x} (command \texttt{make neb}). {\bf the
rest of the section is obsolete}.
Specify \texttt{calculation='neb'} and add namelist \&IONS.
All options for a single SCF calculation apply, plus a few others. In the
namelist \&IONS the number of images used to discretize the elastic band
must be specified. All other variables have a default value. Coordinates
of the initial and final image of the elastic band have to be specified
in the ATOMIC\_POSITIONS card. A detailed description of all input
variables is contained in files \texttt{Doc/INPUT\_PW.*}. See Example 17.
A NEB calculation will produce a number of files in the current directory
(i.e. in the directory were the code is run) containing additional information
on the minimum-energy path. The files are organized as following
(where \texttt{prefix} is specified in the input file):
\begin{description}
\item[\texttt{prefix.dat}]
is a three-column file containig the position of each image on the reaction
coordinate (arb. units), its energy in eV relative to the energy of the first image
and the residual error for the image in eV/$a_0$.
\item[\texttt{prefix.int}]
contains an interpolation of the path energy profile that pass exactly through each
image; it is computed using both the image energies and their derivatives
\item[\texttt{prefix.path}]
information used by \qe\
to restart a path calculation, its format depends on the input
details and is undocumented
\item[\texttt{prefix.axsf}]
atomic positions of all path images in the XCrySDen animation format:
to visualize it, use \texttt{xcrysden -\--axsf prefix.axsf}
\item[\texttt{prefix.xyz}]
atomic positions of all path images in the generic xyz format, used by
many quantum-chemistry softwares
\item[\texttt{prefix.crd}]
path information in the input format used by \pw.x, suitable for a manual
restart of the calculation
\end{description}
"NEB calculation are a bit tricky in general and require extreme care to be
setup correctly. NEB also takes easily hundreds of iteration to converge,
of course depending on the number of atoms and of images. Here is some
free advice:
\begin{enumerate}
\item
Don't use Climbing Image (CI) from the beginning. It makes convergence slower,
especially if the special image changes during the convergence process (this
may happen if \texttt{CI\_scheme='auto'} and if it does it may mess up everything).
Converge your calculation, then restart from the last configuration with
CI option enabled (note that this will {\em increase} the barrier).
\item
Carefully choose the initial path. Remember that \qe\ assumes continuity
between the first and the last image at the initial condition. In other
words, periodic images are NOT used; you may have to manually translate
an atom by one or more unit cell base vectors in order to have a meaningful
initial path. You can visualize NEB input files with XCrySDen as animations,
take some time to check if any atoms overlap or get very close in the initial
path (you will have to add intermediate images, in this case).
\item
Try to start the NEB process with most atomic positions fixed,
in order to converge the more "problematic" ones, before leaving
all atoms move.
\item
Especially for larger systems, you can start NEB with lower accuracy
(less k-points, lower cutoff) and then increase it when it has
converged to refine your calculation.
\item
Use the Broyden algorithm instead of the default one: it is a bit more
fragile, but it removes the problem of "oscillations" in the calculated
activation energies. If these oscillations persist, and you cannot afford
more images, focus to a smaller problem, decompose it into pieces.
\item
A gross estimate of the required number of iterations is
(number of images) * (number of atoms) * 3. Atoms that do not
move should not be counted. It may take half that many iterations,
or twice as many, but more or less that's the order of magnitude,
unless one starts from a very good or very bad initial guess.
\end{enumerate}
(Courtesy of Lorenzo Paulatto)
\section{Phonon calculations}
Phonon calculation is presently a two-step process.
First, you have to find the ground-state atomic and electronic configuration;
Second, you can calculate phonons using Density-Functional Perturbation Theory.
Further processing to calculate Interatomic Force Constants, to add macroscopic
electric field and impose Acoustic Sum Rules at q=0 may be needed.
In the following, we will indicate by $q$ the phonon wavevectors,
while $k$ will indicate Bloch vectors used for summing over the Brillouin Zone.
Since version 4.0 it is possible to safely stop execution of
\ph.x\ code using
the same mechanism of the \pw.x\ code, i.e. by creating a file \texttt{prefix.EXIT} in the
working directory. Execution can be resumed by setting \texttt{recover=.true.}
in the subsequent input data.
\subsection{Single-q calculation}
The phonon code \ph.x\ calculates normal modes at a given q-vector, starting
from data files produced by \pw.x with a simple SCF calculation.
NOTE: the alternative procedure in which a band-structure calculation
with \texttt{calculation='phonon} was performed as an intermediate step is no
longer implemented since version 4.1. It is also no longer needed to
specify \texttt{lnscf=.true.} for $q\ne 0$.
The output data file appear in the directory specified by variables outdir,
with names specified by variable prefix. After the output file(s) has been
produced (do not remove any of the files, unless you know which are used
and which are not), you can run \ph.x.
The first input line of \ph.x is a job identifier. At the second line the
namelist \&INPUTPH starts. The meaning of the variables in the namelist
(most of them having a default value) is described in file
\texttt{Doc/INPUT\_PH.*}. Variables \texttt{outdir} and \texttt{prefix}
must be the same as in the input data of \pw.x. Presently
you must also specify \texttt{amass(i)} (a real variable): the atomic mass
of atomic type $i$.
After the namelist you must specify the q-vector of the phonon mode.
This must be the same q-vector given in the input of \pw.x.
Notice that the dynamical matrix calculated by \ph.x at $q=0$ does not
contain the non-analytic term occurring in polar materials, i.e. there is no
LO-TO splitting in insulators. Moreover no Acoustic Sum Rule (ASR) is
applied. In order to have the complete dynamical matrix at $q=0$ including
the non-analytic terms, you need to calculate effective charges by specifying
option \texttt{epsil=.true.} to \ph.x. This is however not possible (because
not physical!) for metals (i.e. any system subject to a broadening).
At $q=0$, use program \texttt{dynmat.x} to calculate the correct LO-TO
splitting, IR cross sections, and to impose various forms of ASR.
If \ph.x\ was instructed to calculate Raman coefficients,
\texttt{dynmat.x} will also calculate Raman cross sections
for a typical experimental setup.
Input documentation in the header of \texttt{PH/dynmat.f90}.
A sample phonon calculation is performed in Example 02.
\subsection{Calculation of interatomic force constants in real space}
First, dynamical matrices are calculated and saved for a suitable uniform
grid of q-vectors (only those in the Irreducible Brillouin Zone of the
crystal are needed). Although this can be done one q-vector at the time, a
simpler procedure is to specify variable \texttt{ldisp=.true.} and to set
variables \texttt{nq1}, \texttt{nq2}, \texttt{nq3} to some suitable
Monkhorst-Pack grid, that will be automatically generated, centered at $q=0$.
Do not forget to specify \texttt{epsil=.true.} in the input data of \ph.x
if you want the correct TO-LO splitting in polar
materials.
Second, code \texttt{q2r.x} reads the dynamical matrices produced in the
preceding step and Fourier-transform them, writing a file of Interatomic Force
Constants in real space, up to a distance that depends on the size of the grid
of q-vectors. Input documentation in the header of \texttt{PH/q2r.f90}.
Program \texttt{matdyn.x} may be used to produce phonon modes and
frequencies at any q using the Interatomic Force Constants file as input.
Input documentation in the header of \texttt{PH/matdyn.f90}.
For more details, see Example 06.
\subsection{Calculation of electron-phonon interaction coefficients}
The calculation of electron-phonon coefficients in metals is made difficult
by the slow convergence of the sum at the Fermi energy. It is convenient to
use a coarse k-point grid to calculate phonons on a suitable wavevector grid;
a dense k-point grid to calculate the sum at the Fermi energy. The calculation
proceeds in this way:
\begin{enumerate}
\item a scf calculation for the dense k-point grid (or a scf calculation
followed by a non-scf one on the dense k-point grid); specify
option \texttt{la2f=.true.} to \pw.x\ in order to save a file with
the eigenvalues on the dense k-point grid. The latter MUST contain
all k and k+q grid points used in the subsequent electron-phonon
calculation. All grids MUST be unshifted, i.e. include $k=0$.
\item a normal scf + phonon dispersion calculation on the coarse k-point
grid, specifying option \texttt{elph=.true.}. and the file name where
the self-consistent first-order variation of the potential is to be
stored: variable \texttt{fildvscf}).
The electron-phonon coefficients are calculated using several
values of Gaussian broadening (see \texttt{PH/elphon.f90}) because this quickly
shows whether results are converged or not with respect to the k-point grid
and Gaussian broadening.
\item Finally, you can use \texttt{matdyn.x} and \texttt{lambda.x}
(input documentation in the header of \texttt{PH/lambda.f90})
to get the $\alpha^2F(\omega)$ function, the electron-phonon coefficient
$\lambda$, and an estimate of the critical temperature $T_c$.
\end{enumerate}
For more details, see Example 07.
\subsection{Distributed Phonon calculations}
A complete phonon dispersion calculation can be quite long and
expensive, but it can be split into a number of semi-independent
calculations, using options \texttt{start\_q}, \texttt{last\_q},
\texttt{start\_irr}, \texttt{last\_irr}. An example on how to
distribute the calculations and collect the results can be found
in \texttt{examples/GRID\_example}. Reference:\\
{\it Calculation of Phonon Dispersions on the GRID using Quantum
ESPRESSO},
R. di Meo, A. Dal Corso, P. Giannozzi, and S. Cozzini, in
{\it Chemistry and Material Science Applications on Grid Infrastructures},
editors: S. Cozzini, A. Lagan\`a, ICTP Lecture Notes Series,
Vol. 24, pp.165-183 (2009).
\section{Post-processing}
There are a number of auxiliary codes performing postprocessing tasks such
as plotting, averaging, and so on, on the various quantities calculated by
\pw.x. Such quantities are saved by \pw.x\ into the output data file(s).
Postprocessing codes are in the \texttt{PP/} directory. All codes for
which input documentation is not explicitly mentioned have documentation
in the header of the fortran sources.
\subsection{Plotting selected quantities}
The main postprocessing code \texttt{pp.x} reads data file(s), extracts or calculates
the selected quantity, writes it into a format that is suitable for plotting.
Quantities that can be read or calculated are:
\begin{quote}
charge density\\
spin polarization\\
various potentials\\
local density of states at $E_F$\\
local density of electronic entropy\\
STM images\\
selected squared wavefunction\\
ELF (electron localization function)\\
planar averages\\
integrated local density of states
\end{quote}
Various types of plotting (along a line, on a plane, three-dimensional, polar)
and output formats (including the popular cube format) can be specified.
The output files can be directly read by the free plotting system Gnuplot
(1D or 2D plots), or by code \texttt{plotrho.x} that comes with \PostProc\ (2D plots),
or by advanced plotting software XCrySDen and gOpenMol (3D plots).
See file \texttt{Doc/INPUT\_PP.*} for a detailed description of the input for code \texttt{pp.x}.
See Example 05 for an example of a charge density plot, Example 16
for an example of STM image simulation.
\subsection{Band structure, Fermi surface}
The code \texttt{bands.x} reads data file(s), extracts eigenvalues,
regroups them into bands (the algorithm used to order bands and to resolve
crossings may not work in all circumstances, though). The output is written
to a file in a simple format that can be directly read by plotting program
\texttt{plotband.x}. Unpredictable plots may results if k-points are not in sequence
along lines. See Example 05 directory for a simple band plot.
The code \texttt{bands.x} performs as well a symmetry analysis of the band structure:
see Example 01.
The calculation of Fermi surface can be performed using
\texttt{kvecs\_FS.x} and
\texttt{bands\_FS.x}. The resulting file in .xsf format can be read and plotted
using XCrySDen. See Example 08 for an example of Fermi surface
visualization (Ni, including the spin-polarized case).
\subsection{Projection over atomic states, DOS}
The code \texttt{projwfc.x} calculates projections of wavefunctions
over atomic orbitals. The atomic wavefunctions are those contained
in the pseudopotential file(s). The L\"owdin population analysis (similar to
Mulliken analysis) is presently implemented. The projected DOS (or PDOS:
the DOS projected onto atomic orbitals) can also be calculated and written
to file(s). More details on the input data are found in file
\texttt{Doc/INPUT\_PROJWFC.*}. The ordering of the various
angular momentum components (defined in routine \texttt{flib/ylmr2.f90})
is as follows:
$P_{0,0}(t)$, $P_{1,0}(t)$, $P_{1,1}(t)cos\phi$, $P_{1,1}(t)sin\phi$,
$P_{2,0}(t)$, $P_{2,1}(t)cos\phi$, $P_{2,1}(t)sin\phi$,
$P_{2,2}(t)cos2\phi$, $P_{2,2}(t)sin2\phi$
and so on, where $P_{l,m}$=Legendre Polynomials,
$t = cos\theta = z/r$, $\phi= atan(y /x)$.
The total electronic DOS is instead calculated by code
\texttt{dos.x}. See Example 08 for total and projected
electronic DOS calculations.
\subsection{Wannier functions}
There are several Wannier-related utilities in \PostProc:
\begin{enumerate}
\item The "Poor Man Wannier" code \texttt{pmw.x}, to be used
in conjunction with DFT+U calculations (see Example 25)
\item The interface with Wannier90 code, \texttt{pw2wannier.x}:
see the documentation in \texttt{W90/} (you have to install the
Wannier90 plug-in)
\item The \texttt{wannier\_ham.x} code generates a model Hamiltonian
in Wannier functions basis: see \texttt{examples/WannierHam\_example/}.
\end{enumerate}
\subsection{Other tools}
Code \texttt{sumpdos.x} can be used to sum selected PDOS, produced by
\texttt{projwfc.x}, by specifiying the names of files
containing the desired PDOS. Type \texttt{sumpdos.x -h} or look into the source
code for more details.
Code \texttt{epsilon.x} calculates RPA frequency-dependent complex dielectric function. Documentation is in \texttt{Doc/eps\_man.tex}.
The code \texttt{path\_int.x} is intended to be used in the framework of NEB
calculations. It is a tool to generate a new path (what is actually
generated is the restart file) starting from an old one through
interpolation (cubic splines). The new path can be discretized with a
different number of images (this is its main purpose), images are
equispaced and the interpolation can be also
performed on a subsection of the old path. The input file needed by
\texttt{path\_int.x} can be easily set up with the help of the self-explanatory
\texttt{path\_int.sh} shell script.
\section{Using CP}
This section is intended to explain how to perform basic Car-Parrinello (CP)
simulations using the \CP\ package.
It is important to understand that a CP simulation is a sequence of different
runs, some of them used to "prepare" the initial state of the system, and
other performed to collect statistics, or to modify the state of the system
itself, i.e. modify the temperature or the pressure.
To prepare and run a CP simulation you should first of all
define the system:
\begin{quote}
atomic positions\\
system cell\\
pseudopotentials\\
cut-offs\\
number of electrons and bands (optional)\\
FFT grids (optional)
\end{quote}
An example of input file (Benzene Molecule):
\begin{verbatim}
&control
title = 'Benzene Molecule',
calculation = 'cp',
restart_mode = 'from_scratch',
ndr = 51,
ndw = 51,
nstep = 100,
iprint = 10,
isave = 100,
tstress = .TRUE.,
tprnfor = .TRUE.,
dt = 5.0d0,
etot_conv_thr = 1.d-9,
ekin_conv_thr = 1.d-4,
prefix = 'c6h6',
pseudo_dir='/scratch/benzene/',
outdir='/scratch/benzene/Out/'
/
&system
ibrav = 14,
celldm(1) = 16.0,
celldm(2) = 1.0,
celldm(3) = 0.5,
celldm(4) = 0.0,
celldm(5) = 0.0,
celldm(6) = 0.0,
nat = 12,
ntyp = 2,
nbnd = 15,
ecutwfc = 40.0,
nr1b= 10, nr2b = 10, nr3b = 10,
input_dft = 'BLYP'
/
&electrons
emass = 400.d0,
emass_cutoff = 2.5d0,
electron_dynamics = 'sd'
/
&ions
ion_dynamics = 'none'
/
&cell
cell_dynamics = 'none',
press = 0.0d0,
/
ATOMIC_SPECIES
C 12.0d0 c_blyp_gia.pp
H 1.00d0 h.ps
ATOMIC_POSITIONS (bohr)
C 2.6 0.0 0.0
C 1.3 -1.3 0.0
C -1.3 -1.3 0.0
C -2.6 0.0 0.0
C -1.3 1.3 0.0
C 1.3 1.3 0.0
H 4.4 0.0 0.0
H 2.2 -2.2 0.0
H -2.2 -2.2 0.0
H -4.4 0.0 0.0
H -2.2 2.2 0.0
H 2.2 2.2 0.0
\end{verbatim}
You can find the description of the input variables in file
\texttt{Doc/INPUT\_CP.*}.
\subsection{Reaching the electronic ground state}
The first run, when starting from scratch, is always an electronic
minimization, with fixed ions and cell, to bring the electronic system on the ground state (GS) relative to the starting atomic configuration. This step is conceptually very similar to
self-consistency in a \pw.x\ run.
Sometimes a single run is not enough to reach the GS. In this case,
you need to re-run the electronic minimization stage. Use the input
of the first run, changing \texttt{restart\_mode = 'from\_scratch'}
to \texttt{restart\_mode = 'restart'}.
NOTA BENE: Unless you are already experienced with the system
you are studying or with the internals of the code, you will usually need
to tune some input parameters, like \texttt{emass}, \texttt{dt}, and cut-offs. For this
purpose, a few trial runs could be useful: you can perform short
minimizations (say, 10 steps) changing and adjusting these parameters
to fit your needs. You can specify the degree of convergence with these
two thresholds:
\begin{quote}
\texttt{etot\_conv\_thr}: total energy difference between two consecutive steps\\
\texttt{ekin\_conv\_thr}: value of the fictitious kinetic energy of the electrons.
\end{quote}
Usually we consider the system on the GS when
\texttt{ekin\_conv\_thr} $ < 10^{-5}$.
You could check the value of the fictitious kinetic energy on the standard
output (column EKINC).
Different strategies are available to minimize electrons, but the most used
ones are:
\begin{itemize}
\item steepest descent: \texttt{electron\_dynamics = 'sd'}
\item damped dynamics: \texttt{electron\_dynamics = 'damp'},
\texttt{electron\_damping} = a number typically ranging from 0.1 and 0.5
\end{itemize}
See the input description to compute the optimal damping factor.
\subsection{Relax the system}
Once your system is in the GS, depending on how you have prepared the starting
atomic configuration:
\begin{enumerate}
\item
if you have set the atomic positions "by hand" and/or from a classical code,
check the forces on atoms, and if they are large ($\sim 0.1 \div 1.0$
atomic units), you should perform an ionic minimization, otherwise the
system could break up during the dynamics.
\item
if you have taken the positions from a previous run or a previous ab-initio
simulation, check the forces, and if they are too small ($\sim 10^{-4}$
atomic units), this means that atoms are already in equilibrium positions
and, even if left free, they will not move. Then you need to randomize
positions a little bit (see below).
\end{enumerate}
Let us consider case 1). There are
different strategies to relax the system, but the most used
are again steepest-descent or damped-dynamics for ions and electrons.
You could also mix electronic and ionic minimization scheme freely,
i.e. ions in steepest-descent and electron in with damped-dynamics or vice versa.
\begin{itemize}
\item[(a)] suppose we want to perform steepest-descent for ions. Then we should specify
the following section for ions:
\begin{verbatim}
&ions
ion_dynamics = 'sd'
/
\end{verbatim}
Change also the ionic masses to accelerate the minimization:
\begin{verbatim}
ATOMIC_SPECIES
C 2.0d0 c_blyp_gia.pp
H 2.00d0 h.ps
\end{verbatim}
while leaving other input parameters unchanged.
{\em Note} that if the forces are really high ($> 1.0$ atomic units), you
should always use steepest descent for the first ($\sim 100$
relaxation steps.
\item[(b)] As the system approaches the equilibrium positions, the steepest
descent scheme slows down, so is better to switch to damped dynamics:
\begin{verbatim}
&ions
ion_dynamics = 'damp',
ion_damping = 0.2,
ion_velocities = 'zero'
/
\end{verbatim}
A value of \texttt{ion\_damping} around 0.05 is good for many systems.
It is also better to specify to restart with zero ionic and electronic
velocities, since we have changed the masses.
Change further the ionic masses to accelerate the minimization:
\begin{verbatim}
ATOMIC_SPECIES
C 0.1d0 c_blyp_gia.pp
H 0.1d0 h.ps
\end{verbatim}
\item[(c)] when the system is really close to the equilibrium, the damped dynamics
slow down too, especially because, since we are moving electron and ions
together, the ionic forces are not properly correct, then it is often better
to perform a ionic step every N electronic steps, or to move ions only when
electron are in their GS (within the chosen threshold).
This can be specified by adding, in the ionic section, the
\texttt{ion\_nstepe}
parameter, then the \&IONS namelist become as follows:
\begin{verbatim}
&ions
ion_dynamics = 'damp',
ion_damping = 0.2,
ion_velocities = 'zero',
ion_nstepe = 10
/
\end{verbatim}
Then we specify in the \&CONTROL namelist:
\begin{verbatim}
etot_conv_thr = 1.d-6,
ekin_conv_thr = 1.d-5,
forc_conv_thr = 1.d-3
\end{verbatim}
As a result, the code checks every 10 electronic steps whether
the electronic system satisfies the two thresholds
\texttt{etot\_conv\_thr}, \texttt{ekin\_conv\_thr}: if it does,
the ions are advanced by one step.
The process thus continues until the forces become smaller than
\texttt{forc\_conv\_thr}.
{\em Note} that to fully relax the system you need many runs, and different
strategies, that you should mix and change in order to speed-up the convergence.
The process is not automatic, but is strongly based on experience, and trial
and error.
Remember also that the convergence to the equilibrium positions depends on
the energy threshold for the electronic GS, in fact correct forces (required
to move ions toward the minimum) are obtained only when electrons are in their
GS. Then a small threshold on forces could not be satisfied, if you do not
require an even smaller threshold on total energy.
\end{itemize}
Let us now move to case 2: randomization of positions.
If you have relaxed the system or if the starting system is already in
the equilibrium positions, then you need to displace ions from the equilibrium
positions, otherwise they will not move in a dynamics simulation.
After the randomization you should bring electrons on the GS again,
in order to start a dynamic with the correct forces and with electrons
in the GS. Then you should switch off the ionic dynamics and activate
the randomization for each species, specifying the amplitude of the
randomization itself. This could be done with the following
\&IONS namelist:
\begin{verbatim}
&ions
ion_dynamics = 'none',
tranp(1) = .TRUE.,
tranp(2) = .TRUE.,
amprp(1) = 0.01
amprp(2) = 0.01
/
\end{verbatim}
In this way a random displacement (of max 0.01 a.u.) is added to atoms of
species 1 and 2. All other input parameters could remain the same.
Note that the difference in the total energy (etot) between relaxed and
randomized positions can be used to estimate the temperature that will
be reached by the system. In fact, starting with zero ionic velocities,
all the difference is potential energy, but in a dynamics simulation, the
energy will be equipartitioned between kinetic and potential, then to
estimate the temperature take the difference in energy (de), convert it
in Kelvins, divide for the number of atoms and multiply by 2/3.
Randomization could be useful also while we are relaxing the system,
especially when we suspect that the ions are in a local minimum or in
an energy plateau.
\subsection{CP dynamics}
At this point after having minimized the electrons, and with ions displaced from their equilibrium positions, we are ready to start a CP
dynamics. We need to specify \texttt{'verlet'} both in ionic and electronic
dynamics. The threshold in control input section will be ignored, like
any parameter related to minimization strategy. The first time we perform
a CP run after a minimization, it is always better to put velocities equal
to zero, unless we have velocities, from a previous simulation, to
specify in the input file. Restore the proper masses for the ions. In this
way we will sample the microcanonical ensemble. The input section
changes as follow:
\begin{verbatim}
&electrons
emass = 400.d0,
emass_cutoff = 2.5d0,
electron_dynamics = 'verlet',
electron_velocities = 'zero'
/
&ions
ion_dynamics = 'verlet',
ion_velocities = 'zero'
/
ATOMIC_SPECIES
C 12.0d0 c_blyp_gia.pp
H 1.00d0 h.ps
\end{verbatim}
If you want to specify the initial velocities for ions, you have to set
\texttt{ion\_velocities ='from\_input'}, and add the IONIC\_VELOCITIES
card, after the ATOMIC\_POSITION card, with the list of velocities in
atomic units.
NOTA BENE: in restarting the dynamics after the first CP run,
remember to remove or comment the velocities parameters:
\begin{verbatim}
&electrons
emass = 400.d0,
emass_cutoff = 2.5d0,
electron_dynamics = 'verlet'
! electron_velocities = 'zero'
/
&ions
ion_dynamics = 'verlet'
! ion_velocities = 'zero'
/
\end{verbatim}
otherwise you will quench the system interrupting the sampling of the
microcanonical ensemble.
\paragraph{ Varying the temperature }
It is possible to change the temperature of the system or to sample the
canonical ensemble fixing the average temperature, this is done using
the Nos\'e thermostat. To activate this thermostat for ions you have
to specify in namelist \&IONS:
\begin{verbatim}
&ions
ion_dynamics = 'verlet',
ion_temperature = 'nose',
fnosep = 60.0,
tempw = 300.0
/
\end{verbatim}
where \texttt{fnosep} is the frequency of the thermostat in THz, that should be
chosen to be comparable with the center of the vibrational spectrum of
the system, in order to excite as many vibrational modes as possible.
\texttt{tempw} is the desired average temperature in Kelvin.
{\em Note:} to avoid a strong coupling between the Nos\'e thermostat
and the system, proceed step by step. Don't switch on the thermostat
from a completely relaxed configuration: adding a random displacement
is strongly recommended. Check which is the average temperature via a
few steps of a microcanonical simulation. Don't increase the temperature
too much. Finally switch on the thermostat. In the case of molecular system,
different modes have to be thermalized: it is better to use a chain of
thermostat or equivalently running different simulations with different
frequencies.
\paragraph{ No\'se thermostat for electrons }
It is possible to specify also the thermostat for the electrons. This is
usually activated in metals or in systems where we have a transfer of
energy between ionic and electronic degrees of freedom. Beware: the
usage of electronic thermostats is quite delicate. The following information
comes from K. Kudin:
''The main issue is that there is usually some "natural" fictitious kinetic
energy that electrons gain from the ionic motion ("drag"). One could easily
quantify how much of the fictitious energy comes from this drag by doing a CP
run, then a couple of CG (same as BO) steps, and then going back to CP.
The fictitious electronic energy at the last CP restart will be purely
due to the drag effect.''
''The thermostat on electrons will either try to overexcite the otherwise
"cold" electrons, or it will try to take them down to an unnaturally cold
state where their fictitious kinetic energy is even below what would be
just due pure drag. Neither of this is good.''
''I think the only workable regime with an electronic thermostat is a
mild overexcitation of the electrons, however, to do this one will need
to know rather precisely what is the fictitious kinetic energy due to the
drag.''
\subsection{Advanced usage}
\subsubsection{ Self-interaction Correction }
The self-interaction correction (SIC) included in the \CP\
package is based
on the Constrained Local-Spin-Density approach proposed my F. Mauri and
coworkers (M. D'Avezac et al. PRB 71, 205210 (2005)). It was used for
the first time in \qe\ by F. Baletto, C. Cavazzoni
and S.Scandolo (PRL 95, 176801 (2005)).
This approach is a simple and nice way to treat ONE, and only one,
excess charge. It is moreover necessary to check a priori that
the spin-up and spin-down eigenvalues are not too different, for the
corresponding neutral system, working in the Local-Spin-Density
Approximation (setting \texttt{nspin = 2}). If these two conditions are satisfied
and you are interest in charged systems, you can apply the SIC.
This approach is a on-the-fly method to correct the self-interaction
with the excess charge with itself.
Briefly, both the Hartree and the XC part have been
corrected to avoid the interaction of the excess charge with tself.
For example, for the Boron atoms, where we have an even number of
electrons (valence electrons = 3), the parameters for working with
the SIC are:
\begin{verbatim}
&system
nbnd= 2,
total_magnetization=1,
sic_alpha = 1.d0,
sic_epsilon = 1.0d0,
sic = 'sic_mac',
force_pairing = .true.,
&ions
ion_dynamics = 'none',
ion_radius(1) = 0.8d0,
sic_rloc = 1.0,
ATOMIC_POSITIONS (bohr)
B 0.00 0.00 0.00 0 0 0 1
\end{verbatim}
The two main parameters are:
\begin{quote}
\texttt{force\_pairing = .true.}, which forces the paired electrons to be the same;\\
\texttt{sic='sic\_mac'}, which instructs the code to use Mauri's correction.
\end{quote}
Remember to add an extra-column in ATOMIC\_POSITIONS with "1" to activate
SIC for those atoms.
{\bf Warning}:
This approach has known problems for dissociation mechanism
driven by excess electrons.
Comment 1:
Two parameters, \texttt{sic\_alpha} and \texttt{sic\_epsilon'}, have been introduced
following the suggestion of M. Sprik (ICR(05)) to treat the radical
(OH)-H$_2$O. In any case, a complete ab-initio approach is followed
using \texttt{sic\_alpha=1}, \texttt{sic\_epsilon=1}.
Comment 2:
When you apply this SIC scheme to a molecule or to an atom, which are neutral,
remember to add the correction to the energy level as proposed by Landau:
in a neutral system, subtracting the self-interaction, the unpaired electron
feels a charged system, even if using a compensating positive background.
For a cubic box, the correction term due to the Madelung energy is approx.
given by $1.4186/L_{box} - 1.047/(L_{box})^3$, where $L_{box}$ is the
linear dimension of your box (=celldm(1)). The Madelung coefficient is
taken from I. Dabo et al. PRB 77, 115139 (2007).
(info by F. Baletto, francesca.baletto@kcl.ac.uk)
% \subsubsection{ Variable-cell MD }
%The variable-cell MD is when the Car-Parrinello technique is also applied
%to the cell. This technique is useful to study system at very high pressure.
\subsubsection{ ensemble-DFT }
The ensemble-DFT (eDFT) is a robust method to simulate the metals in the
framework of ''ab-initio'' molecular dynamics. It was introduced in 1997
by Marzari et al.
The specific subroutines for the eDFT are in
\texttt{CPV/ensemble\_dft.f90} where you
define all the quantities of interest. The subroutine
\texttt{CPV/inner\_loop\_cold.f90}
called by \texttt{cg\_sub.f90}, control the inner loop, and so the minimization of
the free energy $A$ with respect to the occupation matrix.
To select a eDFT calculations, the user has to set:
\begin{verbatim}
calculation = 'cp'
occupations= 'ensemble'
tcg = .true.
passop= 0.3
maxiter = 250
\end{verbatim}
to use the CG procedure. In the eDFT it is also the outer loop, where the
energy is minimized with respect to the wavefunction keeping fixed the
occupation matrix. While the specific parameters for the inner loop.
Since eDFT was born to treat metals, keep in mind that we want to describe
the broadening of the occupations around the Fermi energy.
Below the new parameters in the electrons list, are listed.
\begin{itemize}
\item \texttt{smearing}: used to select the occupation distribution;
there are two options: Fermi-Dirac smearing='fd', cold-smearing
smearing='cs' (recommended)
\item \texttt{degauss}: is the electronic temperature; it controls the broadening
of the occupation numbers around the Fermi energy.
\item \texttt{ninner}: is the number of iterative cycles in the inner loop,
done to minimize the free energy $A$ with respect the occupation numbers.
The typical range is 2-8.
\item \texttt{conv\_thr}: is the threshold value to stop the search of the 'minimum'
free energy.
\item \texttt{niter\_cold\_restart}: controls the frequency at which a full iterative
inner cycle is done. It is in the range $1\div$\texttt{ninner}. It is a trick to speed up
the calculation.
\item \texttt{lambda\_cold}: is the length step along the search line for the best
value for $A$, when the iterative cycle is not performed. The value is close
to 0.03, smaller for large and complicated metallic systems.
\end{itemize}
{\em NOTE:} \texttt{degauss} is in Hartree, while in \PWscf is in Ry (!!!).
The typical range is 0.01-0.02 Ha.
The input for an Al surface is:
\begin{verbatim}
&CONTROL
calculation = 'cp',
restart_mode = 'from_scratch',
nstep = 10,
iprint = 5,
isave = 5,
dt = 125.0d0,
prefix = 'Aluminum_surface',
pseudo_dir = '~/UPF/',
outdir = '/scratch/'
ndr=50
ndw=51
/
&SYSTEM
ibrav= 14,
celldm(1)= 21.694d0, celldm(2)= 1.00D0, celldm(3)= 2.121D0,
celldm(4)= 0.0d0, celldm(5)= 0.0d0, celldm(6)= 0.0d0,
nat= 96,
ntyp= 1,
nspin=1,
ecutwfc= 15,
nbnd=160,
input_dft = 'pbe'
occupations= 'ensemble',
smearing='cs',
degauss=0.018,
/
&ELECTRONS
orthogonalization = 'Gram-Schmidt',
startingwfc = 'random',
ampre = 0.02,
tcg = .true.,
passop= 0.3,
maxiter = 250,
emass_cutoff = 3.00,
conv_thr=1.d-6
n_inner = 2,
lambda_cold = 0.03,
niter_cold_restart = 2,
/
&IONS
ion_dynamics = 'verlet',
ion_temperature = 'nose'
fnosep = 4.0d0,
tempw = 500.d0
/
ATOMIC_SPECIES
Al 26.89 Al.pbe.UPF
\end{verbatim}
{\em NOTA1} remember that the time step is to integrate the ionic dynamics,
so you can choose something in the range of 1-5 fs. \\
{\em NOTA2} with eDFT you are simulating metals or systems for which the
occupation number is also fractional, so the number of band, \texttt{nbnd}, has to
be chosen such as to have some empty states. As a rule of thumb, start
with an initial occupation number of about 1.6-1.8 (the more bands you
consider, the more the calculation is accurate, but it also takes longer.
The CPU time scales almost linearly with the number of bands.) \\
{\em NOTA3} the parameter \texttt{emass\_cutoff} is used in the preconditioning
and it has a completely different meaning with respect to plain CP.
It ranges between 4 and 7.
All the other parameters have the same meaning in the usual \CP\ input,
and they are discussed above.
\subsubsection{Treatment of USPPs}
The cutoff \texttt{ecutrho} defines the resolution on the real space FFT mesh (as expressed
by \texttt{nr1}, \texttt{nr2} and \texttt{nr3}, that the code left on its own sets automatically).
In the USPP case we refer to this mesh as the "hard" mesh, since it
is denser than the smooth mesh that is needed to represent the square
of the non-norm-conserving wavefunctions.
On this "hard", fine-spaced mesh, you need to determine the size of the
cube that will encompass the largest of the augmentation charges - this
is what \texttt{nr1b}, \texttt{nr2b}, \texttt{nr3b} are. hey are independent
of the system size, but dependent on the size
of the augmentation charge (an atomic property that doesn't vary
that much for different systems) and on the
real-space resolution needed by augmentation charges (rule of thumb:
\texttt{ecutrho} is between 6 and 12 times \texttt{ecutwfc}).
The small boxes should be set as small as possible, but large enough
to contain the core of the largest element in your system.
The formula for estimating the box size is quite simple:
\begin{quote}
\texttt{nr1b} = $2 R_c / L_x \times$ \texttt{nr1}
\end{quote}
and the like, where $R_{cut}$ is largest cut-off radius among the various atom
types present in the system, $L_x$ is the
physical length of your box along the $x$ axis. You have to round your
result to the nearest larger integer.
In practice, \texttt{nr1b} etc. are often in the region of 20-24-28; testing seems
again a necessity.
The core charge is in principle finite only at the core region (as defined
by some $R_{rcut}$ ) and vanishes out side the core. Numerically the charge is
represented in a Fourier series which may give rise to small charge
oscillations outside the core and even to negative charge density, but
only if the cut-off is too low. Having these small boxes removes the
charge oscillations problem (at least outside the box) and also offers
some numerical advantages in going to higher cut-offs." (info by Nicola Marzari)
\section{Performances}
\subsection{Execution time}
Since v.4.2 \qe\ prints real (wall) time instead of CPU time.
The following is a rough estimate of the complexity of a plain
scf calculation with \pw.x, for NCPP. USPP and PAW
give raise additional terms to be calculated, that may add from a
few percent
up to 30-40\% to execution time. For phonon calculations, each of the
$3N_{at}$ modes requires a time of the same order of magnitude of
self-consistent calculation in the same system (possibly times a small multiple).
For \cp.x, each time step takes something in the order of
$T_h + T_{orth} + T_{sub}$ defined below.
The time required for the self-consistent solution at fixed ionic
positions, $T_{scf}$ , is:
$$T_{scf} = N_{iter} T_{iter} + T_{init}$$
where $N_{iter}$ = number of self-consistency iterations (\texttt{niter}),
$T_{iter}$ =
time for a single iteration, $T_{init}$ = initialization time
(usually much smaller than the first term).
The time required for a single self-consistency iteration $T_{iter}$ is:
$$T_{iter} = N_k T_{diag} +T_{rho} + T_{scf}$$
where $N_k$ = number of k-points, $T_{diag}$ = time per
Hamiltonian iterative diagonalization, $T_{rho}$ = time for charge density
calculation, $T_{scf}$ = time for Hartree and XC potential
calculation.
The time for a Hamiltonian iterative diagonalization $T_{diag}$ is:
$$T_{diag} = N_h T_h + T_{orth} + T_{sub}$$
where $N_h$ = number of $H\psi$ products needed by iterative diagonalization,
$T_h$ = time per $H\psi$ product, $T_{orth}$ = CPU time for
orthonormalization, $T_{sub}$ = CPU time for subspace diagonalization.
The time $T_h$ required for a $H\psi$ product is
$$T_h = a_1 M N + a_2 M N_1 N_2 N_3 log(N_1 N_2 N_3 ) + a_3 M P N. $$
The first term comes from the kinetic term and is usually much smaller
than the others. The second and third terms come respectively from local
and nonlocal potential. $a_1, a_2, a_3$ are prefactors (i.e.
small numbers ${\cal O}(1)$), M = number of valence
bands (\texttt{nbnd}), N = number of PW (basis set dimension: \texttt{npw}), $N_1, N_2, N_3$ =
dimensions of the FFT grid for wavefunctions (\texttt{nr1s}, \texttt{nr2s},
\texttt{nr3s}; $N_1 N_2 N_3 \sim 8N$ ),
P = number of pseudopotential projectors, summed on all atoms, on all values of the
angular momentum $l$, and $m = 1, . . . , 2l + 1$.
The time $T_{orth}$ required by orthonormalization is
$$T_{orth} = b_1 N M_x^2$$
and the time $T_{sub}$ required by subspace diagonalization is
$$T_{sub} = b_2 M_x^3$$
where $b_1$ and $b_2$ are prefactors, $M_x$ = number of trial wavefunctions
(this will vary between $M$ and $2\div4 M$, depending on the algorithm).
The time $T_{rho}$ for the calculation of charge density from wavefunctions is
$$T_{rho} = c_1 M N_{r1} N_{r2}N_{r3} log(N_{r1} N_{r2} N_{r3}) +
c_2 M N_{r1} N_{r2} N_{r3} + T_{us}$$
where $c_1, c_2, c_3$ are prefactors, $N_{r1}, N_{r2}, N_{r3}$ =
dimensions of the FFT grid for charge density (\texttt{nr1},
\texttt{nr2}, \texttt{nr3}; $N_{r1} N_{r2} N_r3 \sim 8N_g$,
where $N_g$ = number of G-vectors for the charge density,
\texttt{ngm}), and
$T_{us}$ = time required by PAW/USPPs contribution (if any).
Note that for NCPPs the FFT grids for charge and
wavefunctions are the same.
The time $T_{scf}$ for calculation of potential from charge density is
$$T_{scf} = d_2 N_{r1} N_{r2} N_{r3} + d_3 N_{r1} N_{r2} N_{r3}
log(N_{r1} N_{r2} N_{r3} )$$
where $d_1, d_2$ are prefactors.
The above estimates are for serial execution. In parallel execution,
each contribution may scale in a different manner with the number of processors (see below).
\subsection{Memory requirements}
A typical self-consistency or molecular-dynamics run requires a maximum
memory in the order of $O$ double precision complex numbers, where
$$ O = m M N + P N + p N_1 N_2 N_3 + q N_{r1} N_{r2} N_{r3}$$
with $m, p, q$ = small factors; all other variables have the same meaning as
above. Note that if the $\Gamma-$point only ($k=0$) is used to sample the
Brillouin Zone, the value of N will be cut into half.
The memory required by the phonon code follows the same patterns, with
somewhat larger factors $m, p, q$.
\subsection{File space requirements}
A typical \pw.x\ run will require an amount of temporary disk space in the
order of O double precision complex numbers:
$$O = N_k M N + q N_{r1} N_{r2}N_{r3}$$
where $q = 2\times$ \texttt{mixing\_ndim} (number of iterations used in
self-consistency, default value = 8) if \texttt{disk\_io} is set to 'high'; q = 0
otherwise.
\subsection{Parallelization issues}
\label{SubSec:badpara}
\pw.x\ and \cp.x\ can run in principle on any number of processors.
The effectiveness of parallelization is ultimately judged by the
''scaling'', i.e. how the time needed to perform a job scales
with the number of processors, and depends upon:
\begin{itemize}
\item the size and type of the system under study;
\item the judicious choice of the various levels of parallelization
(detailed in Sec.\ref{SubSec:para});
\item the availability of fast interprocess communications (or lack of it).
\end{itemize}
Ideally one would like to have linear scaling, i.e. $T \sim T_0/N_p$ for
$N_p$ processors, where $T_0$ is the estimated time for serial execution.
In addition, one would like to have linear scaling of
the RAM per processor: $O_N \sim O_0/N_p$, so that large-memory systems
fit into the RAM of each processor.
As a general rule, image parallelization:
\begin{itemize}
\item may give good scaling, but the slowest image will determine
the overall performances (''load balancing'' may be a problem);
\item requires very little communications (suitable for ethernet
communications);
\item does not reduce the required memory per processor (unsuitable for
large-memory jobs).
\end{itemize}
Parallelization on k-points:
\begin{itemize}
\item guarantees (almost) linear scaling if the number of k-points
is a multiple of the number of pools;
\item requires little communications (suitable for ethernet communications);
\item does not reduce the required memory per processor (unsuitable for
large-memory jobs).
\end{itemize}
Parallelization on PWs:
\begin{itemize}
\item yields good to very good scaling, especially if the number of processors
in a pool is a divisor of $N_3$ and $N_{r3}$ (the dimensions along the z-axis
of the FFT grids, \texttt{nr3} and \texttt{nr3s}, which coincide for NCPPs);
\item requires heavy communications (suitable for Gigabit ethernet up to
4, 8 CPUs at most, specialized communication hardware needed for 8 or more
processors );
\item yields almost linear reduction of memory per processor with the number
of processors in the pool.
\end{itemize}
A note on scaling: optimal serial performances are achieved when the data are
as much as possible kept into the cache. As a side effect, PW
parallelization may yield superlinear (better than linear) scaling,
thanks to the increase in serial speed coming from the reduction of data size
(making it easier for the machine to keep data in the cache).
VERY IMPORTANT: For each system there is an optimal range of number of processors on which to
run the job. A too large number of processors will yield performance
degradation. If the size of pools is especially delicate: $N_p$ should not
exceed $N_3$ and $N_{r3}$, and should ideally be no larger than
$1/2\div1/4 N_3$ and/or $N_{r3}$. In order to increase scalability,
it is often convenient to
further subdivide a pool of processors into ''task groups''.
When the number of processors exceeds the number of FFT planes,
data can be redistributed to "task groups" so that each group
can process several wavefunctions at the same time.
The optimal number of processors for "linear-algebra"
parallelization, taking care of multiplication and diagonalization
of $M\times M$ matrices, should be determined by observing the
performances of \texttt{cdiagh/rdiagh} (\pw.x) or \texttt{ortho} (\'cp.x)
for different numbers of processors in the linear-algebra group
(must be a square integer).
Actual parallel performances will also depend on the available software
(MPI libraries) and on the available communication hardware. For
PC clusters, OpenMPI (\texttt{http://www.openmpi.org/}) seems to yield better
performances than other implementations (info by Kostantin Kudin).
Note however that you need a decent communication hardware (at least
Gigabit ethernet) in order to have acceptable performances with
PW parallelization. Do not expect good scaling with cheap hardware:
PW calculations are by no means an "embarrassing parallel" problem.
Also note that multiprocessor motherboards for Intel Pentium CPUs typically
have just one memory bus for all processors. This dramatically
slows down any code doing massive access to memory (as most codes
in the \qe\ distribution do) that runs on processors of the same
motherboard.
\section{Troubleshooting}
Almost all problems in \qe\ arise from incorrect input data
and result in
error stops. Error messages should be self-explanatory, but unfortunately
this is not always true. If the code issues a warning messages and continues,
pay attention to it but do not assume that something is necessarily wrong in
your calculation: most warning messages signal harmless problems.
\subsection{pw.x problems}
\paragraph{pw.x says 'error while loading shared libraries' or
'cannot open shared object file' and does not start}
Possible reasons:
\begin{itemize}
\item If you are running on the same machines on which the code was
compiled, this is a library configuration problem. The solution is
machine-dependent. On Linux, find the path to the missing libraries;
then either add it to file \texttt{/etc/ld.so.conf} and run \texttt{ldconfig}
(must be
done as root), or add it to variable LD\_LIBRARY\_PATH and export
it. Another possibility is to load non-shared version of libraries
(ending with .a) instead of shared ones (ending with .so).
\item If you are {\em not} running on the same machines on which the
code was compiled: you need either to have the same shared libraries
installed on both machines, or to load statically all libraries
(using appropriate \configure\ or loader options). The same applies to
Beowulf-style parallel machines: the needed shared libraries must be
present on all PCs.
\end{itemize}
\paragraph{errors in examples with parallel execution}
If you get error messages in the example scripts -- i.e. not errors in
the codes -- on a parallel machine, such as e.g.:
{\em run example: -n: command not found}
you may have forgotten
the " " in the definitions of PARA\_PREFIX and PARA\_POSTFIX.
\paragraph{pw.x prints the first few lines and then nothing happens
(parallel execution)}
If the code looks like it is not reading from input, maybe
it isn't: the MPI libraries need to be properly configured to accept input
redirection. Use \texttt{pw.x -inp} and the input file name (see Sec.\ref{SubSec:para}), or inquire with
your local computer wizard (if any). Since v.4.2, this is for sure the
reason if the code stops at {\em Waiting for input...}.
\paragraph{pw.x stops with error while reading data}
There is an error in the input data, typically a misspelled namelist
variable, or an empty input file.
Unfortunately with most compilers the code just reports {\em Error while
reading XXX namelist} and no further useful information.
Here are some more subtle sources of trouble:
\begin{itemize}
\item Out-of-bound indices in dimensioned variables read in the namelists;
\item Input data files containing \^{}M (Control-M) characters at the end
of lines, or non-ASCII characters (e.g. non-ASCII quotation marks,
that at a first glance may look the same as the ASCII
character). Typically, this happens with files coming from Windows
or produced with "smart" editors.
\end{itemize}
Both may cause the code to crash with rather mysterious error messages.
If none of the above applies and the code stops at the first namelist
(\&CONTROL) and you are running in parallel, see the previous item.
\paragraph{pw.x mumbles something like {\em cannot recover} or
{\em error reading recover file}}
You are trying to restart from a previous job that either
produced corrupted files, or did not do what you think it did. No luck: you
have to restart from scratch.
\paragraph{pw.x stops with {\em inconsistent DFT} error}
As a rule, the flavor of DFT used in the calculation should be the
same as the one used in the generation of pseudopotentials, which
should all be generated using the same flavor of DFT. This is actually enforced: the
type of DFT is read from pseudopotential files and it is checked that the same DFT
is read from all PPs. If this does not hold, the code stops with the
above error message. Use -- at your own risk -- input variable
\texttt{input\_dft} to force the usage of the DFT you like.
\paragraph{pw.x stops with error in cdiaghg or rdiaghg}
Possible reasons for such behavior are not always clear, but they
typically fall into one of the following cases:
\begin{itemize}
\item serious error in data, such as bad atomic positions or bad
crystal structure/supercell;
\item a bad pseudopotential, typically with a ghost, or a USPP giving
non-positive charge density, leading to a violation of positiveness
of the S matrix appearing in the USPP formalism;
\item a failure of the algorithm performing subspace
diagonalization. The LAPACK algorithms used by \texttt{cdiaghg}
(for generic k-points) or \texttt{rdiaghg} (for $\Gamma-$only case)
are
very robust and extensively tested. Still, it may seldom happen that
such algorithms fail. Try to use conjugate-gradient diagonalization
(\texttt{diagonalization='cg'}), a slower but very robust algorithm, and see
what happens.
\item buggy libraries. Machine-optimized mathematical libraries are
very fast but sometimes not so robust from a numerical point of
view. Suspicious behavior: you get an error that is not
reproducible on other architectures or that disappears if the
calculation is repeated with even minimal changes in
parameters. Known cases: HP-Compaq alphas with cxml libraries, Mac
OS-X with system BLAS/LAPACK. Try to use compiled BLAS and LAPACK
(or better, ATLAS) instead of machine-optimized libraries.
\end{itemize}
\paragraph{pw.x crashes with no error message at all}
This happens quite often in parallel execution, or under a batch
queue, or if you are writing the output to a file. When the program
crashes, part of the output, including the error message, may be lost,
or hidden into error files where nobody looks into. It is the fault of
the operating system, not of the code. Try to run interactively
and to write to the screen. If this doesn't help, move to next point.
\paragraph{pw.x crashes with {\em segmentation fault} or similarly
obscure messages}
Possible reasons:
\begin{itemize}
\item too much RAM memory or stack requested (see next item).
\item if you are using highly optimized mathematical libraries, verify
that they are designed for your hardware.
\item If you are using aggressive optimization in compilation, verify
that you are using the appropriate options for your machine
\item The executable was not properly compiled, or was compiled on
a different and incompatible environment.
\item buggy compiler or libraries: this is the default explanation if you
have problems with the provided tests and examples.
\end{itemize}
\paragraph{pw.x works for simple systems, but not for large systems
or whenever more RAM is needed}
Possible solutions:
\begin{itemize}
\item increase the amount of RAM you are authorized to use (which may
be much smaller than the available RAM). Ask your system
administrator if you don't know what to do.
\item reduce \texttt{nbnd} to the strict minimum, or reduce the cutoffs, or the
cell size , or a combination of them
\item use conjugate-gradient (\texttt{diagonalization='cg'}: slow but very
robust): it requires less memory than the default Davidson
algorithm. If you stick to the latter, use \texttt{diago\_david\_ndim=2}.
\item in parallel execution, use more processors, or use the same
number of processors with less pools. Remember that parallelization
with respect to k-points (pools) does not distribute memory:
parallelization with respect to R- (and G-) space does.
\item IBM only (32-bit machines): if you need more than 256 MB you
must specify it at link time (option \texttt{-bmaxdata}).
\item buggy or weird-behaving compiler. Some versions of the Portland
and Intel compilers on Linux PCs or clusters have this problem. For
Intel ifort 8.1 and later, the problem seems to be due to the
allocation of large automatic arrays that exceeds the available
stack. Increasing the stack size (with command \texttt{limits} or \texttt{ulimit})
may solve the problem. Versions $> 3.2$ try to avoid this
problem by removing the stack size limit at startup. See:\\
\texttt{http://www.democritos.it/pipermail/pw\_forum/2007-September/007176.html},\\
\texttt{http://www.democritos.it/pipermail/pw\_forum/2007-September/007179.html}.
\end{itemize}
\paragraph{pw.x crashes with {\em error in davcio}}
\texttt{davcio} is the routine that performs most of the I/O operations (read
from disk and write to disk) in \pw.x; {\em error in davcio} means a
failure of an I/O operation.
\begin{itemize}
\item If the error is reproducible and happens at the beginning of a
calculation: check if you have read/write permission to the scratch
directory specified in variable \texttt{outdir}. Also: check if there is
enough free space available on the disk you are writing to, and
check your disk quota (if any).
\item If the error is irreproducible: your might have flaky disks; if
you are writing via the network using NFS (which you shouldn't do
anyway), your network connection might be not so stable, or your
NFS implementation is unable to work under heavy load
\item If it happens while restarting from a previous calculation: you
might be restarting from the wrong place, or from wrong data, or
the files might be corrupted.
\item If you are running two or more instances of \texttt{pw.x} at
the same time, check if you are using the same file names in the
same temporary directory. For instance, if you submit a series of
jobs to a batch queue, do not use the same \texttt{outdir} and
the same \texttt{prefix}, unless you are sure that one job doesn't
start before a preceding one has finished.
\end{itemize}
\paragraph{pw.x crashes in parallel execution with an obscure message
related to MPI errors}
Random crashes due to MPI errors have often been reported, typically
in Linux PC clusters. We cannot rule out the possibility that bugs in
\qe\ cause such behavior, but we are quite confident that
the most likely explanation is a hardware problem (defective RAM
for instance) or a software bug (in MPI libraries, compiler, operating
system).
Debugging a parallel code may be difficult, but you should at least
verify if your problem is reproducible on different
architectures/software configurations/input data sets, and if
there is some particular condition that activates the bug. If this
doesn't seem to happen, the odds are that the problem is not in
\qe. You may still report your problem,
but consider that reports like {\em it crashes with...(obscure MPI error)}
contain 0 bits of information and are likely to get 0 bits of answers.
\paragraph{pw.x stops with error message {\em the system is metallic,
specify occupations}}
You did not specify state occupations, but you need to, since your
system appears to have an odd number of electrons. The variable
controlling how metallicity is treated is \texttt{occupations} in namelist
\&SYSTEM. The default, \texttt{occupations='fixed'}, occupies the lowest
(N electrons)/2 states and works only for insulators with a gap. In all other
cases, use \texttt{'smearing'} (\texttt{'tetrahedra'} for DOS calculations).
See input reference documentation for more details.
\paragraph{pw.x stops with {\em internal error: cannot bracket Ef}}
Possible reasons:
\begin{itemize}
\item serious error in data, such as bad number of electrons,
insufficient number of bands, absurd value of broadening;
\item the Fermi energy is found by bisection assuming that the
integrated DOS N(E ) is an increasing function of the energy. This
is not guaranteed for Methfessel-Paxton smearing of order 1 and can
give problems when very few k-points are used. Use some other
smearing function: simple Gaussian broadening or, better,
Marzari-Vanderbilt 'cold smearing'.
\end{itemize}
\paragraph{pw.x yields {\em internal error: cannot bracket Ef} message
but does not stop}
This may happen under special circumstances when you are calculating
the band structure for selected high-symmetry lines. The message
signals that occupations and Fermi energy are not correct (but
eigenvalues and eigenvectors are). Remove \texttt{occupations='tetrahedra'}
in the input data to get rid of the message.
\paragraph{pw.x runs but nothing happens}
Possible reasons:
\begin{itemize}
\item in parallel execution, the code died on just one
processor. Unpredictable behavior may follow.
\item in serial execution, the code encountered a floating-point error
and goes on producing NaNs (Not a Number) forever unless exception
handling is on (and usually it isn't). In both cases, look for one
of the reasons given above.
\item maybe your calculation will take more time than you expect.
\end{itemize}
\paragraph{pw.x yields weird results}
If resutls are really weird (as opposed to misinterpreted):
\begin{itemize}
\item if this happens after a change in the code or in compilation or
preprocessing options, try \texttt{make clean}, recompile. The \texttt{make}
command should take care of all dependencies, but do not rely too
heavily on it. If the problem persists, recompile with
reduced optimization level.
\item maybe your input data are weird.
\end{itemize}
\paragraph{FFT grid is machine-dependent}
Yes, they are! The code automatically chooses the smallest grid that
is compatible with the
specified cutoff in the specified cell, and is an allowed value for the FFT
library used. Most FFT libraries are implemented, or perform well, only
with dimensions that factors into products of small numers (2, 3, 5 typically,
sometimes 7 and 11). Different FFT libraries follow different rules and thus
different dimensions can result for the same system on different machines (or
even on the same machine, with a different FFT). See function allowed in
\texttt{Modules/fft\_scalar.f90}.
As a consequence, the energy may be slightly different on different machines.
The only piece that explicitly depends on the grid parameters is
the XC part of the energy that is computed numerically on the grid. The
differences should be small, though, especially for LDA calculations.
Manually setting the FFT grids to a desired value is possible, but slightly
tricky, using input variables \texttt{nr1}, \texttt{nr2}, \texttt{nr3} and
\texttt{nr1s}, \texttt{nr2s}, \texttt{nr3s}. The
code will still increase them if not acceptable. Automatic FFT grid
dimensions are slightly overestimated, so one may try {\em very carefully}
to reduce
them a little bit. The code will stop if too small values are required, it will
waste CPU time and memory for too large values.
Note that in parallel execution, it is very convenient to have FFT grid
dimensions along $z$ that are a multiple of the number of processors.
\paragraph{pw.x does not find all the symmetries you expected}
\pw.x determines first the symmetry operations (rotations) of the
Bravais lattice; then checks which of these are symmetry operations of
the system (including if needed fractional translations). This is done
by rotating (and translating if needed) the atoms in the unit cell and
verifying if the rotated unit cell coincides with the original one.
Assuming that your coordinates are correct (please carefully check!),
you may not find all the symmetries you expect because:
\begin{itemize}
\item the number of significant figures in the atomic positions is not
large enough. In file \texttt{PW/eqvect.f90}, the variable \texttt{accep} is used to
decide whether a rotation is a symmetry operation. Its current value
($10^{-5}$) is quite strict: a rotated atom must coincide with
another atom to 5 significant digits. You may change the value of
accep and recompile.
\item they are not acceptable symmetry operations of the Bravais
lattice. This is the case for C$_{60}$, for instance: the $I_h$
icosahedral group of C$_{60}$ contains 5-fold rotations that are
incompatible with translation symmetry.
\item the system is rotated with respect to symmetry axis. For
instance: a C$_{60}$ molecule in the fcc lattice will have 24
symmetry operations ($T_h$ group) only if the double bond is
aligned along one of the crystal axis; if C$_{60}$ is rotated
in some arbitrary way, \pw.x may not find any symmetry, apart from
inversion.
\item they contain a fractional translation that is incompatible with
the FFT grid (see next paragraph). Note that if you change cutoff or
unit cell volume, the automatically computed FFT grid changes, and
this may explain changes in symmetry (and in the number of k-points
as a consequence) for no apparent good reason (only if you have
fractional translations in the system, though).
\item a fractional translation, without rotation, is a symmetry
operation of the system. This means that the cell is actually a
supercell. In this case, all symmetry operations containing
fractional translations are disabled. The reason is that in this
rather exotic case there is no simple way to select those symmetry
operations forming a true group, in the mathematical sense of the
term.
\end{itemize}
\paragraph{{\em Warning: symmetry operation \# N not allowed}}
This is not an error. If a symmetry operation contains a fractional
translation that is incompatible with the FFT grid, it is discarded in
order to prevent problems with symmetrization. Typical fractional
translations are 1/2 or 1/3 of a lattice vector. If the FFT grid
dimension along that direction is not divisible respectively by 2 or
by 3, the symmetry operation will not transform the FFT grid into
itself.
\paragraph{Self-consistency is slow or does not converge at all}
Bad input data will often result in bad scf convergence. Please
carefully check your structure first, e.g. using XCrySDen.
Assuming that your input data is sensible :
\begin{enumerate}
\item Verify if your system is metallic or is close to a metallic
state, especially if you have few k-points. If the highest occupied
and lowest unoccupied state(s) keep exchanging place during
self-consistency, forget about reaching convergence. A typical sign
of such behavior is that the self-consistency error goes down, down,
down, than all of a sudden up again, and so on. Usually one can
solve the problem by adding a few empty bands and a small
broadening.
\item Reduce \texttt{mixing\_beta} to $\sim 0.3\div
0.1$ or smaller. Try the \texttt{mixing\_mode} value that is more
appropriate for your problem. For slab geometries used in surface
problems or for elongated cells, \texttt{mixing\_mode='local-TF'}
should be the better choice, dampening "charge sloshing". You may
also try to increase \texttt{mixing\_ndim} to more than 8 (default
value). Beware: this will increase the amount of memory you need.
\item Specific to USPP: the presence of negative charge density
regions due to either the pseudization procedure of the augmentation
part or to truncation at finite cutoff may give convergence
problems. Raising the \texttt{ecutrho} cutoff for charge density will
usually help.
\end{enumerate}
\paragraph{I do not get the same results in different machines!}
If the difference is small, do not panic. It is quite normal for
iterative methods to reach convergence through different paths as soon
as anything changes. In particular, between serial and parallel
execution there are operations that are not performed in the same
order. As the numerical accuracy of computer numbers is finite, this
can yield slightly different results.
It is also normal that the total energy converges to a better accuracy
than its terms, since only the sum is variational, i.e. has a minimum
in correspondence to ground-state charge density. Thus if the
convergence threshold is for instance $10^{-8}$, you get 8-digit
accuracy on the total energy, but one or two less on other terms
(e.g. XC and Hartree energy). It this is a problem for you, reduce the
convergence threshold for instance to $10^{-10}$ or $10^{-12}$. The
differences should go away (but it will probably take a few more
iterations to converge).
\paragraph{Execution time is time-dependent!}
Yes it is! On most machines and on
most operating systems, depending on machine load, on communication load
(for parallel machines), on various other factors (including maybe the phase
of the moon), reported execution times may vary quite a lot for the same job.
\paragraph{{\em Warning : N eigenvectors not converged}}
This is a warning message that can be safely ignored if it is not
present in the last steps of self-consistency. If it is still present
in the last steps of self-consistency, and if the number of
unconverged eigenvector is a significant part of the total, it may
signal serious trouble in self-consistency (see next point) or
something badly wrong in input data.
\paragraph{{\em Warning : negative or imaginary charge...}, or
{\em ...core charge ...}, or {\em npt with rhoup$<0$...} or {\em rho dw$<0$...}}
These are warning messages that can be safely ignored unless the
negative or imaginary charge is sizable, let us say of the order of
0.1. If it is, something seriously wrong is going on. Otherwise, the
origin of the negative charge is the following. When one transforms a
positive function in real space to Fourier space and truncates at some
finite cutoff, the positive function is no longer guaranteed to be
positive when transformed back to real space. This happens only with
core corrections and with USPPs. In some cases it
may be a source of trouble (see next point) but it is usually solved
by increasing the cutoff for the charge density.
\paragraph{Structural optimization is slow or does not converge or ends
with a mysterious bfgs error}
Typical structural optimizations, based on the BFGS algorithm,
converge to the default thresholds ( etot\_conv\_thr and
forc\_conv\_thr ) in 15-25 BFGS steps (depending on the
starting configuration). This may not happen when your
system is characterized by "floppy" low-energy modes, that make very
difficult (and of little use anyway) to reach a well converged structure, no
matter what. Other possible reasons for a problematic convergence are listed
below.
Close to convergence the self-consistency error in forces may become large
with respect to the value of forces. The resulting mismatch between forces
and energies may confuse the line minimization algorithm, which assumes
consistency between the two. The code reduces the starting self-consistency
threshold conv thr when approaching the minimum energy configuration, up
to a factor defined by \texttt{upscale}. Reducing \texttt{conv\_thr}
(or increasing \texttt{upscale})
yields a smoother structural optimization, but if \texttt{conv\_thr} becomes too small,
electronic self-consistency may not converge. You may also increase variables
\texttt{etot\_conv\_thr} and \texttt{forc\_conv\_thr} that determine the threshold for
convergence (the default values are quite strict).
A limitation to the accuracy of forces comes from the absence of perfect
translational invariance. If we had only the Hartree potential, our PW
calculation would be translationally invariant to machine
precision. The presence of an XC potential
introduces Fourier components in the potential that are not in our
basis set. This loss of precision (more serious for gradient-corrected
functionals) translates into a slight but detectable loss
of translational invariance (the energy changes if all atoms are displaced by
the same quantity, not commensurate with the FFT grid). This sets a limit
to the accuracy of forces. The situation improves somewhat by increasing
the \texttt{ecutrho} cutoff.
\paragraph{pw.x stops during variable-cell optimization in
checkallsym with {\em non orthogonal operation} error}
Variable-cell optimization may occasionally break the starting
symmetry of the cell. When this happens, the run is stopped because
the number of k-points calculated for the starting configuration may
no longer be suitable. Possible solutions:
\begin{itemize}
\item start with a nonsymmetric cell;
\item use a symmetry-conserving algorithm: the Wentzcovitch algorithm
(\texttt{cell dynamics='damp-w'}) should not break the symmetry.
\end{itemize}
\subsection{PostProc}
\paragraph{Some postprocessing codes complain that they do not find some files}
For Linux PC clusters in parallel execution: in at least some versions
of MPICH, the current directory is set to the directory where the executable
code resides, instead of being set to the directory where the code is executed.
This MPICH weirdness may cause unexpected failures in some postprocessing
codes that expect a data file in the current directory. Workaround: use
symbolic links, or copy the executable to the current directory.
\paragraph{{\em error in davcio} in postprocessing codes}
Most likely you are not reading the correct data files, or you are not
following the correct procedure for postprocessing. In parallel execution:
if you did not set \texttt{wf\_collect=.true.}, the number of processors and
pools for the phonon run should be the same as for the
self-consistent run; all files must be visible to all processors.
\subsection{ph.x errors}
\paragraph{ph.x stops with {\em error reading file}}
The data file produced by \pw.x
is bad or incomplete or produced by an incompatible version of the code.
In parallel execution: if you did not set \texttt{wf\_collect=.true.}, the number
of processors and pools for the phonon run should be the same as for the
self-consistent run; all files must be visible to all processors.
\paragraph{ph.x mumbles something like {\em cannot recover} or {\em error
reading recover file}}
You have a bad restart file from a preceding failed execution.
Remove all files \texttt{recover*} in \texttt{outdir}.
\paragraph{ph.x says {\em occupation numbers probably wrong} and
continues} You have a
metallic or spin-polarized system but occupations are not set to
\texttt{'smearing'}.
\paragraph{ph.x does not yield acoustic modes with zero frequency at $q=0$}
This may not be an error: the Acoustic Sum Rule (ASR) is never exactly
verified, because the system is never exactly translationally
invariant as it should be. The calculated frequency of the acoustic
mode is typically less than 10 cm$^{-1}$, but in some cases it may be
much higher, up to 100 cm$^{-1}$. The ultimate test is to diagonalize
the dynamical matrix with program \texttt{dynmat.x}, imposing the ASR. If you
obtain an acoustic mode with a much smaller $\omega$ (let us say
$< 1 \mbox{cm}^{-1}$ )
with all other modes virtually unchanged, you can trust your results.
''The problem is [...] in the fact that the XC
energy is computed in real space on a discrete grid and hence the
total energy is invariant (...) only for translation in the FFT
grid. Increasing the charge density cutoff increases the grid density
thus making the integral more exact thus reducing the problem,
unfortunately rather slowly...This problem is usually more severe for
GGA than with LDA because the GGA functionals have functional forms
that vary more strongly with the position; particularly so for
isolated molecules or system with significant portions of "vacuum"
because in the exponential tail of the charge density a) the finite
cutoff (hence there is an effect due to cutoff) induces oscillations
in rho and b) the reduced gradient is diverging.''(info by Stefano de
Gironcoli, June 2008)
\paragraph{ph.x yields really lousy phonons, with bad or negative
frequencies or wrong symmetries or gross ASR violations}
Possible reasons
\begin{itemize}
\item if this happens only for acoustic modes at $q=0$ that should
have $\omega=0$: Acoustic Sum Rule violation, see the item before
this one.
\item wrong data file read.
\item wrong atomic masses given in input will yield wrong frequencies
(but the content of file fildyn should be valid, since the force
constants, not the dynamical matrix, are written to file).
\item convergence threshold for either SCF (\texttt{conv\_thr}) or phonon
calculation (\texttt{tr2\_ph}) too large: try to reduce them.
\item maybe your system does have negative or strange phonon
frequencies, with the approximations you used. A negative frequency
signals a mechanical instability of the chosen structure. Check that
the structure is reasonable, and check the following parameters:
\begin{itemize}
\item The cutoff for wavefunctions, \texttt{ecutwfc}
\item For USPP: the cutoff for the charge density, \texttt{ecutrho}
\item The k-point grid, especially for metallic systems.
\end{itemize}
\end{itemize}
Note that "negative" frequencies are actually imaginary: the negative
sign flags eigenvalues of the dynamical matrix for which $\omega^2 <
0$.
\paragraph{{\em Wrong degeneracy} error in star\_q}
Verify the q-vector for which you are calculating phonons. In order to
check whether a symmetry operation belongs to the small group of $q$,
the code compares $q$ and the rotated $q$, with an acceptance tolerance of
$10^{-5}$ (set in routine \texttt{PW/eqvect.f90}). You may run into trouble if
your q-vector differs from a high-symmetry point by an amount in that
order of magnitude.
\section{Frequently Asked Questions (FAQ)}
\subsection{General}
If you search information on \qe, the best starting point is the web site
\texttt{html://www.quantum-espresso.org}. See in particular the
links ``learn'' for documentation, ``contacts'' if you need
somebody to talk with. The mailing list \texttt{pw\_forum} is
the typical place where to ask questions about \qe.
%More FAQS:
% how/where to submit problems
% whom to contact for which problem (download, web, wiki, qeforge,
% mailing list, bug, help ...)
% how to contact maintainers
% how to submit a bug report
% which hardware for QE
% How to find E(V) for a noncubic crystal
\subsection{Installation}
Most installation problems have obvious origins and can be solved by reading
error messages and acting accordingly. Sometimes the reason for a failure
is less obvious. In such a case, you should look into
Sec.\ref{Sec:Installation}, and into the \texttt{pw\_forum} archive to
see if a similar problem (with solution) is described. If you get
really weird error messages during installation, look for them with
your preferred Internet search engine (such as Google): very often you
will find an explanation and a workaround.
\paragraph{What Fortran compiler do I need to compile \qe?}
Any non-buggy, or not-too-buggy, fortran-95 compiler should work,
with minimal or no changes to the code. \configure may not
be able to recognize your system, though.
\paragraph{Why is \configure\ saying that I have no fortran compiler?}
Because you haven't one (really!); or maybe you have one, but it is not
in your execution path; or maybe it has been given an unusual name by your
system manager. Install a compiler if you have none; if you have one, fix
your execution path, or define an alias if it has a strange name.
Do not pass an executable with the path as an argument to \configure,
as in e.g. \texttt{./configure F90=/some/strange/f95}: it doesn't work.
\paragraph{Why is \configure\ saying that my fortran compiler doesn't work?}
Because it doesn't work (really!); more exactly, \configure\ has tried
to compile a small test program and didn't succeed. Your compiler may not be
properly installed. For Intel compiler on PC's: you may have forgotten to run
the required initialization script for the compiler. See also above.
\paragraph{\configure\ doesn't recognize my system, what should I do?}
If compilation/linking works, never mind, Otherwise, try to supply a suitable
supported architecture, or/and manually edit the \texttt{make.sys} file.
Detailed instructions in Sec.\ref{Sec:Installation}.
\paragraph{Why doesn't \configure\ recognize that I have a parallel machine?}
You need a properly configured complete parallel environment. If any piece
is missing, \configure\ will revert to serial compilation.
Detailed instructions in Sec.\ref{Sec:Installation}.
\paragraph{Compilation fails with {\em internal error}, what should I do?}
Any message during compilation saying something like {\em internal compiler
error}
and the like means that your compiler is buggy. You should report the problem
to the compiler maker -- especially if you paid real money for it.
Sometimes reducing the optimization level, or rearranging the code in a
strategic place, will make the problem disappear. In other cases you
will need to move to a different compiler, or to a less buggy version
(or buggy in a different way that doesn't bug you) of the same compiler.
\paragraph{Compilation fails at linking stage: {\em symbol ... not found}}
If the missing symbols (i.e. routines that are called but not found)
are in the code itself: most likely the fortran-to-C conventions used
in file \texttt{include/c\_defs.h} are not appropriate. Edit this file
and retry.
If the missing symbols are in external libraries (BLAS, LAPACK, FFT,
MPI libraries):
there is a name mismatch between what the compiler expects and what the
library provides. See Sec.\ref{Sec:Installation}).
If the missing symbols aren't found anywhere either in the code or in the
libraries: they are system library symbols. i) If they are called by external
libraries, you need to add a missing system library, or to use a different
set of external libraries, compiled with the same compiler you are using.
ii) If you are using no external libraries and still getting missing symbols,
your compiler and compiler libraries are not correctly installed.
\subsection{Pseudopotentials}
\paragraph{Can I mix USPP/NCPP/PAW ?}
Yes, you can (if implemented, of course: a few kinds of calculations
are not available with USPP, a few more are not for PAW). A small
restrictions exists in \texttt{cp.x}, expecting atoms with USPP listed before
those with NCPP, which in turn are expected before local PP's (if any).
A further restriction, that can be overriden,
is that all PP's should be generated with the same XC.
Otherwise, you can mix and match. Note that
it is the hardest atom that determines the cutoff.
\paragraph{Where can I find pseudopotentials for atom X?}
First, a general rule: when you ask for a pseudopotential, you should
always specify which kind of PP you need (NCPP, USPP
PAW, full- or scalar-relativistic, for which XC functional,
and, for many elements, with how many electrons in valence).
If you do not find anything suitable in the ``pseudo'' page of the web
site, we have bad news for you: you have to produce it by yourself.
See \ref{SubSec:pseudo} for more.
\paragraph{Where can I find pseudopotentials for rare-earth X?}
Please consider first if DFT is suitable for your system! In many cases,
it isn't (at least ``plain'' DFT: GGA and the like). If you are still
convinced that it is, see above.
\paragraph{Is there a converter from format XYZ to UPF?}
What is available (no warranty) is in directory \texttt{upftools/}.
You are most welcome to contribute a new converter.
\subsection{Input data}
A large percentage of the problems reported to the mailing list are
caused by incorrect input data. Before reporting a problem with
strange crashes or strange results, {\em please} have
a look at your structure with XCrySDen. XCrySDen can directly
visualise the structure from both \PWscf\ input data:
\begin{verbatim}
xcrysden --pwi "input-data-file"
\end{verbatim}
and from \PWscf\ output as well:
\begin{verbatim}
xcrysden --pwo "output-file".
\end{verbatim}
Unlike most other visualizers, XCrySDen is periodicity-aware: you can
easily visualize periodically repeated cells.
You are advised to always use XCrySDen to check your input data!
\paragraph{Where can I find the crystal structure/atomic positions of XYZ?}
The following site contains a lot of crystal structures:
\texttt{http://cst-www.nrl.navy.mil/lattice}.\\
"Since this seems to come up often, I'd like to point out that the
American Mineralogist Crystal Structure Database
(\texttt{http://rruff.geo.arizona.edu/AMS/amcsd})
is another excellent place to
find structures, though you will have to use it in conjunction with
the Bilbao crystallography server (\texttt{http://www.cryst.ehu.es}),
and have some understanding of space groups and Wyckoff positions".
See also:
\texttt{http://cci.lbl.gov/cctbx/index.html}.
\paragraph{How can I generate a supercell?}
If you need to create a supercell and are too lazy to create a
small program to translate atoms, you can
\begin{itemize}
\item ``use the 'spacegroup' program in EXCITING package
(http://exciting-code.org) to generate the supercell,
use 'fropho' (http://fropho.sourceforge.net) to check the symmetry''
(Kun Yin, April 2009)
\item ``use the PHON code: http://www.homepages.ucl.ac.uk/\~{}ucfbdxa/''
(Eyvaz Isaev, April 2009).
\end{itemize}
\paragraph{Where can I find the Brillouin Zone/high-symmetry
points/irreps for XYZ?}
"You might find this web site useful:
\texttt{http://www.cryst.ehu.es/cryst/get\_kvec.html}" (info by Cyrille
Barreteau, nov. 2007). Or else: in textbooks, such as e.g. {\em The
mathematical theory of symmetry in solids} by Bradley and Cracknell.
\paragraph{Where can I find Monkhorst-Pack grids of k-points?}
Auxiliary code \texttt{kpoints.x}, found in \texttt{pwtools/} and
produced by \texttt{make tools}, generates uniform grids of k-points
that are equivalent to Monkhorst-Pack grids.
\subsection{Parallel execution}
Effective usage of parallelism requires some basic knowledge on how
parallel machines work and how parallelism is implemented in
\qe. If you have no experience and no clear ideas (or not
idea at all), consider reading Sec.\ref{Sec:para}.
\paragraph{How do I choose the number of processors/how do I setup my parallel calculation?}
Please see above.
\paragraph{Why is my parallel job running in such a lousy way?}
A frequent reason for lousy parallel performances is a
conflict between MPI parallelization (implemented in \qe)
and the autoparallelizing feature of MKL libraries. Set the
environment variable \texttt{OPEN\_MP\_THREADS} to 1.
See Sec.\ref{Sec:para} for more info.
\paragraph{Why is my parallel job crashing when reading input data / doing nothing?}
If the same data work in serial execution, use
\texttt{code -inp input\_file} instead of \texttt{code $<$ input\_file}.
Some MPI libraries do not properly handle input redirection.
\paragraph{The code stops with an {\em error reading namelist xxxx}}
Most likely there is a misspelled variable in namelist xxxx.
If there isn't any (have you looked carefully? really?? REALLY???),
beware control characters like DOS control-M: they can confuse
the namelist-reading code. If this happens to the first namelist
to be read (usually "\&CONTROL") in parallel execution, see above.
\paragraph{Why is my parallel job crashing with mysterious errors?}
Mysterious, unpredictable, erratic errors in parallel execution are
almost always coming from bugs in the compiler or/and in the MPI
libraries and sometimes even to flacky hardware. Sorry, not our fault.
\subsection{Frequent errors during execution}
\paragraph{Why is the code saying {\em Wrong atomic coordinates}?}
Because they are: two or more atoms in the list of atoms have
overlapping, or anyway too close, positions. Can't you see why? look better
(or use XCrySDen: see above) and remember that the code checks periodic
images as well.
\paragraph{The code stops with an {\em error in davcio}}
Possible reasons: disk is full; \texttt{outdir} is not writable for
any reason; you changed some parameter(s) in the input (like
\texttt{wf\_collect}, or the number of processors/pools) without
doing a bit of cleanup in your temporary files; you were running
more than one instance of \texttt{pw.x} in the same temporary
directory with the same file names.
\paragraph{The code stops with a {\em wrong charge} error}
In most cases: you are treating a metallic system
as if it were insulating.
\paragraph{The code stops with a mysterious error in IOTK}
IOTK is a toolkit that reads/writes XML files. There are frequent
reports of mysterious errors with IOTK not finding some variable
in the XML data file. If this error has no obvious explanation
(e.g. the file is properly written and read, the searched variable
is present, etc) and if it appears to be erratic or irreproducible
(e.g. it occurs only with version X of compiler Y), it is almost
certainly due to a compiler bug. Try to reduce optimization level,
or use a different compiler. If you paid real money for your
compiler, complain with the vendor.
\subsection{Self Consistency}
\paragraph{What are the units for quantity XYZ?}
Unless otherwise specified, all \PWscf\ input and output
quantities are in atomic "Rydberg" units, i.e. energies in Ry, lengths
in Bohr radii, etc.. Note that \CP\ uses instead atomic "Hartree"
units: energies in Ha, lengths in Bohr radii.
\paragraph{Self-consistency is slow or does not converge at all}
In most cases: your input data is bad, or else your system is metallic
and you are treating it as an insulator. If this is not the case:
reduce \texttt{mixing\_beta} to $\sim 0.3\div 0.1$ or smaller,
try the \texttt{mixing\_mode} value that is more
appropriate for your problem.
\paragraph{What is the difference between total and absolute magnetization?}
The total magnetization is the integral of the magnetization
in the cell:
$$
M_T = \int (n_{up}-n_{down}) d^3r.
$$
The absolute magnetization is the integral of the absolute value of
the magnetization in the cell:
$$
M_A= \int |n_{up}-n_{down}| d^3r.
$$
In a simple ferromagnetic material they should be equal (except
possibly for an overall sign)`. In simple antiferromagnets (like FeO,
NiO) $M_T$ is zero and $M_A$ is twice the magnetization of each of the
two atoms. (info by Stefano de Gironcoli)
\paragraph{How can I calculate magnetic moments for each atom?}
There is no 'right' way of defining the local magnetic moment
around an atom in a multi-atom system. However an approximate way to define
it is via the projected density of states on the atomic orbitals (code
projwfc.x, see example08 for its use as a postprocessing tool). This
code generate many files with the density of states projected on each
atomic wavefunction of each atom and a BIG amount of data on the
standard output, the last few lines of which contain the decomposition
of Lowdin charges on angular momentum and spin component of each atom.
\paragraph{What is the order of $Y_{lm}$ components in projected
DOS / projection of atomic wavefunctions?}
See input data documentation for \texttt{projwfc.x}.
\paragraph{Why is the sum of partial Lowdin charges not equal to
the total charge?}
"Lowdin charges (as well as other conventional atomic charges) do not
satisfy any sum rule. You can easily convince yourself that ths is the
case because the atomic orbitals that are used to calculate them are
arbitrary to some extent. If yu like, you can think that the missing
charge is "delocalized" or "bonding" charge, but this would be another
way of naming the conventional (to some extent) character of Lowdin
charge." (Stefano Baroni, Sept. 2008).
See also the definition of "spilling parameter": Sanchez-Portal et
al., Sol. State Commun. 95, 685 (1995). The spilling parameter
measures the ability of the basis provided by the pseudo-atomic wfc to
represent the PW eigenstates, by measuring how much of the subspace of
the Hamiltonian eigenstates falls outside the subspace spanned by the
atomic basis.
\paragraph{I cannot find the Fermi energy, where is it?}
It is printed in the output. If not, the information on Gaussian smearing,
needed to calculate a sensible Fermi energy, was not provided in input.
In this case, \pw.x prints instead the highest occupied and lowest
unoccupied levels. If not, the number of bands to be calculated was not
provided in input and \pw.x calculates occupied bands only.
\paragraph{What is the reference level for Kohn-Sham energies?
Why do I get positive values for Kohn-Sham levels?}
The reference level is an ill-defined quantity in calculations
in solids with periodic boundary conditions. Absolute values of
Kohn-Sham eigenvalues are meaningless.
\paragraph{Why do I get a strange value of the Fermi energy?}
"The value of the Fermi energy (as well as of any energy, for that
matter) depends of the reference level. What you are referring to is
probably the "Fermi energy referred to the vacuum level" (i.e.
the work function). In order to obtain that, you need to know what the
vacuum level is, which cannot be said from a bulk calculation only"
(Stefano Baroni, Sept. 2008).
\paragraph{Why I don't get zero pressure/stress at equilibrium?}
It depends. If you make a calculation with fixed cell parameters, you
will never get exactly zero pressure/stress, unless you use the cell
that yields perfect equilibrium for your pseudopotentials, cutoffs,
k-points, etc.. Such cell will anyway be slightly different from the
experimental one. Note however that pressures/stresses in the order of
a few KBar correspond to very small differences in terms of lattice parameters.
If you obtain the equilibrium cell from a variable-cell optimization,
do not forget that the pressure/stress calculated with the modified
kinetic energy functional (very useful for variable-cell calculations)
slightly differ from those calculated without it. Also note that the
PW basis set used during variable-cell calculations is
determined by the given cutoff and the {\em initial} cell. If you
make a calculation with the final geometry at the same cutoff,
you may get slightly different results. The difference should
be small, though, unless you are using a too low cutoff for your
system.
\paragraph{Why do I get {\em negative starting charge}?}
Self-consistency requires an initial guess for the charge density in
order to bootstrap the iterative algorithm. This first guess is
usually built from a superposition of atomic charges, constructed from
pseudopotential data.
More often than not, this charges are a slightly too hard to be
expanded very accurately in PWs, hence some aliasing error
will be introduced. Especially if the unit cell is big and mostly
empty, some local low negative charge density will be produced.
''This is NOT harmful at all, the negative charge density is handled
properly by the code and will disappear during the self-consistent
cycles'', but if it is very high (let's say more than 0.001*number of
electrons) it may be a symptom that your charge density cutoff is too
low. (L. Paulatto - November 2008)
\paragraph{How do I calculate the work function?}
Work function = (average potential in the vacuum) - (Fermi
Energy). The former is estimated in a supercell with the slab
geometry, by looking at the average of the electrostatic potential
(typically without the XC part). See the example in
examples/WorkFct\_example.
\subsection{ Phonons }
\paragraph{ Is there a simple way to determine the symmetry of a given
phonon mode?}
A symmetry analyzer was added in v.3.2 by Andrea Dal Corso.
Other packages that perform symmetry analysis of phonons and normal modes:\\
ISOTROPY package: http://stokes.byu.edu/iso/isotropy.html\\
ACKJ, ACMI packages: http://www.cpc.cs.qub.ac.uk.
\paragraph{I am not getting zero acoustic mode frequencies, why? }
Because the Acoustic Sum Rule (ASR), i.e. the translational invariance,
is violated in approximated calculations. In PW calculations,
the main and most irreducible violation comes from the discreteness
of the FFT grid. There may be other reasons, though, notably
insufficient convergence: "Recently I found that the parameters
\texttt{tr2\_ph} for the phonons and \texttt{conv\_thr} for the
ground state can affect the quality of the phonon calculation,
especially the "vanishing" frequencies for molecules."
(Info from Katalyn Gaal-Nagy). Anyway: if the nonzero frequencies are
small, you can impose the ASR to the dynamical matrix, usually with
excellent results.
Nonzero frequencies for rotational modes of a molecule are a fictitious
effect of the finite supercell size, or else, of a less than perfect
convergence of the geometry of the molecule.
\paragraph{Why do I get negative phonon frequencies? }
"Negative" frequencies actually are "imaginary" frequencies
($\omega^2<0$). If these occur for acoustic frequencies at Gamma point,
or for rotational modes of a molecule, see above.
In all other cases: it depends. It may be a problem of bad
convergence (see above), or it may signal a real instability.
\paragraph{Why do I get a message {\em no elec. field with metals}? }
If you want to calculate the contribution of macroscopic electric
fields to phonons -- a quantity that is well-defined in insulators
only --- you cannot use smearing in the scf calculation, or else the
code will complain.
\paragraph{How can I calculate Raman/IR coefficients in metals?}
You cannot: they are well defined only for insulators.
\paragraph{How can I calculate the electron-phonon coefficients
in insulators?}
You cannot: the current implementation is for metals only.
\end{document}