qmcpack/manual/analysis.tex

1197 lines
54 KiB
TeX

\chapter{Analyzing QMCPACK data}
\label{chap:analyzing}
\section{Using the qmca tool to obtain total energies and related quantities}
\label{sec:qmca}
The \texttt{qmca} tool is the primary means of analyzing scalar valued
data generated by QMCPACK. Output files that contain scalar valued data
are \texttt{*.scalar.dat} and \texttt{*.dmc.dat} (see chapter
\ref{chap:output_overview} for a detailed description of these files).
Quantities that are available for analysis in \texttt{*.scalar.dat} files
include the local energy and its variance, the kinetic energy,
the potential energy and its components,
the acceptance ratio, and the average cpu time spent per block, among
others. The \texttt{*.dmc.dat} files provide information regarding
the DMC walker population in addition to the local energy.
Basic capabilities of \texttt{qmca} include calculating mean values
and associated error bars, processing multiple files at once in batched
fashion, performing twist averaging, plotting mean values by series,
and plotting traces (per block or step) of the underlying data.
These capabilities are explained with accompanying examples in the
following subsections.
To use \texttt{qmca}, installations of Python and NumPy must be
present on the local machine. For graphical plotting, the matplotlib module
must also be available.
An overview of all supported input flags to \texttt{qmca} can be
obtained by typing ``\texttt{qmca}'' at the command line with no
other inputs (also try ``\texttt{qmca -x}'' for a short list of examples):
\begin{shade}
>qmca
no files provided, please see help info below
Usage: qmca [options] [file(s)]
Options:
--version show program's version number and exit
-v, --verbose Print detailed information (default=False).
-q QUANTITIES, --quantities=QUANTITIES
Quantity or list of quantities to analyze. See names
and abbreviations below (default=all).
-u UNITS, --units=UNITS
Desired energy units. Can be Ha (Hartree), Ry
(Rydberg), eV (electron volts), kJ_mol (k.
joule/mole), K (Kelvin), J (Joules) (default=Ha).
-e EQUILIBRATION, --equilibration=EQUILIBRATION
Equilibration length in blocks (default=auto).
-a, --average Average over files in each series (default=False).
-w WEIGHTS, --weights=WEIGHTS
List of weights for averaging (default=None).
-b, --reblock (pending) Use reblocking to calculate statistics
(default=False).
-p, --plot Plot quantities vs. series (default=False).
-t, --trace Plot a trace of quantities (default=False).
-h, --histogram (pending) Plot a histogram of quantities
(default=False).
-o, --overlay Overlay plots (default=False).
--legend=LEGEND Placement of legend. None for no legend, outside for
outside legend (default=upper right).
--noautocorr Do not calculate autocorrelation. Warning: error bars
are no longer valid! (default=False).
--noac Alias for --noautocorr (default=False).
--sac Show autocorrelation of sample data (default=False).
--sv Show variance of sample data (default=False).
-i, --image (pending) Save image files (default=False).
-r, --report (pending) Write a report (default=False).
-s, --show_options Print user provided options (default=False).
-x, --examples Print examples and exit (default=False).
--help Print help information and exit (default=False).
-d DESIRED_ERROR, --desired_error=DESIRED_ERROR
Show number of samples needed for desired error bar
(default=none).
-n PARTICLE_NUMBER, --enlarge_system=PARTICLE_NUMBER
Show number of samples needed to maintain error bar on
larger system: desired particle number first, current
particle number second (default=none)
\end{shade}
\subsection{Obtaining a statistically correct mean and error bar}
\label{sec:qmca_mean_error}
A rough guess at the mean and error bar of the local energy can be
obtained in the following way with \texttt{qmca}:
\begin{shade}
>qmca -q e qmc.s000.scalar.dat
qmc series 0 LocalEnergy = -45.876150 +/- 0.017688
\end{shade}
\noindent
In this case the VMC energy of an 8 atom cell of diamond is estimated
to be $-45.876(2)$ Hartrees. This rough guess should not be used
for production-level or publication quality estimates.
To obtain production-level results, the underlying data should first
inspected visually to ensure that all data included in the averaging
can be attributed to a distribution sharing the same mean. The first
steps of essentially any Monte Carlo calculation (the ``equilibration
phase'') do not belong to the equilibrium distribution and should be
excluded from estimates of the mean and its error bar.
We can plot a data trace (``\texttt{-t}'') of the local energy in the
following way:
\begin{shade}
>qmca -t -q e -e 0 qmc.s000.scalar.dat
\end{shade}
\noindent
The ``\texttt{-e 0}'' part indicates that we do not want any data
to be excluded from the calculation of averages initially. The resulting
plot is shown in Fig. \ref{fig:qmca_mean_error_trace}. The unphysical
equilibration period is visible on the left side of the plot.
Most of the data fluctuates around a well defined mean (consistent
variations around a flat line). This property is important to verify
by plotting the trace for each QMC run.
\begin{figure}
\begin{center}
\ifdefined\HCode
\includegraphics[trim = 0mm 0mm 0mm 0mm,clip,width=0.75\textwidth]{./figures/qmca_mean_error_trace.dmn}
\else
\includegraphics[trim = 0mm 0mm 0mm 0mm,clip,width=0.75\textwidth]{./figures/qmca_mean_error_trace.pdf}
\fi
\caption{Trace of the VMC local energy for an 8 atom cell of diamond generated with \texttt{qmca}. The x-axis (``samples'') refers to the VMC block index in this case.}
\label{fig:qmca_mean_error_trace}
\end{center}
\end{figure}
If we exclude none of the equilibration data points, we get an
erroneous estimate of $-45.870(2)$ Ha for the local energy:
\begin{shade}
>qmca -q e -e 0 qmc.s000.scalar.dat
qmc series 0 LocalEnergy = -45.870071 +/- 0.018072
\end{shade}
\noindent
The equilibration period is typically estimated by eye, though one should
check a few conservative values to ensure that the mean remains
unaffected. In this dataset, the equilibration appears to have been
reached after 100 samples or so. After excluding the first 100
VMC blocks from the analysis we get:
\begin{shade}
>qmca -q e -e 100 qmc.s000.scalar.dat
qmc series 0 LocalEnergy = -45.877363 +/- 0.017432
\end{shade}
\noindent
This estimate ($-45.877(2)$ Ha) differs significantly from the
$-45.870(2)$ Ha figure obtained from the full set of data, but it
agrees with the rough estimate of $-45.876(2)$ Hartrees obtained
with the abbreviated command (``\texttt{qmca -q e qmc.s000.scalar.dat}'').
This is because \texttt{qmca} makes a heuristic guess at the
equilibration period and got it reasonably correct in this case.
There are many cases where the heuristic guess fails and it should not
be relied on for quality results.
We have so far obtained a statistically correct mean. To obtain
a statistically correct error bar it is best to include $\sim$100 or more
statistically independent samples. An estimate of the number
of independent samples can be obtained by considering the
autocorrelation time, which is essentially a measure of the number of
samples that must be traversed before an uncorrelated/independent sample
is reached. We can get an estimate of the autocorrelation time
in the following way:
\begin{shade}
>qmca -q e -e 100 qmc.s000.scalar.dat --sac
qmc series 0 LocalEnergy = -45.877363 +/- 0.017432 4.8
\end{shade}
\noindent
The flag ``\texttt{--sac}'' stands for (s)how (a)uto(c)orrelation.
In this case the autocorrelation estimate is $4.8\approx 5$ samples.
Since the total run contained 800 samples and we have excluded 100 of
them, we can estimate the number of independent samples as
$(800-100)/5=140$. In this case, the error bar is expected to be
estimated reasonably well.
Please keep in mind that the error bar represents the expected range
of the mean with a certainty of only $\sim 70\%$, i.e. it is a one
sigma error bar. The actual mean value will lie outside the range
indicated by the error bar in one out of every three runs and in a set
of 20 runs one value can be expected to deviate from its estimate by
twice the error bar.
\subsection{Judging wavefunction optimization}
\label{sec:qmca_judge_opt}
Wavefunction optimization is a highly non-linear and sometimes
sensitive process. As such, there is a risk that systematic
errors encountered at this stage of the QMC process can be propagated
into subsequent (expensive) DMC runs unless they are guarded against
with vigilance.
In this section we again consider an 8 atom cell of diamond, but
now in the context of Jastrow optimization (one- and two-body terms).
In optimization runs it is often preferable to use a large number
of \texttt{warmupsteps} ($\sim 100$) so that equilibration bias does
not propagate into the optimization process. We can check that
the added warmup has had its intended effect by again checking the
local energy trace:
\begin{shade}
>qmca -t -q e *scalar*
\end{shade}
\noindent
The resulting plot can be found in Fig. \ref{fig:qmca_judge_opt}.
In this case sufficient \texttt{warmupsteps} were used to exit
the equilibration period before samples were collected and we can
proceed without using the ``\texttt{-e}'' option with \texttt{qmca}.
\begin{figure}
\begin{center}
\ifdefined\HCode
\includegraphics[trim = 0mm 0mm 0mm 0mm, clip,width=0.9\textwidth]{./figures/qmca_judge_opt.dmn}
\else
\includegraphics[trim = 0mm 0mm 0mm 0mm, clip,width=0.9\columnwidth]{./figures/qmca_judge_opt.pdf}
\fi
\end{center}
\caption{Trace of the local energy during one- and two-body Jastrow optimization for an 8 atom cell of diamond generated with \texttt{qmca}. Data for each optimization cycle (QMCPACK series) is separated by a vertical black line.
}
\label{fig:qmca_judge_opt}
\end{figure}
After inspecting the trace, we should inspect the text output
from \texttt{qmca}, now including the total energy and its variance:
\begin{shade}
>qmca -q ev opt*scalar.dat
LocalEnergy Variance ratio
opt series 0 -44.823616 +/- 0.007430 7.054219 +/- 0.041998 0.1574
opt series 1 -45.877643 +/- 0.003329 1.095362 +/- 0.041154 0.0239
opt series 2 -45.883191 +/- 0.004149 1.077942 +/- 0.021555 0.0235
opt series 3 -45.877524 +/- 0.003094 1.074047 +/- 0.010491 0.0234
opt series 4 -45.886062 +/- 0.003750 1.061707 +/- 0.014459 0.0231
opt series 5 -45.877668 +/- 0.003475 1.091585 +/- 0.021637 0.0238
opt series 6 -45.877109 +/- 0.003586 1.069205 +/- 0.009387 0.0233
opt series 7 -45.882563 +/- 0.004324 1.058771 +/- 0.008651 0.0231
\end{shade}
\noindent
The flags ``\texttt{-q ev}'' requested the energy (\texttt{e}) and
the variance (\texttt{v}). For this combination of quantities, a
third column (``\texttt{ratio}'') is printed containing the ratio
of the variance and the absolute value of the local energy.
The variance/energy ratio is an intensive quantity and is useful
to inspect regardless of the system under study. Successful
optimization of molecules and solids of any size generally result
in comparable values for the variance/energy ratio.
The first line of
the output (``\texttt{series 0}'') corresponds to the local energy
and variance of the system without a Jastrow factor (all Jastrow
coefficients were initialized to zero in this case), reflecting the
quality of the orbitals alone. For pseudopotential systems, a
variance/energy ratio $>0.20$ Ha generally indicates there is a problem
with the input orbitals that needs to be resolved prior to
performing wavefunction optimization.
The subsequent lines correspond to energies and variances of
intermediate parameterizations of the trial wavefunction during
the optimization process. The output line containing
``\texttt{opt series 1}'', for example, corresponds to the trial
wavefunction parameterized during the ``\texttt{series 0}'' step
(the parameters of this wavefunction would be found in an output
file matching \texttt{*s000*opt.xml}). The first thing to check
about the resulting optimization is again the variance/energy ratio.
For pseudopotential systems, a variance/energy ratio $<0.03$ Ha is
consistent with a trial wavefunction of production quality, and values
of $0.01$ Ha are rarely obtainable for standard Slater-Jastrow
wavefunctions. By this metric, all parameterizations obtained for
optimizations performed in series 0-6 are of comparable quality
(note that the quality of the wavefunction obtained during optimization
series 7 is effectively unknown).
A good way to further discriminate among the parameterizations is to
plot the energy and variance as a function of series with \texttt{qmca}:
\begin{shade}
>qmca -p -q ev opt*scalar.dat
\end{shade}
\noindent
The ``\texttt{-p}'' option results in plots of means plus error bars
vs. series for all requested quantities.
The resulting plots for the local energy and variance are shown
in Fig. \ref{fig:qmca_opt_ev}. In this case the resulting energies
and variances are statistically indistinguishable for all optimization
cycles.
\begin{figure}
\centering
\ifdefined\HCode%
\begin{tabularx}{1024pt}{X X}
\includegraphics[trim=0mm 0mm 4mm 0mm,clip,width=512pt]{./figures/qmca_opt_energy.dmn}&
\includegraphics[trim=2mm 0mm 4mm 0mm,clip,width=512pt]{./figures/qmca_opt_variance.png}\\
\end{tabularx}
\else%
\begin{tabularx}{\textwidth}{X X}
\includegraphics[trim=0mm 0mm 4mm 0mm,clip,width=0.47\textwidth]{./figures/qmca_opt_energy.pdf}&
\includegraphics[trim=2mm 0mm 4mm 0mm,clip,width=0.47\textwidth]{./figures/qmca_opt_variance.png}\\
\end{tabularx}
\fi%
\caption{Energy and variance vs. optimization series for an 8 atom cell of diamond as plotted by \texttt{qmca}.}%
\label{fig:qmca_opt_ev}%
\end{figure}
A good way to choose the optimal wavefunction for use in DMC is to select
the one with lowest statistically significant energy within the set of
optimized wavefunctions with reasonable variance (\emph{e.g.} among
those with variance/energy ratio $<0.03$ Ha). For pseudopotential
calculations, minimizing according to the total energy is recommended
to reduce locality errors in DMC.
\subsection{Judging diffusion Monte Carlo runs}
\label{sec:qmca_judge_dmc}
Judging the quality of the DMC projection process requires more
care than this needed in VMC. In order to reduce bias, a small
timestep is required in the approximate projector but this also
leads to slow equilibration and long autocorrelation times.
Systematic errors in the projection process can also arise from
statistical fluctuations due to pseudopotentials or from trial
wavefunctions with larger than necessary variance.
\begin{figure}
\begin{center}
\ifdefined\HCode
\includegraphics[trim = 0mm 0mm 0mm 0mm,clip,width=0.75\columnwidth]{./figures/qmca_short_dmc.dmn}
\else
\includegraphics[trim = 0mm 0mm 0mm 0mm,clip,width=0.75\columnwidth]{./figures/qmca_short_dmc.pdf}
\fi
\end{center}
\caption{Trace of the local energy for VMC followed by DMC with a small timestep ($0.002$ Ha$^{-1}$) for an 8 atom cell of diamond generated with \texttt{qmca}.}
\label{fig:qmca_short_dmc}
\end{figure}
To illustrate the problems that can arise with respect to slow
equilibration and long autocorrelation times, we consider the
8 atom diamond system with VMC ($200$ blocks of $160$ steps) followed
by DMC ($400$ blocks of $5$ steps) with a small timestep ($0.002$ Ha$^{-1}$).
A good first step in assessing the quality of any DMC run is
to plot the trace of the local energy:
\begin{shade}
>qmca -t -q e -e 0 *scalar*
\end{shade}
\noindent
The resulting trace plot is shown in Fig. \ref{fig:qmca_short_dmc}.
As always, the DMC local energy decreases exponentially away from
the VMC value but in this case it takes a long time to do so.
At least half of the DMC run is inefficiently consumed by equilibration.
If we are not careful to inspect and remove the transient, the estimated
DMC energy will be strongly biased by the transient as shown by the
horizontal red line (estimated mean) in the figure. The autocorrelation
time is also large ($\sim 12$ blocks):
\begin{shade}
>qmca -q e -e 200 --sac *s001.scalar*
qmc series 1 LocalEnergy = -46.045720 +/- 0.004813 11.6
\end{shade}
\noindent
Of the included 200 blocks, fewer than 20 contribute to the estimated error
bar, indicating that we cannot trust the reported error bar.
This can also be demonstrated directly from the data. If we halve the number
of samples included to 100, we would expect from Gaussian statistics
that the error bar would grow by a factor of $\sqrt{2}$, but instead we
get
\begin{shade}
>qmca -q e -e 300 *s001.scalar*
qmc series 1 LocalEnergy = -46.048537 +/- 0.009280
\end{shade}
\noindent
which erroneously shows an estimated increase in the error bar by a factor
of about two. Overall this run is simply too short to gain meaningful
information.
Consider the case where we are interested in the cohesive energy of
diamond and, after having performed a timestep study of the cohesive
energy, we have found that the energy difference between bulk diamond
and atomic carbon converges to our required accuracy with a larger
timestep of $0.01$ Ha$^{-1}$. In a production setting, a small cell
could be used to determine the appropriate timestep while a larger
cell would subsequently be used to obtain a converged cohesive energy,
though for purposes of demonstration we still proceed with the 8 atom
cell here. The new timestep of $0.01$ Ha$^{-1}$ will result in a shorter
autocorrelation time than the smaller timestep used previously, but
we would like to shorten the equilibration time further still. This
can be achieved by using a larger timestep (say $0.02$ Ha$^{-1}$) in a
short intermediate DMC run used to walk down the transient. The
rapidly achieved equilibrium with the $0.02$ Ha$^{-1}$ timestep
projector will be much nearer to the $0.01$ Ha$^{-1}$ timestep one
we seek than the original VMC equilibrium, and so we can expect
a shortened secondary equilibration time in the production
$0.01$ Ha$^{-1}$ timestep run. Note that this procedure is fully
general, even if one has to deal with an even shorter
timestep--\emph{e.g.} $0.002$ Ha$^{-1}$--for a particular problem.
\begin{figure}
\begin{center}
\ifdefined\HCode
\includegraphics[trim = 0mm 0mm 0mm 0mm, clip,width=0.75\columnwidth]{./figures/qmca_accel_dmc.dmn}
\else
\includegraphics[trim = 0mm 0mm 0mm 0mm, clip,width=0.75\columnwidth]{./figures/qmca_accel_dmc.pdf}
\fi
\end{center}
\caption{Trace of the local energy for VMC followed by a short intermediate DMC with a large timestep ($0.02$ Ha$^{-1}$) and finally a production DMC run with a timestep of $0.01$ Ha$^{-1}$. Calculations were performed in an 8 atom cell of diamond.}
\label{fig:qmca_accel_dmc}
\end{figure}
We now rerun the prior example but with an intermediate DMC
calculation using $40$ blocks of $5$ steps with a timestep of
$0.02$ Ha$^{-1}$ followed by a production DMC calculation
using $400$ blocks of $10$ steps with a timestep of $0.01$ Ha$^{-1}$.
We again plot the local energy trace using \texttt{qmca}
\begin{shade}
>qmca -t -q e -e 0 *scalar*
\end{shade}
\noindent
with the result shown in Fig. \ref{fig:qmca_accel_dmc}.
The projection transient has been effectively contained in the
short DMC run with a larger timestep. As expected, the
production run contains only a short equilibration period.
Removing the first 20 blocks as a precaution, we obtain an estimate
of the total energy in VMC and DMC:
\begin{shade}
>qmca -q ev -e 20 --sac qmc.*.scalar.dat
LocalEnergy Variance ratio
qmc series 0 -45.881042 +/- 0.001283 1.0 1.076726 +/- 0.007013 1.0 0.0235
qmc series 1 -46.040814 +/- 0.005046 3.9 1.011303 +/- 0.016807 1.1 0.0220
qmc series 2 -46.032960 +/- 0.002077 5.2 1.014940 +/- 0.002547 1.0 0.0220
\end{shade}
\noindent
Notice that the variance energy ratio in DMC ($0.220$ Ha) is similar to, but
slightly smaller than, what is obtained with VMC ($0.235$ Ha). If the DMC
variance/energy ratio is ever significantly larger than in VMC, this is
cause to be concerned about the correctness of the DMC run. Also notice
the estimated autocorrelation time ($\sim 5$ blocks). This leaves us with
an estimated $\sim 76$ independent samples, though we should recall that
the autocorrelation time is also a statistical estimate which can be improved
with more data. We can gain a better estimate of the autocorrelation
time by using the \texttt{*.dmc.dat} files which contain output data resolved
per step rather than per block (there are $10\times$ more steps than blocks
in this example case):
\begin{shade}
>qmca -q ev -e 200 --sac qmc.s002.dmc.dat
LocalEnergy Variance ratio
qmc series 2 -46.032909 +/- 0.002068 31.2 1.015781 +/- 0.002536 1.4 0.0221
\end{shade}
\noindent
This results in an estimated autocorrelation time of $\sim 31$ steps, or
$\sim 3$ blocks, indicating that we actually have $\sim 122$ independent
samples which should be sufficient to obtain a trustworthy error bar.
Our final DMC total energy is estimated to be $-46.0329(2)$ Ha.
\begin{figure}
\begin{center}
\ifdefined\HCode
\includegraphics[trim = 0mm 0mm 0mm 0mm, clip,width=0.75\columnwidth]{./figures/qmca_pop_trace.dmn}
\else
\includegraphics[trim = 0mm 0mm 0mm 0mm, clip,width=0.75\columnwidth]{./figures/qmca_pop_trace.pdf}
\fi
\end{center}
\caption{Trace of the DMC walker population for an 8 atom cell of diamond obtained with \texttt{qmca}.}
\label{fig:qmca_pop_trace}
\end{figure}
Another simulation property that should be explicitly monitored
is the behavior of the DMC walker population. Data regarding the
walker population is contained in the \texttt{*.dmc.dat} files.
In Fig. \ref{fig:qmca_pop_trace} we show the trace of the DMC
walker population for the current run:
\begin{shade}
>qmca -t -q nw *dmc.dat
qmc series 1 NumOfWalkers = 2056.905405 +/- 8.775527
qmc series 2 NumOfWalkers = 2050.164160 +/- 4.954850
\end{shade}
\noindent
Following a DMC run the walker population should be checked for
two qualities: 1) that the population is sufficiently large (a number
$>2000$ is generally sufficient to reduce population control bias) and
2) that the population fluctuates benignly around its intended target
value. In this case the target walker count (provided in the input file)
was $2048$ and we can confirm from the plot that the population is simply
fluctuating around this value. Also from the text output we have a dynamic
population estimate of 2050(5) walkers. Rapid population reductions or
increases--population explosions--are indicative of problems with a run.
These issues sometimes result from using a considerably poor wavefunction
(see comments regarding variance/energy ratio above and in the preceding
subsections). QMCPACK has internal guards in place that prevent
the population from exceeding certain maximum and minimum bounds, so
in particularly faulty runs one might see the population ``stabilize''
to a constant value much larger or smaller than the target. In these
cases the cause(s) for the divergent population behavior need to
be investigated and resolved before proceeding further.
\subsection{Obtaining other quantities}
\label{sec:qmca_other_quantities}
A number of other scalar valued quantities are available with
\texttt{qmca}. To obtain text output for all quantities
available, simply exclude the ``\texttt{-q}'' option used in
the prior examples. Below is example output for a DMC calculation
of the 8 atom diamond system from the \texttt{scalar.dat} file:
\begin{shade}
>qmca -e 20 qmc.s002.scalar.dat
qmc series 2
LocalEnergy = -46.0330 +/- 0.0021
Variance = 1.0149 +/- 0.0025
Kinetic = 33.851 +/- 0.019
LocalPotential = -79.884 +/- 0.020
ElecElec = -11.4483 +/- 0.0083
LocalECP = -22.615 +/- 0.029
NonLocalECP = 5.2815 +/- 0.0079
IonIon = -51.10 +/- 0.00
LocalEnergy_sq = 2120.05 +/- 0.19
BlockWeight = 20514.27 +/- 48.38
BlockCPU = 1.4890 +/- 0.0038
AcceptRatio = 0.9963954 +/- 0.0000055
Efficiency = 71.88 +/- 0.00
TotalTime = 565.80 +/- 0.00
TotalSamples = 7795421 +/- 0
\end{shade}
\noindent
Similarly, for the \texttt{dmc.dat} file we get
\begin{shade}
>qmca -e 20 qmc.s002.dmc.dat
qmc series 2
LocalEnergy = -46.0329 +/- 0.0020
Variance = 1.0162 +/- 0.0025
TotalSamples = 8201275 +/- 0
TrialEnergy = -46.0343 +/- 0.0023
DiffEff = 0.9939150 +/- 0.0000088
Weight = 2050.23 +/- 4.82
NumOfWalkers = 2050 +/- 5
LivingFraction = 0.996427 +/- 0.000021
AvgSentWalkers = 0.2625 +/- 0.0011
\end{shade}
Any subset of desired quantities can be obtained by using the
``\texttt{-q}'' option with either the full names of the quantities
listed above
\begin{shade}
>qmca -q 'LocalEnergy Kinetic LocalPotential' -e 20 qmc.s002.scalar.dat
qmc series 2
LocalEnergy = -46.0330 +/- 0.0021
Kinetic = 33.851 +/- 0.019
LocalPotential = -79.884 +/- 0.020
\end{shade}
\noindent
or with their corresponding abbreviations
\begin{shade}
>qmca -q ekp -e 20 qmc.s002.scalar.dat
qmc series 2
LocalEnergy = -46.0330 +/- 0.0021
Kinetic = 33.851 +/- 0.019
LocalPotential = -79.884 +/- 0.020
\end{shade}
\noindent
Abbreviations for each quantity can be found by typing \texttt{qmca}
at the command line with no other input. A current list is provided
below:
\begin{shade}
Abbreviations and full names for quantities:
ar = AcceptRatio
bc = BlockCPU
bw = BlockWeight
ce = CorrectedEnergy
de = DiffEff
e = LocalEnergy
ee = ElecElec
eff = Efficiency
ii = IonIon
k = Kinetic
kc = KEcorr
l = LocalECP
le2 = LocalEnergy_sq
mpc = MPC
n = NonLocalECP
nw = NumOfWalkers
p = LocalPotential
sw = AvgSentWalkers
te = TrialEnergy
ts = TotalSamples
tt = TotalTime
v = Variance
w = Weight
\end{shade}
\noindent
Please see the output overview for \texttt{scalar.dat}
(Sec. \ref{sec:scalardat_file}) and \texttt{dmc.dat}
(Sec. \ref{sec:dmc_file}) for more information about
these quantities. The data analysis aspects for these
quantities is essentially the same as for the local
energy as covered in the preceding subsections.
Quantities that do not belong to an equilibrium distribution
(\emph{e.g.} \texttt{BlockCPU}) are somewhat different, though they
still exhibit statistical fluctuations.
\subsection{Processing multiple files}
\label{sec:qmca_multiple_files}
Batch file processing is a common use case for \texttt{qmca}.
If we consider an ``equation of state'' calculation involving
the 8 atom diamond cell we have used so far, we might be interested
in the total energy for the various supercell volumes along the
trajectory from compression to expansion. After checking
the traces (``\texttt{qmca -t -q e scale\_*/vmc/*scalar*}'')
to settle on a sensible equilibration cutoff as discussed in
the preceding subsections we can obtain the total energies
all at once:
\begin{shade}
>qmca -q ev -e 40 scale_*/vmc/*scalar*
LocalEnergy Variance ratio
scale_0.80/vmc/qmc series 0 -44.670984 +/- 0.006051 2.542384 +/- 0.019902 0.0569
scale_0.82/vmc/qmc series 0 -44.982818 +/- 0.005757 2.413011 +/- 0.022626 0.0536
scale_0.84/vmc/qmc series 0 -45.228257 +/- 0.005374 2.258577 +/- 0.019322 0.0499
scale_0.86/vmc/qmc series 0 -45.415842 +/- 0.005532 2.204980 +/- 0.052978 0.0486
scale_0.88/vmc/qmc series 0 -45.570215 +/- 0.004651 2.061374 +/- 0.014359 0.0452
scale_0.90/vmc/qmc series 0 -45.683684 +/- 0.005009 1.988539 +/- 0.018267 0.0435
scale_0.92/vmc/qmc series 0 -45.751359 +/- 0.004928 1.913282 +/- 0.013998 0.0418
scale_0.94/vmc/qmc series 0 -45.791622 +/- 0.005026 1.843704 +/- 0.014460 0.0403
scale_0.96/vmc/qmc series 0 -45.809256 +/- 0.005053 1.829103 +/- 0.014536 0.0399
scale_0.98/vmc/qmc series 0 -45.806235 +/- 0.004963 1.775391 +/- 0.015199 0.0388
scale_1.00/vmc/qmc series 0 -45.783481 +/- 0.005293 1.726869 +/- 0.012001 0.0377
scale_1.02/vmc/qmc series 0 -45.741655 +/- 0.005627 1.681776 +/- 0.011496 0.0368
scale_1.04/vmc/qmc series 0 -45.685101 +/- 0.005353 1.682608 +/- 0.015423 0.0368
scale_1.06/vmc/qmc series 0 -45.615164 +/- 0.005978 1.652155 +/- 0.010945 0.0362
scale_1.08/vmc/qmc series 0 -45.543037 +/- 0.005191 1.646375 +/- 0.013446 0.0361
scale_1.10/vmc/qmc series 0 -45.450976 +/- 0.004794 1.707649 +/- 0.048186 0.0376
scale_1.12/vmc/qmc series 0 -45.371851 +/- 0.005103 1.686997 +/- 0.035920 0.0372
scale_1.14/vmc/qmc series 0 -45.265490 +/- 0.005311 1.631614 +/- 0.012381 0.0360
scale_1.16/vmc/qmc series 0 -45.161961 +/- 0.004868 1.656586 +/- 0.014788 0.0367
scale_1.18/vmc/qmc series 0 -45.062579 +/- 0.005971 1.671998 +/- 0.019942 0.0371
scale_1.20/vmc/qmc series 0 -44.960477 +/- 0.004888 1.651864 +/- 0.009756 0.0367
\end{shade}
\noindent
In this case, we are using a Jastrow factor optimized only at the
equilibrium geometry (``\texttt{scale\_1.00}'') but with radial
cutoffs restricted to the Wigner-Seitz radius of the most compressed
supercell (``\texttt{scale\_0.80}'') to avoid introducing wavefunction
cusps at the cell boundary (QMCPACK would have aborted with a warning in
this case, had we tried). It is clear that this restricted Jastrow factor
is not an optimal choice as it yields variance/energy ratios between $0.036$
and $0.057$ Ha. This issue is largely a result of our undersized (8 atom)
supercell and larger cells should always be used in real production
calculations.
Batch processing is also possible for multiple quantities. If multiple
quantities are requested, an additional line is inserted to separate
results from different runs:
\begin{shade}
>qmca -q 'e bc eff' -e 40 scale_*/vmc/*scalar*
scale_0.80/vmc/qmc series 0
LocalEnergy = -44.6710 +/- 0.0061
BlockCPU = 0.02986 +/- 0.00038
Efficiency = 38104.00 +/- 0.00
scale_0.82/vmc/qmc series 0
LocalEnergy = -44.9828 +/- 0.0058
BlockCPU = 0.02826 +/- 0.00013
Efficiency = 44483.91 +/- 0.00
scale_0.84/vmc/qmc series 0
LocalEnergy = -45.2283 +/- 0.0054
BlockCPU = 0.02747 +/- 0.00030
Efficiency = 52525.12 +/- 0.00
scale_0.86/vmc/qmc series 0
LocalEnergy = -45.4158 +/- 0.0055
BlockCPU = 0.02679 +/- 0.00013
Efficiency = 50811.55 +/- 0.00
scale_0.88/vmc/qmc series 0
LocalEnergy = -45.5702 +/- 0.0047
BlockCPU = 0.02598 +/- 0.00015
Efficiency = 74148.79 +/- 0.00
scale_0.90/vmc/qmc series 0
LocalEnergy = -45.6837 +/- 0.0050
BlockCPU = 0.02527 +/- 0.00011
Efficiency = 65714.98 +/- 0.00
...
\end{shade}
\subsection{Twist averaging}
\label{sec:qmca_twist_average}
Twist averaging can be performed straightforwardly for any
output quantity listed in Sec. \ref{sec:qmca_other_quantities}
with \texttt{qmca}. We illustrate these capabilities by
repeating the 8 atom diamond DMC runs performed in Sec.
\ref{sec:qmca_judge_dmc} at eight real valued supercell twist
angles (a $2\times 2\times 2$ Monkhorst-Pack grid centered at
the $\Gamma$-point). Data traces for each twist can be overlapped
on the same plot:
\begin{shade}
>qmca -to -q e -e '30 20 30' *scalar* --legend outside
\end{shade}
\noindent
The ``\texttt{-o}'' option requests the plots be overlapped;
eight separate plots would be generated otherwise. The
equilibration input ``\texttt{-e '30 20 30'}'' cuts out from
the analyzed data the first 30 blocks for series 0 (VMC),
20 blocks for series 1 (intermediate DMC), and 30 blocks for
series 2 (production DMC). The resulting plot is shown in
Fig. \ref{fig:qmca_twist_overlap}
\begin{figure}
\begin{center}
\ifdefined\HCode
\includegraphics[trim = 0mm 0mm 0mm 0mm, clip,width=0.9\columnwidth]{./figures/qmca_twist_trace_overlap.dmn}
\else
\includegraphics[trim = 0mm 0mm 0mm 0mm, clip,width=0.9\columnwidth]{./figures/qmca_twist_trace_overlap.pdf}
\fi
\end{center}
\caption{Overlapped energy traces from VMC to DMC for an 8 supercell of diamond obtained with \texttt{qmca}. Data for each twist appears in a different color.}
\label{fig:qmca_twist_overlap}
\end{figure}
Twist averaging is performed by providing the ``\texttt{-a}''
option. If provided on its own, uniform weights are applied
to each twist angle. To obtain a trace plot with twist averaging
enforced, use a command similar to the following:
\begin{shade}
>qmca -a -t -q e -e '30 20 30' *scalar*
\end{shade}
\noindent
The resulting plot is shown in Fig. \ref{fig:qmca_twist_average}.
As can be seen from the trace plot, the chosen equilibration lengths
are appropriate and we proceed to obtain the twist averaged total energy
from the \texttt{scalar.dat} files
\begin{shade}
>qmca -a -q ev -e 30 --sac *s002.scalar*
LocalEnergy Variance ratio
avg series 2 -45.873369 +/- 0.000753 5.3 1.028751 +/- 0.001056 1.3 0.0224
\end{shade}
\noindent
and also from the \texttt{dmc.dat} files
\begin{shade}
>qmca -a -q ev -e 300 --sac *s002.dmc*
LocalEnergy Variance ratio
avg series 2 -45.873371 +/- 0.000741 30.5 1.028843 +/- 0.000972 1.6 0.0224
\end{shade}
\noindent
yielding a twist averaged total energy of $-45.8733(8)$ Ha.
\begin{figure}
\begin{center}
\ifdefined\HCode
\includegraphics[trim = 0mm 0mm 0mm 0mm, clip,width=0.75\columnwidth]{./figures/qmca_twist_average_trace.dmn}
\else
\includegraphics[trim = 0mm 0mm 0mm 0mm, clip,width=0.75\columnwidth]{./figures/qmca_twist_average_trace.pdf}
\fi
\end{center}
\caption{Twist averaged energy trace from VMC to DMC for an 8 supercell of diamond obtained with \texttt{qmca}.}
\label{fig:qmca_twist_average}
\end{figure}
As can be seen from the Fig. \ref{fig:qmca_twist_overlap}, some of the twist
angles are degenerate. This is seen more clearly in the text output:
\begin{shade}
>qmca -q ev -e 30 *s002.scalar*
LocalEnergy Variance ratio
qmc.g000 series 2 -45.264510 +/- 0.001942 1.057065 +/- 0.002318 0.0234
qmc.g001 series 2 -46.035511 +/- 0.001806 1.015992 +/- 0.002836 0.0221
qmc.g002 series 2 -46.035410 +/- 0.001538 1.015039 +/- 0.002661 0.0220
qmc.g003 series 2 -46.047285 +/- 0.001898 1.018219 +/- 0.002588 0.0221
qmc.g004 series 2 -46.034225 +/- 0.002539 1.013420 +/- 0.002835 0.0220
qmc.g005 series 2 -46.046731 +/- 0.002963 1.018337 +/- 0.004109 0.0221
qmc.g006 series 2 -46.047133 +/- 0.001958 1.021483 +/- 0.003082 0.0222
qmc.g007 series 2 -45.476146 +/- 0.002065 1.070456 +/- 0.003133 0.0235
\end{shade}
\noindent
The degenerate twists grouped by set are $\{0\}$, $\{1,2,4\}$, $\{3,5,6\}$,
$\{7\}$.
Alternatively, the run could have been performed at \emph{only} the four
unique (irreducible) twist angles. We will emulate this situation by
analyzing data for twists 0, 1, 3, and 7 only. In a production setting
with irreducibly weighted twists, run would be performed on these twists
alone; we reuse the uniform twist data for illustration purposes only.
We can use \texttt{qmca} to perform twist averaging with different
weights applied to each twist
\begin{shade}
>qmca -a -w '1 3 3 1' -q ev -e 30 *g000*2*sc* *g001*2*sc* *g003*2*sc* *g007*2*sc*
LocalEnergy Variance ratio
avg series 2 -45.873631 +/- 0.001044 1.028769 +/- 0.001520 0.0224
\end{shade}
\noindent
yielding a total energy value of $-45.874(1)$ Ha, in agreement with the
uniform weighted twist average performed above.
The decision of whether or not to perform irreducible weighted twist
averaging should be made on the basis of efficiency. The relative
efficiency of irreducible vs. uniform weighted twist averaging
depends on the irreducible weights and the ratio of the lengths of
the available sampling and equilibration periods. A formula for
the relative efficiency of these two cases is derived and discussed
in more detail in Appendix \ref{sec:app_ta_efficiency}.
\subsection{Setting output units}
\label{sec:qmca_output_units}
Estimates outputted by \texttt{qmca} are in Hartree units by
default. The output units for energetic quantities can be
changed by using the ``\texttt{-u}'' option.
\vspace{3mm}
\noindent
Energy in Hartrees:
\begin{shade}
>qmca -q e -u Ha -e 20 qmc.s002.scalar.dat
qmc series 2 LocalEnergy = -46.032960 +/- 0.002077
\end{shade}
\noindent
Energy in electron volts:
\begin{shade}
>qmca -q e -u eV -e 20 qmc.s002.scalar.dat
qmc series 2 LocalEnergy = -1252.620565 +/- 0.056521
\end{shade}
\noindent
Energy in Rydbergs:
\begin{shade}
>qmca -q e -u rydberg -e 20 qmc.s002.scalar.dat
qmc series 2 LocalEnergy = -92.065919 +/- 0.004154
\end{shade}
\noindent
Energy in kilojoules per mole:
\begin{shade}
>qmca -q e -u kj_mol -e 20 qmc.s002.scalar.dat
qmc series 2 LocalEnergy = -120859.512998 +/- 5.453431
\end{shade}
\subsection{Speeding up trace plotting}
\label{sec:qmca_fast_trace_plot}
When working with many files or files with many entries,
\texttt{qmca} may take a long time to produce plots. The time
delay is actually due to the autocorrelation time estimate
used to calculate error bars. The calculation time for
the autocorrelation scales as $\mathcal{O}(M^2)$, with $M$ being
the number of statistical samples. If you are only interested
in plotting traces and not in the estimated error bars, the
autocorrelation time estimation can be turned off with the
``\texttt{--noac}'' option:
\begin{shade}
>qmca -t -q e -e 20 --noac qmc.s002.scalar.dat
\end{shade}
\noindent
Please note that the resulting error bars printed to the console
will be underestimated and are not meaningful. Do \emph{not}
use ``\texttt{--noac}'' in conjunction with the ``\texttt{-p}''
plotting option as these plots are of no use without meaningful
error bars.
\subsection{Short usage examples}
\label{sec:qmca_short_examples}
\noindent
Plotting a trace of the local energy:
\begin{shade}
>qmca -t -q e *scalar*
\end{shade}
\noindent
Applying an equilibration cutoff to VMC data (series 0):
\begin{shade}
>qmca -q e -e 30 *s000.scalar*
\end{shade}
\noindent
Applying the same equilibration cutoff to VMC and DMC data (series 0, 1, 2):
\begin{shade}
>qmca -q e -e 20 *scalar*
\end{shade}
\noindent
Applying different equilibration cutoffs to VMC and DMC data (series 0, 1, 2):
\begin{shade}
>qmca -q e -e '30 20 40' *scalar*
\end{shade}
\noindent
Obtaining the energy, variance, and variance/energy ratio for all series:
\begin{shade}
>qmca -q ev -e 30 *scalar*
\end{shade}
\noindent
Overlaying plots of mean + error bar for energy and variance for separate
two- and three- body Jastrow optimization runs:
\begin{shade}
>qmca -po -q ev ./optJ2/*scalar* ./optJ3/*scalar*
\end{shade}
\noindent
Obtaining the acceptance ratio:
\begin{shade}
>qmca -q ar -e 30 *scalar*
\end{shade}
\noindent
Obtaining the average DMC walker population:
\begin{shade}
>qmca -q nw -e 400 *s002.dmc.dat
\end{shade}
\noindent
Obtaining the Monte Carlo efficiency:
\begin{shade}
>qmca -q eff -e 30 *scalar*
\end{shade}
\noindent
Obtaining the total wallclock time per series:
\begin{shade}
>qmca -q tt -e 0 *scalar*
\end{shade}
\noindent
Obtaining the average wallclock time spent per block:
\begin{shade}
>qmca -q bc -e 0 *scalar*
\end{shade}
\noindent
Obtaining a subset of desired quantities:
\begin{shade}
>qmca -q 'e v ar eff' -e 30 *scalar*
\end{shade}
\noindent
Obtaining all available quantities:
\begin{shade}
>qmca -e 30 *scalar*
\end{shade}
\noindent
Obtaining the twist averaged total energy with uniform weights:
\begin{shade}
>qmca -a -q e -e 40 *g*s002.scalar.dat
\end{shade}
\noindent
Obtaining the twist averaged total energy with specific weights:
\begin{shade}
>qmca -a -w '1 3 3 1' -q e -e 40 *g*s002.scalar.dat
\end{shade}
\noindent
Obtaining the local, kinetic, and potential energies in eV:
\begin{shade}
>qmca -q ekp -e 30 -u eV *scalar*
\end{shade}
\subsection{Production quality checklist}
\label{sec:qmca_production_checklist}
\begin{enumerate}
\item{Inspect the trace plots (``\texttt{-t}'' option) for any
oddities in the data. Typical behavior is a short equilibration
period followed by benign fluctuations around a clear mean value.
There should not be any large spikes in the data. This applies
to \emph{all} runs (VMC, optimization, DMC, etc.).}
\item{Remove all equilibration steps (``\texttt{-e}'' option) from
the data by inspecting the trace plot.}
\item{Check the quality of the orbitals (standalone Jastrow-less
VMC or sometimes the first \texttt{scalar} file produced during
optimization) by inspecting the variance/energy ratio
``\texttt{qmca -q ev *scalar*}''. For pseudopotential systems
without a Jastrow, the variance/energy ratio should not exceed
$0.2$ Ha, otherwise there is a problem with the orbitals.}
\item{Check the quality of the optimized Jastrow factor by inspecting
the variance/energy ratio. For pseudopotential systems with a
Jastrow, the variance/energy ratio should not exceed $0.04$ Ha
for pseudopotential systems. A good Jastrow is indicated by a
variance/energy ratio in the range $0.01-0.03$ Ha. A value less
than $0.01$ Ha is difficult to achieve.}
\item{Confirm that the optimization has converged by plotting the
energy and variance vs. optimization series
(``\texttt{qmca -p -q ev *scalar*}''). Do not assume that
optimization has converged in only a few cycles. Use at least
10 cycles of with around 100,000 samples unless you already have
experience with the system in question.}
\item{Optimize Jastrow factors according to energy minimization to
reduce locality errors arising from the use of non-local
pseudopotentials in DMC. A good approach is to optimize with a
few cycles of variance minimization followed by several cycles of
energy minimization.}
\item{Occasionally try optimizing with more samples and/or cycles
to see if improved results are obtained.}
\item{If using a B-spline representation of the orbitals, converge
the VMC energy and variance with respect to the mesh size (controlled
via meshfactor). This is best done in the presence of any
Jastrow factor to reduce noise. Consider using the hybrid LMTO
representation of the orbitals as this can reduce both the VMC/DMC
variance and DMC timestep error in addition to saving memory.}
\item{Check the variance/energy ratio of all production VMC and DMC
calculations. In all cases the DMC ratio should be slightly
less than the VMC one and both should abide the guidelines above,
\emph{i.e.} the ratio should be less than $0.04$ Ha for
pseudopotential systems. The production ratio should also be
consistent with what is observed during wavefunction optimization.}
\item{Be aware of population control bias in DMC. Run with a
population of $\sim 2000$ or greater. Occasionally repeat a run
using a larger population to explicitly confirm that population
control bias is small.}
\item{Check the stability of the DMC walker population by plotting
the trace of the population size (``\texttt{qmca -t -q nw *dmc.dat}'').
Verify that the average walker population is consistent with
the requested value provided in the input.}
\item{In DMC, perform a timestep study to either 1) obtain
extrapolated results, or 2) obtain a timestep for future
production where an energy difference shows convergence
(\emph{e.g.} a band gap or defect formation energy). For
pseudopotential systems, converged timesteps for many systems
are in the range $0.002-0.01$ Ha$^{-1}$, but the actual converged
timestep must be explicitly checked.}
\item{In periodic systems, converge the total energy with respect to
the size of the twist/k-point grid. Results for smaller systems
can easily be transferred to larger ones (\emph{e.g.} a 2x2x2 twist
grid in a 2x2x2 tiled cell is equivalent to a 1x1x1 twist grid in a
4x4x4 tiled cell)}.
\item{In periodic systems, perform finite size extrapolation
including two body corrections (needed for cohesive energy/phase
stability studies) unless it can be shown that finite size effects
cancel for the energy difference in question (\emph{e.g.} some
defect formation energies).}
\end{enumerate}
\section{Using the qfit tool for statistical timestep extrapolation and curve fitting}
\label{sec:qfit}
The \texttt{qfit} tool is used to provide statistical estimates of
curve fitting parameters based on QMCPACK data. While \texttt{qfit}
will eventually support many types of fitted curves (\emph{e.g.} Morse
potential binding curves, various equation of state fitting curves, etc.),
it is currently limited to estimating fitting parameters related to
timestep extrapolation.
\subsection{The jack-knife statistical technique}
The \texttt{qfit} tool obtains estimates of fitting parameter
means and associated error bars via the ``jack-knife''
technique. The jack-knife method is a powerful and general tool
to obtain meaningful error bars for any quantity that is related
in a non-linear fashion to an underlying set of statistical data.
For this reason, we give a brief overview of the jack-knife
technique before proceeding with usage instructions for the
\texttt{qfit} tool.
Consider $N$ statistical variables $\{x_n\}_{n=1}^N$ that have
been outputted by one or more simulation runs. If we have
$M$ samples of each of the $N$ variables, then the mean values
of each these variables can be estimated in the standard way,
i.e. $\bar{x}_n\approx \tfrac{1}{M}\sum_{m=1}^Mx_{nm}$.
Suppose we are interested in $P$ statistical quantities
$\{y_p\}_{p=1}^P$ that are related to the original $N$ variables
by a known multidimensional function $F$:
\begin{align}
y_1,y_2,\ldots,y_P &= F(x_1,x_2,\ldots,x_N)\quad \textrm{or} \nonumber \\
\vec{y} &= F(\vec{x})
\end{align}
The relationship implied by $F$ is completely general.
For example the $\{x_n\}$ might be elements of a matrix
with $\{y_p\}$ being the eigenvalues, or $F$ might be
a fitting procedure for $N$ energies at different timesteps
with $P$ fitting parameters. An approximate guess at the mean
value of $\vec{y}$ can be obtained by evaluating $F$ at the mean
value of $\vec{x}$ (i.e. $F(\bar{x}_1\ldots\bar{x}_N)$), but with
this approach we have no way to estimate the statistical error
bar of any $\bar{y}_p$.
In the jack-knife procedure, the statistical variability intrinsic
to the underlying data $\{x_n\}$ is used to obtain estimates of the
mean and error bar of $\{y_p\}$. We first construct a new set of $x$
statistical data by taking the average over all samples but one:
\begin{align}
\tilde{x}_{nm} = \frac{1}{N-1}(N\bar{x}_n-x_{nm})\qquad m\in [1,M]
\end{align}
The result is a distribution of approximate $x$ mean values. These
are used to construct a distribution of approximate means for $y$:
\begin{align}
\tilde{y}_{1m},\ldots,\tilde{y}_{Pm} = F(\tilde{x}_{1m},\ldots,\tilde{x}_{Nm}) \qquad m\in [1,M]
\end{align}
Estimates for the mean and error bar of the quantities of
interest can finally be obtained using the formulas below:
\begin{align}
\bar{y}_p &= \frac{1}{M}\sum_{m=1}^M\tilde{y}_{pm} \\
\sigma_{y_p} &= \sqrt{\frac{M-1}{M}\left(\sum_{m=1}^M\tilde{y}_{pm}^2-M\bar{y}_p^2\right)}
\end{align}
\subsection{Performing timestep extrapolation}
In this section, we use a 32 atom supercell of MnO as an example
system for timestep extrapolation. Data for this system has been
collected in DMC using the following sequence of timesteps:
$0.04,~0.02,~0.01,~0.005,~0.0025,~0.00125$ Ha$^{-1}$. For a typical
production pseudopotential study, timesteps in the range
$0.02-0.002$ Ha$^{-1}$ are usually sufficient and it is recommended
to increase the number of steps/blocks by a factor of two when
the timestep is halved. In order to perform accurate statistical
fitting, we must first understand the equilibration and autocorrelation
properties of the inputted local energy data. After plotting the
local energy traces (\texttt{qmca -t -q e -e 0 ./qmc*/*scalar*})
it is clear that an equilibration period of $30$ blocks is reasonable.
Approximate autocorrelation lengths are also obtained with \texttt{qmca}:
\begin{shade}
>qmca -e 30 -q e --sac ./qmc*/qmc.g000.s002.scalar.dat
./qmc_tm_0.00125/qmc.g000 series 2 LocalEnergy = -3848.234513 +/- 0.055754 1.7
./qmc_tm_0.00250/qmc.g000 series 2 LocalEnergy = -3848.237614 +/- 0.055432 2.2
./qmc_tm_0.00500/qmc.g000 series 2 LocalEnergy = -3848.349741 +/- 0.069729 2.8
./qmc_tm_0.01000/qmc.g000 series 2 LocalEnergy = -3848.274596 +/- 0.126407 3.9
./qmc_tm_0.02000/qmc.g000 series 2 LocalEnergy = -3848.539017 +/- 0.075740 2.4
./qmc_tm_0.04000/qmc.g000 series 2 LocalEnergy = -3848.976424 +/- 0.075305 1.8
\end{shade}
\noindent
The autocorrelation must be removed from the data prior to jack-knifing
and so we will reblock the data by a factor of 4.
The \texttt{qfit} tool can be used in the following way to obtain
a linear timestep fit of the data:
\begin{shade}
>qfit ts -e 30 -b 4 -s 2 -t '0.00125 0.0025 0.005 0.01 0.02 0.04' ./qmc*/*scalar*
fit function : linear
fitted formula: (-3848.193 +/- 0.037) + (-18.95 +/- 1.95)*t
intercept : -3848.193 +/- 0.037 Ha
\end{shade}
The input arguments are as follows: \texttt{ts} indicates we are
performing a timestep fit, ``\texttt{-e 30}'' is the equilibration period
removed from each set of scalar data, ``\texttt{-b 4}'' indicates the data
will be reblocked by a factor of 4 (\emph{e.g.} a file containing 400 \
entries will be block averaged into a new set of 100 prior to jack-knife
fitting), ``\texttt{-s 2}'' indicates that the timestep data begins with
series 2 (scalar files matching \texttt{*s000*} or \texttt{*s001*} are
to be excluded), and ``\texttt{-t } '0.00125 0.0025 0.005 0.01 0.02 0.04' ''
provides a list of timestep values corresponding to the inputted scalar
files. The ``\texttt{-e}'' and ``\texttt{-b}'' options can receive a
list of file-specific values (same format as ``\texttt{-t}'') if desired.
As can be seen from the text output, the parameters for the linear fit
are printed with error bars obtained with jack-knife resampling and
the zero timestep ``intercept'' is $-3848.19(4)$ Ha. In addition to
text output, the command above will result in a plot of the fit with
the zero timestep value shown as a red dot, as shown in the left
panel of Fig.~\ref{fig:qfit_timestep}.
\begin{figure}
\centering
\ifdefined\HCode%
\begin{tabularx}{1024pt}{X X}
\includegraphics[trim=0mm 0mm 4mm 0mm,clip,width=512pt]{./figures/qfit_timestep_linear.dmn}&
\includegraphics[trim=2mm 0mm 4mm 0mm,clip,width=512pt]{./figures/qfit_timestep_quadratic.dmn}\\
\end{tabularx}
\else%
\begin{tabularx}{\textwidth}{X X}
\includegraphics[trim=0mm 0mm 4mm 0mm,clip,width=0.47\textwidth]{./figures/qfit_timestep_linear.pdf}&
\includegraphics[trim=2mm 0mm 4mm 0mm,clip,width=0.47\textwidth]{./figures/qfit_timestep_quadratic.pdf}\\
\end{tabularx}
\fi%
\caption{Linear (left) and quadratic (right) timestep fits to DMC data for a 32 atom supercell of MnO obtained with \texttt{qfit}. Zero timestep estimates are indicated by the red data point on the left side of either panel.}
\label{fig:qfit_timestep}
\end{figure}
Different fitting functions are supported via the ``\texttt{-f}'' option.
Currently supported options include \texttt{linear} ($a+bt$),
\texttt{quadratic} ($a+bt+ct^2$), and \texttt{sqrt} ($a+b\sqrt{t}+ct$).
Results for a quadratic fit are shown below as well as in the right
panel of Fig.~\ref{fig:qfit_timestep}.
\begin{shade}
>qfit ts -f quadratic -e30 -b4 -s2 -t '0.00125 0.0025 0.005 0.01 0.02 0.04' ./qmc*/*scalar*
fit function : quadratic
fitted formula: (-3848.245 +/- 0.047) + (-7.25 +/- 8.33)*t + (-285.00 +/- 202.39)*t^2
intercept : -3848.245 +/- 0.047 Ha
\end{shade}
In this case we find a zero timestep estimate of $-3848.25(5)$ Ha$^{-1}$.
A timestep of $0.04$ Ha$^{-1}$ might be on the large side to include in
timestep extrapolation and it is likely to have an outsize influence
in the case of linear extrapolation. Upon excluding this point, linear
extrapolation yields a zero timestep value of $-3848.22(4)$ Ha$^{-1}$.
It should be noted that quadratic extrapolation can result in intrinsically
larger uncertainty in the extrapolated value. For example, when the $0.04$
Ha$^{-1}$ point is excluded the uncertainty grows by 50\% and we obtain an
estimated value of $-3848.28(7)$ instead.
\section{Densities and spin-densities}
\label{sec:densities}
TBD.
%\section{Energy densities}
%\label{sec:energydensities}