QMCPACK manual entry for timestep fitting tool

This commit is contained in:
Jaron Krogel 2017-08-29 12:17:52 -04:00
parent 532380beb9
commit c48aeb41f4
4 changed files with 154 additions and 6 deletions

View File

@ -1002,8 +1002,156 @@ Obtaining the local, kinetic, and potential energies in eV:
\end{enumerate}
%\section{Using the qfit tool for statistical timestep extrapolation and curve fitting}
%\label{sec:qfit}
\section{Using the qfit tool for statistical timestep extrapolation and curve fitting}
\label{sec:qfit}
The \texttt{qfit} tool is used to provide statistical estimates of
curve fitting parameters based on QMCPACK data. While \texttt{qfit}
will eventually support many types of fitted curves (\emph{e.g.} Morse
potential binding curves, various equation of state fitting curves, etc.),
it is currently limited to estimating fitting parameters related to
timestep extrapolation.
\subsection{The jack-knife statistical technique}
The \texttt{qfit} tool obtains estimates of fitting parameter
means and associated error bars via the ``jack-knife''
technique. The jack-knife method is a powerful and general tool
to obtain meaningful error bars for any quantity that is related
in a non-linear fashion to an underlying set of statistical data.
For this reason, we give a brief overview of the jack-knife
technique before proceeding with usage instructions for the
\texttt{qfit} tool.
Consider $N$ statistical variables $\{x_n\}_{n=1}^N$ that have
been outputted by one or more simulation runs. If we have
$M$ samples of each of the $N$ variables, then the mean values
of each these variables can be estimated in the standard way,
i.e. $\bar{x}_n\approx \tfrac{1}{M}\sum_{m=1}^Mx_{nm}$.
Suppose we are interested in $P$ statistical quantities
$\{y_p\}_{p=1}^P$ that are related to the original $N$ variables
by a known multidimensional function $F$:
\begin{align}
y_1,y_2,\ldots,y_P &= F(x_1,x_2,\ldots,x_N)\quad \textrm{or} \nonumber \\
\vec{y} &= F(\vec{x})
\end{align}
The relationship implied by $F$ is completely general.
For example the $\{x_n\}$ might be elements of a matrix
with $\{y_p\}$ being the eigenvalues, or $F$ might be
a fitting procedure for $N$ energies at different timesteps
with $P$ fitting parameters. An approximate guess at the mean
value of $\vec{y}$ can be obtained by evaluating $F$ at the mean
value of $\vec{x}$ (i.e. $F(\bar{x}_1\ldots\bar{x}_N)$), but with
this approach we have no way to estimate the statistical error
bar of any $\bar{y}_p$.
In the jack-knife procedure, the statistical variability intrinsic
to the underlying data $\{x_n\}$ is used to obtain estimates of the
mean and error bar of $\{y_p\}$. We first construct a new set of $x$
statistical data by taking the average over all samples but one:
\begin{align}
\tilde{x}_{nm} = \frac{1}{N-1}(N\bar{x}_n-x_{nm})\qquad m\in [1,M]
\end{align}
The result is a distribution of approximate $x$ mean values. These
are used to construct a distribution of approximate means for $y$:
\begin{align}
\tilde{y}_{1m},\ldots,\tilde{y}_{Pm} = F(\tilde{x}_{1m},\ldots,\tilde{x}_{Nm}) \qquad m\in [1,M]
\end{align}
Estimates for the mean and error bar of the quantities of
interest can finally be obtained using the formulas below:
\begin{align}
\bar{y}_p &= \frac{1}{M}\sum_{m=1}^M\tilde{y}_{pm} \\
\sigma_{y_p} &= \sqrt{\frac{M-1}{M}\left(\sum_{m=1}^M\tilde{y}_{pm}^2-M\bar{y}_p^2\right)}
\end{align}
\subsection{Performing timestep extrapolation}
In this section, we use a 32 atom supercell of MnO as an example
system for timestep extrapolation. Data for this system has been
collected in DMC using the following sequence of timesteps:
$0.04,~0.02,~0.01,~0.005,~0.0025,~0.00125$ Ha$^{-1}$. For a typical
production pseudopotential study, timesteps in the range
$0.02-0.002$ Ha$^{-1}$ are usually sufficient and it is recommended
to increase the number of steps/blocks by a factor of two when
the timestep is halved. In order to perform accurate statistical
fitting, we must first understand the equilibration and autocorrelation
properties of the inputted local energy data. After plotting the
local energy traces (\texttt{qmca -t -q e -e 0 ./qmc*/*scalar*})
it is clear that an equilibration period of $30$ blocks is reasonable.
Approximate autocorrelation lengths are also obtained with \texttt{qmca}:
\begin{shade}
>qmca -e 30 -q e --sac ./qmc*/qmc.g000.s002.scalar.dat
./qmc_tm_0.00125/qmc.g000 series 2 LocalEnergy = -3848.234513 +/- 0.055754 1.7
./qmc_tm_0.00250/qmc.g000 series 2 LocalEnergy = -3848.237614 +/- 0.055432 2.2
./qmc_tm_0.00500/qmc.g000 series 2 LocalEnergy = -3848.349741 +/- 0.069729 2.8
./qmc_tm_0.01000/qmc.g000 series 2 LocalEnergy = -3848.274596 +/- 0.126407 3.9
./qmc_tm_0.02000/qmc.g000 series 2 LocalEnergy = -3848.539017 +/- 0.075740 2.4
./qmc_tm_0.04000/qmc.g000 series 2 LocalEnergy = -3848.976424 +/- 0.075305 1.8
\end{shade}
\noindent
The autocorrelation must be removed from the data prior to jack-knifing
and so we will reblock the data by a factor of 4.
The \texttt{qfit} tool can be used in the following way to obtain
a linear timestep fit of the data:
\begin{shade}
>qfit ts -e 30 -b 4 -s 2 -t '0.00125 0.0025 0.005 0.01 0.02 0.04' ./qmc*/*scalar*
fit function : linear
fitted formula: (-3848.193 +/- 0.037) + (-18.95 +/- 1.95)*t
intercept : -3848.193 +/- 0.037 Ha
\end{shade}
The input arguments are as follows: \texttt{ts} indicates we are
performing a timestep fit, ``\texttt{-e 30}'' is the equilibration period
removed from each set of scalar data, ``\texttt{-b 4}'' indicates the data
will be reblocked by a factor of 4 (\emph{e.g.} a file containing 400 \
entries will be block averaged into a new set of 100 prior to jack-knife
fitting), ``\texttt{-s 2}'' indicates that the timestep data begins with
series 2 (scalar files matching \texttt{*s000*} or \texttt{*s001*} are
to be excluded), and ``\texttt{-t } '0.00125 0.0025 0.005 0.01 0.02 0.04' ''
provides a list of timestep values corresponding to the inputted scalar
files. The ``\texttt{-e}'' and ``\texttt{-b}'' options can receive a
list of file-specific values (same format as ``\texttt{-t}'') if desired.
As can be seen from the text output, the parameters for the linear fit
are printed with error bars obtained with jack-knife resampling and
the zero timestep ``intercept'' is $-3848.19(4)$ Ha. In addition to
text output, the command above will result in a plot of the fit with
the zero timestep value shown as a red dot, as shown in the left
panel of Fig.~\ref{fig:qfit_timestep}.
\begin{figure}
\centering
\begin{subfigure}[t]{0.47\textwidth}
\centering
\includegraphics[trim=0mm 0mm 4mm 0mm,clip,width=\linewidth]{figures/qfit_timestep_linear.pdf}
\end{subfigure}
\begin{subfigure}[t]{0.47\textwidth}
\centering
\includegraphics[trim=2mm 0mm 4mm 0mm,clip,width=\linewidth]{figures/qfit_timestep_quadratic.pdf}
\end{subfigure}
\caption{Linear (left) and quadratic (right) timestep fits to DMC data for a 32 atom supercell of MnO obtained with \texttt{qfit}. Zero timestep estimates are indicated by the red data point on the left side of either panel.\label{fig:qfit_timestep}}
\end{figure}
Different fitting functions are supported via the ``\texttt{-f}'' option.
Currently supported options include \texttt{linear} ($a+bt$),
\texttt{quadratic} ($a+bt+ct^2$), and \texttt{sqrt} ($a+b\sqrt{t}+ct$).
Results for a quadratic fit are shown below as well as in the right
panel of Fig.~\ref{fig:qfit_timestep}.
\begin{shade}
>qfit ts -f quadratic -e30 -b4 -s2 -t '0.00125 0.0025 0.005 0.01 0.02 0.04' ./qmc*/*scalar*
fit function : quadratic
fitted formula: (-3848.245 +/- 0.047) + (-7.25 +/- 8.33)*t + (-285.00 +/- 202.39)*t^2
intercept : -3848.245 +/- 0.047 Ha
\end{shade}
In this case we find a zero timestep estimate of $-3848.25(5)$ Ha$^{-1}$.
A timestep of $0.04$ Ha$^{-1}$ might be on the large side to include in
timestep extrapolation and it is likely to have an outsize influence
in the case of linear extrapolation. Upon excluding this point, linear
extrapolation yields a zero timestep value of $-3848.22(4)$ Ha$^{-1}$.
It should be noted that quadratic extrapolation can result in intrinsically
larger uncertainty in the extrapolated value. For example, when the $0.04$
Ha$^{-1}$ point is excluded the uncertainty grows by 50\% and we obtain an
estimated value of $-3848.28(7)$ instead.
\section{Densities and spin-densities}

Binary file not shown.

Binary file not shown.

View File

@ -9,7 +9,7 @@ import inspect
try:
from scipy.optimize import fmin
except ImportError:
print 'tsfit error: scipy is not present on your machine\n please install scipy and retry'
print 'qfit error: scipy is not present on your machine\n please install scipy and retry'
#end try
try:
import matplotlib
@ -96,13 +96,13 @@ def message(msg,header=None,post_header=' message:',indent=' ',logfile=devlog
#end def message
def warn(msg,header='qmcfit',indent=' ',logfile=devlog):
def warn(msg,header='qfit',indent=' ',logfile=devlog):
post_header=' warning:'
message(msg,header,post_header,indent,logfile)
#end def warn
def error(msg,header='qmcfit',exit=True,trace=False,indent=' ',logfile=devlog):
def error(msg,header='qfit',exit=True,trace=False,indent=' ',logfile=devlog):
post_header=' error:'
message(msg,header,post_header,indent,logfile)
if exit:
@ -1517,7 +1517,7 @@ def timestep_fit():
if __name__=='__main__':
fit_types = sorted(all_fit_functions.keys())
if len(sys.argv)<2:
error('first argument must be type of fit\ne.g. for a timestep fit, type "qmcfit ts ..."\nvalid fit types are: {0}'.format(fit_types))
error('first argument must be type of fit\ne.g. for a timestep fit, type "qfit ts ..."\nvalid fit types are: {0}'.format(fit_types))
#end if
fit_type = sys.argv[1]
if fit_type in fit_types: