mirror of https://github.com/QMCPACK/qmcpack.git
QMCPACK manual entry for timestep fitting tool
This commit is contained in:
parent
532380beb9
commit
c48aeb41f4
|
@ -1002,8 +1002,156 @@ Obtaining the local, kinetic, and potential energies in eV:
|
|||
\end{enumerate}
|
||||
|
||||
|
||||
%\section{Using the qfit tool for statistical timestep extrapolation and curve fitting}
|
||||
%\label{sec:qfit}
|
||||
\section{Using the qfit tool for statistical timestep extrapolation and curve fitting}
|
||||
\label{sec:qfit}
|
||||
|
||||
The \texttt{qfit} tool is used to provide statistical estimates of
|
||||
curve fitting parameters based on QMCPACK data. While \texttt{qfit}
|
||||
will eventually support many types of fitted curves (\emph{e.g.} Morse
|
||||
potential binding curves, various equation of state fitting curves, etc.),
|
||||
it is currently limited to estimating fitting parameters related to
|
||||
timestep extrapolation.
|
||||
|
||||
\subsection{The jack-knife statistical technique}
|
||||
The \texttt{qfit} tool obtains estimates of fitting parameter
|
||||
means and associated error bars via the ``jack-knife''
|
||||
technique. The jack-knife method is a powerful and general tool
|
||||
to obtain meaningful error bars for any quantity that is related
|
||||
in a non-linear fashion to an underlying set of statistical data.
|
||||
For this reason, we give a brief overview of the jack-knife
|
||||
technique before proceeding with usage instructions for the
|
||||
\texttt{qfit} tool.
|
||||
|
||||
Consider $N$ statistical variables $\{x_n\}_{n=1}^N$ that have
|
||||
been outputted by one or more simulation runs. If we have
|
||||
$M$ samples of each of the $N$ variables, then the mean values
|
||||
of each these variables can be estimated in the standard way,
|
||||
i.e. $\bar{x}_n\approx \tfrac{1}{M}\sum_{m=1}^Mx_{nm}$.
|
||||
|
||||
Suppose we are interested in $P$ statistical quantities
|
||||
$\{y_p\}_{p=1}^P$ that are related to the original $N$ variables
|
||||
by a known multidimensional function $F$:
|
||||
\begin{align}
|
||||
y_1,y_2,\ldots,y_P &= F(x_1,x_2,\ldots,x_N)\quad \textrm{or} \nonumber \\
|
||||
\vec{y} &= F(\vec{x})
|
||||
\end{align}
|
||||
The relationship implied by $F$ is completely general.
|
||||
For example the $\{x_n\}$ might be elements of a matrix
|
||||
with $\{y_p\}$ being the eigenvalues, or $F$ might be
|
||||
a fitting procedure for $N$ energies at different timesteps
|
||||
with $P$ fitting parameters. An approximate guess at the mean
|
||||
value of $\vec{y}$ can be obtained by evaluating $F$ at the mean
|
||||
value of $\vec{x}$ (i.e. $F(\bar{x}_1\ldots\bar{x}_N)$), but with
|
||||
this approach we have no way to estimate the statistical error
|
||||
bar of any $\bar{y}_p$.
|
||||
|
||||
In the jack-knife procedure, the statistical variability intrinsic
|
||||
to the underlying data $\{x_n\}$ is used to obtain estimates of the
|
||||
mean and error bar of $\{y_p\}$. We first construct a new set of $x$
|
||||
statistical data by taking the average over all samples but one:
|
||||
\begin{align}
|
||||
\tilde{x}_{nm} = \frac{1}{N-1}(N\bar{x}_n-x_{nm})\qquad m\in [1,M]
|
||||
\end{align}
|
||||
The result is a distribution of approximate $x$ mean values. These
|
||||
are used to construct a distribution of approximate means for $y$:
|
||||
\begin{align}
|
||||
\tilde{y}_{1m},\ldots,\tilde{y}_{Pm} = F(\tilde{x}_{1m},\ldots,\tilde{x}_{Nm}) \qquad m\in [1,M]
|
||||
\end{align}
|
||||
Estimates for the mean and error bar of the quantities of
|
||||
interest can finally be obtained using the formulas below:
|
||||
\begin{align}
|
||||
\bar{y}_p &= \frac{1}{M}\sum_{m=1}^M\tilde{y}_{pm} \\
|
||||
\sigma_{y_p} &= \sqrt{\frac{M-1}{M}\left(\sum_{m=1}^M\tilde{y}_{pm}^2-M\bar{y}_p^2\right)}
|
||||
\end{align}
|
||||
|
||||
|
||||
\subsection{Performing timestep extrapolation}
|
||||
In this section, we use a 32 atom supercell of MnO as an example
|
||||
system for timestep extrapolation. Data for this system has been
|
||||
collected in DMC using the following sequence of timesteps:
|
||||
$0.04,~0.02,~0.01,~0.005,~0.0025,~0.00125$ Ha$^{-1}$. For a typical
|
||||
production pseudopotential study, timesteps in the range
|
||||
$0.02-0.002$ Ha$^{-1}$ are usually sufficient and it is recommended
|
||||
to increase the number of steps/blocks by a factor of two when
|
||||
the timestep is halved. In order to perform accurate statistical
|
||||
fitting, we must first understand the equilibration and autocorrelation
|
||||
properties of the inputted local energy data. After plotting the
|
||||
local energy traces (\texttt{qmca -t -q e -e 0 ./qmc*/*scalar*})
|
||||
it is clear that an equilibration period of $30$ blocks is reasonable.
|
||||
Approximate autocorrelation lengths are also obtained with \texttt{qmca}:
|
||||
\begin{shade}
|
||||
>qmca -e 30 -q e --sac ./qmc*/qmc.g000.s002.scalar.dat
|
||||
./qmc_tm_0.00125/qmc.g000 series 2 LocalEnergy = -3848.234513 +/- 0.055754 1.7
|
||||
./qmc_tm_0.00250/qmc.g000 series 2 LocalEnergy = -3848.237614 +/- 0.055432 2.2
|
||||
./qmc_tm_0.00500/qmc.g000 series 2 LocalEnergy = -3848.349741 +/- 0.069729 2.8
|
||||
./qmc_tm_0.01000/qmc.g000 series 2 LocalEnergy = -3848.274596 +/- 0.126407 3.9
|
||||
./qmc_tm_0.02000/qmc.g000 series 2 LocalEnergy = -3848.539017 +/- 0.075740 2.4
|
||||
./qmc_tm_0.04000/qmc.g000 series 2 LocalEnergy = -3848.976424 +/- 0.075305 1.8
|
||||
\end{shade}
|
||||
\noindent
|
||||
The autocorrelation must be removed from the data prior to jack-knifing
|
||||
and so we will reblock the data by a factor of 4.
|
||||
|
||||
The \texttt{qfit} tool can be used in the following way to obtain
|
||||
a linear timestep fit of the data:
|
||||
\begin{shade}
|
||||
>qfit ts -e 30 -b 4 -s 2 -t '0.00125 0.0025 0.005 0.01 0.02 0.04' ./qmc*/*scalar*
|
||||
fit function : linear
|
||||
fitted formula: (-3848.193 +/- 0.037) + (-18.95 +/- 1.95)*t
|
||||
intercept : -3848.193 +/- 0.037 Ha
|
||||
\end{shade}
|
||||
The input arguments are as follows: \texttt{ts} indicates we are
|
||||
performing a timestep fit, ``\texttt{-e 30}'' is the equilibration period
|
||||
removed from each set of scalar data, ``\texttt{-b 4}'' indicates the data
|
||||
will be reblocked by a factor of 4 (\emph{e.g.} a file containing 400 \
|
||||
entries will be block averaged into a new set of 100 prior to jack-knife
|
||||
fitting), ``\texttt{-s 2}'' indicates that the timestep data begins with
|
||||
series 2 (scalar files matching \texttt{*s000*} or \texttt{*s001*} are
|
||||
to be excluded), and ``\texttt{-t } '0.00125 0.0025 0.005 0.01 0.02 0.04' ''
|
||||
provides a list of timestep values corresponding to the inputted scalar
|
||||
files. The ``\texttt{-e}'' and ``\texttt{-b}'' options can receive a
|
||||
list of file-specific values (same format as ``\texttt{-t}'') if desired.
|
||||
As can be seen from the text output, the parameters for the linear fit
|
||||
are printed with error bars obtained with jack-knife resampling and
|
||||
the zero timestep ``intercept'' is $-3848.19(4)$ Ha. In addition to
|
||||
text output, the command above will result in a plot of the fit with
|
||||
the zero timestep value shown as a red dot, as shown in the left
|
||||
panel of Fig.~\ref{fig:qfit_timestep}.
|
||||
|
||||
\begin{figure}
|
||||
\centering
|
||||
\begin{subfigure}[t]{0.47\textwidth}
|
||||
\centering
|
||||
\includegraphics[trim=0mm 0mm 4mm 0mm,clip,width=\linewidth]{figures/qfit_timestep_linear.pdf}
|
||||
\end{subfigure}
|
||||
\begin{subfigure}[t]{0.47\textwidth}
|
||||
\centering
|
||||
\includegraphics[trim=2mm 0mm 4mm 0mm,clip,width=\linewidth]{figures/qfit_timestep_quadratic.pdf}
|
||||
\end{subfigure}
|
||||
\caption{Linear (left) and quadratic (right) timestep fits to DMC data for a 32 atom supercell of MnO obtained with \texttt{qfit}. Zero timestep estimates are indicated by the red data point on the left side of either panel.\label{fig:qfit_timestep}}
|
||||
\end{figure}
|
||||
|
||||
Different fitting functions are supported via the ``\texttt{-f}'' option.
|
||||
Currently supported options include \texttt{linear} ($a+bt$),
|
||||
\texttt{quadratic} ($a+bt+ct^2$), and \texttt{sqrt} ($a+b\sqrt{t}+ct$).
|
||||
Results for a quadratic fit are shown below as well as in the right
|
||||
panel of Fig.~\ref{fig:qfit_timestep}.
|
||||
\begin{shade}
|
||||
>qfit ts -f quadratic -e30 -b4 -s2 -t '0.00125 0.0025 0.005 0.01 0.02 0.04' ./qmc*/*scalar*
|
||||
fit function : quadratic
|
||||
fitted formula: (-3848.245 +/- 0.047) + (-7.25 +/- 8.33)*t + (-285.00 +/- 202.39)*t^2
|
||||
intercept : -3848.245 +/- 0.047 Ha
|
||||
\end{shade}
|
||||
In this case we find a zero timestep estimate of $-3848.25(5)$ Ha$^{-1}$.
|
||||
A timestep of $0.04$ Ha$^{-1}$ might be on the large side to include in
|
||||
timestep extrapolation and it is likely to have an outsize influence
|
||||
in the case of linear extrapolation. Upon excluding this point, linear
|
||||
extrapolation yields a zero timestep value of $-3848.22(4)$ Ha$^{-1}$.
|
||||
It should be noted that quadratic extrapolation can result in intrinsically
|
||||
larger uncertainty in the extrapolated value. For example, when the $0.04$
|
||||
Ha$^{-1}$ point is excluded the uncertainty grows by 50\% and we obtain an
|
||||
estimated value of $-3848.28(7)$ instead.
|
||||
|
||||
|
||||
|
||||
\section{Densities and spin-densities}
|
||||
|
|
Binary file not shown.
Binary file not shown.
|
@ -9,7 +9,7 @@ import inspect
|
|||
try:
|
||||
from scipy.optimize import fmin
|
||||
except ImportError:
|
||||
print 'tsfit error: scipy is not present on your machine\n please install scipy and retry'
|
||||
print 'qfit error: scipy is not present on your machine\n please install scipy and retry'
|
||||
#end try
|
||||
try:
|
||||
import matplotlib
|
||||
|
@ -96,13 +96,13 @@ def message(msg,header=None,post_header=' message:',indent=' ',logfile=devlog
|
|||
#end def message
|
||||
|
||||
|
||||
def warn(msg,header='qmcfit',indent=' ',logfile=devlog):
|
||||
def warn(msg,header='qfit',indent=' ',logfile=devlog):
|
||||
post_header=' warning:'
|
||||
message(msg,header,post_header,indent,logfile)
|
||||
#end def warn
|
||||
|
||||
|
||||
def error(msg,header='qmcfit',exit=True,trace=False,indent=' ',logfile=devlog):
|
||||
def error(msg,header='qfit',exit=True,trace=False,indent=' ',logfile=devlog):
|
||||
post_header=' error:'
|
||||
message(msg,header,post_header,indent,logfile)
|
||||
if exit:
|
||||
|
@ -1517,7 +1517,7 @@ def timestep_fit():
|
|||
if __name__=='__main__':
|
||||
fit_types = sorted(all_fit_functions.keys())
|
||||
if len(sys.argv)<2:
|
||||
error('first argument must be type of fit\ne.g. for a timestep fit, type "qmcfit ts ..."\nvalid fit types are: {0}'.format(fit_types))
|
||||
error('first argument must be type of fit\ne.g. for a timestep fit, type "qfit ts ..."\nvalid fit types are: {0}'.format(fit_types))
|
||||
#end if
|
||||
fit_type = sys.argv[1]
|
||||
if fit_type in fit_types:
|
Loading…
Reference in New Issue