qmcpack/manual/opt.tex

429 lines
27 KiB
TeX

\section{Wavefunction optimization}
\label{sec:optimization}
Optimizing wavefunction is critical in all kinds of real-space QMC calculations
because it significantly improves both the accuracy and efficiency of computation.
However, it is very difficult to directly adopt deterministic minimization approaches because of the stochastic nature of evaluating quantities with MC.
Thanks to the algorithmic breakthrough during the first decade of this century and the tremendous computer power available,
it is now feasible to optimize tens of thousands of parameters in a wavefunction for a solid or molecule.
QMCPACK has multiple optimizers implemented based on the state-of-the-art linear method.
We are continually improving our optimizers for robustness and friendliness and are trying to provide a single solution.
Because of the large variation of wavefunction types carrying distinct characteristics, using several optimizers might be needed in some cases.
We strongly suggested reading recommendations from the experts who maintain these optimizers.
A typical optimization block looks like the following. It starts with method=``linear" and contains three blocks of parameters.
\begin{lstlisting}[style=QMCPXML]
<loop max="10">
<qmc method="linear" move="pbyp" gpu="yes">
<!-- Specify the VMC options -->
<parameter name="walkers"> 256 </parameter>
<parameter name="samples"> 2867200 </parameter>
<parameter name="stepsbetweensamples"> 1 </parameter>
<parameter name="substeps"> 5 </parameter>
<parameter name="warmupSteps"> 5 </parameter>
<parameter name="blocks"> 70 </parameter>
<parameter name="timestep"> 1.0 </parameter>
<parameter name="usedrift"> no </parameter>
<estimator name="LocalEnergy" hdf5="no"/>
...
<!-- Specify the correlated sampling options and define the cost function -->
<parameter name="minwalkers"> 0.3 </parameter>
<cost name="energy"> 0.95 </cost>
<cost name="unreweightedvariance"> 0.00 </cost>
<cost name="reweightedvariance"> 0.05 </cost>
...
<!-- Specify the optimizer options -->
<parameter name="MinMethod"> OneShiftOnly </parameter>
...
</qmc>
</loop>
\end{lstlisting}
\begin{itemize}
\item Loop is helpful to repeatedly execute identical optimization blocks.
\item The first part is highly identical to a regular VMC block.
\item The second part is to specify the correlated sampling options and define the cost function.
\item The last part is used to specify the options of different optimizers, which can be very distinct from one to another.
\end{itemize}
\subsection{VMC run for the optimization}
The VMC calculation for the wavefunction optimization has a strict requirement
that \ixml{samples} or \ixml{samplesperthread} must be specified because of the optimizer needs for the stored \ixml{samples}.
The input parameters of this part are identical to the VMC method.
Recommendations:
\begin{itemize}
\item Run the inclusive VMC calculation correctly and efficiently because this takes a significant amount of time during optimization.
For example, make sure the derived \ixml{steps} per block is 1 and use larger \ixml{substeps} to control the correlation between \ixml{samples}.
\item A reasonable starting wavefunction is necessary. A lot of optimization fails because of a bad wavefunction starting point.
The sign of a bad initial wavefunction includes but is not limited to a very long equilibration time, low acceptance ratio, and huge variance.
The first thing to do after a failed optimization is to check the information provided by the VMC calculation via \texttt{*.scalar.dat files}.
\end{itemize}
\subsection{Correlated sampling and cost function}
After generating the samples with VMC, the derivatives of the wavefunction with respect to the parameters are computed for proposing a new set of parameters by optimizers.
And later, a correlated sampling calculation is performed to quickly evaluate values of the cost function on the old set of parameters and the new set for further decisions.
The input parameters are listed in the following table.
\begin{table}[h]
\begin{center}
\begin{tabularx}{\textwidth}{l l l l l X }
\hline
\multicolumn{6}{l}{\texttt{linear} method} \\
\hline
\multicolumn{2}{l}{parameters} & \multicolumn{4}{l}{}\\
& \bfseries name & \bfseries datatype & \bfseries values & \bfseries default & \bfseries description \\
& \texttt{nonlocalpp} & text & yes, no & no & include non-local PP energy in the cost function\\
% & \texttt{GEVMethod} & text & mixed, H2 & mixed & methods of generalized eigenvalue problem\\
% & \texttt{beta} & real & any value & 0.0 & a parameter for GEVMethod\\
% & \texttt{use\_nonlocalpp\_deriv} & text & yes, no & no & include the derivatives of non-local PP\\
& \texttt{minwalkers} & real & 0--1 & 0.3 & Lower bound of the effective weight\\
& \texttt{maxWeight} & real & $>1$ & 1e6 & Maximum weight allowed in reweighting\\
\hline
\end{tabularx}
\end{center}
\end{table}
Additional information:
\begin{itemize}
\item \ixml{maxWeight}: The default should be good.
\item \ixml{nonlocalpp}: The \texttt{nonlocalpp} contribution to the local energy depends on the wavefunction.
When a new set of parameters is proposed, this contribution needs to be updated if the cost function consists of local energy.
Fortunately, nonlocal contribution is chosen small when making a PP for small locality error.
We can ignore its change and avoid the expensive computational cost.
An implementation issue with GPU code is that a large amount of memory is consumed with this option.
\item \ixml{minwalkers}: This is a \textit{critical} parameter. When the ratio of effective samples to actual number of samples in a reweighting step goes lower than \ixml{minwalkers},
the proposed set of parameters is invalid. % The last set of acceptable parameters is kept.
\end{itemize}
The cost function consists of three components: energy, unreweighted variance, and reweighted variance.
\begin{lstlisting}[style=QMCPXML]
<cost name="energy"> 0.95 </cost>
<cost name="unreweightedvariance"> 0.00 </cost>
<cost name="reweightedvariance"> 0.05 </cost>
\end{lstlisting}
\subsection{Optimizers}
QMCPACK implements a few optimizers having different preference aiming for different priorities.
The optimizers can be switched among ``OneShiftOnly" (default), ``adaptive," and ``quartic" (old) using the following line in the optimization block:
\begin{lstlisting}
<parameter name="MinMethod"> THE METHOD YOU LIKE </parameter>
\end{lstlisting}
\subsubsection{OneShiftOnly}
The OneShiftOnly optimizer targets a fast optimization by moving parameters more aggressively. It works with OpenMP and GPU and can be considered for large systems.
This method relies on the effective weight of correlated sampling rather than the cost function value to justify a new set of parameters.
If the effective weight is larger than \ixml{minwalkers}, the new set is taken whether or not the cost function value decreases.
If a proposed set is rejected, the standard output prints the measured ratio of effective samples to the total number of samples
and adjustment on \ixml{minwalkers} can be made if needed.
\begin{table}[h]
\begin{center}
\begin{tabularx}{\textwidth}{l l l l l X }
\hline
\multicolumn{6}{l}{\texttt{linear} method} \\
\hline
\multicolumn{2}{l}{parameters} & \multicolumn{4}{l}{}\\
& \bfseries name & \bfseries datatype & \bfseries values & \bfseries default & \bfseries description \\
& \texttt{shift\_i} & real & $>0$ & 0.01 & Direct stabilizer added to the Hamiltonian matrix\\
& \texttt{shift\_s} & real & $>0$ & 1.00 & Initial stabilizer based on the overlap matrix\\
\hline
\end{tabularx}
\end{center}
\end{table}
Additional information:
\begin{itemize}
\item \ixml{shift_i}: This is the direct term added to the diagonal of the Hamiltonian matrix.
It provides more stable but slower optimization with a large value.
\item \ixml{shift_s}: This is the initial value of the stabilizer based on the overlap matrix added to the Hamiltonian matrix.
It provides more stable but slower optimization with a large value. The used value is auto-adjusted by the optimizer.
\end{itemize}
Recommendations:
\begin{itemize}
\item Default \ixml{shift_i}, \ixml{shift_s} should be fine.
\item For hard cases, increasing \ixml{shift_i} (by a factor of 5 or 10) can significantly stabilize the optimization by reducing the pace towards the optimal parameter set.
\item If the VMC energy of the last optimization iterations grows significantly, increase \ixml{minwalkers} closer to 1 and make the optimization stable.
\item If the first iterations of optimization are rejected on a reasonable initial wavefunction,
lower the \ixml{minwalkers} value based on the measured value printed in the standard output to accept the move.
\end{itemize}
We recommended using this optimizer in two sections with a very small \ixml{minwalkers} in the first and a large value in the second, such as the following.
In the very beginning, parameters are far away from optimal values and large changes are proposed by the optimizer.
Having a small \ixml{minwalkers} makes it much easier to accept these changes.
When the energy gradually converges, we can have a large \ixml{minwalkers} to avoid risky parameter sets.
\begin{lstlisting}[style=QMCPXML]
<loop max="6">
<qmc method="linear" move="pbyp" gpu="yes">
<!-- Specify the VMC options -->
<parameter name="walkers"> 1 </parameter>
<parameter name="samples"> 10000 </parameter>
<parameter name="stepsbetweensamples"> 1 </parameter>
<parameter name="substeps"> 5 </parameter>
<parameter name="warmupSteps"> 5 </parameter>
<parameter name="blocks"> 25 </parameter>
<parameter name="timestep"> 1.0 </parameter>
<parameter name="usedrift"> no </parameter>
<estimator name="LocalEnergy" hdf5="no"/>
<!-- Specify the optimizer options -->
<parameter name="MinMethod"> OneShiftOnly </parameter>
<parameter name="minwalkers"> 1e-4 </parameter>
</qmc>
</loop>
<loop max="12">
<qmc method="linear" move="pbyp" gpu="yes">
<!-- Specify the VMC options -->
<parameter name="walkers"> 1 </parameter>
<parameter name="samples"> 20000 </parameter>
<parameter name="stepsbetweensamples"> 1 </parameter>
<parameter name="substeps"> 5 </parameter>
<parameter name="warmupSteps"> 2 </parameter>
<parameter name="blocks"> 50 </parameter>
<parameter name="timestep"> 1.0 </parameter>
<parameter name="usedrift"> no </parameter>
<estimator name="LocalEnergy" hdf5="no"/>
<!-- Specify the optimizer options -->
<parameter name="MinMethod"> OneShiftOnly </parameter>
<parameter name="minwalkers"> 0.5 </parameter>
</qmc>
</loop>
\end{lstlisting}
For each optimization step, you will see\\
\begin{lstlisting}
The new set of parameters is valid. Updating the trial wave function!
\end{lstlisting}\par
or\par
\begin{lstlisting}
The new set of parameters is not valid. Revert to the old set!
\end{lstlisting}
Occasional rejection is fine. Frequent rejection indicates potential problems, and users should inspect the VMC calculation or change optimization strategy.
To track the progress of optimization, use the command \texttt{qmca -q ev *.scalar.dat} to look at the VMC energy and variance for each optimization step.
\subsubsection{adaptive}
The default setting of the adaptive optimizer is to construct the linear method Hamiltonian and overlap matrices explicitly and add different shifts to the Hamiltonian matrix
as ``stabilizers.''
The generalized eigenvalue problem is solved for each shift to obtain updates to the wavefunction parameters.
Then a correlated sampling is performed for each shift's updated wavefunction and the initial trial wavefunction
using the middle shift's updated wavefunction as the guiding function.
The cost function for these wavefunctions is compared, and the update corresponding to the best cost function is selected.
In the next iteration, the median magnitude of the stabilizers is set to the magnitude that generated the best update in the current iteration, thus adapting the magnitude of
the stabilizers automatically.
When the trial wavefunction contains more than 10,000 parameters, constructing and storing the linear method matrices could become a memory bottleneck.
To avoid explicit construction of these matrices, the adaptive optimizer implements the block linear method (BLM) approach. \cite{Zhao:2017:blocked_lm}
The BLM tries to find an approximate
solution\: $\vec{c}_{opt}$ to the standard LM generalized eigenvalue problem by dividing the variable space into a number of blocks
and making intelligent estimates for which directions within those blocks will be most important for constructing\: $\vec{c}_{opt}$,
which is then obtained by solving a smaller, more memory-efficient
eigenproblem in the basis of these supposedly important block-wise directions.
\begin{table}[h]
\begin{center}
\begin{tabularx}{\textwidth}{l l l l l X }
\hline
\multicolumn{6}{l}{\texttt{linear} method} \\
\hline
\multicolumn{2}{l}{parameters} & \multicolumn{4}{l}{}\\
& \bfseries name & \bfseries datatype & \bfseries values & \bfseries default & \bfseries description \\
%& \texttt{stepsize} & real & 0--1 & 0.25 & Step size for moving parameters\\
& \texttt{max\_relative\_change} & real & $>0$ & 10.0 & Allowed change in cost function\\
& \texttt{max\_param\_change} & real & $>0$ & 0.3 & Allowed change in wavefunction parameter\\
& \texttt{shift\_i} & real & $>0$ & 0.01 & Initial diagonal stabilizer added to the Hamiltonian matrix\\
& \texttt{shift\_s} & real & $>0$ & 1.00 & Initial overlap-based stabilizer added to the Hamiltonian matrix\\
& \texttt{target\_shift\_i} & real & any & -1.0 & Diagonal stabilizer value aimed for during adaptive method (disabled if $\leq$ 0)\\
& \texttt{cost\_increase\_tol} & real & $\geq 0$ & 0.0 & Tolerance for cost function increases\\
& \texttt{chase\_lowest} & text & yes, no & yes & Chase the lowest eigenvector in iterative solver\\
& \texttt{chase\_closest} & text & yes, no & no & Chase the eigenvector closest to initial guess\\
& \texttt{block\_lm} & text & yes, no & no & Use BLM\\
& \texttt{nblocks} & integer & $>0$ & & Number of blocks in BLM\\
& \texttt{nolds} & integer & $>0$ & & Number of old update vectors used in BLM\\
& \texttt{nkept} & integer & $>0$ & & Number of eigenvectors to keep per block in BLM\\
\hline
\end{tabularx}
\end{center}
\end{table}
Additional information:
\begin{itemize}
\item \ixml{shift_i}: This is the initial coefficient used to scale the diagonal stabilizer.
More stable but slower optimization is expected with a large value.
The adaptive method will automatically adjust this value after each linear method iteration.
\item \ixml{shift_s}: This is the initial coefficient used to scale the overlap-based stabilizer.
More stable but slower optimization is expected with a large value.
The adaptive method will automatically adjust this value after each linear method iteration.
\item \ixml{target_shift_i}: If set greater than zero, the adaptive method will choose the update whose shift\_i value is closest to
this target value so long as the associated cost is within cost\_increase\_tol of the lowest cost.
Disable this behavior by setting target\_shift\_i to a negative number.
\item \ixml{cost_increase_tol}: Tolerance for cost function increases when selecting the best shift.
\item \ixml{nblocks}: This is the number of blocks used in BLM. The amount of memory required to store LM matrices decreases
as the number of blocks increases. But the error introduced by BLM would increase as the number of blocks increases.
\item \ixml{nolds}: In BLM, the interblock correlation is accounted for by including a small number of wavefunction update vectors
outside the block. Larger \ixml{nolds} would include more interblock correlation and more accurate results but
also higher memory requirements.
\item \ixml{nkept}: This is the number of update directions retained from each block in the BLM. If all directions are retained in each block,
then the BLM becomes equivalent to the standard LM. Retaining five or fewer directions per block is often sufficient.
\end{itemize}
Recommendations:
\begin{itemize}
\item Default \ixml{shift_i}, \ixml{shift_s} should be fine.
\item When there are fewer than about 5,000 variables being optimized, the traditional LM is preferred because it has a lower overhead than the BLM when the number of variables is small.
\item Initial experience with the BLM suggests that a few hundred blocks and a handful of \ixml{nolds} and \ixml{nkept}
often provide a good balance between memory use and accuracy. In general, using fewer blocks should be more accurate but would require more memory.
\end{itemize}
\begin{lstlisting}[style=QMCPXML]
<loop max="15">
<qmc method="linear" move="pbyp">
<!-- Specify the VMC options -->
<parameter name="walkers"> 1 </parameter>
<parameter name="samples"> 20000 </parameter>
<parameter name="stepsbetweensamples"> 1 </parameter>
<parameter name="substeps"> 5 </parameter>
<parameter name="warmupSteps"> 5 </parameter>
<parameter name="blocks"> 50 </parameter>
<parameter name="timestep"> 1.0 </parameter>
<parameter name="usedrift"> no </parameter>
<estimator name="LocalEnergy" hdf5="no"/>
<!-- Specify the correlated sampling options and define the cost function -->
<cost name="energy"> 1.00 </cost>
<cost name="unreweightedvariance"> 0.00 </cost>
<cost name="reweightedvariance"> 0.00 </cost>
<!-- Specify the optimizer options -->
<parameter name="MinMethod">adaptive</parameter>
<parameter name="max_relative_cost_change">10.0</parameter>
<parameter name="shift_i"> 1.00 </parameter>
<parameter name="shift_s"> 1.00 </parameter>
<parameter name="max_param_change"> 0.3 </parameter>
<parameter name="chase_lowest"> yes </parameter>
<parameter name="chase_closest"> yes </parameter>
<parameter name="block_lm"> no </parameter>
<!-- Specify the BLM specific options if needed
<parameter name="nblocks"> 100 </parameter>
<parameter name="nolds"> 5 </parameter>
<parameter name="nkept"> 3 </parameter>
-->
</qmc>
</loop>
\end{lstlisting}
%To activate this optimizer, add ``-D BUILD\_LMYENGINE\_INTERFACE=1'' in the CMake command line.
The adaptive optimizer is also able to optimize individual excited states directly. \cite{Zhao:2016:dir_tar}
In this case, it tries to minimize the following function:
\begin{equation*}
\Omega[\Psi]=\frac{\left<\Psi|\omega-H|\Psi\right>}{\left<\Psi|{\left(\omega-H\right)}^2|\Psi\right>}\:.
\end{equation*}
The global minimum of this function corresponds to the state whose energy lies immediately above the shift parameter $\omega$ in the energy spectrum.
For example, if $\omega$ were placed in between the ground state energy and the first excited state energy and the wavefunction ansatz was capable of a good
description for the first excited state, then the wavefunction would be optimized for the first excited state.
Note that if the ansatz is not capable of a good description of the excited state in question, the optimization could converge to a different
state, as is known to occur in some circumstances for traditional ground state optimizations.
Note also that the ground state can be targeted by this method by choosing $\omega$ to be below the ground state energy, although we should stress that this
is not the same thing as a traditional ground state optimization and will in general give a slightly different wavefunction.
Excited state targeting requires two additional parameters, as shown in the following table.\\
\begin{table}[h]
\begin{center}
\begin{tabularx}{\textwidth}{l l l l l X }
\hline
\multicolumn{6}{l}{Excited state targeting} \\
\hline
\multicolumn{2}{l}{parameters} & \multicolumn{4}{l}{}\\
& \bfseries name & \bfseries datatype & \bfseries values & \bfseries default & \bfseries description \\
%& \texttt{stepsize} & real & 0--1 & 0.25 & Step size for moving parameters\\
& \texttt{targetExcited} & text & yes, no & no & Whether to use the excited state targeting optimization\\
& \texttt{omega} & real & real numbers & none & Energy shift used to target different excited states\\
\hline
\end{tabularx}
\end{center}
\end{table}
Excited state recommendations:
\begin{itemize}
\item Because of the finite variance in any approximate wavefunction, we recommended setting $\omega=\omega_0-\sigma$, where $\omega_0$ is placed just
below the energy of the targeted state and $\sigma^2$ is the energy variance.
\item To obtain an unbiased excitation energy, the ground state should be optimized with the excited state variational principle as well by setting
\ixml{omega} below the ground state energy. Note that using the ground state variational principle for the ground state and the excited state variational
principle for the excited state creates a bias in favor of the ground state.
\end{itemize}
\subsubsection{quartic}
\textit{This is an older optimizer method retained for compatibility. We recommend starting with the newest OneShiftOnly or adaptive optimizers.}
The quartic optimizer fits a quartic polynomial to 7 values of the cost function obtained using reweighting along the chosen direction and determines the optimal move.
This optimizer is very robust but is a bit conservative when accepting new steps, especially when large parameters changes are proposed.
\begin{table}[h]
\begin{center}
\begin{tabularx}{\textwidth}{l l l l l X }
\hline
\multicolumn{6}{l}{\texttt{linear} method} \\
\hline
\multicolumn{2}{l}{parameters} & \multicolumn{4}{l}{}\\
& \bfseries name & \bfseries datatype & \bfseries values & \bfseries default & \bfseries description \\
%& \texttt{stepsize} & real & 0--1 & 0.25 & Step size for moving parameters\\
& \texttt{bigchange} & real & $>0$ & 50.0 & Largest parameter change allowed\\
& \texttt{alloweddifference} & real & $>0$ & 1e-4 & Allowed increased in energy\\
& \texttt{exp0} & real & any value & -16.0 & Initial value for stabilizer\\
& \texttt{stabilizerscale} & real & $>0$ & 2.0 & Increase in value of \texttt{exp0} between iterations\\
& \texttt{nstabilizers} & integer & $>0$ & 3 & Number of stabilizers to try\\
& \texttt{max\_its} & integer & $>0$ & 1 & Number of inner loops with same samples\\
\hline
\end{tabularx}
\end{center}
\end{table}
Additional information:
\begin{itemize}
\item \ixml{exp0}. This is the initial value for stabilizer (shift to diagonal of H). The actual value of stabilizer is $10^{\textrm{exp0}}$.
\end{itemize}
Recommendations:
\begin{itemize}
\item{For hard cases (e.g., simultaneous optimization of long MSD and 3-Body J), set \ixml{exp0}
to 0 and do a single inner iteration (max its=1) per sample of configurations.}
\end{itemize}
\begin{lstlisting}[style=QMCPXML]
<!-- Specify the optimizer options -->
<parameter name="MinMethod">quartic</parameter>
<parameter name="exp0">-6</parameter>
<parameter name="alloweddifference"> 1.0e-4 </parameter>
<parameter name="nstabilizers"> 1 </parameter>
<parameter name="bigchange">15.0</parameter>
\end{lstlisting}
\subsection{General recommendations}
Here are a few recommendations to make wavefunction optimization easier.
\begin{itemize}
\item All electron wavefunctions are typically more difficult to optimize than pseudopotential wavefunctions because of the importance of the wavefunction near the nucleus.
\item Two-body Jastrow contributes the largest portion of correlation energy from bare Slater determinants. Consequently, the recommended order for optimizing wavefunction components is two-body, one-body, three-body Jastrow factors and MSD coefficients.
\item For two-body spline Jastrows, always start from a reasonable one. The lack of physically motivated constraints in the functional form at large distances can cause slow convergence if starting from zero.
\item One-body spline Jastrow from old calculations can be a good starting point.
\item Three-body polynomial Jastrow can start from zero. It is beneficial to first optimize one-body and two-body Jastrow factors without adding three-body terms in the calculation and then add the three-body Jastrow and optimize all the three components together.
\end{itemize}
\subsubsection{Optimization of CI coefficients}
When storing a CI wavefunction in HDF5 format, the CI coefficients and the $\alpha$ and $\beta$ components of each CI are not in the XML input file. When optimizing the CI coefficients, they will be stored in HDF5 format.
The optimization header block will have to specify that the new CI coefficients will be saved to HDF5 format. If the tag is not added coefficients will not be saved.
\begin{lstlisting}[style=QMCPXML]
<qmc method="linear" move="pbyp" gpu="no" hdf5="yes">
\end{lstlisting}
The rest of the optimization block remains the same.
When running the optimization, the new coefficients will be stored in a *.sXXX.opt.h5 file, where XXX coressponds to the series number. The H5 file contains only the optimized coefficients. The corresponding *.sXXX.opt.xml will be updated for each optimization block as follow:
\begin{lstlisting}[style=QMCPXML]
<detlist size="1487" type="DETS" nca="0" ncb="0" nea="2" neb="2" nstates="85" cutoff="1e-2" href="../LiH.orbs.h5" opt_coeffs="LiH.s001.opt.h5"/>
\end{lstlisting}
The opt\_coeffs tag will then reference where the new CI coefficients are stored.\\
When restarting the run with the new optimized coeffs, you need to specify the previous hdf5 containing the basis set, orbitals, and MSD, as well as the new optimized coefficients. The code will read the previous data but will rewrite the coefficients that were optimized with the values found in the *.sXXX.opt.h5 file.
\subsection{General recommendations}
Here are a few recommendations to avoid bad calculations
\begin{itemize}
\item CI coefficients are optimized with a set of Jastrows. Make sure you maintain the pair (Jastrows, CI-coefficients) as computed.
\end{itemize}