qmcpack/manual/spo_spline.tex

132 lines
8.4 KiB
TeX

\subsection{Spline basis sets}
\label{sec:spo_spline}
In this section we describe the use of spline basis sets to expand the \texttt{sposet}.
Spline basis sets are designed to work seamlessly with plane wave DFT code (e.g.,\ Quantum ESPRESSO as a trial wavefunction generator).
In QMC algorithms, all the SPOs $\{\phi(\vec{r})\}$ need to be updated every time a single electron moves.
Evaluating SPOs takes a very large portion of computation time.
In principle, PW basis set can be used to express SPOs directly in QMC, as in DFT.
But it introduces an unfavorable scaling because the basis set size increases linearly as the system size.
For this reason, it is efficient to use a localized basis with compact
support and a good transferability from the plane wave basis.
In particular, 3D tricubic B-splines provide a basis in which only
64 elements are nonzero at any given point in space~\cite{blips4QMC}.
The 1D cubic B-spline is given by
\begin{equation}
f(x) = \sum_{i'=i-1}^{i+2} b^{i'\!,3}(x)\,\, p_{i'},
\label{eq:SplineFunc}
\end{equation}
where $b^{i}(x)$ is the piecewise cubic polynomial basis functions
and $i = \text{floor}(\Delta^{-1} x)$ is the index of
the first grid point $\le x$. Constructing a tensor product in each Cartesian
direction, we can represent a 3D orbital as
\begin{equation}
\phi_n(x,y,z) =
\!\!\!\!\sum_{i'=i-1}^{i+2} \!\! b_x^{i'\!,3}(x)
\!\!\!\!\sum_{j'=j-1}^{j+2} \!\! b_y^{j'\!,3}(y)
\!\!\!\!\sum_{k'=k-1}^{k+2} \!\! b_z^{k'\!,3}(z) \,\, p_{i', j', k',n}.
\label{eq:TricubicValue}
\end{equation}
This allows the rapid evaluation of each orbital in constant time.
Furthermore, this basis is systematically improvable with a single spacing
parameter so that accuracy is not compromised compared with the plane wave basis.
The use of 3D tricubic B-splines greatly improves computational efficiency.
The gain in computation time from a plane wave basis set to an equivalent B-spline basis set
becomes increasingly large as the system size grows.
On the downside, this computational efficiency comes at
the expense of increased memory use, which is easily overcome, however, by the large
aggregate memory available per node through OpenMP/MPI hybrid QMC.
The input xml block for the spline SPOs is given in Listing~\ref{listing:splineSPOs}. A list of options is given in
Table~\ref{table:splineSPOs}.
\begin{table}[h]
\begin{center}
\begin{tabularx}{\textwidth}{l l l l l X }
\hline
\multicolumn{6}{l}{\texttt{determinantset} element} \\
\hline
\multicolumn{2}{l}{Parent elements:} & \multicolumn{4}{l}{\texttt{wavefunction}}\\
\multicolumn{2}{l}{Child elements:} & \multicolumn{4}{l}{\texttt{slaterdeterminant}}\\
\multicolumn{2}{l}{Attribute:} & \multicolumn{4}{l}{}\\
& \bfseries Name & \bfseries Datatype & \bfseries Values & \bfseries Default & \bfseries Description \\
& \texttt{type} & Text & Bspline & & Type of \texttt{sposet}. \\
& \texttt{href} & Text & & & Path to hdf5 file from pw2qmcpack.x. \\
& \texttt{tilematrix} & 9 integers & & & Tiling matrix used to expand supercell. \\
& \texttt{twistnum} & Integer & & & Index of the super twist. \\
& \texttt{twist} & 3 floats & & & Super twist. \\
& \texttt{meshfactor} & Float & $\le 1.0$ & & Grid spacing ratio. \\
& \texttt{precision} & Text & Single/double & & Precision of spline coefficients. \\
& \texttt{gpu} & Text & Yes/no & & GPU switch. \\
& \texttt{gpusharing} & Text & Yes/no & No & Share B-spline table across GPUs. \\
& \texttt{Spline\_Size\_Limit\_MB} & Integer & & & Limit B-spline table size on GPU. \\
& \texttt{check\_orb\_norm} & Text & Yes/no & Yes & Check norms of orbitals from h5 file. \\
& \texttt{save\_coefs} & Text & Yes/no & No & Save the spline coefficients to h5 file. \\
& \texttt{source} & Text & \textit{Any} & Ion0 & Particle set with atomic positions. \\
\hline
\end{tabularx}
\end{center}
\caption{Options for the \texttt{determinantset} xml-block associated with B-spline single particle orbital sets.}
\label{table:splineSPOs}
\end{table}
%%\begin{lstlisting}[caption=.]
%%<sposet_builder type="bspline" href="pwscf.h5" tilematrix="2 0 0 0 2 0 0 0 2" twistnum="0"
%% source="i" meshfactor="1.0" precision="float" truncate="no">
%% <sposet type="bspline" name="spo_ud" size="208" spindataset="0"/>
%%</sposet_builder>
%%<determinantset>
%% <slaterdeterminant>
%% <determinant id="updet" group="u" sposet="spo_ud" size="208"/>
%% <determinant id="downdet" group="d" sposet="spo_ud" size="208"/>
%% </slaterdeterminant>
%%</determinantset>
%%\end{lstlisting}
\begin{lstlisting}[style=QMCPXML,caption=Determinant set XML element.\label{listing:splineSPOs}]
<determinantset type="bspline" source="i" href="pwscf.h5"
tilematrix="1 1 3 1 2 -1 -2 1 0" twistnum="-1" gpu="yes" meshfactor="0.8"
twist="0 0 0" precision="double">
<slaterdeterminant>
<determinant id="updet" size="208">
<occupation mode="ground" spindataset="0">
</occupation>
</determinant>
<determinant id="downdet" size="208">
<occupation mode="ground" spindataset="0">
</occupation>
</determinant>
</slaterdeterminant>
</determinantset>
\end{lstlisting}
Additional information:
\begin{itemize}
\item \texttt{precision}. Only effective on CPU versions without mixed precision, ``single" is always imposed with mixed precision. Using single precision not only saves memory use but also speeds up the B-spline evaluation. We recommend using single precision since we saw little chance of really compromising the accuracy of calculation.
\item \texttt{meshfactor}. The ratio of actual grid spacing of B-splines used in QMC calculation with respect to the original one calculated from h5. A smaller meshfactor saves memory use but reduces accuracy. The effects are similar to reducing plane wave cutoff in DFT calculations. Use with caution!
\item \texttt{twistnum}. If positive, it is the index. We recommend not taking this way since the indexing might show some uncertainty. If negative, the super twist is referred by \texttt{twist}.
\item \texttt{save\_coefs}. If yes, dump the real-space B-spline coefficient table into an h5 file on the disk.
When the orbital transformation from k space to B-spline requires more than the available amount of scratch memory on the compute nodes,
users can perform this step on fat nodes and transfer back the h5 file for QMC calculations.
\item \texttt{gpusharing}. If enabled, spline data is shared across multiple GPUs on a given computational node. For example, on a
two-GPU-per-node system, each GPU would have half of the
orbitals. This enables larger overall spline tables than would normally fit in
the memory of individual GPUs to be used, potentially up to
the total GPU memory on a node. To obtain high performance, large
electron counts or a high-performing CPU-GPU interconnect is required.
To use this feature, the following needs to be done:
\begin{itemize}
\item The CUDA Multi-Process Service (MPS) needs to be used
(e.g., on Summit/SummitDev use ``-alloc\_flags gpumps" for
bsub). If MPS is not detected, sharing will be disabled.
\item CUDA\_VISIBLE\_DEVICES needs to be properly set to control each
rank's visible CUDA devices (e.g., on OLCF Summit/SummitDev one
needs to create a resource set containing all GPUs with the
respective number of ranks with ``jsrun --task-per-rs Ngpus -g
Ngpus").
\end{itemize}
\item \texttt{Spline\_Size\_Limit\_MB}. Allows distribution of the B-spline coefficient table between the host and GPU memory. The compute kernels access host memory via zero-copy. Although the performance penalty introduced by it is significant, it allows large calculations to go through.
\end{itemize}