mirror of https://github.com/QMCPACK/qmcpack.git
132 lines
8.4 KiB
TeX
132 lines
8.4 KiB
TeX
\subsection{Spline basis sets}
|
|
\label{sec:spo_spline}
|
|
In this section we describe the use of spline basis sets to expand the \texttt{sposet}.
|
|
Spline basis sets are designed to work seamlessly with plane wave DFT code (e.g.,\ Quantum ESPRESSO as a trial wavefunction generator).
|
|
|
|
In QMC algorithms, all the SPOs $\{\phi(\vec{r})\}$ need to be updated every time a single electron moves.
|
|
Evaluating SPOs takes a very large portion of computation time.
|
|
In principle, PW basis set can be used to express SPOs directly in QMC, as in DFT.
|
|
But it introduces an unfavorable scaling because the basis set size increases linearly as the system size.
|
|
For this reason, it is efficient to use a localized basis with compact
|
|
support and a good transferability from the plane wave basis.
|
|
|
|
In particular, 3D tricubic B-splines provide a basis in which only
|
|
64 elements are nonzero at any given point in space~\cite{blips4QMC}.
|
|
The 1D cubic B-spline is given by
|
|
\begin{equation}
|
|
f(x) = \sum_{i'=i-1}^{i+2} b^{i'\!,3}(x)\,\, p_{i'},
|
|
\label{eq:SplineFunc}
|
|
\end{equation}
|
|
where $b^{i}(x)$ is the piecewise cubic polynomial basis functions
|
|
and $i = \text{floor}(\Delta^{-1} x)$ is the index of
|
|
the first grid point $\le x$. Constructing a tensor product in each Cartesian
|
|
direction, we can represent a 3D orbital as
|
|
\begin{equation}
|
|
\phi_n(x,y,z) =
|
|
\!\!\!\!\sum_{i'=i-1}^{i+2} \!\! b_x^{i'\!,3}(x)
|
|
\!\!\!\!\sum_{j'=j-1}^{j+2} \!\! b_y^{j'\!,3}(y)
|
|
\!\!\!\!\sum_{k'=k-1}^{k+2} \!\! b_z^{k'\!,3}(z) \,\, p_{i', j', k',n}.
|
|
\label{eq:TricubicValue}
|
|
\end{equation}
|
|
This allows the rapid evaluation of each orbital in constant time.
|
|
Furthermore, this basis is systematically improvable with a single spacing
|
|
parameter so that accuracy is not compromised compared with the plane wave basis.
|
|
|
|
The use of 3D tricubic B-splines greatly improves computational efficiency.
|
|
The gain in computation time from a plane wave basis set to an equivalent B-spline basis set
|
|
becomes increasingly large as the system size grows.
|
|
On the downside, this computational efficiency comes at
|
|
the expense of increased memory use, which is easily overcome, however, by the large
|
|
aggregate memory available per node through OpenMP/MPI hybrid QMC.
|
|
|
|
The input xml block for the spline SPOs is given in Listing~\ref{listing:splineSPOs}. A list of options is given in
|
|
Table~\ref{table:splineSPOs}.
|
|
|
|
\begin{table}[h]
|
|
\begin{center}
|
|
\begin{tabularx}{\textwidth}{l l l l l X }
|
|
\hline
|
|
\multicolumn{6}{l}{\texttt{determinantset} element} \\
|
|
\hline
|
|
\multicolumn{2}{l}{Parent elements:} & \multicolumn{4}{l}{\texttt{wavefunction}}\\
|
|
\multicolumn{2}{l}{Child elements:} & \multicolumn{4}{l}{\texttt{slaterdeterminant}}\\
|
|
\multicolumn{2}{l}{Attribute:} & \multicolumn{4}{l}{}\\
|
|
& \bfseries Name & \bfseries Datatype & \bfseries Values & \bfseries Default & \bfseries Description \\
|
|
& \texttt{type} & Text & Bspline & & Type of \texttt{sposet}. \\
|
|
& \texttt{href} & Text & & & Path to hdf5 file from pw2qmcpack.x. \\
|
|
& \texttt{tilematrix} & 9 integers & & & Tiling matrix used to expand supercell. \\
|
|
& \texttt{twistnum} & Integer & & & Index of the super twist. \\
|
|
& \texttt{twist} & 3 floats & & & Super twist. \\
|
|
& \texttt{meshfactor} & Float & $\le 1.0$ & & Grid spacing ratio. \\
|
|
& \texttt{precision} & Text & Single/double & & Precision of spline coefficients. \\
|
|
& \texttt{gpu} & Text & Yes/no & & GPU switch. \\
|
|
& \texttt{gpusharing} & Text & Yes/no & No & Share B-spline table across GPUs. \\
|
|
& \texttt{Spline\_Size\_Limit\_MB} & Integer & & & Limit B-spline table size on GPU. \\
|
|
& \texttt{check\_orb\_norm} & Text & Yes/no & Yes & Check norms of orbitals from h5 file. \\
|
|
& \texttt{save\_coefs} & Text & Yes/no & No & Save the spline coefficients to h5 file. \\
|
|
& \texttt{source} & Text & \textit{Any} & Ion0 & Particle set with atomic positions. \\
|
|
\hline
|
|
\end{tabularx}
|
|
\end{center}
|
|
\caption{Options for the \texttt{determinantset} xml-block associated with B-spline single particle orbital sets.}
|
|
\label{table:splineSPOs}
|
|
\end{table}
|
|
|
|
%%\begin{lstlisting}[caption=.]
|
|
%%<sposet_builder type="bspline" href="pwscf.h5" tilematrix="2 0 0 0 2 0 0 0 2" twistnum="0"
|
|
%% source="i" meshfactor="1.0" precision="float" truncate="no">
|
|
%% <sposet type="bspline" name="spo_ud" size="208" spindataset="0"/>
|
|
%%</sposet_builder>
|
|
%%<determinantset>
|
|
%% <slaterdeterminant>
|
|
%% <determinant id="updet" group="u" sposet="spo_ud" size="208"/>
|
|
%% <determinant id="downdet" group="d" sposet="spo_ud" size="208"/>
|
|
%% </slaterdeterminant>
|
|
%%</determinantset>
|
|
%%\end{lstlisting}
|
|
|
|
\begin{lstlisting}[style=QMCPXML,caption=Determinant set XML element.\label{listing:splineSPOs}]
|
|
<determinantset type="bspline" source="i" href="pwscf.h5"
|
|
tilematrix="1 1 3 1 2 -1 -2 1 0" twistnum="-1" gpu="yes" meshfactor="0.8"
|
|
twist="0 0 0" precision="double">
|
|
<slaterdeterminant>
|
|
<determinant id="updet" size="208">
|
|
<occupation mode="ground" spindataset="0">
|
|
</occupation>
|
|
</determinant>
|
|
<determinant id="downdet" size="208">
|
|
<occupation mode="ground" spindataset="0">
|
|
</occupation>
|
|
</determinant>
|
|
</slaterdeterminant>
|
|
</determinantset>
|
|
\end{lstlisting}
|
|
|
|
Additional information:
|
|
\begin{itemize}
|
|
\item \texttt{precision}. Only effective on CPU versions without mixed precision, ``single" is always imposed with mixed precision. Using single precision not only saves memory use but also speeds up the B-spline evaluation. We recommend using single precision since we saw little chance of really compromising the accuracy of calculation.
|
|
\item \texttt{meshfactor}. The ratio of actual grid spacing of B-splines used in QMC calculation with respect to the original one calculated from h5. A smaller meshfactor saves memory use but reduces accuracy. The effects are similar to reducing plane wave cutoff in DFT calculations. Use with caution!
|
|
\item \texttt{twistnum}. If positive, it is the index. We recommend not taking this way since the indexing might show some uncertainty. If negative, the super twist is referred by \texttt{twist}.
|
|
\item \texttt{save\_coefs}. If yes, dump the real-space B-spline coefficient table into an h5 file on the disk.
|
|
When the orbital transformation from k space to B-spline requires more than the available amount of scratch memory on the compute nodes,
|
|
users can perform this step on fat nodes and transfer back the h5 file for QMC calculations.
|
|
\item \texttt{gpusharing}. If enabled, spline data is shared across multiple GPUs on a given computational node. For example, on a
|
|
two-GPU-per-node system, each GPU would have half of the
|
|
orbitals. This enables larger overall spline tables than would normally fit in
|
|
the memory of individual GPUs to be used, potentially up to
|
|
the total GPU memory on a node. To obtain high performance, large
|
|
electron counts or a high-performing CPU-GPU interconnect is required.
|
|
To use this feature, the following needs to be done:
|
|
\begin{itemize}
|
|
\item The CUDA Multi-Process Service (MPS) needs to be used
|
|
(e.g., on Summit/SummitDev use ``-alloc\_flags gpumps" for
|
|
bsub). If MPS is not detected, sharing will be disabled.
|
|
\item CUDA\_VISIBLE\_DEVICES needs to be properly set to control each
|
|
rank's visible CUDA devices (e.g., on OLCF Summit/SummitDev one
|
|
needs to create a resource set containing all GPUs with the
|
|
respective number of ranks with ``jsrun --task-per-rs Ngpus -g
|
|
Ngpus").
|
|
\end{itemize}
|
|
\item \texttt{Spline\_Size\_Limit\_MB}. Allows distribution of the B-spline coefficient table between the host and GPU memory. The compute kernels access host memory via zero-copy. Although the performance penalty introduced by it is significant, it allows large calculations to go through.
|
|
\end{itemize}
|