mirror of https://github.com/abinit/abinit.git
369 lines
18 KiB
Plaintext
369 lines
18 KiB
Plaintext
This directory contains documentation of the development of the
|
|
routines which implement the nonlocal pseudopotential operations in
|
|
response-function (RF) strain calculations. These were written by D.
|
|
R. Hamann in 2002-2003, and the were first included in the 4.2 release.
|
|
While conceptually straightforward, the expressions to be evaluated are
|
|
highly complex, and the only practical way to develop the code was to
|
|
use a symbolic-manipulation program and automated post-processing.
|
|
These tools are collected here, and their structure and usage are
|
|
explained as an aid to future developers who embark on related
|
|
projects.
|
|
|
|
|
|
*********************** INTRODUCTION *********************************
|
|
|
|
The underlying theoretical approach to treating strain within
|
|
density-functional perturbation theory is presented in strainpert.pdf,
|
|
which is a preprint that has been submitted to Phys. Rev. B. It is
|
|
also posted on arXiv.org as cond-mat/0409269. Before proceeding
|
|
further, it would be well to read this at least through Sec. III-E.
|
|
The equation numbers below refer to this preprint.
|
|
|
|
The application of the nonlocal psp's in ground-state (GS)
|
|
calculations to find wavefunctions, energies, forces, and stresses was
|
|
developed by D. C. Allan. He introduced the tensor formulation, Eq.
|
|
(51), and routines implementing Eq. (52) for nonlocal wave-function
|
|
operations, and the ground-state expectation values of Eq. (54) for
|
|
stresses. His approach was extended to RF atomic-displacement
|
|
calculations by Allan and X. Gonze. The necessary expressions were
|
|
derived and coded by hand in routines metcon, cont13, cont22, cont24,
|
|
etc. all in directory src/03nonlocal. These are all called by nonlop,
|
|
which is used in both GS and RF calculations. The new RF strain
|
|
routines were designed to parallel these existing routines as closely
|
|
as possible in their operations, arguments, and structures within which
|
|
they are called. The new "machine-coded" routines are the following:
|
|
|
|
To evaluate the action of the first strain derivatives of the psp's on
|
|
wave functions, Eq. (54):
|
|
|
|
metstr.F90
|
|
|
|
To evaluate the zero-order wave function expectation values of strain-
|
|
strain second derivatives, Eq. (55):
|
|
|
|
contstr21.F90
|
|
contstr22.F90
|
|
contstr23.F90
|
|
contstr24.F90
|
|
contstr25a.F90
|
|
contstr25.F90
|
|
contstr26.F90
|
|
|
|
To evaluate the zero-order wave function expectation values of mixed
|
|
strain -- atomic-displacement second derivatives ("internal strain"),
|
|
Eq. (56):
|
|
|
|
contistr01.F90
|
|
contistr03.F90
|
|
contistr10.F90
|
|
contistr12.F90
|
|
contistr21.F90
|
|
contistr30.F90
|
|
|
|
As discussed in Sec. III-E, the derivative nonlocal operators
|
|
couple "right (|K>)" and "left (<K'|)" tensors of different ranks. The
|
|
"active" arguments aa and bb of all these routines are complex one-
|
|
dimensional arrays indexed by a dictionary order of the Eq. (51)
|
|
tensors, which have been summed over the wave-function c coefficients
|
|
in Eq. (33) with appropriate f form factors and their derivatives as in
|
|
Eqs. (48), (54-56). In metstr, the bb argument is an output array of
|
|
the same structure to be used in forming the argument "vectout" of
|
|
nonlop. Retaining the established calling structure, separate calling
|
|
statements are invoked for each particular right-to-left change in
|
|
tensor rank. (These are noted in the comments in nonlop.) For metstr,
|
|
the argument "iterm" determines the rank coupling. However, the
|
|
second-derivative expressions are so complicated that each different
|
|
rank coupling has been coded in a separate routine in the contstr*,
|
|
contistr* series.
|
|
|
|
|
|
********************** MATHEMATICA PROGRAMS **************************
|
|
|
|
The challenging task in creating these routines was to find the C
|
|
coefficients in Eqs. (54-56). For each set of subscripts indicating a
|
|
particular pair of left and right tensor components, C is a different
|
|
polynomial in certain subsets of the 3X3 metric-tensor and metric-
|
|
tensor-derivative elements. While hand computation of the many
|
|
thousands of required polynomials is essentially impossible, the
|
|
symbolic manipulation capabilities of Mathematica (henceforth "M") in
|
|
fact enable this to be carried out in just a few lines of "core" M
|
|
code. The following four M programs are included in this directory:
|
|
|
|
make_metstr3.m
|
|
make_met2str.m
|
|
make_metintstr.m
|
|
make_metintstrv.m
|
|
|
|
These routines are all very similar. I will discuss make_met2str.m in
|
|
detail, and note the differences for the others. Some experience with
|
|
M must be assumed at this point, and it would probably be well to have
|
|
a reference at hand (eg. S. Wolfram, Mathematica, 3rd ed.,
|
|
(Wolfram Media, Champaign, IL, 1996). A later edition should do. One
|
|
need not necessarily know how to write programs in M to follow the
|
|
subsequent discussion -- experience using it interactively should
|
|
suffice.
|
|
|
|
Make_met2str.m evaluates the C coefficients in Eq. (55), the
|
|
strain-strain second derivative term. First, let's discuss the use of
|
|
this program. From an interactive M window, one first loads the file,
|
|
executing the command
|
|
|
|
<<make_met2str.m
|
|
|
|
This defines a function MakeMet2str[iterm,outfile], and one then
|
|
executes the following series of commands,
|
|
|
|
MakeMet2str[1,"d2term1.out"]
|
|
MakeMet2str[2,"d2term2.out"]
|
|
...
|
|
MakeMet2str[6,"d2term6.out"]
|
|
|
|
(where the outfile names are obviously arbitrary). A few hundred lines
|
|
of d2term1.out are included in this directory to give an idea of what
|
|
the output looks like (there are 13,063 lines total!).
|
|
|
|
Turning to the program itself, I will refer to line numbers:
|
|
|
|
Lines 1-23
|
|
These are a comment which was carried over from the first routine
|
|
of the series, metstr3.m, and does not accurately reflect what the
|
|
current routine does. However, it gives the general idea.
|
|
|
|
Lines 25-28
|
|
The function name, some M program conventions, and a list of
|
|
{private variables}.
|
|
|
|
Lines 35-165
|
|
A dictionary of tensor products of k components, extending
|
|
Allan's scheme to (agh!) 7th rank. This is a two-index M List. The
|
|
indices following each line are comments for the programmer's benefit.
|
|
|
|
Lines 168-298
|
|
The same thing for kp (<K'| in Eq. (55)).
|
|
|
|
Lines 306-308
|
|
Definitions of the vectors k and kp as Lists of 3 components.
|
|
|
|
Lines 314-315
|
|
The metric tensor m is defined as a 3X3 List of components, each
|
|
of which is a function of two strain variables s1 and s2. There are
|
|
only 6 functions, not 9, building in the required symmetry of the
|
|
metric tensor. The functions m11[s1,s2], etc. have not been defined.
|
|
Therefore, M will represent their derivatives symbolically, which is
|
|
what we want.
|
|
|
|
Lines 317,319
|
|
These define dot products and squared magnitudes of our vectors
|
|
following Eq. (23). Recall that k, kp are reduced wave vectors, which
|
|
is why the m comes in.
|
|
|
|
Line 376
|
|
This List defines the modified Legendre polynomials in Eq. (49).
|
|
|
|
Line 326
|
|
The start of a do loop over rank (the loop control is at its end
|
|
in M, line 369). rank is the equivalent of the angular momentum l.
|
|
|
|
Lines 328-337
|
|
This is a List of 6 possible polynomials "poly," each formed from
|
|
derivatives of the squared magnitudes of k and/or kp and/or of the
|
|
Legendre's, grouped so that each of the 6 represents a particular
|
|
increment between input and output tensor rank. Each invocation of the
|
|
routine only uses one of these based on the "iterm" input variable for
|
|
reasons of code structure discussed in the introduction. This is where
|
|
the formal derivatives of the undefined metric tensor component
|
|
functions are introduced.
|
|
|
|
Lines 339,340
|
|
These 6-term lists specify the input-output rank couplings, once
|
|
again selected by the input iterm.
|
|
|
|
Lines 345-346
|
|
This outputs some text and indices for the Fortran code.
|
|
|
|
Lines 348-349
|
|
The number of elements of the input and output tensors for the
|
|
current rank and iterm.
|
|
|
|
Lines 354-357
|
|
The start of a double do loop on input and output tensor indices,
|
|
and THE ONLY SIGNIFICANT COMPUTATION IN THE ENTIRE PROGRAM! This is
|
|
where M does in a few lines (and a few to a few ten's of computer
|
|
minutes on a fairly old PIII machine) the work of a hundred graduate
|
|
students for a hundred years. From the horrendously complex poly, the
|
|
"Coefficient" function extracts the coefficient of the product of the
|
|
specified pair of k, kp tensor elements. It is smart enough to ignore
|
|
the order of the tensor components in poly, and probably does a pretty
|
|
good job of collecting terms and identifying common subexpressions.
|
|
Feeding the result to the "Simplify" function further cleans up and
|
|
shortens the expression, probably better than could be done by hand.
|
|
|
|
Lines 358-365
|
|
The results are written to the output file. Note that cm is the
|
|
C in Eq. (55), with the m, m' subscript indices. The l subscript and
|
|
nu, nu' pair are respectively rank and iterm. The superscript indices
|
|
belong to the metric-tensor derivative components of C, which will be
|
|
evaluated when the Fortran routines are executed. The # symbol here and
|
|
elsewhere in the output is a flag to be used in post-processing.
|
|
|
|
Lines 367-369
|
|
These are the controls at the ends of the triply-nested do loops.
|
|
|
|
Differences in the other M routines are as follows.
|
|
make_metintstr.m evaluates C's in Eq. (56) for mixed strain -- atomic-
|
|
displacement derivatives. This is similar to make_met2str.m in almost
|
|
all respects. The difference is that the List poly is defined differently.
|
|
Here, we have only one strain derivative. Since the atomic-displacement
|
|
derivative here is being taken with respect to a reduced atomic
|
|
coordinate, this derivative simply brings down a reduced k or kp
|
|
component (from the phase or "structure factor" exponential) without
|
|
any metric tensor components. This and the rank-coupling Lists above
|
|
have 5 members in this case. The C coefficients are output as members
|
|
of a 3-dimensional cm array here, their first index being the atomic-
|
|
displacement direction and the second and third m, m' as above.
|
|
|
|
make_metintstrv.m is the routine that was actually used for the
|
|
internal strain routines. The difference is explained in the comment
|
|
lines 31-36.
|
|
|
|
make_metstr3.m is historically the first of this set. It
|
|
evaluates C's in Eq. (54), the first strain derivative expression. The
|
|
structure here is slightly different in that "rank" is an input
|
|
variable and "iterm" is an (unrolled) internal loop. The 3 poly's are
|
|
defined on lines 333, 364, and 395. Since there were some differences
|
|
in the text to be output for the Fortran code, it was easier to write
|
|
the "iterm" loop as unrolled. The C coefficient cm is written as a 4-
|
|
index array here, since it will be saved in the Fortran routine metstr
|
|
and used for many executions. After loading the file, MakeMetstr
|
|
should be executed with the input variable rank set to 1, 2, and 3 (p,
|
|
d, and f) and the resulting output files concatenated befor post-
|
|
processing. It appears from the comments to have been necessary to do
|
|
rank = 0 by hand (something probably blew up). This was a finite task.
|
|
|
|
************************* POST PROCESSING ****************************
|
|
|
|
In case you haven't looked at the sample of d2term1.out, do so
|
|
now. The M output for the formal derivatives of the undefined metric
|
|
tensor component functions, "Derivative(1,0)(m11)(s1,s1)," etc. is not
|
|
Fortran. The shell scripts fort_fix3.scr, fort_2fix.scr, and
|
|
fort_fixis.scr post-process the raw M output for the C's in Eqs. (54),
|
|
(55), and (56) respectively. There are basically two sections to each
|
|
script. First, sed is invoked (with an in-line editing script) to
|
|
substitute Fortran-friendly (and much shorter) notations for the metric
|
|
tensor and metric-tensor derivative elements. Each of these scripts
|
|
replaces the file on which it operates (which is specified as an inline
|
|
argument, eg. "./fort_2fix.scr d2term1.out"), so it is a good idea to
|
|
make a copy first.
|
|
|
|
sed continues with some housekeeping. Multiplication by 1 as an
|
|
operation is eliminated. Where M has put floating-point number with no
|
|
decimal places, they are made into integers (eg. 6. -> 6). Where there
|
|
are decimals, "d0" is appended to make sure we don't get single-
|
|
precision. (Note that integers, of which there are lots, will not
|
|
interfere with double-precision evaluation of these expressions.) The
|
|
one place we don't want integers is as divisors, so we restore double
|
|
precision (eg. /48. -> /48 -> /48.d0).
|
|
|
|
Now we have cleaned-up, acceptable Fortran statements, but have
|
|
many short lines which do not have proper free-form F90 continuation.
|
|
The sed output is saved to a temporary file and further processed by
|
|
format_code.x, the executable compiled from format_code.c, to fix the
|
|
continuation and "compactify" the Fortran. format_code.c simply copies
|
|
its standard input to its standard output one character at a time, but
|
|
deletes input newline characters. It counts and tests characters, and
|
|
inserts newlines, continuation &'s, and spaces to produce reasonably
|
|
uniform line lengths and follow Abinit indentation rules. It breaks
|
|
lines after )'s or the last of several adjacent )'s in an attempt to
|
|
make the code at least somewhat readable, not that anyone would really
|
|
want to. It also makes use of the "#" characters inserted in the M
|
|
programs to flag the start of each new "cm(i,j)=" statement.
|
|
|
|
The post-processed result of the excerpt of d2term1.out is shown
|
|
in d2term1.fort, and starts at line 97 in contstr21. This and the
|
|
other full routines start by computing the needed metric tensor
|
|
derivatives, evaluate all the necessary cm terms, save them for re-use
|
|
where possible, and do the actual operations on the aa and bb array
|
|
arguments at the very end.
|
|
|
|
The post-processing produced nearly error-free code. There was a
|
|
very small amount of hand-editing, no more than a few lines in each
|
|
routine at most to fix a few things that caused compiler errors. While
|
|
the c program inserted a warning comment (for possible hand editing) in
|
|
places when the number of continuation lines exceeded the Fortran 90
|
|
Standard maximum, none of the compilers that have been used since this
|
|
has been part of Abinit seemed to mind, so these comments were deleted.
|
|
|
|
********************** INTERFACE WITH NONLOP *************************
|
|
|
|
nonlop is a long and complicated routine, so a broad-brush
|
|
outline of what it does is in order. Basically, it has two halves. If
|
|
the input variable signs==1, only the first half is executed and
|
|
expectation values of the nonlocal psp's are output. If signs=2, both
|
|
halves are executed; the first to evaluate the "right" side of the
|
|
separable nonlocal potential (or its derivatives) operating on the
|
|
input wave function, and the second half to deliver plane wave matrix
|
|
elements of the "left" side based on coefficients evaluated in the
|
|
first half.
|
|
|
|
The computation of (form factors X tensors), (FFXT), acting on the
|
|
input wave function is done in the opernl* routines. (These routines
|
|
all do the same thing, but are coded differently for better performance
|
|
with different computer architectures.) The same routines also have an
|
|
"output" mode if needed in the second half of nonlop, but the discussion
|
|
below will focus on the "input" mode.
|
|
|
|
The opernl3 routine has a loop over wave-vector blocks, and calls
|
|
the routine mkffkg3 inside this loop. mkffkg3 internally loops over the
|
|
wave vectors in the current block with index ipw. The required
|
|
products of tensors, form factors, and form factor derivatives vary
|
|
widely depending on what is being done (the "choice" input variable),
|
|
and the number of projectors for each angular momentum for the current
|
|
type of atom. The first thing mkffkg3 does is to compute the required
|
|
tensor products of components each wave vector in the block, the
|
|
kpgx(ipw,ii) array. This corresponds exactly to the tensor definitions
|
|
in make_met2str.m (regarding ii as a one-dimensional index of the tnk
|
|
List). The remainder of the program packs the leading dimension of the
|
|
output array ffkg(iffkg,ipw) with FFXT's in an order determined by a
|
|
number of if's and do loops. There is no simple mapping of the
|
|
implicit composite of indices into iffkg. However, referring to the
|
|
tnk definition at the start of make_met2str.m, ffkg has sequential
|
|
groups of elements corresponding to the full range of j in each (*i,j*)
|
|
set.
|
|
|
|
opernl3 forms dot products on the plane wave index of ffkg and
|
|
the wave function (including a phase factor), running over the entire
|
|
index range iffkg=1,nffkg without regard to the composite nature of the
|
|
iffkg index. This is the most time-consuming part, and justifies the
|
|
arcane packing of ffkg. The remainder of the routine PARTIALLY UNPACKS
|
|
the dot-product(iffkg) array into several output arrays, mimicking the
|
|
if and do structure of mkffkg to recreate the composite identity of the
|
|
iffkg index. The reader is directed to the OUTPUT description in
|
|
opernl3 for details. The lead index of these arrays generally
|
|
corresponds to a segment of 1,nffkg.
|
|
|
|
Now we come to the rationale for the detailed discussion of the
|
|
last few paragraphs. Segments of the output arrays of operln3 are the
|
|
aa, bb arguments of our new routines for strain-perturbation RF
|
|
calculations. It was necessary to expand mkffkg3 to include the new
|
|
FFXT's needed for these calculations, and to expand the set of output
|
|
arrays and the unpacking part of opernl3 accordingly. The calling
|
|
statements for the strain routines must comply with the iffkg structure
|
|
(in its partially unpacked form) to pass the proper array segments as
|
|
aa, bb arguments.
|
|
|
|
The other significant addition to implement the nonlocal strain
|
|
operations was the need to extend mkffnl, the routine which computes
|
|
the Eq. (48) form factors and their derivatives, to include the second
|
|
derivatives. This is called by a number of routines prior to calling
|
|
nonlop.
|
|
|
|
We have discussed the "input" actions of opernl3, which produce
|
|
expectation values. The "output" mode producing wave functions utilizes
|
|
only one of the new routines, metstr, and further extensions of opernl3
|
|
(in the sign==-1, choice==3 section) to do the required "partial
|
|
repacking" into the full iffkg index range.
|
|
|
|
The alternative routines opernl2 (calling mkffkg) and opernl4a,
|
|
opernl4b function in the same manner as opernl3 and mkffkg3, with
|
|
differences such as index ordering and loop unrolling.
|