-splitting rotate_wfc_* and adding rotate_Hpsi_* into a DENSE diagonalization dir
-removing cg_param, david_param, ... in favour of util_param
-implementation of ParO
-update of PW, UtilXlib, FFTXlib and install needed for compatibility
The general Davidson routine cegterg used internally wavefunction-like arrays
that have three indices: plane waves, polarization, bands. This has no real
motivation (historical maybe?) and differs from the rest of QE where
wavefunctions with two indices (plane waves+polarization, bands) are used.
In my opinion, the "gap" between the two sets of plane waves/polarizations
should also be removed (that is: the 2*npw plane waves/polarizations should
be consecutive, not with a "gap" in the middle as it is now) but this is a
much more serious change, affecting many different parts of the code.
=============================
Harmonization of three copies of desc_init (two more are in KS_Solvers/PPCG,
plus two slightly different ones in Davidson diagonalization), with some
changes for clarity (in my opinion); harmonization of various copies of
compute_distmat and of calbec_[dz]distmat.
In my opinion all these routines, plus several simolar ones that are either
present in multiple copies or that can be easily harmonized, used in parallel
diagonalization, should be moved somewhere else, preferably LAXlib/.
The problem now is that they are CONTAINed so they use and set variables from
the calling subroutine and may use arrays passed as arrays (with :); moving
them to a separate routine requires an interface, moving them into a module may
lead to undesired dependencies. Ideally one should be able to set up and
diagonalize a distributed matrix without filling the code of calls to
low-level LAXlib routines and without too much voodoo.
now QE do not "use" modules of LAXlib any longer, but it just include interface blocks.
In principle they can now be compiled independently.
All this beside possible errors.
Further clean-ups are now possible, within LAXlib and in QE source codes
threaded_memXXX is contains a parallel do region
threaded_barrier_memXXX contains do region without parallel
threaded_nowait_memXXX contains do region without parallel and a nowait at the end do
Duplicate module mp_bands.f90 moved from KS_Solvers/XX to UtilXlib/mp_bands_util.f90
Makefiles and makedepend.sh updated
that should take care of the duplicate symbols
git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13712 c92efa57-630b-4861-b058-cf58834340f0
In real space processors are organized in a 2D pattern.
Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes.
In reciprocal space each processor owns Z-columns that belong to a sub set of
X-values. This allows to split the processors in two sets for communication
in the YZ and XY planes.
In alternative, if the situation allows for it, a task group paralelization is used
(with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier
trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc
paralleization if task group can be used because a smaller number of larger ammounts of
data are transferred. Hence three types of fft are implemented:
!
!! ... isgn = +-1 : parallel 3d fft for rho and for the potential
!
!! ... isgn = +-2 : parallel 3d fft for wavefunctions
!
!! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group
!
!! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R)
!! ... fft along z using pencils (cft_1z)
!! ... transpose across nodes (fft_scatter_yz)
!! ... fft along y using pencils (cft_1y)
!! ... transpose across nodes (fft_scatter_xy)
!! ... fft along x using pencils (cft_1x)
!
!! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega
!! ... fft along x using pencils (cft_1x)
!! ... transpose across nodes (fft_scatter_xy)
!! ... fft along y using pencils (cft_1y)
!! ... transpose across nodes (fft_scatter_yz)
!! ... fft along z using pencils (cft_1z)
!
! If task_group_fft_is_active the FFT acts on a number of wfcs equal to
! dfft%nproc2, the number of Y-sections in which a plane is divided.
! Data are reshuffled by the fft_scatter_tg routine so that each of the
! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes
! of a single wavefunciton.
!
fft_type module heavily modified, a number of variables renamed with more intuitive names
(at least to me), a number of more variables introduced for the Y-proc parallelization.
Task_group module made void. task_group management is now reduced to the logical component
fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc.
In term of interfaces, the 'easy' calling sequences are
SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany )
!! where:
!!
!! **grid_type = 'Dense'** :
!! inverse/direct fourier transform of potentials and charge density f
!! on the dense grid (dfftp). On output, f is overwritten
!!
!! **grid_type = 'Smooth'** :
!! inverse/direct fourier transform of potentials and charge density f
!! on the smooth grid (dffts). On output, f is overwritten
!!
!! **grid_type = 'Wave'** :
!! inverse/direct fourier transform of wave functions f
!! on the smooth grid (dffts). On output, f is overwritten
!!
!! **grid_type = 'tgWave'** :
!! inverse/direct fourier transform of wave functions f with task group
!! on the smooth grid (dffts). On output, f is overwritten
!!
!! **grid_type = 'Custom'** :
!! inverse/direct fourier transform of potentials and charge density f
!! on a custom grid (dfft_exx). On output, f is overwritten
!!
!! **grid_type = 'CustomWave'** :
!! inverse/direct fourier transform of wave functions f
!! on a custom grid (dfft_exx). On output, f is overwritten
!!
!! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft.
!! No check is performed on the correspondence between dfft and grid_type.
!! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave'
!! from all other cases
Many more files modified.
git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0