2003-01-20 05:58:50 +08:00
|
|
|
!
|
2013-01-28 17:21:12 +08:00
|
|
|
! Copyright (C) 2013 Quantum ESPRESSO group
|
2003-01-20 05:58:50 +08:00
|
|
|
! This file is distributed under the terms of the
|
|
|
|
! GNU General Public License. See the file `License'
|
|
|
|
! in the root directory of the present distribution,
|
|
|
|
! or http://www.gnu.org/copyleft/gpl.txt .
|
|
|
|
!
|
2004-03-24 17:36:50 +08:00
|
|
|
!----------------------------------------------------------------------------
|
|
|
|
MODULE mp_global
|
|
|
|
!----------------------------------------------------------------------------
|
2021-02-25 18:42:39 +08:00
|
|
|
!! Wrapper module, for compatibility. Contains a few "leftover" variables
|
|
|
|
!! used for checks (all the *_file variables, read from data file),
|
|
|
|
!! plus the routine mp_startup initializing MPI and the command line,
|
|
|
|
!! plus the routine mp_global_end stopping MPI.
|
|
|
|
!! Do not use this module to reference variables (e.g. communicators)
|
|
|
|
!! belonging to each of the various parallelization levels:
|
|
|
|
!! use the specific modules instead.
|
2017-11-10 23:05:24 +08:00
|
|
|
! ... PLEASE DO NOT ADD NEW STUFF TO THIS MODULE. Removing stuff is ok.
|
2013-01-28 17:21:12 +08:00
|
|
|
!
|
2017-07-28 20:32:24 +08:00
|
|
|
USE mp_world, ONLY: world_comm, mp_world_start, mp_world_end
|
2013-01-28 17:21:12 +08:00
|
|
|
USE mp_images
|
|
|
|
USE mp_pools
|
|
|
|
USE mp_bands
|
band group parallelization slightly modified to make it more flexible, and little
more efficient.
subroutine init_index_over_band ( comm, nbnd ) that set ibnd_start and ibnd_end
variables requiring comm=inter_bgrp_comm is removed and replaced by
subroutine set_bgrp_indices ( nbnd, ibnd_start, ibnd_end ) implementing the same
relationships between its arguments but:
- forcing the use of inter_bgrp_comm from the same mp_bands module,
- returning ibnd_start and ibnd_end as explicit outputs that are not anymore kept
in the module. In this way other quantities can be distributes if needed in any
given routine without too many non-local effects.
For compatibility with TDDFPT, that uses the bgrp parallelization and loads
ibnd_start/ibnd_end trhough mp_global module, these two variables are moved in
a dedicated module mp_bands_TDDFPT included in Module/mp_bands.f90. This is done
to avoid too much invasive changes in a code i don't know well. In this way the
needed changes are very localized and transparent, the code compiles correctly
so I think it should work exactly as before.
In my opinion the two variables should be moved somewhere inside TDDFPT.
Band parallelization is extended to h_psi(lda,n,m,psi,hpsi) and s_psi routines
(only when .not.exx_is_active because otherwise it is already used inside vexx)
for generic values of m (of course it gives a speedup only when m is not too small
compared to nbgrp but it works also if m < nbgrp ).
Compatibility with task groups has not be explored but should not be conceptually
different from how it works in the exx case.
git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@11835 c92efa57-630b-4861-b058-cf58834340f0
2015-11-07 08:06:40 +08:00
|
|
|
USE mp_bands_TDDFPT
|
2017-02-08 01:39:25 +08:00
|
|
|
USE mp_exx
|
2016-10-25 17:37:19 +08:00
|
|
|
USE mp_orthopools
|
2004-03-24 17:36:50 +08:00
|
|
|
!
|
2004-03-25 00:30:44 +08:00
|
|
|
IMPLICIT NONE
|
2004-03-24 17:36:50 +08:00
|
|
|
SAVE
|
|
|
|
!
|
2013-01-28 17:21:12 +08:00
|
|
|
! ... number of processors for the various groups: values read from file
|
2011-01-19 04:07:00 +08:00
|
|
|
!
|
2013-01-28 17:21:12 +08:00
|
|
|
INTEGER :: nproc_file = 1
|
|
|
|
INTEGER :: nproc_image_file = 1
|
|
|
|
INTEGER :: nproc_pool_file = 1
|
|
|
|
INTEGER :: nproc_ortho_file = 1
|
|
|
|
INTEGER :: nproc_bgrp_file = 1
|
|
|
|
INTEGER :: ntask_groups_file= 1
|
MAJOR restructuring of the FFTXlib library
In real space processors are organized in a 2D pattern.
Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes.
In reciprocal space each processor owns Z-columns that belong to a sub set of
X-values. This allows to split the processors in two sets for communication
in the YZ and XY planes.
In alternative, if the situation allows for it, a task group paralelization is used
(with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier
trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc
paralleization if task group can be used because a smaller number of larger ammounts of
data are transferred. Hence three types of fft are implemented:
!
!! ... isgn = +-1 : parallel 3d fft for rho and for the potential
!
!! ... isgn = +-2 : parallel 3d fft for wavefunctions
!
!! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group
!
!! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R)
!! ... fft along z using pencils (cft_1z)
!! ... transpose across nodes (fft_scatter_yz)
!! ... fft along y using pencils (cft_1y)
!! ... transpose across nodes (fft_scatter_xy)
!! ... fft along x using pencils (cft_1x)
!
!! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega
!! ... fft along x using pencils (cft_1x)
!! ... transpose across nodes (fft_scatter_xy)
!! ... fft along y using pencils (cft_1y)
!! ... transpose across nodes (fft_scatter_yz)
!! ... fft along z using pencils (cft_1z)
!
! If task_group_fft_is_active the FFT acts on a number of wfcs equal to
! dfft%nproc2, the number of Y-sections in which a plane is divided.
! Data are reshuffled by the fft_scatter_tg routine so that each of the
! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes
! of a single wavefunciton.
!
fft_type module heavily modified, a number of variables renamed with more intuitive names
(at least to me), a number of more variables introduced for the Y-proc parallelization.
Task_group module made void. task_group management is now reduced to the logical component
fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc.
In term of interfaces, the 'easy' calling sequences are
SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany )
!! where:
!!
!! **grid_type = 'Dense'** :
!! inverse/direct fourier transform of potentials and charge density f
!! on the dense grid (dfftp). On output, f is overwritten
!!
!! **grid_type = 'Smooth'** :
!! inverse/direct fourier transform of potentials and charge density f
!! on the smooth grid (dffts). On output, f is overwritten
!!
!! **grid_type = 'Wave'** :
!! inverse/direct fourier transform of wave functions f
!! on the smooth grid (dffts). On output, f is overwritten
!!
!! **grid_type = 'tgWave'** :
!! inverse/direct fourier transform of wave functions f with task group
!! on the smooth grid (dffts). On output, f is overwritten
!!
!! **grid_type = 'Custom'** :
!! inverse/direct fourier transform of potentials and charge density f
!! on a custom grid (dfft_exx). On output, f is overwritten
!!
!! **grid_type = 'CustomWave'** :
!! inverse/direct fourier transform of wave functions f
!! on a custom grid (dfft_exx). On output, f is overwritten
!!
!! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft.
!! No check is performed on the correspondence between dfft and grid_type.
!! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave'
!! from all other cases
Many more files modified.
git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-02 04:31:02 +08:00
|
|
|
INTEGER :: nyfft_file= 1
|
2004-03-24 17:36:50 +08:00
|
|
|
!
|
2009-10-19 01:48:34 +08:00
|
|
|
CONTAINS
|
|
|
|
!
|
|
|
|
!-----------------------------------------------------------------------
|
2019-02-08 04:06:21 +08:00
|
|
|
SUBROUTINE mp_startup ( my_world_comm, start_images )
|
2009-10-19 01:48:34 +08:00
|
|
|
!-----------------------------------------------------------------------
|
2021-02-25 18:42:39 +08:00
|
|
|
!! This wrapper subroutine initializes all parallelization levels.
|
|
|
|
!! If option with_images=.true., processes are organized into images,
|
|
|
|
!! each performing a quasi-indipendent calculation, such as a point
|
|
|
|
!! in configuration space (NEB) or a phonon irrep (PHonon).
|
|
|
|
!! Within each image processes are further subdivided into various
|
|
|
|
!! groups and parallelization levels.
|
|
|
|
!
|
|
|
|
! IMPORTANT NOTICE 1: since the command line is read here, it may be
|
|
|
|
! convenient to call it in serial execution as well
|
|
|
|
! IMPORTANT NOTICE 2: most parallelization levels are initialized here
|
|
|
|
! but at least some will be moved to a later stage
|
2009-10-19 01:48:34 +08:00
|
|
|
!
|
2013-01-29 18:31:17 +08:00
|
|
|
USE command_line_options, ONLY : get_command_line, &
|
2019-02-08 04:06:21 +08:00
|
|
|
nimage_, npool_, nband_, ntg_, nyfft_
|
2013-09-23 20:57:14 +08:00
|
|
|
USE parallel_include
|
|
|
|
!
|
2009-10-19 01:48:34 +08:00
|
|
|
IMPLICIT NONE
|
2013-09-23 20:57:14 +08:00
|
|
|
INTEGER, INTENT(IN), OPTIONAL :: my_world_comm
|
2013-09-26 23:47:31 +08:00
|
|
|
LOGICAL, INTENT(IN), OPTIONAL :: start_images
|
2013-10-07 16:10:00 +08:00
|
|
|
LOGICAL :: do_images
|
2017-08-06 14:05:34 +08:00
|
|
|
INTEGER :: my_comm
|
2013-09-23 20:57:14 +08:00
|
|
|
!
|
|
|
|
my_comm = MPI_COMM_WORLD
|
|
|
|
IF ( PRESENT(my_world_comm) ) my_comm = my_world_comm
|
|
|
|
!
|
2013-10-27 17:09:10 +08:00
|
|
|
CALL mp_world_start( my_comm )
|
2013-01-28 17:21:12 +08:00
|
|
|
CALL get_command_line ( )
|
2013-01-19 23:02:11 +08:00
|
|
|
!
|
2013-09-23 20:57:14 +08:00
|
|
|
do_images = .FALSE.
|
|
|
|
IF ( PRESENT(start_images) ) do_images = start_images
|
2012-10-09 00:31:09 +08:00
|
|
|
IF ( do_images ) THEN
|
2013-01-29 18:31:17 +08:00
|
|
|
CALL mp_start_images ( nimage_, world_comm )
|
2012-10-09 00:31:09 +08:00
|
|
|
ELSE
|
2013-01-28 17:21:12 +08:00
|
|
|
CALL mp_init_image ( world_comm )
|
2009-10-19 01:48:34 +08:00
|
|
|
END IF
|
|
|
|
!
|
2015-08-21 01:29:18 +08:00
|
|
|
CALL mp_start_pools ( npool_, intra_image_comm )
|
2017-11-10 23:05:24 +08:00
|
|
|
! Init orthopools is done during EXX bootstrap but,
|
|
|
|
! if they become more used, do it here:
|
2016-10-25 17:37:19 +08:00
|
|
|
! CALL mp_start_orthopools ( intra_image_comm )
|
2019-07-10 00:10:55 +08:00
|
|
|
#if defined (__CUDA_OPTIMIZED)
|
|
|
|
CALL mp_start_bands ( 1 , ntg_, nyfft_, intra_pool_comm )
|
|
|
|
#else
|
|
|
|
CALL mp_start_bands ( nband_, ntg_, nyfft_, intra_pool_comm )
|
|
|
|
#endif
|
MAJOR restructuring of the FFTXlib library
In real space processors are organized in a 2D pattern.
Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes.
In reciprocal space each processor owns Z-columns that belong to a sub set of
X-values. This allows to split the processors in two sets for communication
in the YZ and XY planes.
In alternative, if the situation allows for it, a task group paralelization is used
(with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier
trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc
paralleization if task group can be used because a smaller number of larger ammounts of
data are transferred. Hence three types of fft are implemented:
!
!! ... isgn = +-1 : parallel 3d fft for rho and for the potential
!
!! ... isgn = +-2 : parallel 3d fft for wavefunctions
!
!! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group
!
!! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R)
!! ... fft along z using pencils (cft_1z)
!! ... transpose across nodes (fft_scatter_yz)
!! ... fft along y using pencils (cft_1y)
!! ... transpose across nodes (fft_scatter_xy)
!! ... fft along x using pencils (cft_1x)
!
!! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega
!! ... fft along x using pencils (cft_1x)
!! ... transpose across nodes (fft_scatter_xy)
!! ... fft along y using pencils (cft_1y)
!! ... transpose across nodes (fft_scatter_yz)
!! ... fft along z using pencils (cft_1z)
!
! If task_group_fft_is_active the FFT acts on a number of wfcs equal to
! dfft%nproc2, the number of Y-sections in which a plane is divided.
! Data are reshuffled by the fft_scatter_tg routine so that each of the
! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes
! of a single wavefunciton.
!
fft_type module heavily modified, a number of variables renamed with more intuitive names
(at least to me), a number of more variables introduced for the Y-proc parallelization.
Task_group module made void. task_group management is now reduced to the logical component
fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc.
In term of interfaces, the 'easy' calling sequences are
SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany )
!! where:
!!
!! **grid_type = 'Dense'** :
!! inverse/direct fourier transform of potentials and charge density f
!! on the dense grid (dfftp). On output, f is overwritten
!!
!! **grid_type = 'Smooth'** :
!! inverse/direct fourier transform of potentials and charge density f
!! on the smooth grid (dffts). On output, f is overwritten
!!
!! **grid_type = 'Wave'** :
!! inverse/direct fourier transform of wave functions f
!! on the smooth grid (dffts). On output, f is overwritten
!!
!! **grid_type = 'tgWave'** :
!! inverse/direct fourier transform of wave functions f with task group
!! on the smooth grid (dffts). On output, f is overwritten
!!
!! **grid_type = 'Custom'** :
!! inverse/direct fourier transform of potentials and charge density f
!! on a custom grid (dfft_exx). On output, f is overwritten
!!
!! **grid_type = 'CustomWave'** :
!! inverse/direct fourier transform of wave functions f
!! on a custom grid (dfft_exx). On output, f is overwritten
!!
!! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft.
!! No check is performed on the correspondence between dfft and grid_type.
!! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave'
!! from all other cases
Many more files modified.
git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-02 04:31:02 +08:00
|
|
|
CALL mp_start_exx ( nband_, ntg_, intra_pool_comm )
|
2015-11-23 08:50:43 +08:00
|
|
|
!
|
2009-10-19 01:48:34 +08:00
|
|
|
RETURN
|
|
|
|
!
|
|
|
|
END SUBROUTINE mp_startup
|
2010-12-29 08:16:26 +08:00
|
|
|
!
|
2013-10-27 17:09:10 +08:00
|
|
|
!-----------------------------------------------------------------------
|
|
|
|
SUBROUTINE mp_global_end ( )
|
|
|
|
!-----------------------------------------------------------------------
|
|
|
|
!
|
2013-12-28 17:29:14 +08:00
|
|
|
USE mp, ONLY : mp_comm_free
|
|
|
|
!
|
|
|
|
CALL mp_comm_free ( intra_bgrp_comm )
|
|
|
|
CALL mp_comm_free ( inter_bgrp_comm )
|
|
|
|
CALL mp_comm_free ( intra_pool_comm )
|
|
|
|
CALL mp_comm_free ( inter_pool_comm )
|
2017-08-24 22:55:40 +08:00
|
|
|
CALL mp_stop_orthopools( ) ! cleans orthopools if used in exx
|
2017-09-09 00:31:17 +08:00
|
|
|
CALL mp_world_end( )
|
2013-10-27 17:09:10 +08:00
|
|
|
!
|
2014-01-12 16:28:19 +08:00
|
|
|
RETURN
|
|
|
|
!
|
2013-10-27 17:09:10 +08:00
|
|
|
END SUBROUTINE mp_global_end
|
|
|
|
!
|
2004-03-24 17:36:50 +08:00
|
|
|
END MODULE mp_global
|