quantum-espresso/Modules/mp_global.f90

137 lines
4.9 KiB
Fortran
Raw Normal View History

!
! Copyright (C) 2013 Quantum ESPRESSO group
! This file is distributed under the terms of the
! GNU General Public License. See the file `License'
! in the root directory of the present distribution,
! or http://www.gnu.org/copyleft/gpl.txt .
!
!----------------------------------------------------------------------------
MODULE mp_global
!----------------------------------------------------------------------------
!
! ... Wrapper module, for compatibility. Contains a few "leftover" variables
! ... used for checks (all the *_file variables, read from data file),
! ... plus the routine mp_startup initializing MPI, plus the
! ... routine mp_global_end stopping MPI.
! ... Do not use this module to reference variables (e.g. communicators)
! ... belonging to each of the various parallelization levels:
! ... use the specific modules instead
!
USE mp_world, ONLY: world_comm, mp_world_start, mp_world_end
USE mp_images
USE mp_pools
USE mp_bands
band group parallelization slightly modified to make it more flexible, and little more efficient. subroutine init_index_over_band ( comm, nbnd ) that set ibnd_start and ibnd_end variables requiring comm=inter_bgrp_comm is removed and replaced by subroutine set_bgrp_indices ( nbnd, ibnd_start, ibnd_end ) implementing the same relationships between its arguments but: - forcing the use of inter_bgrp_comm from the same mp_bands module, - returning ibnd_start and ibnd_end as explicit outputs that are not anymore kept in the module. In this way other quantities can be distributes if needed in any given routine without too many non-local effects. For compatibility with TDDFPT, that uses the bgrp parallelization and loads ibnd_start/ibnd_end trhough mp_global module, these two variables are moved in a dedicated module mp_bands_TDDFPT included in Module/mp_bands.f90. This is done to avoid too much invasive changes in a code i don't know well. In this way the needed changes are very localized and transparent, the code compiles correctly so I think it should work exactly as before. In my opinion the two variables should be moved somewhere inside TDDFPT. Band parallelization is extended to h_psi(lda,n,m,psi,hpsi) and s_psi routines (only when .not.exx_is_active because otherwise it is already used inside vexx) for generic values of m (of course it gives a speedup only when m is not too small compared to nbgrp but it works also if m < nbgrp ). Compatibility with task groups has not be explored but should not be conceptually different from how it works in the exx case. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@11835 c92efa57-630b-4861-b058-cf58834340f0
2015-11-07 08:06:40 +08:00
USE mp_bands_TDDFPT
USE mp_exx
USE mp_diag
USE mp_orthopools
!
IMPLICIT NONE
SAVE
!
! ... number of processors for the various groups: values read from file
!
INTEGER :: nproc_file = 1
INTEGER :: nproc_image_file = 1
INTEGER :: nproc_pool_file = 1
INTEGER :: nproc_ortho_file = 1
INTEGER :: nproc_bgrp_file = 1
INTEGER :: ntask_groups_file= 1
MAJOR restructuring of the FFTXlib library In real space processors are organized in a 2D pattern. Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes. In reciprocal space each processor owns Z-columns that belong to a sub set of X-values. This allows to split the processors in two sets for communication in the YZ and XY planes. In alternative, if the situation allows for it, a task group paralelization is used (with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc paralleization if task group can be used because a smaller number of larger ammounts of data are transferred. Hence three types of fft are implemented: ! !! ... isgn = +-1 : parallel 3d fft for rho and for the potential ! !! ... isgn = +-2 : parallel 3d fft for wavefunctions ! !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group ! !! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R) !! ... fft along z using pencils (cft_1z) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along x using pencils (cft_1x) ! !! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega !! ... fft along x using pencils (cft_1x) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along z using pencils (cft_1z) ! ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to ! dfft%nproc2, the number of Y-sections in which a plane is divided. ! Data are reshuffled by the fft_scatter_tg routine so that each of the ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes ! of a single wavefunciton. ! fft_type module heavily modified, a number of variables renamed with more intuitive names (at least to me), a number of more variables introduced for the Y-proc parallelization. Task_group module made void. task_group management is now reduced to the logical component fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc. In term of interfaces, the 'easy' calling sequences are SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany ) !! where: !! !! **grid_type = 'Dense'** : !! inverse/direct fourier transform of potentials and charge density f !! on the dense grid (dfftp). On output, f is overwritten !! !! **grid_type = 'Smooth'** : !! inverse/direct fourier transform of potentials and charge density f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Wave'** : !! inverse/direct fourier transform of wave functions f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'tgWave'** : !! inverse/direct fourier transform of wave functions f with task group !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Custom'** : !! inverse/direct fourier transform of potentials and charge density f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **grid_type = 'CustomWave'** : !! inverse/direct fourier transform of wave functions f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft. !! No check is performed on the correspondence between dfft and grid_type. !! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' !! from all other cases Many more files modified. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-02 04:31:02 +08:00
INTEGER :: nyfft_file= 1
!
CONTAINS
!
!-----------------------------------------------------------------------
SUBROUTINE mp_startup ( my_world_comm, start_images, diag_in_band_group, what_band_group )
!-----------------------------------------------------------------------
! ... This wrapper subroutine initializes all parallelization levels.
! ... If option with_images=.true., processes are organized into images,
! ... each performing a quasi-indipendent calculation, such as a point
! .. in configuration space (NEB) or a phonon irrep (PHonon)
! ... Within each image processes are further subdivided into various
! ... groups and parallelization levels
!
USE command_line_options, ONLY : get_command_line, &
MAJOR restructuring of the FFTXlib library In real space processors are organized in a 2D pattern. Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes. In reciprocal space each processor owns Z-columns that belong to a sub set of X-values. This allows to split the processors in two sets for communication in the YZ and XY planes. In alternative, if the situation allows for it, a task group paralelization is used (with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc paralleization if task group can be used because a smaller number of larger ammounts of data are transferred. Hence three types of fft are implemented: ! !! ... isgn = +-1 : parallel 3d fft for rho and for the potential ! !! ... isgn = +-2 : parallel 3d fft for wavefunctions ! !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group ! !! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R) !! ... fft along z using pencils (cft_1z) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along x using pencils (cft_1x) ! !! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega !! ... fft along x using pencils (cft_1x) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along z using pencils (cft_1z) ! ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to ! dfft%nproc2, the number of Y-sections in which a plane is divided. ! Data are reshuffled by the fft_scatter_tg routine so that each of the ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes ! of a single wavefunciton. ! fft_type module heavily modified, a number of variables renamed with more intuitive names (at least to me), a number of more variables introduced for the Y-proc parallelization. Task_group module made void. task_group management is now reduced to the logical component fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc. In term of interfaces, the 'easy' calling sequences are SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany ) !! where: !! !! **grid_type = 'Dense'** : !! inverse/direct fourier transform of potentials and charge density f !! on the dense grid (dfftp). On output, f is overwritten !! !! **grid_type = 'Smooth'** : !! inverse/direct fourier transform of potentials and charge density f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Wave'** : !! inverse/direct fourier transform of wave functions f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'tgWave'** : !! inverse/direct fourier transform of wave functions f with task group !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Custom'** : !! inverse/direct fourier transform of potentials and charge density f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **grid_type = 'CustomWave'** : !! inverse/direct fourier transform of wave functions f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft. !! No check is performed on the correspondence between dfft and grid_type. !! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' !! from all other cases Many more files modified. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-02 04:31:02 +08:00
nimage_, npool_, ndiag_, nband_, ntg_, nyfft_
USE parallel_include
!
IMPLICIT NONE
INTEGER, INTENT(IN), OPTIONAL :: my_world_comm
LOGICAL, INTENT(IN), OPTIONAL :: start_images
LOGICAL, INTENT(IN), OPTIONAL :: diag_in_band_group
INTEGER, INTENT(IN), OPTIONAL :: what_band_group
LOGICAL :: do_images
LOGICAL :: do_diag_in_band
INTEGER :: my_comm
INTEGER :: what_band_group_
LOGICAL :: do_distr_diag_inside_bgrp
!
my_comm = MPI_COMM_WORLD
IF ( PRESENT(my_world_comm) ) my_comm = my_world_comm
!
what_band_group_ = 1
IF( PRESENT( what_band_group ) ) THEN
what_band_group_ = what_band_group
END IF
!
CALL mp_world_start( my_comm )
CALL get_command_line ( )
!
do_images = .FALSE.
IF ( PRESENT(start_images) ) do_images = start_images
IF ( do_images ) THEN
CALL mp_start_images ( nimage_, world_comm )
ELSE
CALL mp_init_image ( world_comm )
END IF
!
CALL mp_start_pools ( npool_, intra_image_comm )
! Init orthopools is done during EXX bootstrap but, if they become more used, do it here:
! CALL mp_start_orthopools ( intra_image_comm )
MAJOR restructuring of the FFTXlib library In real space processors are organized in a 2D pattern. Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes. In reciprocal space each processor owns Z-columns that belong to a sub set of X-values. This allows to split the processors in two sets for communication in the YZ and XY planes. In alternative, if the situation allows for it, a task group paralelization is used (with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc paralleization if task group can be used because a smaller number of larger ammounts of data are transferred. Hence three types of fft are implemented: ! !! ... isgn = +-1 : parallel 3d fft for rho and for the potential ! !! ... isgn = +-2 : parallel 3d fft for wavefunctions ! !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group ! !! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R) !! ... fft along z using pencils (cft_1z) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along x using pencils (cft_1x) ! !! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega !! ... fft along x using pencils (cft_1x) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along z using pencils (cft_1z) ! ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to ! dfft%nproc2, the number of Y-sections in which a plane is divided. ! Data are reshuffled by the fft_scatter_tg routine so that each of the ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes ! of a single wavefunciton. ! fft_type module heavily modified, a number of variables renamed with more intuitive names (at least to me), a number of more variables introduced for the Y-proc parallelization. Task_group module made void. task_group management is now reduced to the logical component fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc. In term of interfaces, the 'easy' calling sequences are SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany ) !! where: !! !! **grid_type = 'Dense'** : !! inverse/direct fourier transform of potentials and charge density f !! on the dense grid (dfftp). On output, f is overwritten !! !! **grid_type = 'Smooth'** : !! inverse/direct fourier transform of potentials and charge density f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Wave'** : !! inverse/direct fourier transform of wave functions f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'tgWave'** : !! inverse/direct fourier transform of wave functions f with task group !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Custom'** : !! inverse/direct fourier transform of potentials and charge density f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **grid_type = 'CustomWave'** : !! inverse/direct fourier transform of wave functions f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft. !! No check is performed on the correspondence between dfft and grid_type. !! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' !! from all other cases Many more files modified. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-02 04:31:02 +08:00
CALL mp_start_bands ( nband_, ntg_, nyfft_, intra_pool_comm )
CALL mp_start_exx ( nband_, ntg_, intra_pool_comm )
!
do_diag_in_band = .FALSE.
IF ( PRESENT(diag_in_band_group) ) do_diag_in_band = diag_in_band_group
!
IF( negrp.gt.1 .or. do_diag_in_band ) THEN
! used to be the default : one diag group per bgrp ! with strict hierarchy: POOL > BAND > DIAG
! if using exx groups from mp_exx still use this diag method
my_comm = intra_bgrp_comm
ELSE
! new default: one diag group per pool ( individual k-point level )
! with band group and diag group both being children of POOL comm
my_comm = intra_pool_comm
END IF
do_distr_diag_inside_bgrp = (negrp.gt.1) .or. do_diag_in_band
CALL mp_start_diag ( ndiag_, my_comm, do_distr_diag_inside_bgrp )
!
call set_mpi_comm_4_cg( intra_pool_comm, intra_bgrp_comm, inter_bgrp_comm )
call set_mpi_comm_4_davidson( intra_pool_comm, intra_bgrp_comm, inter_bgrp_comm )
!
RETURN
!
END SUBROUTINE mp_startup
!
!-----------------------------------------------------------------------
SUBROUTINE mp_global_end ( )
!-----------------------------------------------------------------------
!
USE mp, ONLY : mp_comm_free
!
! CALL clean_ortho_group ( )
CALL unset_mpi_comm_4_cg()
CALL unset_mpi_comm_4_davidson()
CALL mp_comm_free ( intra_bgrp_comm )
CALL mp_comm_free ( inter_bgrp_comm )
CALL mp_comm_free ( intra_pool_comm )
CALL mp_comm_free ( inter_pool_comm )
CALL mp_world_end( )
CALL mp_stop_orthopools( ) ! cleans orthopools if used in exx
!
RETURN
!
END SUBROUTINE mp_global_end
!
END MODULE mp_global