quantum-espresso/Modules/mp_bands.f90

141 lines
4.7 KiB
Fortran
Raw Normal View History

!
! Copyright (C) 2013 Quantum ESPRESSO group
! This file is distributed under the terms of the
! GNU General Public License. See the file `License'
! in the root directory of the present distribution,
! or http://www.gnu.org/copyleft/gpl.txt .
!
!----------------------------------------------------------------------------
MODULE mp_bands
!----------------------------------------------------------------------------
2021-02-24 23:14:14 +08:00
!! Band groups (processors within a pool of bands).
!! Subdivision of pool group, used for parallelization over bands.
!
USE mp, ONLY : mp_barrier, mp_bcast, mp_size, mp_rank, mp_comm_split
USE parallel_include
!
IMPLICIT NONE
SAVE
!
2021-02-24 23:14:14 +08:00
INTEGER :: nbgrp = 1
!! number of band groups
INTEGER :: nproc_bgrp = 1
!! number of processors within a band group
INTEGER :: me_bgrp = 0
!! index of the processor within a band group
INTEGER :: root_bgrp = 0
!! index of the root processor within a band group
INTEGER :: my_bgrp_id = 0
!! index of my band group
INTEGER :: root_bgrp_id = 0
!! index of root band group
INTEGER :: inter_bgrp_comm = 0
!! inter band group communicator
INTEGER :: intra_bgrp_comm = 0
!! intra band group communicator
!
LOGICAL :: use_bgrp_in_hpsi = .FALSE.
2021-02-24 23:14:14 +08:00
!! TRUE if band parallelization is performed inside \( H\psi \)
!! and \( S\psi \), FALSE otherwise (band parallelization can be
!! performed outside \( H\psi \) and \( S\psi \) though).
!
! ... "task" groups (for band parallelization of FFT)
!
2021-02-24 23:14:14 +08:00
INTEGER :: ntask_groups = 1
!! number of proc. in an orbital "task group"
!
MAJOR restructuring of the FFTXlib library In real space processors are organized in a 2D pattern. Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes. In reciprocal space each processor owns Z-columns that belong to a sub set of X-values. This allows to split the processors in two sets for communication in the YZ and XY planes. In alternative, if the situation allows for it, a task group paralelization is used (with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc paralleization if task group can be used because a smaller number of larger ammounts of data are transferred. Hence three types of fft are implemented: ! !! ... isgn = +-1 : parallel 3d fft for rho and for the potential ! !! ... isgn = +-2 : parallel 3d fft for wavefunctions ! !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group ! !! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R) !! ... fft along z using pencils (cft_1z) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along x using pencils (cft_1x) ! !! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega !! ... fft along x using pencils (cft_1x) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along z using pencils (cft_1z) ! ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to ! dfft%nproc2, the number of Y-sections in which a plane is divided. ! Data are reshuffled by the fft_scatter_tg routine so that each of the ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes ! of a single wavefunciton. ! fft_type module heavily modified, a number of variables renamed with more intuitive names (at least to me), a number of more variables introduced for the Y-proc parallelization. Task_group module made void. task_group management is now reduced to the logical component fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc. In term of interfaces, the 'easy' calling sequences are SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany ) !! where: !! !! **grid_type = 'Dense'** : !! inverse/direct fourier transform of potentials and charge density f !! on the dense grid (dfftp). On output, f is overwritten !! !! **grid_type = 'Smooth'** : !! inverse/direct fourier transform of potentials and charge density f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Wave'** : !! inverse/direct fourier transform of wave functions f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'tgWave'** : !! inverse/direct fourier transform of wave functions f with task group !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Custom'** : !! inverse/direct fourier transform of potentials and charge density f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **grid_type = 'CustomWave'** : !! inverse/direct fourier transform of wave functions f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft. !! No check is performed on the correspondence between dfft and grid_type. !! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' !! from all other cases Many more files modified. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-02 04:31:02 +08:00
! ... "nyfft" groups (to push FFT parallelization beyond the nz-planes limit)
2021-02-24 23:14:14 +08:00
!
INTEGER :: nyfft = 1
!! number of y-fft groups. By default =1, i.e. y-ffts are done by a
!! single proc.
MAJOR restructuring of the FFTXlib library In real space processors are organized in a 2D pattern. Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes. In reciprocal space each processor owns Z-columns that belong to a sub set of X-values. This allows to split the processors in two sets for communication in the YZ and XY planes. In alternative, if the situation allows for it, a task group paralelization is used (with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc paralleization if task group can be used because a smaller number of larger ammounts of data are transferred. Hence three types of fft are implemented: ! !! ... isgn = +-1 : parallel 3d fft for rho and for the potential ! !! ... isgn = +-2 : parallel 3d fft for wavefunctions ! !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group ! !! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R) !! ... fft along z using pencils (cft_1z) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along x using pencils (cft_1x) ! !! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega !! ... fft along x using pencils (cft_1x) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along z using pencils (cft_1z) ! ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to ! dfft%nproc2, the number of Y-sections in which a plane is divided. ! Data are reshuffled by the fft_scatter_tg routine so that each of the ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes ! of a single wavefunciton. ! fft_type module heavily modified, a number of variables renamed with more intuitive names (at least to me), a number of more variables introduced for the Y-proc parallelization. Task_group module made void. task_group management is now reduced to the logical component fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc. In term of interfaces, the 'easy' calling sequences are SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany ) !! where: !! !! **grid_type = 'Dense'** : !! inverse/direct fourier transform of potentials and charge density f !! on the dense grid (dfftp). On output, f is overwritten !! !! **grid_type = 'Smooth'** : !! inverse/direct fourier transform of potentials and charge density f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Wave'** : !! inverse/direct fourier transform of wave functions f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'tgWave'** : !! inverse/direct fourier transform of wave functions f with task group !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Custom'** : !! inverse/direct fourier transform of potentials and charge density f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **grid_type = 'CustomWave'** : !! inverse/direct fourier transform of wave functions f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft. !! No check is performed on the correspondence between dfft and grid_type. !! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' !! from all other cases Many more files modified. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-02 04:31:02 +08:00
!
CONTAINS
!
!----------------------------------------------------------------------------
MAJOR restructuring of the FFTXlib library In real space processors are organized in a 2D pattern. Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes. In reciprocal space each processor owns Z-columns that belong to a sub set of X-values. This allows to split the processors in two sets for communication in the YZ and XY planes. In alternative, if the situation allows for it, a task group paralelization is used (with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc paralleization if task group can be used because a smaller number of larger ammounts of data are transferred. Hence three types of fft are implemented: ! !! ... isgn = +-1 : parallel 3d fft for rho and for the potential ! !! ... isgn = +-2 : parallel 3d fft for wavefunctions ! !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group ! !! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R) !! ... fft along z using pencils (cft_1z) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along x using pencils (cft_1x) ! !! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega !! ... fft along x using pencils (cft_1x) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along z using pencils (cft_1z) ! ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to ! dfft%nproc2, the number of Y-sections in which a plane is divided. ! Data are reshuffled by the fft_scatter_tg routine so that each of the ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes ! of a single wavefunciton. ! fft_type module heavily modified, a number of variables renamed with more intuitive names (at least to me), a number of more variables introduced for the Y-proc parallelization. Task_group module made void. task_group management is now reduced to the logical component fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc. In term of interfaces, the 'easy' calling sequences are SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany ) !! where: !! !! **grid_type = 'Dense'** : !! inverse/direct fourier transform of potentials and charge density f !! on the dense grid (dfftp). On output, f is overwritten !! !! **grid_type = 'Smooth'** : !! inverse/direct fourier transform of potentials and charge density f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Wave'** : !! inverse/direct fourier transform of wave functions f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'tgWave'** : !! inverse/direct fourier transform of wave functions f with task group !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Custom'** : !! inverse/direct fourier transform of potentials and charge density f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **grid_type = 'CustomWave'** : !! inverse/direct fourier transform of wave functions f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft. !! No check is performed on the correspondence between dfft and grid_type. !! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' !! from all other cases Many more files modified. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-02 04:31:02 +08:00
SUBROUTINE mp_start_bands( nband_, ntg_, nyfft_, parent_comm )
2021-02-24 23:14:14 +08:00
!--------------------------------------------------------------------------
!! Divide processors (of the "parent_comm" group) into nband_ pools.
!
IMPLICIT NONE
!
2021-02-24 23:14:14 +08:00
INTEGER, INTENT(IN) :: nband_
!! it must have been previously read from command line argument
!! by a call to routine \(\texttt{get\_command\_line}\).
INTEGER, INTENT(IN) :: parent_comm
!! typically processors of a k-point pool (intra\_pool\_comm)
MAJOR restructuring of the FFTXlib library In real space processors are organized in a 2D pattern. Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes. In reciprocal space each processor owns Z-columns that belong to a sub set of X-values. This allows to split the processors in two sets for communication in the YZ and XY planes. In alternative, if the situation allows for it, a task group paralelization is used (with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc paralleization if task group can be used because a smaller number of larger ammounts of data are transferred. Hence three types of fft are implemented: ! !! ... isgn = +-1 : parallel 3d fft for rho and for the potential ! !! ... isgn = +-2 : parallel 3d fft for wavefunctions ! !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group ! !! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R) !! ... fft along z using pencils (cft_1z) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along x using pencils (cft_1x) ! !! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega !! ... fft along x using pencils (cft_1x) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along z using pencils (cft_1z) ! ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to ! dfft%nproc2, the number of Y-sections in which a plane is divided. ! Data are reshuffled by the fft_scatter_tg routine so that each of the ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes ! of a single wavefunciton. ! fft_type module heavily modified, a number of variables renamed with more intuitive names (at least to me), a number of more variables introduced for the Y-proc parallelization. Task_group module made void. task_group management is now reduced to the logical component fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc. In term of interfaces, the 'easy' calling sequences are SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany ) !! where: !! !! **grid_type = 'Dense'** : !! inverse/direct fourier transform of potentials and charge density f !! on the dense grid (dfftp). On output, f is overwritten !! !! **grid_type = 'Smooth'** : !! inverse/direct fourier transform of potentials and charge density f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Wave'** : !! inverse/direct fourier transform of wave functions f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'tgWave'** : !! inverse/direct fourier transform of wave functions f with task group !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Custom'** : !! inverse/direct fourier transform of potentials and charge density f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **grid_type = 'CustomWave'** : !! inverse/direct fourier transform of wave functions f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft. !! No check is performed on the correspondence between dfft and grid_type. !! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' !! from all other cases Many more files modified. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-02 04:31:02 +08:00
INTEGER, INTENT(IN), OPTIONAL :: ntg_, nyfft_
!
INTEGER :: parent_nproc = 1, parent_mype = 0
!
#if defined (__MPI)
!
parent_nproc = mp_size( parent_comm )
parent_mype = mp_rank( parent_comm )
!
nbgrp = nband_
!
IF ( nbgrp < 1 .OR. nbgrp > parent_nproc ) CALL errore( 'mp_start_bands',&
'invalid number of band groups, out of range', 1 )
IF ( MOD( parent_nproc, nbgrp ) /= 0 ) CALL errore( 'mp_start_bands', &
'n. of band groups must be divisor of parent_nproc', 1 )
band group parallelization slightly modified to make it more flexible, and little more efficient. subroutine init_index_over_band ( comm, nbnd ) that set ibnd_start and ibnd_end variables requiring comm=inter_bgrp_comm is removed and replaced by subroutine set_bgrp_indices ( nbnd, ibnd_start, ibnd_end ) implementing the same relationships between its arguments but: - forcing the use of inter_bgrp_comm from the same mp_bands module, - returning ibnd_start and ibnd_end as explicit outputs that are not anymore kept in the module. In this way other quantities can be distributes if needed in any given routine without too many non-local effects. For compatibility with TDDFPT, that uses the bgrp parallelization and loads ibnd_start/ibnd_end trhough mp_global module, these two variables are moved in a dedicated module mp_bands_TDDFPT included in Module/mp_bands.f90. This is done to avoid too much invasive changes in a code i don't know well. In this way the needed changes are very localized and transparent, the code compiles correctly so I think it should work exactly as before. In my opinion the two variables should be moved somewhere inside TDDFPT. Band parallelization is extended to h_psi(lda,n,m,psi,hpsi) and s_psi routines (only when .not.exx_is_active because otherwise it is already used inside vexx) for generic values of m (of course it gives a speedup only when m is not too small compared to nbgrp but it works also if m < nbgrp ). Compatibility with task groups has not be explored but should not be conceptually different from how it works in the exx case. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@11835 c92efa57-630b-4861-b058-cf58834340f0
2015-11-07 08:06:40 +08:00
!
! set logical flag so that band parallelization in H\psi is allowed
! (can be disabled before calling H\psi if not desired)
band group parallelization slightly modified to make it more flexible, and little more efficient. subroutine init_index_over_band ( comm, nbnd ) that set ibnd_start and ibnd_end variables requiring comm=inter_bgrp_comm is removed and replaced by subroutine set_bgrp_indices ( nbnd, ibnd_start, ibnd_end ) implementing the same relationships between its arguments but: - forcing the use of inter_bgrp_comm from the same mp_bands module, - returning ibnd_start and ibnd_end as explicit outputs that are not anymore kept in the module. In this way other quantities can be distributes if needed in any given routine without too many non-local effects. For compatibility with TDDFPT, that uses the bgrp parallelization and loads ibnd_start/ibnd_end trhough mp_global module, these two variables are moved in a dedicated module mp_bands_TDDFPT included in Module/mp_bands.f90. This is done to avoid too much invasive changes in a code i don't know well. In this way the needed changes are very localized and transparent, the code compiles correctly so I think it should work exactly as before. In my opinion the two variables should be moved somewhere inside TDDFPT. Band parallelization is extended to h_psi(lda,n,m,psi,hpsi) and s_psi routines (only when .not.exx_is_active because otherwise it is already used inside vexx) for generic values of m (of course it gives a speedup only when m is not too small compared to nbgrp but it works also if m < nbgrp ). Compatibility with task groups has not be explored but should not be conceptually different from how it works in the exx case. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@11835 c92efa57-630b-4861-b058-cf58834340f0
2015-11-07 08:06:40 +08:00
!
use_bgrp_in_hpsi = ( nbgrp > 1 )
!
! ... Set number of processors per band group
!
nproc_bgrp = parent_nproc / nbgrp
!
! ... set index of band group for this processor ( 0 : nbgrp - 1 )
!
my_bgrp_id = parent_mype / nproc_bgrp
!
! ... set index of processor within the image ( 0 : nproc_image - 1 )
!
me_bgrp = MOD( parent_mype, nproc_bgrp )
!
CALL mp_barrier( parent_comm )
!
! ... the intra_bgrp_comm communicator is created
!
CALL mp_comm_split( parent_comm, my_bgrp_id, parent_mype, intra_bgrp_comm )
!
CALL mp_barrier( parent_comm )
!
! ... the inter_bgrp_comm communicator is created
!
CALL mp_comm_split( parent_comm, me_bgrp, parent_mype, inter_bgrp_comm )
!
IF ( PRESENT(ntg_) ) THEN
ntask_groups = ntg_
END IF
MAJOR restructuring of the FFTXlib library In real space processors are organized in a 2D pattern. Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes. In reciprocal space each processor owns Z-columns that belong to a sub set of X-values. This allows to split the processors in two sets for communication in the YZ and XY planes. In alternative, if the situation allows for it, a task group paralelization is used (with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc paralleization if task group can be used because a smaller number of larger ammounts of data are transferred. Hence three types of fft are implemented: ! !! ... isgn = +-1 : parallel 3d fft for rho and for the potential ! !! ... isgn = +-2 : parallel 3d fft for wavefunctions ! !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group ! !! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R) !! ... fft along z using pencils (cft_1z) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along x using pencils (cft_1x) ! !! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega !! ... fft along x using pencils (cft_1x) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along z using pencils (cft_1z) ! ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to ! dfft%nproc2, the number of Y-sections in which a plane is divided. ! Data are reshuffled by the fft_scatter_tg routine so that each of the ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes ! of a single wavefunciton. ! fft_type module heavily modified, a number of variables renamed with more intuitive names (at least to me), a number of more variables introduced for the Y-proc parallelization. Task_group module made void. task_group management is now reduced to the logical component fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc. In term of interfaces, the 'easy' calling sequences are SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany ) !! where: !! !! **grid_type = 'Dense'** : !! inverse/direct fourier transform of potentials and charge density f !! on the dense grid (dfftp). On output, f is overwritten !! !! **grid_type = 'Smooth'** : !! inverse/direct fourier transform of potentials and charge density f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Wave'** : !! inverse/direct fourier transform of wave functions f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'tgWave'** : !! inverse/direct fourier transform of wave functions f with task group !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Custom'** : !! inverse/direct fourier transform of potentials and charge density f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **grid_type = 'CustomWave'** : !! inverse/direct fourier transform of wave functions f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft. !! No check is performed on the correspondence between dfft and grid_type. !! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' !! from all other cases Many more files modified. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-02 04:31:02 +08:00
IF ( PRESENT(nyfft_) ) THEN
nyfft = nyfft_
END IF
call errore('mp_bands',' nyfft value incompatible with nproc_bgrp ', MOD(nproc_bgrp, nyfft) )
!
#endif
RETURN
!
END SUBROUTINE mp_start_bands
!
END MODULE mp_bands
band group parallelization slightly modified to make it more flexible, and little more efficient. subroutine init_index_over_band ( comm, nbnd ) that set ibnd_start and ibnd_end variables requiring comm=inter_bgrp_comm is removed and replaced by subroutine set_bgrp_indices ( nbnd, ibnd_start, ibnd_end ) implementing the same relationships between its arguments but: - forcing the use of inter_bgrp_comm from the same mp_bands module, - returning ibnd_start and ibnd_end as explicit outputs that are not anymore kept in the module. In this way other quantities can be distributes if needed in any given routine without too many non-local effects. For compatibility with TDDFPT, that uses the bgrp parallelization and loads ibnd_start/ibnd_end trhough mp_global module, these two variables are moved in a dedicated module mp_bands_TDDFPT included in Module/mp_bands.f90. This is done to avoid too much invasive changes in a code i don't know well. In this way the needed changes are very localized and transparent, the code compiles correctly so I think it should work exactly as before. In my opinion the two variables should be moved somewhere inside TDDFPT. Band parallelization is extended to h_psi(lda,n,m,psi,hpsi) and s_psi routines (only when .not.exx_is_active because otherwise it is already used inside vexx) for generic values of m (of course it gives a speedup only when m is not too small compared to nbgrp but it works also if m < nbgrp ). Compatibility with task groups has not be explored but should not be conceptually different from how it works in the exx case. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@11835 c92efa57-630b-4861-b058-cf58834340f0
2015-11-07 08:06:40 +08:00
!
!
MODULE mp_bands_TDDFPT
2021-03-17 02:18:59 +08:00
!! Starting and ending band indexes in bgrp parallelization.
! NB: These two variables used to be in mp_bands and are loaded from mp_global in TDDFPT
! I think they would better stay in a TDDFPT specific module but leave them here not to
! be too invasive on a code I don't know well. SdG
!
2021-02-24 23:14:14 +08:00
INTEGER :: ibnd_start = 0
!! starting band index used in bgrp parallelization.
INTEGER :: ibnd_end = 0
!! ending band index used in bgrp parallelization.
band group parallelization slightly modified to make it more flexible, and little more efficient. subroutine init_index_over_band ( comm, nbnd ) that set ibnd_start and ibnd_end variables requiring comm=inter_bgrp_comm is removed and replaced by subroutine set_bgrp_indices ( nbnd, ibnd_start, ibnd_end ) implementing the same relationships between its arguments but: - forcing the use of inter_bgrp_comm from the same mp_bands module, - returning ibnd_start and ibnd_end as explicit outputs that are not anymore kept in the module. In this way other quantities can be distributes if needed in any given routine without too many non-local effects. For compatibility with TDDFPT, that uses the bgrp parallelization and loads ibnd_start/ibnd_end trhough mp_global module, these two variables are moved in a dedicated module mp_bands_TDDFPT included in Module/mp_bands.f90. This is done to avoid too much invasive changes in a code i don't know well. In this way the needed changes are very localized and transparent, the code compiles correctly so I think it should work exactly as before. In my opinion the two variables should be moved somewhere inside TDDFPT. Band parallelization is extended to h_psi(lda,n,m,psi,hpsi) and s_psi routines (only when .not.exx_is_active because otherwise it is already used inside vexx) for generic values of m (of course it gives a speedup only when m is not too small compared to nbgrp but it works also if m < nbgrp ). Compatibility with task groups has not be explored but should not be conceptually different from how it works in the exx case. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@11835 c92efa57-630b-4861-b058-cf58834340f0
2015-11-07 08:06:40 +08:00
!
END MODULE mp_bands_TDDFPT
!