quantum-espresso/Modules/mp_bands.f90

130 lines
4.7 KiB
Fortran
Raw Normal View History

!
! Copyright (C) 2013 Quantum ESPRESSO group
! This file is distributed under the terms of the
! GNU General Public License. See the file `License'
! in the root directory of the present distribution,
! or http://www.gnu.org/copyleft/gpl.txt .
!
!----------------------------------------------------------------------------
MODULE mp_bands
!----------------------------------------------------------------------------
!
USE mp, ONLY : mp_barrier, mp_bcast, mp_size, mp_rank, mp_comm_split
USE parallel_include
!
IMPLICIT NONE
SAVE
!
! ... Band groups (processors within a pool of bands)
! ... Subdivision of pool group, used for parallelization over bands
!
INTEGER :: nbgrp = 1 ! number of band groups
INTEGER :: nproc_bgrp = 1 ! number of processors within a band group
INTEGER :: me_bgrp = 0 ! index of the processor within a band group
INTEGER :: root_bgrp = 0 ! index of the root processor within a band group
INTEGER :: my_bgrp_id = 0 ! index of my band group
INTEGER :: root_bgrp_id = 0 ! index of root band group
INTEGER :: inter_bgrp_comm = 0 ! inter band group communicator
INTEGER :: intra_bgrp_comm = 0 ! intra band group communicator
! Next variable is .T. if band parallelization is performed inside H\psi
! and S\psi, .F. otherwise (band parallelization can be performed outside
! H\psi and S\psi, though)
LOGICAL :: use_bgrp_in_hpsi = .FALSE.
!
! ... "task" groups (for band parallelization of FFT)
!
INTEGER :: ntask_groups = 1 ! number of proc. in an orbital "task group"
!
MAJOR restructuring of the FFTXlib library In real space processors are organized in a 2D pattern. Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes. In reciprocal space each processor owns Z-columns that belong to a sub set of X-values. This allows to split the processors in two sets for communication in the YZ and XY planes. In alternative, if the situation allows for it, a task group paralelization is used (with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc paralleization if task group can be used because a smaller number of larger ammounts of data are transferred. Hence three types of fft are implemented: ! !! ... isgn = +-1 : parallel 3d fft for rho and for the potential ! !! ... isgn = +-2 : parallel 3d fft for wavefunctions ! !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group ! !! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R) !! ... fft along z using pencils (cft_1z) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along x using pencils (cft_1x) ! !! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega !! ... fft along x using pencils (cft_1x) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along z using pencils (cft_1z) ! ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to ! dfft%nproc2, the number of Y-sections in which a plane is divided. ! Data are reshuffled by the fft_scatter_tg routine so that each of the ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes ! of a single wavefunciton. ! fft_type module heavily modified, a number of variables renamed with more intuitive names (at least to me), a number of more variables introduced for the Y-proc parallelization. Task_group module made void. task_group management is now reduced to the logical component fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc. In term of interfaces, the 'easy' calling sequences are SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany ) !! where: !! !! **grid_type = 'Dense'** : !! inverse/direct fourier transform of potentials and charge density f !! on the dense grid (dfftp). On output, f is overwritten !! !! **grid_type = 'Smooth'** : !! inverse/direct fourier transform of potentials and charge density f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Wave'** : !! inverse/direct fourier transform of wave functions f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'tgWave'** : !! inverse/direct fourier transform of wave functions f with task group !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Custom'** : !! inverse/direct fourier transform of potentials and charge density f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **grid_type = 'CustomWave'** : !! inverse/direct fourier transform of wave functions f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft. !! No check is performed on the correspondence between dfft and grid_type. !! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' !! from all other cases Many more files modified. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-02 04:31:02 +08:00
! ... "nyfft" groups (to push FFT parallelization beyond the nz-planes limit)
INTEGER :: nyfft = 1 ! number of y-fft groups. By default =1, i.e. y-ffts are done by a single proc
!
CONTAINS
!
!----------------------------------------------------------------------------
MAJOR restructuring of the FFTXlib library In real space processors are organized in a 2D pattern. Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes. In reciprocal space each processor owns Z-columns that belong to a sub set of X-values. This allows to split the processors in two sets for communication in the YZ and XY planes. In alternative, if the situation allows for it, a task group paralelization is used (with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc paralleization if task group can be used because a smaller number of larger ammounts of data are transferred. Hence three types of fft are implemented: ! !! ... isgn = +-1 : parallel 3d fft for rho and for the potential ! !! ... isgn = +-2 : parallel 3d fft for wavefunctions ! !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group ! !! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R) !! ... fft along z using pencils (cft_1z) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along x using pencils (cft_1x) ! !! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega !! ... fft along x using pencils (cft_1x) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along z using pencils (cft_1z) ! ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to ! dfft%nproc2, the number of Y-sections in which a plane is divided. ! Data are reshuffled by the fft_scatter_tg routine so that each of the ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes ! of a single wavefunciton. ! fft_type module heavily modified, a number of variables renamed with more intuitive names (at least to me), a number of more variables introduced for the Y-proc parallelization. Task_group module made void. task_group management is now reduced to the logical component fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc. In term of interfaces, the 'easy' calling sequences are SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany ) !! where: !! !! **grid_type = 'Dense'** : !! inverse/direct fourier transform of potentials and charge density f !! on the dense grid (dfftp). On output, f is overwritten !! !! **grid_type = 'Smooth'** : !! inverse/direct fourier transform of potentials and charge density f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Wave'** : !! inverse/direct fourier transform of wave functions f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'tgWave'** : !! inverse/direct fourier transform of wave functions f with task group !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Custom'** : !! inverse/direct fourier transform of potentials and charge density f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **grid_type = 'CustomWave'** : !! inverse/direct fourier transform of wave functions f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft. !! No check is performed on the correspondence between dfft and grid_type. !! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' !! from all other cases Many more files modified. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-02 04:31:02 +08:00
SUBROUTINE mp_start_bands( nband_, ntg_, nyfft_, parent_comm )
!---------------------------------------------------------------------------
!
! ... Divide processors (of the "parent_comm" group) into nband_ pools
! ... Requires: nband_, read from command line
! ... parent_comm, typically processors of a k-point pool
! ... (intra_pool_comm)
!
IMPLICIT NONE
!
INTEGER, INTENT(IN) :: nband_, parent_comm
MAJOR restructuring of the FFTXlib library In real space processors are organized in a 2D pattern. Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes. In reciprocal space each processor owns Z-columns that belong to a sub set of X-values. This allows to split the processors in two sets for communication in the YZ and XY planes. In alternative, if the situation allows for it, a task group paralelization is used (with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc paralleization if task group can be used because a smaller number of larger ammounts of data are transferred. Hence three types of fft are implemented: ! !! ... isgn = +-1 : parallel 3d fft for rho and for the potential ! !! ... isgn = +-2 : parallel 3d fft for wavefunctions ! !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group ! !! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R) !! ... fft along z using pencils (cft_1z) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along x using pencils (cft_1x) ! !! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega !! ... fft along x using pencils (cft_1x) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along z using pencils (cft_1z) ! ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to ! dfft%nproc2, the number of Y-sections in which a plane is divided. ! Data are reshuffled by the fft_scatter_tg routine so that each of the ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes ! of a single wavefunciton. ! fft_type module heavily modified, a number of variables renamed with more intuitive names (at least to me), a number of more variables introduced for the Y-proc parallelization. Task_group module made void. task_group management is now reduced to the logical component fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc. In term of interfaces, the 'easy' calling sequences are SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany ) !! where: !! !! **grid_type = 'Dense'** : !! inverse/direct fourier transform of potentials and charge density f !! on the dense grid (dfftp). On output, f is overwritten !! !! **grid_type = 'Smooth'** : !! inverse/direct fourier transform of potentials and charge density f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Wave'** : !! inverse/direct fourier transform of wave functions f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'tgWave'** : !! inverse/direct fourier transform of wave functions f with task group !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Custom'** : !! inverse/direct fourier transform of potentials and charge density f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **grid_type = 'CustomWave'** : !! inverse/direct fourier transform of wave functions f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft. !! No check is performed on the correspondence between dfft and grid_type. !! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' !! from all other cases Many more files modified. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-02 04:31:02 +08:00
INTEGER, INTENT(IN), OPTIONAL :: ntg_, nyfft_
!
INTEGER :: parent_nproc = 1, parent_mype = 0
!
#if defined (__MPI)
!
parent_nproc = mp_size( parent_comm )
parent_mype = mp_rank( parent_comm )
!
! ... nband_ must have been previously read from command line argument
! ... by a call to routine get_command_line
!
nbgrp = nband_
!
IF ( nbgrp < 1 .OR. nbgrp > parent_nproc ) CALL errore( 'mp_start_bands',&
'invalid number of band groups, out of range', 1 )
IF ( MOD( parent_nproc, nbgrp ) /= 0 ) CALL errore( 'mp_start_bands', &
'n. of band groups must be divisor of parent_nproc', 1 )
band group parallelization slightly modified to make it more flexible, and little more efficient. subroutine init_index_over_band ( comm, nbnd ) that set ibnd_start and ibnd_end variables requiring comm=inter_bgrp_comm is removed and replaced by subroutine set_bgrp_indices ( nbnd, ibnd_start, ibnd_end ) implementing the same relationships between its arguments but: - forcing the use of inter_bgrp_comm from the same mp_bands module, - returning ibnd_start and ibnd_end as explicit outputs that are not anymore kept in the module. In this way other quantities can be distributes if needed in any given routine without too many non-local effects. For compatibility with TDDFPT, that uses the bgrp parallelization and loads ibnd_start/ibnd_end trhough mp_global module, these two variables are moved in a dedicated module mp_bands_TDDFPT included in Module/mp_bands.f90. This is done to avoid too much invasive changes in a code i don't know well. In this way the needed changes are very localized and transparent, the code compiles correctly so I think it should work exactly as before. In my opinion the two variables should be moved somewhere inside TDDFPT. Band parallelization is extended to h_psi(lda,n,m,psi,hpsi) and s_psi routines (only when .not.exx_is_active because otherwise it is already used inside vexx) for generic values of m (of course it gives a speedup only when m is not too small compared to nbgrp but it works also if m < nbgrp ). Compatibility with task groups has not be explored but should not be conceptually different from how it works in the exx case. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@11835 c92efa57-630b-4861-b058-cf58834340f0
2015-11-07 08:06:40 +08:00
!
! set logical flag so that band parallelization in H\psi is allowed
! (can be disabled before calling H\psi if not desired)
band group parallelization slightly modified to make it more flexible, and little more efficient. subroutine init_index_over_band ( comm, nbnd ) that set ibnd_start and ibnd_end variables requiring comm=inter_bgrp_comm is removed and replaced by subroutine set_bgrp_indices ( nbnd, ibnd_start, ibnd_end ) implementing the same relationships between its arguments but: - forcing the use of inter_bgrp_comm from the same mp_bands module, - returning ibnd_start and ibnd_end as explicit outputs that are not anymore kept in the module. In this way other quantities can be distributes if needed in any given routine without too many non-local effects. For compatibility with TDDFPT, that uses the bgrp parallelization and loads ibnd_start/ibnd_end trhough mp_global module, these two variables are moved in a dedicated module mp_bands_TDDFPT included in Module/mp_bands.f90. This is done to avoid too much invasive changes in a code i don't know well. In this way the needed changes are very localized and transparent, the code compiles correctly so I think it should work exactly as before. In my opinion the two variables should be moved somewhere inside TDDFPT. Band parallelization is extended to h_psi(lda,n,m,psi,hpsi) and s_psi routines (only when .not.exx_is_active because otherwise it is already used inside vexx) for generic values of m (of course it gives a speedup only when m is not too small compared to nbgrp but it works also if m < nbgrp ). Compatibility with task groups has not be explored but should not be conceptually different from how it works in the exx case. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@11835 c92efa57-630b-4861-b058-cf58834340f0
2015-11-07 08:06:40 +08:00
!
use_bgrp_in_hpsi = ( nbgrp > 1 )
!
! ... Set number of processors per band group
!
nproc_bgrp = parent_nproc / nbgrp
!
! ... set index of band group for this processor ( 0 : nbgrp - 1 )
!
my_bgrp_id = parent_mype / nproc_bgrp
!
! ... set index of processor within the image ( 0 : nproc_image - 1 )
!
me_bgrp = MOD( parent_mype, nproc_bgrp )
!
CALL mp_barrier( parent_comm )
!
! ... the intra_bgrp_comm communicator is created
!
CALL mp_comm_split( parent_comm, my_bgrp_id, parent_mype, intra_bgrp_comm )
!
CALL mp_barrier( parent_comm )
!
! ... the inter_bgrp_comm communicator is created
!
CALL mp_comm_split( parent_comm, me_bgrp, parent_mype, inter_bgrp_comm )
!
IF ( PRESENT(ntg_) ) THEN
ntask_groups = ntg_
END IF
MAJOR restructuring of the FFTXlib library In real space processors are organized in a 2D pattern. Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes. In reciprocal space each processor owns Z-columns that belong to a sub set of X-values. This allows to split the processors in two sets for communication in the YZ and XY planes. In alternative, if the situation allows for it, a task group paralelization is used (with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc paralleization if task group can be used because a smaller number of larger ammounts of data are transferred. Hence three types of fft are implemented: ! !! ... isgn = +-1 : parallel 3d fft for rho and for the potential ! !! ... isgn = +-2 : parallel 3d fft for wavefunctions ! !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group ! !! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iG*R) !! ... fft along z using pencils (cft_1z) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along x using pencils (cft_1x) ! !! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega !! ... fft along x using pencils (cft_1x) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along z using pencils (cft_1z) ! ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to ! dfft%nproc2, the number of Y-sections in which a plane is divided. ! Data are reshuffled by the fft_scatter_tg routine so that each of the ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes ! of a single wavefunciton. ! fft_type module heavily modified, a number of variables renamed with more intuitive names (at least to me), a number of more variables introduced for the Y-proc parallelization. Task_group module made void. task_group management is now reduced to the logical component fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc. In term of interfaces, the 'easy' calling sequences are SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany ) !! where: !! !! **grid_type = 'Dense'** : !! inverse/direct fourier transform of potentials and charge density f !! on the dense grid (dfftp). On output, f is overwritten !! !! **grid_type = 'Smooth'** : !! inverse/direct fourier transform of potentials and charge density f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Wave'** : !! inverse/direct fourier transform of wave functions f !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'tgWave'** : !! inverse/direct fourier transform of wave functions f with task group !! on the smooth grid (dffts). On output, f is overwritten !! !! **grid_type = 'Custom'** : !! inverse/direct fourier transform of potentials and charge density f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **grid_type = 'CustomWave'** : !! inverse/direct fourier transform of wave functions f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft. !! No check is performed on the correspondence between dfft and grid_type. !! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' !! from all other cases Many more files modified. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-02 04:31:02 +08:00
IF ( PRESENT(nyfft_) ) THEN
nyfft = nyfft_
END IF
call errore('mp_bands',' nyfft value incompatible with nproc_bgrp ', MOD(nproc_bgrp, nyfft) )
!
#endif
RETURN
!
END SUBROUTINE mp_start_bands
!
END MODULE mp_bands
band group parallelization slightly modified to make it more flexible, and little more efficient. subroutine init_index_over_band ( comm, nbnd ) that set ibnd_start and ibnd_end variables requiring comm=inter_bgrp_comm is removed and replaced by subroutine set_bgrp_indices ( nbnd, ibnd_start, ibnd_end ) implementing the same relationships between its arguments but: - forcing the use of inter_bgrp_comm from the same mp_bands module, - returning ibnd_start and ibnd_end as explicit outputs that are not anymore kept in the module. In this way other quantities can be distributes if needed in any given routine without too many non-local effects. For compatibility with TDDFPT, that uses the bgrp parallelization and loads ibnd_start/ibnd_end trhough mp_global module, these two variables are moved in a dedicated module mp_bands_TDDFPT included in Module/mp_bands.f90. This is done to avoid too much invasive changes in a code i don't know well. In this way the needed changes are very localized and transparent, the code compiles correctly so I think it should work exactly as before. In my opinion the two variables should be moved somewhere inside TDDFPT. Band parallelization is extended to h_psi(lda,n,m,psi,hpsi) and s_psi routines (only when .not.exx_is_active because otherwise it is already used inside vexx) for generic values of m (of course it gives a speedup only when m is not too small compared to nbgrp but it works also if m < nbgrp ). Compatibility with task groups has not be explored but should not be conceptually different from how it works in the exx case. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@11835 c92efa57-630b-4861-b058-cf58834340f0
2015-11-07 08:06:40 +08:00
!
!
MODULE mp_bands_TDDFPT
!
! NB: These two variables used to be in mp_bands and are loaded from mp_global in TDDFPT
band group parallelization slightly modified to make it more flexible, and little more efficient. subroutine init_index_over_band ( comm, nbnd ) that set ibnd_start and ibnd_end variables requiring comm=inter_bgrp_comm is removed and replaced by subroutine set_bgrp_indices ( nbnd, ibnd_start, ibnd_end ) implementing the same relationships between its arguments but: - forcing the use of inter_bgrp_comm from the same mp_bands module, - returning ibnd_start and ibnd_end as explicit outputs that are not anymore kept in the module. In this way other quantities can be distributes if needed in any given routine without too many non-local effects. For compatibility with TDDFPT, that uses the bgrp parallelization and loads ibnd_start/ibnd_end trhough mp_global module, these two variables are moved in a dedicated module mp_bands_TDDFPT included in Module/mp_bands.f90. This is done to avoid too much invasive changes in a code i don't know well. In this way the needed changes are very localized and transparent, the code compiles correctly so I think it should work exactly as before. In my opinion the two variables should be moved somewhere inside TDDFPT. Band parallelization is extended to h_psi(lda,n,m,psi,hpsi) and s_psi routines (only when .not.exx_is_active because otherwise it is already used inside vexx) for generic values of m (of course it gives a speedup only when m is not too small compared to nbgrp but it works also if m < nbgrp ). Compatibility with task groups has not be explored but should not be conceptually different from how it works in the exx case. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@11835 c92efa57-630b-4861-b058-cf58834340f0
2015-11-07 08:06:40 +08:00
! I think they would better stay in a TDDFPT specific module but leave them here not to
! be too invasive on a code I don't know well. SdG
!
INTEGER :: ibnd_start = 0 ! starting band index used in bgrp parallelization
INTEGER :: ibnd_end = 0 ! ending band index used in bgrp parallelization
!
END MODULE mp_bands_TDDFPT
!