Commit Graph

50 Commits

Author SHA1 Message Date
Stefano de Gironcoli 7a76e5b076 more threaded_backassignement (including optionally summing another vector) 2018-08-07 20:45:56 +02:00
Stefano de Gironcoli a091a2604a more thrreaded (back) assignments 2018-08-07 20:45:56 +02:00
Stefano de Gironcoli bb6889e8b2 more omp assignements 2018-08-07 20:45:56 +02:00
Stefano de Gironcoli 56197ff9ca more chuncked omp parallel do loops 2018-08-07 20:45:56 +02:00
Stefano de Gironcoli a241241d27 updated dependencies 2018-08-05 16:52:11 +02:00
Stefano de Gironcoli d936f16226 export_gstart_2_* and set_mpi_comm_4_* moved to LAXLIB
their call corrected in init_run and mp_global
a recently added bug in ppcg_k when npol=1 corrected
2018-08-05 16:52:11 +02:00
Paolo Giannozzi db9228d819 make.depend updated 2018-08-05 11:08:40 +02:00
Paolo Giannozzi cd22b7fc54 Some compilers flag the presence of a comma as in WRITE( ), list-of-variables
as obsolete syntax
2018-08-05 11:05:47 +02:00
Stefano de Gironcoli ac8b63bd4c update of previous merge PPCG 2018-08-05 08:25:56 +00:00
Stefano de Gironcoli 90dafe5d29 timing of PPCG routines updated. 2018-08-03 10:20:51 +02:00
Stefano De Gironcoli b8f879e0d7 timing using start_clock/stop_clock 2018-08-03 09:27:57 +02:00
Stefano de Gironcoli 2c6d20ed77 updated versions of ppcg_gamma/k solvers. the generic-k
version  works also in the case npol=2 (at least on my laptop with
mpirun -np 4 ...)
2018-08-03 04:15:56 +02:00
Stefano de Gironcoli 57ec56ed6b further changes to make npol=2 case work 2018-08-03 04:15:56 +02:00
Stefano de Gironcoli b013e79275 first attempt to generalize to non-collinear case. tests CRASH. 2018-08-03 04:15:56 +02:00
Stefano de Gironcoli 203126fd44 avg number of iteration in ppcg computed properly 2018-08-03 04:15:56 +02:00
Stefano de Gironcoli e177dce7da fixed (hopefully) the dependence for the stand-alone cp compilation 2018-08-03 04:15:56 +02:00
Stefano de Gironcoli 5ad3ee115a let's change something so that the server recompiles 2018-08-03 04:15:56 +02:00
Stefano de Gironcoli d55e74a4e4 more minor changes to deal with ppcg option.
PW/examples/example01 script modified to include ppcg; corresponding references added
2018-08-03 04:15:56 +02:00
Stefano De Gironcoli 854fe693e0 PPCG: renaming of a few files originating form the CG case and makefile update 2018-08-03 04:15:56 +02:00
Stefano de Gironcoli 82fc9fa868 adding PPCG to KS_Solvers directory. makedeps script updated 2018-08-03 04:15:56 +02:00
Ye Luo aa13725349 Need to clean up the garbage npw to npwx. 2018-06-14 19:58:00 -05:00
Ye Luo 6ac7f8c32a Merge branch 'bugfix-ndiag' into opt-threading-all-parts 2018-06-14 19:05:31 -05:00
Ye Luo 94a9c8ca6b Bugfix Need to protect the array range properly. 2018-06-14 18:21:54 -05:00
Ye Luo f91ec7499e Chuncked innermost loop in collapse. 2018-06-03 09:24:05 -05:00
Ye Luo 8812c4085f Reverted to the old algorithm in hpsi_dot_v. 2018-06-02 16:24:36 -05:00
Ye Luo f0b9584bf8 Minor change 2018-06-02 13:19:17 -05:00
Ye Luo 9a94d4d047 Setting the chunk size as a constant 2018-06-02 12:30:11 -05:00
Ye Luo fa21b8d52a Add functions to do threaded memcpy and memset
threaded_memXXX is contains a parallel do region
threaded_barrier_memXXX contains do region without parallel
threaded_nowait_memXXX contains do region without parallel and a nowait at the end do
2018-06-02 12:22:42 -05:00
Ye Luo 9c16309006 Chuncked computing in cegterg. 2018-06-02 10:23:01 -05:00
Ye Luo c54ca024c6 Threade more in cegterg. 2018-06-01 00:31:26 -05:00
Ye Luo 14fef459bb Clean up threaded fill. 2018-05-28 19:36:01 -05:00
Ye Luo 14508b0810 Optimize hpsi_dot_v 2018-05-28 19:13:39 -05:00
Ye Luo 85f6e070d9 Add threaded copy. 2018-05-28 15:29:25 -05:00
Ye Luo 8b628c3f0a Clean up garbage when npw < npwx. 2018-05-28 15:14:05 -05:00
Ye Luo af2fac5ef9 Replace allgather with gather. 2018-05-28 08:34:42 -05:00
Ye Luo 2c6c859896 Remove all unnecessary mem ops in cegterg. 2018-05-27 21:54:46 -05:00
Ye Luo 0f340dd372 A bit comments. 2018-05-24 19:29:51 -05:00
Ye Luo 8d563908a8 Merge remote-tracking branch 'gitlab/develop' into opt-threading-all-parts 2018-04-18 18:32:01 -05:00
Ye Luo fceb56cf0c Avoid filling ptmp zero in hpsi_dot_v of regterg. 2018-03-19 11:01:16 -05:00
Ye Luo 8925a803aa Better threading cegterg. 2018-03-18 18:45:45 -05:00
Ye Luo e421345814 Avoid filling ptmp zero in hpsi_dot_v of cegterg. 2018-03-18 16:09:45 -05:00
Paolo Giannozzi a06d380cf4 Replicated routine "set_bgrp_index" replaced by "divide" 2017-12-23 22:00:32 +01:00
degironc ae1805bb72 redundant duplicate module constants.f90 removed from KS_Solvers/CG
Mathematical constant PI defined as a local parameter when needed



git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@14010 c92efa57-630b-4861-b058-cf58834340f0
2017-11-25 21:07:59 +00:00
degironc aba852b428 order of input arguments in KS_Solver routines changed
bringing overlap logical flag close to the s_psi function it affects



git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13800 c92efa57-630b-4861-b058-cf58834340f0
2017-08-29 08:09:06 +00:00
degironc 0d2d3d5721 minor estetic change
git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13730 c92efa57-630b-4861-b058-cf58834340f0
2017-08-19 13:30:16 +00:00
degironc a8340b4d40 Duplicate routines cdiaghg and rdiaghg moved from KS_Solvers/XX to LAXlib.
Duplicate module mp_bands.f90 moved from KS_Solvers/XX to UtilXlib/mp_bands_util.f90
Makefiles and makedepend.sh updated
 
that should take care of the duplicate symbols




git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13712 c92efa57-630b-4861-b058-cf58834340f0
2017-08-08 21:44:44 +00:00
degironc 3e6b4f8e76 MAJOR restructuring of the FFTXlib library
In real space processors are organized in a 2D pattern.

Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes.
In reciprocal space each processor owns Z-columns that belong to a sub set of
X-values. This allows to split the processors in two sets for communication
in the YZ and XY planes.
In alternative, if the situation allows for it, a task group paralelization is used
(with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier
trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc
paralleization if task group can be used because a smaller number of larger ammounts of 
data are transferred. Hence three types of fft are implemented: 
 
  !
  !! ... isgn = +-1 : parallel 3d fft for rho and for the potential
  !
  !! ... isgn = +-2 : parallel 3d fft for wavefunctions
  !
  !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group
  !
  !! ... isgn = +   : G-space to R-space, output = \sum_G f(G)exp(+iG*R)
  !! ...              fft along z using pencils        (cft_1z)
  !! ...              transpose across nodes           (fft_scatter_yz)
  !! ...              fft along y using pencils        (cft_1y)
  !! ...              transpose across nodes           (fft_scatter_xy)
  !! ...              fft along x using pencils        (cft_1x)
  !
  !! ... isgn = -   : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega
  !! ...              fft along x using pencils        (cft_1x)
  !! ...              transpose across nodes           (fft_scatter_xy)
  !! ...              fft along y using pencils        (cft_1y)
  !! ...              transpose across nodes           (fft_scatter_yz)
  !! ...              fft along z using pencils        (cft_1z)
  !
  ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to 
  ! dfft%nproc2, the number of Y-sections in which a plane is divided. 
  ! Data are reshuffled by the fft_scatter_tg routine so that each of the 
  ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes 
  ! of a single wavefunciton.
  !

fft_type module heavily modified, a number of variables renamed with more intuitive names 
(at least to me), a number of more variables introduced for the Y-proc parallelization.

Task_group module made void. task_group management is now reduced to the logical component
 fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc.

In term of interfaces, the 'easy' calling sequences are

SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany )

  !! where:
  !! 
  !! **grid_type = 'Dense'** : 
  !!   inverse/direct fourier transform of potentials and charge density f
  !!   on the dense grid (dfftp). On output, f is overwritten
  !! 
  !! **grid_type = 'Smooth'** :
  !!   inverse/direct fourier transform of  potentials and charge density f
  !!   on the smooth grid (dffts). On output, f is overwritten
  !! 
  !! **grid_type = 'Wave'** :
  !!   inverse/direct fourier transform of  wave functions f
  !!   on the smooth grid (dffts). On output, f is overwritten
  !!
  !! **grid_type = 'tgWave'** :
  !!   inverse/direct fourier transform of  wave functions f with task group
  !!   on the smooth grid (dffts). On output, f is overwritten
  !!
  !! **grid_type = 'Custom'** : 
  !!   inverse/direct fourier transform of potentials and charge density f
  !!   on a custom grid (dfft_exx). On output, f is overwritten
  !! 
  !! **grid_type = 'CustomWave'** :
  !!   inverse/direct fourier transform of  wave functions f
  !!   on a custom grid (dfft_exx). On output, f is overwritten
  !! 
  !! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft.
  !!   No check is performed on the correspondence between dfft and grid_type.
  !!   grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' 
  !!   from all other cases
                                                                                                 

Many more files modified.




git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-01 20:31:02 +00:00
giannozz 15215e2262 Compiled modules shouldn't be under revision control!
git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13663 c92efa57-630b-4861-b058-cf58834340f0
2017-07-31 16:43:50 +00:00
degironc 1b33777cbd remove some timing printing.
intra_pool_comm (the parent_comm of intra_bgrp_comm) should be the first argument
of set_mpi_comm_4_XX routines.



git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13645 c92efa57-630b-4861-b058-cf58834340f0
2017-07-29 19:48:15 +00:00
degironc 4636bca635 KS_Solvers directory has been created with three subdirectories:
KS_Solvers/CG, KS_Solvers/Davidson, KS_Solvers/Davidson_RCI.
Two are currently used by QE, the third one implements the Davidson
diagonalization within the Reverse Communication Interface paradigm,
courtesy of Micael Oliveira.

KS_Solvers routines depend only on lower level libraries, notably UtilXlib, 
LAXlib, (SCA)LAPACK, and BLAS.

reorganization can be improved. For instance some duplicated routines like
cdiaghg and rdiaghg could/should be moved in LAXlib. This could reduce the need
to include  KS_Solvers directories in the link step of many codes.    

Minimal changes to calling sequence have been made, essentially just adding
h_psi,s_psi,g_psi and h_1psi,s_1psi routines names as arguments (with a
specific calling sequence ihardcode inside the routines that agree with PWSCF one). 
This could be avoided adopting the RCI paradigm.

Compiled in serial and parallel, 177/182 pw tests passed (3 that were failing 
even before on my laptop pw-berry, pw-langevin, pw-pawatom + 2 unknown==not tested), 
12 /17 cp tests passed (some o2-us-para-pbe-X fail but the same was for the 
original version)

I assume the modified calling procedure is working and the problem lies somewhere else.
 
Randomly tested some examples in pw, ph, pwcond and it seams to work.

Please report any problem.





git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13644 c92efa57-630b-4861-b058-cf58834340f0
2017-07-29 12:19:19 +00:00