Commit Graph

253 Commits

Author SHA1 Message Date
Ivan Carnimeo df92fa13ac minor fixes 2020-10-06 13:45:24 +02:00
Ivan Carnimeo 48207bc6cf fix: remove eigenvalues on cpu 2020-10-06 13:45:24 +02:00
Ivan Carnimeo 9923cd2f41 minor fixes 2020-10-06 13:45:24 +02:00
Ivan Carnimeo 2a317e1cec ParO fully implemented for K-POINTS 2020-10-06 13:45:24 +02:00
Ivan Carnimeo e51934525e bpcg_k_gpu done (needs some refinements) 2020-10-06 13:45:23 +02:00
Ivan Carnimeo 96d6fa496f some refinements to rotate_HSpsi_k_gpu.f90 2020-10-06 13:45:23 +02:00
Ivan Carnimeo fe0f307ccb rotate_HSpsi_k_gpu.f90 almost done (works but needs some refinements) 2020-10-06 13:45:23 +02:00
Ivan Carnimeo 637aa4c8b3 paro_k_new_gpu.f90 created
TODO: rotate_HSpsi_k and bpcg_k
2020-10-06 13:45:23 +02:00
Ivan Carnimeo 89dd2b8140 Paro fully implemented for Gamma and __MPI = false
TODO:
     1) __MPI = true (Scalapack + GPU needed)
     2) K points (work in progress)
2020-10-06 13:45:23 +02:00
Ivan Carnimeo 029b4a401c Gamma case almost finished.
TODO:
  fix c_band call to paro_gamma_new_gpu
2020-10-06 13:45:23 +02:00
Ivan Carnimeo 35d591253a bpcg 100% -- some cleanup still needed 2020-10-06 13:45:23 +02:00
Ivan Carnimeo 5ebcd59f6f 77% bpcg 2020-10-06 13:45:22 +02:00
Ivan Carnimeo 48be64f469 some more on bpcg_gamma 2020-10-06 13:41:48 +02:00
Ivan Carnimeo 18b816dc08 some work on bpcg and rotate_Hpsi 2020-10-06 13:41:47 +02:00
Ivan Carnimeo 84a6644246 some cleanup 2020-10-06 13:41:47 +02:00
Ivan Carnimeo aa96512c87 unused host variables removed from rotate_HSpsi_gamma_gpu 2020-10-06 13:41:47 +02:00
Ivan Carnimeo 191542dadc rotate_HSpsi_gamma_gpu done
TO DO:  protate_HSpsi_gamma_gpu
2020-10-06 13:41:47 +02:00
Ivan Carnimeo 0f2e27c66e rotate_HSpsi_gamma and bpcg_gamma needed 2020-10-06 13:41:47 +02:00
Ivan Carnimeo 37514ecfd7 minor changes 2020-10-06 13:41:47 +02:00
Ivan Carnimeo ea178f9401 some work done 2020-10-06 13:41:47 +02:00
Ivan Carnimeo e40913b2da paro_gamma_new_gpu added and Makefiles updated 2020-10-06 13:41:46 +02:00
Pietro Bonfa 1a4df64ffe Merge branch 'develop' into syncqe 2020-10-04 16:33:16 +02:00
Federico Ficarelli 921853902e Update build system to latest changes from upstream 2020-09-29 18:11:33 +02:00
Daniele Cesarini 2643568f60 Added cmake for scalapack 2020-09-29 18:11:32 +02:00
Daniele Cesarini e736e1c01c Fixed missing dependencies to OpenMP 2020-09-29 18:11:32 +02:00
Daniele Cesarini fc09ef40e4 Removed cmake function preprocessing and replaced with _qe_add_global_target 2020-09-29 18:11:32 +02:00
Daniele Cesarini 90840d6caf Fix preprocessor flags for Fortran files 2020-09-29 18:11:32 +02:00
Federico Ficarelli cf894bd132 Add separate target for shared module 'davidson_param' 2020-09-29 18:11:32 +02:00
Federico Ficarelli 9f89c3c2a6 Fix david_param.mod shared between Davidson/Davidson_RCI 2020-09-29 18:11:31 +02:00
Daniele Cesarini 9246f191ac Restricted dependency visibility for cmake targets 2020-09-29 18:11:31 +02:00
Daniele Cesarini d912e3905c Added missing QE packagies to cmake 2020-09-29 18:11:31 +02:00
Federico Ficarelli 2adf2e3f44 Make qe_install_targets variadic 2020-09-29 18:11:30 +02:00
Federico Ficarelli 1b43e7ad64 Add QE::Solvers 2020-09-29 18:11:30 +02:00
Federico Ficarelli ce7c15c3b0 Make qe_install_targets variadic 2020-09-29 18:11:29 +02:00
Federico Ficarelli fc99bec2b6 Add QE::Solvers 2020-09-29 18:11:29 +02:00
Ivan Carnimeo a433f40f67 bug fix line 419 2020-09-08 16:28:12 +02:00
Pietro Bonfa f462e309ea Fixing some problems with old compilers 2020-07-14 08:56:06 +02:00
Ivan Carnimeo b1776be9b0 some useless host-device alignments removed 2020-07-07 11:51:21 +02:00
Ivan Carnimeo 41a2fef372 some truncation errors fixed 2020-07-03 16:37:51 +02:00
Ivan Carnimeo d7dc8b1541 indeces change: ii,jj for cuf, i,j for cpu loops 2020-07-03 16:20:06 +02:00
Ivan Carnimeo 7c55ac66e6 it should now compile on CI 2020-07-03 15:55:21 +02:00
Ivan Carnimeo acd5a80ce3 generic_cublas should now compile on CI 2020-07-03 15:31:40 +02:00
Ivan Carnimeo 1a4a6e3038 arrays passed to ppcg* from c_bands are now passed as device arrays 2020-07-03 15:17:09 +02:00
Ivan Carnimeo d9fe63b601 PPCG on GPU, Gamma and K-points (some minor fixes required)
Committer: Ivan Carnimeo <icarnime@r033c01s04.galileo.cineca.it>
	modified:   KS_Solvers/Makefile
	modified:   KS_Solvers/PPCG/Makefile
	new file:   KS_Solvers/PPCG/generic_cublas.f90
	modified:   KS_Solvers/PPCG/make.depend
	new file:   KS_Solvers/PPCG/ppcg_gamma_gpu.f90
	new file:   KS_Solvers/PPCG/ppcg_k_gpu.f90
	modified:   PW/src/c_bands.f90
2020-06-19 11:15:03 +02:00
Pietro Bonfa 36915a4a6d A few more checks for memory allocation failures 2020-05-12 15:19:21 +02:00
Pietro Bonfa 05cc3dac4d Merge branch 'develop' into gpu-develop (first step) 2020-02-23 20:47:07 +01:00
Paolo Giannozzi d28b9cf06a Leftover test messages 2020-02-21 15:05:44 +00:00
Stefano de Gironcoli 88c1164d06 changes needed to update KS_Solver
-splitting rotate_wfc_* and adding rotate_Hpsi_* into a DENSE diagonalization dir
-removing  cg_param, david_param, ... in favour of util_param
-implementation of ParO
-update of PW, UtilXlib, FFTXlib and install  needed for compatibility
2020-02-17 12:19:53 +01:00
Pietro Bonfa ed83176255 Merge branch 'develop' into gpu-develop 2020-02-07 19:53:18 +01:00
Paolo Giannozzi 6dfebb7db6 Two indices for Davidson arrays
The general Davidson routine cegterg used internally wavefunction-like arrays
that have three indices: plane waves, polarization, bands. This has no real
motivation (historical maybe?) and differs from the rest of QE where
wavefunctions with two indices (plane waves+polarization, bands) are used.

In my opinion, the "gap" between the two sets of plane waves/polarizations
should also be removed (that is: the 2*npw plane waves/polarizations should
be consecutive, not with a "gap" in the middle as it is now) but this is a
much more serious change, affecting many different parts of the code.
2020-02-04 15:00:05 +00:00
Paolo Giannozzi b89ca39069 Allocations moved inside desc_init 2020-02-03 22:06:52 +00:00
Paolo Giannozzi e265446d5d More desc_init harmonization: second version of desc_init moved to laxlib as
well. Not sure what is the difference between the two versions, though.
2020-02-03 20:21:20 +00:00
Paolo Giannozzi dee8f970d2 desc_init moved into LAXlib 2020-02-03 11:02:41 +00:00
Pietro Bonfa 4725c3f548 Merge branch 'develop' into gpu-develop 2020-02-02 15:02:57 +01:00
carcava 38f80cfaa1 - use the new descriptor initi subroutine 2020-02-02 00:56:40 +01:00
Paolo Giannozzi 537aecdcd3 LAXlib-related reorganization
=============================
Harmonization of three copies of desc_init (two more are in KS_Solvers/PPCG,
plus two slightly different ones in Davidson diagonalization), with some
changes for clarity (in my opinion); harmonization of various copies of
compute_distmat and of calbec_[dz]distmat.

In my opinion all these routines, plus several simolar ones that are either
present  in multiple copies or that can be easily harmonized, used in parallel
diagonalization, should be moved somewhere else, preferably LAXlib/.
The problem now is that they are CONTAINed so they use and set variables from
the calling subroutine and may use arrays passed as arrays (with :); moving
them to a separate routine requires an interface, moving them into a module may
lead to undesired dependencies. Ideally one should be able to set up and
diagonalize a distributed matrix without filling the code of calls to
 low-level LAXlib routines and without too much voodoo.
2020-01-29 20:05:02 +00:00
Paolo Giannozzi ffd53eb4da Dependencies updated
Two routines in KS_Solvers/PPCG aligned to latest LAXlib changes
2020-01-28 14:58:32 +00:00
carcava 45522b457e Merge branch 'develop' into laxlib
Conflicts:
	CPV/src/cglib.f90
	CPV/src/cplib.f90
	CPV/src/ldaU.f90
	CPV/src/ldaUpen.f90
	CPV/src/nl_base.f90
	CPV/src/ortho.f90
	CPV/src/wave.f90
2020-01-25 11:01:23 +01:00
Paolo Giannozzi 14ca48dd4d Missing comma in format, some compilers don't like it 2020-01-22 22:16:54 +01:00
Pietro Bonfa 14833ba14d Compile GPU code on the CPU 2019-12-12 22:45:26 +01:00
Pietro Bonfa 52bbfac655 More devicexlib 2019-12-02 15:15:49 +01:00
Pietro Bonfa 009e90a444 Updated DeviceXlib version 2019-11-27 08:05:06 +01:00
Pietro Bonfa 2c8b38d336 Mergin develop with gpu-develop 2019-08-21 19:14:52 +02:00
Carlo Cavazzoni ef771b7d41 - forgotten to rename call to laxlib subs 2019-08-18 21:05:56 +02:00
Carlo Cavazzoni 004301add1 - re-factoring of LAXlib
now QE do not "use" modules of LAXlib any longer, but it just include interface blocks.
  In principle they can now be compiled independently.
  All this beside possible errors.
  Further clean-ups are now possible, within LAXlib and in QE source codes
2019-08-13 01:16:24 +02:00
Carlo Cavazzoni 27adf6d690 - more disentanglement with LAXlib, quite some change inside LAXlib, still few outside.
Next we have to deal with the removal of the use descriptors stuff
2019-08-10 18:49:26 +02:00
Carlo Cavazzoni 5fbc6ecc9c - LAXlib made independent from other module 2019-08-07 14:27:02 +02:00
Pietro Bonfa bbc62a53af Replaced cuda_util with the new device_util library from MaX. 2019-07-29 13:09:37 +02:00
Pietro Bonfa bd55264319 Merge branch 'develop' into gpu-develop 2019-07-01 10:45:08 +02:00
Paolo Giannozzi f423ffc216 Fixes for NAG compiler glitches, courtesy Themos Tsikas 2019-06-12 20:55:06 +02:00
Pietro Bonfa 284c1cd23e Merge branch 'develop' into gpu-develop 2019-04-01 11:23:33 +02:00
Paolo Giannozzi 6834a502ef [Skip-CI] Obsolete version 'svn' replaced by 'git'; various .PHONY of questionable
usefulness, referring to no longer existing procedure devised for svn, removed
2019-03-01 17:42:56 +01:00
Pietro Bonfa 8cef325bd1 Added wrapper for cuda_memcpy and replaced assignement operator in cegterg_gpu 2019-02-27 16:43:21 +01:00
Pietro Bonfa cdcf2699a7 Merge branch 'develop' into gpu-develop 2019-02-05 15:41:15 +01:00
Paolo Giannozzi cc985e701b Problem with parallel make (once again) 2019-02-05 09:15:04 +01:00
Paolo Giannozzi f725126d3a More minor cleanup: use module "parallel include" in KS_Solvers 2019-02-04 10:07:52 +01:00
Paolo Giannozzi 9a75ac9c8b Maybe irrelevant but this is the way it should be 2019-02-04 09:42:53 +01:00
Paolo Giannozzi 75f98e3c59 Last-minute addition of a comment in the Makefile had unexpected
side effects. Now it should work. List of objects is now explicit
2019-02-04 09:20:09 +01:00
Paolo Giannozzi 8e0ac0a7bf Small change to the Makefile of KS_Solvers should prevent annoying
re-linking of executables due to a dependency of many executables
upon KS_Solvers/libks_solvers.a that in turn was re-build every time
2019-02-03 22:00:40 +01:00
Pietro Bonfa 5ef40d68ea Merge branch 'develop' of gitlab.com:QEF/q-e into gpu-develop 2019-01-17 17:44:42 +01:00
Lorenzo Paulatto (naquite) 4f0da5d0b4 More syntax that xlf90 does not like 2019-01-16 16:20:28 +01:00
Pietro Bonfa d25955957c Improved buffers 2019-01-03 17:57:14 +01:00
Pietro Bonfa a590fef748 Fixed CPU build. Same function call for GPU version of ddot (should be moved elsewhere soon). 2018-11-13 14:49:27 +01:00
Pietro Bonfa a0470a9e67 Aligned cegterg_gpu to CPU version. 2018-11-13 13:32:25 +01:00
Pietro Bonfa 31972e5d95 Reverted avoided communication in cegter (now in UtilXLib), minor changes to FFT. 2018-11-07 16:16:02 +01:00
Pietro Bonfa e383f51542 Added check on flag PGI_POWER_WORKAROUND in Davidson makefile. 2018-10-25 17:38:34 +02:00
Pietro Bonfa 7e2b2c462a Restoring CPU compilation 2018-10-17 12:23:57 +02:00
Pietro Bonfa eebf0236df Initial (naive) implementation of CG diagonalization algorithm. 2018-10-17 11:18:08 +02:00
Pietro Bonfa fa262106f0 Initialization (partially) ported to GPUs 2018-10-09 16:54:13 +02:00
Pietro Bonfa 36e9f3b1d4 Fix compilation dependencies 2018-10-09 12:02:07 +02:00
Pietro Bonfa 74b597ae70 GScratch is now a real library 2018-09-11 12:26:43 +02:00
Pietro Bonfa 5ba063967f Switched to single global buffer for all QE project. Should be made optional in LAXlib and KS_solvers (easy task) 2018-08-17 17:48:19 +02:00
Pietro Bonfa e32a34f3d3 Merge branch 'develop' of gitlab.com:QEF/q-e into gpu-develop 2018-08-17 14:13:32 +02:00
Pietro Bonfa 85e37de069 Restored original cegterg code since aligned version performs much worse (probably a workload balance problem, but more careful analysis is needed) 2018-08-17 11:00:12 +02:00
Pietro Bonfa 354a86b841 Aligned cegterg_gpu and cegterg. Should lead to some performance improvements. Explicit bounds in accelerated parallel solver subroutine. 2018-08-16 13:52:19 +02:00
Stefano De Gironcoli 18bfc19c86 a single libks_solvers.a library is created.
Makefiles of the children codes are updated to use it.
2018-08-14 01:41:44 +02:00
Stefano De Gironcoli 3ac492bb6e wrong indexing of threaded_backassignement corrected in ppcg_gamma 2018-08-08 06:53:49 +02:00
Stefano de Gironcoli 64cca07a92 more threaded_backassignement (including optionally summing another vector) 2018-08-07 14:15:39 +02:00
Stefano de Gironcoli 819ab53cc5 more thrreaded (back) assignments 2018-08-07 12:09:05 +02:00
Stefano de Gironcoli e85384bd98 more omp assignements 2018-08-06 07:17:35 +02:00
Stefano de Gironcoli 53b0e84e6c more chuncked omp parallel do loops 2018-08-06 03:34:51 +02:00
Stefano de Gironcoli a241241d27 updated dependencies 2018-08-05 16:52:11 +02:00
Stefano de Gironcoli d936f16226 export_gstart_2_* and set_mpi_comm_4_* moved to LAXLIB
their call corrected in init_run and mp_global
a recently added bug in ppcg_k when npol=1 corrected
2018-08-05 16:52:11 +02:00
Paolo Giannozzi db9228d819 make.depend updated 2018-08-05 11:08:40 +02:00
Paolo Giannozzi cd22b7fc54 Some compilers flag the presence of a comma as in WRITE( ), list-of-variables
as obsolete syntax
2018-08-05 11:05:47 +02:00
Stefano de Gironcoli ac8b63bd4c update of previous merge PPCG 2018-08-05 08:25:56 +00:00
Stefano de Gironcoli 90dafe5d29 timing of PPCG routines updated. 2018-08-03 10:20:51 +02:00
Stefano De Gironcoli b8f879e0d7 timing using start_clock/stop_clock 2018-08-03 09:27:57 +02:00
Stefano de Gironcoli 2c6d20ed77 updated versions of ppcg_gamma/k solvers. the generic-k
version  works also in the case npol=2 (at least on my laptop with
mpirun -np 4 ...)
2018-08-03 04:15:56 +02:00
Stefano de Gironcoli 57ec56ed6b further changes to make npol=2 case work 2018-08-03 04:15:56 +02:00
Stefano de Gironcoli b013e79275 first attempt to generalize to non-collinear case. tests CRASH. 2018-08-03 04:15:56 +02:00
Stefano de Gironcoli 203126fd44 avg number of iteration in ppcg computed properly 2018-08-03 04:15:56 +02:00
Stefano de Gironcoli e177dce7da fixed (hopefully) the dependence for the stand-alone cp compilation 2018-08-03 04:15:56 +02:00
Stefano de Gironcoli 5ad3ee115a let's change something so that the server recompiles 2018-08-03 04:15:56 +02:00
Stefano de Gironcoli d55e74a4e4 more minor changes to deal with ppcg option.
PW/examples/example01 script modified to include ppcg; corresponding references added
2018-08-03 04:15:56 +02:00
Stefano De Gironcoli 854fe693e0 PPCG: renaming of a few files originating form the CG case and makefile update 2018-08-03 04:15:56 +02:00
Stefano de Gironcoli 82fc9fa868 adding PPCG to KS_Solvers directory. makedeps script updated 2018-08-03 04:15:56 +02:00
Pietro Bonfa a5230da8f7 Merged utilXlib and upstream develop 2018-06-18 13:57:51 +02:00
Pietro Bonfa 5999a3c939 Fixed bug in regterg_gpu 2018-06-15 15:28:48 +02:00
Ye Luo aa13725349 Need to clean up the garbage npw to npwx. 2018-06-14 19:58:00 -05:00
Ye Luo 6ac7f8c32a Merge branch 'bugfix-ndiag' into opt-threading-all-parts 2018-06-14 19:05:31 -05:00
Ye Luo 94a9c8ca6b Bugfix Need to protect the array range properly. 2018-06-14 18:21:54 -05:00
Ye Luo f91ec7499e Chuncked innermost loop in collapse. 2018-06-03 09:24:05 -05:00
Ye Luo 8812c4085f Reverted to the old algorithm in hpsi_dot_v. 2018-06-02 16:24:36 -05:00
Ye Luo f0b9584bf8 Minor change 2018-06-02 13:19:17 -05:00
Ye Luo 9a94d4d047 Setting the chunk size as a constant 2018-06-02 12:30:11 -05:00
Ye Luo fa21b8d52a Add functions to do threaded memcpy and memset
threaded_memXXX is contains a parallel do region
threaded_barrier_memXXX contains do region without parallel
threaded_nowait_memXXX contains do region without parallel and a nowait at the end do
2018-06-02 12:22:42 -05:00
Ye Luo 9c16309006 Chuncked computing in cegterg. 2018-06-02 10:23:01 -05:00
Ye Luo c54ca024c6 Threade more in cegterg. 2018-06-01 00:31:26 -05:00
Pietro Bonfa fbc368ad32 beta|psi now performed on GPU. No need to have target attribute in modules 2018-05-29 17:30:04 +02:00
Ye Luo 14fef459bb Clean up threaded fill. 2018-05-28 19:36:01 -05:00
Ye Luo 14508b0810 Optimize hpsi_dot_v 2018-05-28 19:13:39 -05:00
Ye Luo 85f6e070d9 Add threaded copy. 2018-05-28 15:29:25 -05:00
Ye Luo 8b628c3f0a Clean up garbage when npw < npwx. 2018-05-28 15:14:05 -05:00
Ye Luo af2fac5ef9 Replace allgather with gather. 2018-05-28 08:34:42 -05:00
Ye Luo 2c6c859896 Remove all unnecessary mem ops in cegterg. 2018-05-27 21:54:46 -05:00
Ye Luo 0f340dd372 A bit comments. 2018-05-24 19:29:51 -05:00
Pietro Bonfa 0c90e6b212 Updated buffer class, removed contiguos pointers 2018-04-23 14:48:29 +02:00
Ye Luo 8d563908a8 Merge remote-tracking branch 'gitlab/develop' into opt-threading-all-parts 2018-04-18 18:32:01 -05:00
Pietro Bonfa 5f6d231bdd Added GPU version for h_psi, s_psi, g_psi and vloc. psic and psic_nc module variables should be standardized. Many data copies between CPU and GPU should be replaced by data in modules. All test passing except for pw_vc-relax/vc-relax3.in which has too loose convergence (noticed by Josh Romero) and for dsygvdx_gpu (real problem) occasionally failing 2018-04-18 13:28:38 +02:00
Pietro Bonfa 45a59c3ae7 Added serial and parallel Davidson solvers for data on the GPU. Only serial version is actually accelerated. 2018-04-10 14:14:59 +02:00
Ye Luo fceb56cf0c Avoid filling ptmp zero in hpsi_dot_v of regterg. 2018-03-19 11:01:16 -05:00
Ye Luo 8925a803aa Better threading cegterg. 2018-03-18 18:45:45 -05:00
Ye Luo e421345814 Avoid filling ptmp zero in hpsi_dot_v of cegterg. 2018-03-18 16:09:45 -05:00
Paolo Giannozzi a06d380cf4 Replicated routine "set_bgrp_index" replaced by "divide" 2017-12-23 22:00:32 +01:00
degironc ae1805bb72 redundant duplicate module constants.f90 removed from KS_Solvers/CG
Mathematical constant PI defined as a local parameter when needed



git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@14010 c92efa57-630b-4861-b058-cf58834340f0
2017-11-25 21:07:59 +00:00
degironc aba852b428 order of input arguments in KS_Solver routines changed
bringing overlap logical flag close to the s_psi function it affects



git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13800 c92efa57-630b-4861-b058-cf58834340f0
2017-08-29 08:09:06 +00:00
degironc 0d2d3d5721 minor estetic change
git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13730 c92efa57-630b-4861-b058-cf58834340f0
2017-08-19 13:30:16 +00:00
degironc a8340b4d40 Duplicate routines cdiaghg and rdiaghg moved from KS_Solvers/XX to LAXlib.
Duplicate module mp_bands.f90 moved from KS_Solvers/XX to UtilXlib/mp_bands_util.f90
Makefiles and makedepend.sh updated
 
that should take care of the duplicate symbols




git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13712 c92efa57-630b-4861-b058-cf58834340f0
2017-08-08 21:44:44 +00:00
degironc 3e6b4f8e76 MAJOR restructuring of the FFTXlib library
In real space processors are organized in a 2D pattern.

Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes.
In reciprocal space each processor owns Z-columns that belong to a sub set of
X-values. This allows to split the processors in two sets for communication
in the YZ and XY planes.
In alternative, if the situation allows for it, a task group paralelization is used
(with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier
trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc
paralleization if task group can be used because a smaller number of larger ammounts of 
data are transferred. Hence three types of fft are implemented: 
 
  !
  !! ... isgn = +-1 : parallel 3d fft for rho and for the potential
  !
  !! ... isgn = +-2 : parallel 3d fft for wavefunctions
  !
  !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group
  !
  !! ... isgn = +   : G-space to R-space, output = \sum_G f(G)exp(+iG*R)
  !! ...              fft along z using pencils        (cft_1z)
  !! ...              transpose across nodes           (fft_scatter_yz)
  !! ...              fft along y using pencils        (cft_1y)
  !! ...              transpose across nodes           (fft_scatter_xy)
  !! ...              fft along x using pencils        (cft_1x)
  !
  !! ... isgn = -   : R-space to G-space, output = \int_R f(R)exp(-iG*R)/Omega
  !! ...              fft along x using pencils        (cft_1x)
  !! ...              transpose across nodes           (fft_scatter_xy)
  !! ...              fft along y using pencils        (cft_1y)
  !! ...              transpose across nodes           (fft_scatter_yz)
  !! ...              fft along z using pencils        (cft_1z)
  !
  ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to 
  ! dfft%nproc2, the number of Y-sections in which a plane is divided. 
  ! Data are reshuffled by the fft_scatter_tg routine so that each of the 
  ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes 
  ! of a single wavefunciton.
  !

fft_type module heavily modified, a number of variables renamed with more intuitive names 
(at least to me), a number of more variables introduced for the Y-proc parallelization.

Task_group module made void. task_group management is now reduced to the logical component
 fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc.

In term of interfaces, the 'easy' calling sequences are

SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany )

  !! where:
  !! 
  !! **grid_type = 'Dense'** : 
  !!   inverse/direct fourier transform of potentials and charge density f
  !!   on the dense grid (dfftp). On output, f is overwritten
  !! 
  !! **grid_type = 'Smooth'** :
  !!   inverse/direct fourier transform of  potentials and charge density f
  !!   on the smooth grid (dffts). On output, f is overwritten
  !! 
  !! **grid_type = 'Wave'** :
  !!   inverse/direct fourier transform of  wave functions f
  !!   on the smooth grid (dffts). On output, f is overwritten
  !!
  !! **grid_type = 'tgWave'** :
  !!   inverse/direct fourier transform of  wave functions f with task group
  !!   on the smooth grid (dffts). On output, f is overwritten
  !!
  !! **grid_type = 'Custom'** : 
  !!   inverse/direct fourier transform of potentials and charge density f
  !!   on a custom grid (dfft_exx). On output, f is overwritten
  !! 
  !! **grid_type = 'CustomWave'** :
  !!   inverse/direct fourier transform of  wave functions f
  !!   on a custom grid (dfft_exx). On output, f is overwritten
  !! 
  !! **dfft = FFT descriptor**, IMPORTANT NOTICE: grid is specified only by dfft.
  !!   No check is performed on the correspondence between dfft and grid_type.
  !!   grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' 
  !!   from all other cases
                                                                                                 

Many more files modified.




git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0
2017-08-01 20:31:02 +00:00