quantum-espresso

Commit Graph

Author	SHA1	Message	Date
Ivan Carnimeo	df92fa13ac	minor fixes	2020-10-06 13:45:24 +02:00
Ivan Carnimeo	48207bc6cf	fix: remove eigenvalues on cpu	2020-10-06 13:45:24 +02:00
Ivan Carnimeo	9923cd2f41	minor fixes	2020-10-06 13:45:24 +02:00
Ivan Carnimeo	2a317e1cec	ParO fully implemented for K-POINTS	2020-10-06 13:45:24 +02:00
Ivan Carnimeo	e51934525e	bpcg_k_gpu done (needs some refinements)	2020-10-06 13:45:23 +02:00
Ivan Carnimeo	96d6fa496f	some refinements to rotate_HSpsi_k_gpu.f90	2020-10-06 13:45:23 +02:00
Ivan Carnimeo	fe0f307ccb	rotate_HSpsi_k_gpu.f90 almost done (works but needs some refinements)	2020-10-06 13:45:23 +02:00
Ivan Carnimeo	637aa4c8b3	paro_k_new_gpu.f90 created TODO: rotate_HSpsi_k and bpcg_k	2020-10-06 13:45:23 +02:00
Ivan Carnimeo	89dd2b8140	Paro fully implemented for Gamma and __MPI = false TODO: 1) __MPI = true (Scalapack + GPU needed) 2) K points (work in progress)	2020-10-06 13:45:23 +02:00
Ivan Carnimeo	029b4a401c	Gamma case almost finished. TODO: fix c_band call to paro_gamma_new_gpu	2020-10-06 13:45:23 +02:00
Ivan Carnimeo	35d591253a	bpcg 100% -- some cleanup still needed	2020-10-06 13:45:23 +02:00
Ivan Carnimeo	5ebcd59f6f	77% bpcg	2020-10-06 13:45:22 +02:00
Ivan Carnimeo	48be64f469	some more on bpcg_gamma	2020-10-06 13:41:48 +02:00
Ivan Carnimeo	18b816dc08	some work on bpcg and rotate_Hpsi	2020-10-06 13:41:47 +02:00
Ivan Carnimeo	84a6644246	some cleanup	2020-10-06 13:41:47 +02:00
Ivan Carnimeo	aa96512c87	unused host variables removed from rotate_HSpsi_gamma_gpu	2020-10-06 13:41:47 +02:00
Ivan Carnimeo	191542dadc	rotate_HSpsi_gamma_gpu done TO DO: protate_HSpsi_gamma_gpu	2020-10-06 13:41:47 +02:00
Ivan Carnimeo	0f2e27c66e	rotate_HSpsi_gamma and bpcg_gamma needed	2020-10-06 13:41:47 +02:00
Ivan Carnimeo	37514ecfd7	minor changes	2020-10-06 13:41:47 +02:00
Ivan Carnimeo	ea178f9401	some work done	2020-10-06 13:41:47 +02:00
Ivan Carnimeo	e40913b2da	paro_gamma_new_gpu added and Makefiles updated	2020-10-06 13:41:46 +02:00
Pietro Bonfa	1a4df64ffe	Merge branch 'develop' into syncqe	2020-10-04 16:33:16 +02:00
Federico Ficarelli	921853902e	Update build system to latest changes from upstream	2020-09-29 18:11:33 +02:00
Daniele Cesarini	2643568f60	Added cmake for scalapack	2020-09-29 18:11:32 +02:00
Daniele Cesarini	e736e1c01c	Fixed missing dependencies to OpenMP	2020-09-29 18:11:32 +02:00
Daniele Cesarini	fc09ef40e4	Removed cmake function preprocessing and replaced with _qe_add_global_target	2020-09-29 18:11:32 +02:00
Daniele Cesarini	90840d6caf	Fix preprocessor flags for Fortran files	2020-09-29 18:11:32 +02:00
Federico Ficarelli	cf894bd132	Add separate target for shared module 'davidson_param'	2020-09-29 18:11:32 +02:00
Federico Ficarelli	9f89c3c2a6	Fix david_param.mod shared between Davidson/Davidson_RCI	2020-09-29 18:11:31 +02:00
Daniele Cesarini	9246f191ac	Restricted dependency visibility for cmake targets	2020-09-29 18:11:31 +02:00
Daniele Cesarini	d912e3905c	Added missing QE packagies to cmake	2020-09-29 18:11:31 +02:00
Federico Ficarelli	2adf2e3f44	Make qe_install_targets variadic	2020-09-29 18:11:30 +02:00
Federico Ficarelli	1b43e7ad64	Add QE::Solvers	2020-09-29 18:11:30 +02:00
Federico Ficarelli	ce7c15c3b0	Make qe_install_targets variadic	2020-09-29 18:11:29 +02:00
Federico Ficarelli	fc99bec2b6	Add QE::Solvers	2020-09-29 18:11:29 +02:00
Ivan Carnimeo	a433f40f67	bug fix line 419	2020-09-08 16:28:12 +02:00
Pietro Bonfa	f462e309ea	Fixing some problems with old compilers	2020-07-14 08:56:06 +02:00
Ivan Carnimeo	b1776be9b0	some useless host-device alignments removed	2020-07-07 11:51:21 +02:00
Ivan Carnimeo	41a2fef372	some truncation errors fixed	2020-07-03 16:37:51 +02:00
Ivan Carnimeo	d7dc8b1541	indeces change: ii,jj for cuf, i,j for cpu loops	2020-07-03 16:20:06 +02:00
Ivan Carnimeo	7c55ac66e6	it should now compile on CI	2020-07-03 15:55:21 +02:00
Ivan Carnimeo	acd5a80ce3	generic_cublas should now compile on CI	2020-07-03 15:31:40 +02:00
Ivan Carnimeo	1a4a6e3038	arrays passed to ppcg* from c_bands are now passed as device arrays	2020-07-03 15:17:09 +02:00
Ivan Carnimeo	d9fe63b601	PPCG on GPU, Gamma and K-points (some minor fixes required) Committer: Ivan Carnimeo <icarnime@r033c01s04.galileo.cineca.it> modified: KS_Solvers/Makefile modified: KS_Solvers/PPCG/Makefile new file: KS_Solvers/PPCG/generic_cublas.f90 modified: KS_Solvers/PPCG/make.depend new file: KS_Solvers/PPCG/ppcg_gamma_gpu.f90 new file: KS_Solvers/PPCG/ppcg_k_gpu.f90 modified: PW/src/c_bands.f90	2020-06-19 11:15:03 +02:00
Pietro Bonfa	36915a4a6d	A few more checks for memory allocation failures	2020-05-12 15:19:21 +02:00
Pietro Bonfa	05cc3dac4d	Merge branch 'develop' into gpu-develop (first step)	2020-02-23 20:47:07 +01:00
Paolo Giannozzi	d28b9cf06a	Leftover test messages	2020-02-21 15:05:44 +00:00
Stefano de Gironcoli	88c1164d06	changes needed to update KS_Solver -splitting rotate_wfc_* and adding rotate_Hpsi_* into a DENSE diagonalization dir -removing cg_param, david_param, ... in favour of util_param -implementation of ParO -update of PW, UtilXlib, FFTXlib and install needed for compatibility	2020-02-17 12:19:53 +01:00
Pietro Bonfa	ed83176255	Merge branch 'develop' into gpu-develop	2020-02-07 19:53:18 +01:00
Paolo Giannozzi	6dfebb7db6	Two indices for Davidson arrays The general Davidson routine cegterg used internally wavefunction-like arrays that have three indices: plane waves, polarization, bands. This has no real motivation (historical maybe?) and differs from the rest of QE where wavefunctions with two indices (plane waves+polarization, bands) are used. In my opinion, the "gap" between the two sets of plane waves/polarizations should also be removed (that is: the 2*npw plane waves/polarizations should be consecutive, not with a "gap" in the middle as it is now) but this is a much more serious change, affecting many different parts of the code.	2020-02-04 15:00:05 +00:00
Paolo Giannozzi	b89ca39069	Allocations moved inside desc_init	2020-02-03 22:06:52 +00:00
Paolo Giannozzi	e265446d5d	More desc_init harmonization: second version of desc_init moved to laxlib as well. Not sure what is the difference between the two versions, though.	2020-02-03 20:21:20 +00:00
Paolo Giannozzi	dee8f970d2	desc_init moved into LAXlib	2020-02-03 11:02:41 +00:00
Pietro Bonfa	4725c3f548	Merge branch 'develop' into gpu-develop	2020-02-02 15:02:57 +01:00
carcava	38f80cfaa1	- use the new descriptor initi subroutine	2020-02-02 00:56:40 +01:00
Paolo Giannozzi	537aecdcd3	LAXlib-related reorganization ============================= Harmonization of three copies of desc_init (two more are in KS_Solvers/PPCG, plus two slightly different ones in Davidson diagonalization), with some changes for clarity (in my opinion); harmonization of various copies of compute_distmat and of calbec_[dz]distmat. In my opinion all these routines, plus several simolar ones that are either present in multiple copies or that can be easily harmonized, used in parallel diagonalization, should be moved somewhere else, preferably LAXlib/. The problem now is that they are CONTAINed so they use and set variables from the calling subroutine and may use arrays passed as arrays (with :); moving them to a separate routine requires an interface, moving them into a module may lead to undesired dependencies. Ideally one should be able to set up and diagonalize a distributed matrix without filling the code of calls to low-level LAXlib routines and without too much voodoo.	2020-01-29 20:05:02 +00:00
Paolo Giannozzi	ffd53eb4da	Dependencies updated Two routines in KS_Solvers/PPCG aligned to latest LAXlib changes	2020-01-28 14:58:32 +00:00
carcava	45522b457e	Merge branch 'develop' into laxlib Conflicts: CPV/src/cglib.f90 CPV/src/cplib.f90 CPV/src/ldaU.f90 CPV/src/ldaUpen.f90 CPV/src/nl_base.f90 CPV/src/ortho.f90 CPV/src/wave.f90	2020-01-25 11:01:23 +01:00
Paolo Giannozzi	14ca48dd4d	Missing comma in format, some compilers don't like it	2020-01-22 22:16:54 +01:00
Pietro Bonfa	14833ba14d	Compile GPU code on the CPU	2019-12-12 22:45:26 +01:00
Pietro Bonfa	52bbfac655	More devicexlib	2019-12-02 15:15:49 +01:00
Pietro Bonfa	009e90a444	Updated DeviceXlib version	2019-11-27 08:05:06 +01:00
Pietro Bonfa	2c8b38d336	Mergin develop with gpu-develop	2019-08-21 19:14:52 +02:00
Carlo Cavazzoni	ef771b7d41	- forgotten to rename call to laxlib subs	2019-08-18 21:05:56 +02:00
Carlo Cavazzoni	004301add1	- re-factoring of LAXlib now QE do not "use" modules of LAXlib any longer, but it just include interface blocks. In principle they can now be compiled independently. All this beside possible errors. Further clean-ups are now possible, within LAXlib and in QE source codes	2019-08-13 01:16:24 +02:00
Carlo Cavazzoni	27adf6d690	- more disentanglement with LAXlib, quite some change inside LAXlib, still few outside. Next we have to deal with the removal of the use descriptors stuff	2019-08-10 18:49:26 +02:00
Carlo Cavazzoni	5fbc6ecc9c	- LAXlib made independent from other module	2019-08-07 14:27:02 +02:00
Pietro Bonfa	bbc62a53af	Replaced cuda_util with the new device_util library from MaX.	2019-07-29 13:09:37 +02:00
Pietro Bonfa	bd55264319	Merge branch 'develop' into gpu-develop	2019-07-01 10:45:08 +02:00
Paolo Giannozzi	f423ffc216	Fixes for NAG compiler glitches, courtesy Themos Tsikas	2019-06-12 20:55:06 +02:00
Pietro Bonfa	284c1cd23e	Merge branch 'develop' into gpu-develop	2019-04-01 11:23:33 +02:00
Paolo Giannozzi	6834a502ef	[Skip-CI] Obsolete version 'svn' replaced by 'git'; various .PHONY of questionable usefulness, referring to no longer existing procedure devised for svn, removed	2019-03-01 17:42:56 +01:00
Pietro Bonfa	8cef325bd1	Added wrapper for cuda_memcpy and replaced assignement operator in cegterg_gpu	2019-02-27 16:43:21 +01:00
Pietro Bonfa	cdcf2699a7	Merge branch 'develop' into gpu-develop	2019-02-05 15:41:15 +01:00
Paolo Giannozzi	cc985e701b	Problem with parallel make (once again)	2019-02-05 09:15:04 +01:00
Paolo Giannozzi	f725126d3a	More minor cleanup: use module "parallel include" in KS_Solvers	2019-02-04 10:07:52 +01:00
Paolo Giannozzi	9a75ac9c8b	Maybe irrelevant but this is the way it should be	2019-02-04 09:42:53 +01:00
Paolo Giannozzi	75f98e3c59	Last-minute addition of a comment in the Makefile had unexpected side effects. Now it should work. List of objects is now explicit	2019-02-04 09:20:09 +01:00
Paolo Giannozzi	8e0ac0a7bf	Small change to the Makefile of KS_Solvers should prevent annoying re-linking of executables due to a dependency of many executables upon KS_Solvers/libks_solvers.a that in turn was re-build every time	2019-02-03 22:00:40 +01:00
Pietro Bonfa	5ef40d68ea	Merge branch 'develop' of gitlab.com:QEF/q-e into gpu-develop	2019-01-17 17:44:42 +01:00
Lorenzo Paulatto (naquite)	4f0da5d0b4	More syntax that xlf90 does not like	2019-01-16 16:20:28 +01:00
Pietro Bonfa	d25955957c	Improved buffers	2019-01-03 17:57:14 +01:00
Pietro Bonfa	a590fef748	Fixed CPU build. Same function call for GPU version of ddot (should be moved elsewhere soon).	2018-11-13 14:49:27 +01:00
Pietro Bonfa	a0470a9e67	Aligned cegterg_gpu to CPU version.	2018-11-13 13:32:25 +01:00
Pietro Bonfa	31972e5d95	Reverted avoided communication in cegter (now in UtilXLib), minor changes to FFT.	2018-11-07 16:16:02 +01:00
Pietro Bonfa	e383f51542	Added check on flag PGI_POWER_WORKAROUND in Davidson makefile.	2018-10-25 17:38:34 +02:00
Pietro Bonfa	7e2b2c462a	Restoring CPU compilation	2018-10-17 12:23:57 +02:00
Pietro Bonfa	eebf0236df	Initial (naive) implementation of CG diagonalization algorithm.	2018-10-17 11:18:08 +02:00
Pietro Bonfa	fa262106f0	Initialization (partially) ported to GPUs	2018-10-09 16:54:13 +02:00
Pietro Bonfa	36e9f3b1d4	Fix compilation dependencies	2018-10-09 12:02:07 +02:00
Pietro Bonfa	74b597ae70	GScratch is now a real library	2018-09-11 12:26:43 +02:00
Pietro Bonfa	5ba063967f	Switched to single global buffer for all QE project. Should be made optional in LAXlib and KS_solvers (easy task)	2018-08-17 17:48:19 +02:00
Pietro Bonfa	e32a34f3d3	Merge branch 'develop' of gitlab.com:QEF/q-e into gpu-develop	2018-08-17 14:13:32 +02:00
Pietro Bonfa	85e37de069	Restored original cegterg code since aligned version performs much worse (probably a workload balance problem, but more careful analysis is needed)	2018-08-17 11:00:12 +02:00
Pietro Bonfa	354a86b841	Aligned cegterg_gpu and cegterg. Should lead to some performance improvements. Explicit bounds in accelerated parallel solver subroutine.	2018-08-16 13:52:19 +02:00
Stefano De Gironcoli	18bfc19c86	a single libks_solvers.a library is created. Makefiles of the children codes are updated to use it.	2018-08-14 01:41:44 +02:00
Stefano De Gironcoli	3ac492bb6e	wrong indexing of threaded_backassignement corrected in ppcg_gamma	2018-08-08 06:53:49 +02:00
Stefano de Gironcoli	64cca07a92	more threaded_backassignement (including optionally summing another vector)	2018-08-07 14:15:39 +02:00
Stefano de Gironcoli	819ab53cc5	more thrreaded (back) assignments	2018-08-07 12:09:05 +02:00
Stefano de Gironcoli	e85384bd98	more omp assignements	2018-08-06 07:17:35 +02:00
Stefano de Gironcoli	53b0e84e6c	more chuncked omp parallel do loops	2018-08-06 03:34:51 +02:00
Stefano de Gironcoli	a241241d27	updated dependencies	2018-08-05 16:52:11 +02:00
Stefano de Gironcoli	d936f16226	export_gstart_2_* and set_mpi_comm_4_* moved to LAXLIB their call corrected in init_run and mp_global a recently added bug in ppcg_k when npol=1 corrected	2018-08-05 16:52:11 +02:00
Paolo Giannozzi	db9228d819	make.depend updated	2018-08-05 11:08:40 +02:00
Paolo Giannozzi	cd22b7fc54	Some compilers flag the presence of a comma as in WRITE( ), list-of-variables as obsolete syntax	2018-08-05 11:05:47 +02:00
Stefano de Gironcoli	ac8b63bd4c	update of previous merge PPCG	2018-08-05 08:25:56 +00:00
Stefano de Gironcoli	90dafe5d29	timing of PPCG routines updated.	2018-08-03 10:20:51 +02:00
Stefano De Gironcoli	b8f879e0d7	timing using start_clock/stop_clock	2018-08-03 09:27:57 +02:00
Stefano de Gironcoli	2c6d20ed77	updated versions of ppcg_gamma/k solvers. the generic-k version works also in the case npol=2 (at least on my laptop with mpirun -np 4 ...)	2018-08-03 04:15:56 +02:00
Stefano de Gironcoli	57ec56ed6b	further changes to make npol=2 case work	2018-08-03 04:15:56 +02:00
Stefano de Gironcoli	b013e79275	first attempt to generalize to non-collinear case. tests CRASH.	2018-08-03 04:15:56 +02:00
Stefano de Gironcoli	203126fd44	avg number of iteration in ppcg computed properly	2018-08-03 04:15:56 +02:00
Stefano de Gironcoli	e177dce7da	fixed (hopefully) the dependence for the stand-alone cp compilation	2018-08-03 04:15:56 +02:00
Stefano de Gironcoli	5ad3ee115a	let's change something so that the server recompiles	2018-08-03 04:15:56 +02:00
Stefano de Gironcoli	d55e74a4e4	more minor changes to deal with ppcg option. PW/examples/example01 script modified to include ppcg; corresponding references added	2018-08-03 04:15:56 +02:00
Stefano De Gironcoli	854fe693e0	PPCG: renaming of a few files originating form the CG case and makefile update	2018-08-03 04:15:56 +02:00
Stefano de Gironcoli	82fc9fa868	adding PPCG to KS_Solvers directory. makedeps script updated	2018-08-03 04:15:56 +02:00
Pietro Bonfa	a5230da8f7	Merged utilXlib and upstream develop	2018-06-18 13:57:51 +02:00
Pietro Bonfa	5999a3c939	Fixed bug in regterg_gpu	2018-06-15 15:28:48 +02:00
Ye Luo	aa13725349	Need to clean up the garbage npw to npwx.	2018-06-14 19:58:00 -05:00
Ye Luo	6ac7f8c32a	Merge branch 'bugfix-ndiag' into opt-threading-all-parts	2018-06-14 19:05:31 -05:00
Ye Luo	94a9c8ca6b	Bugfix Need to protect the array range properly.	2018-06-14 18:21:54 -05:00
Ye Luo	f91ec7499e	Chuncked innermost loop in collapse.	2018-06-03 09:24:05 -05:00
Ye Luo	8812c4085f	Reverted to the old algorithm in hpsi_dot_v.	2018-06-02 16:24:36 -05:00
Ye Luo	f0b9584bf8	Minor change	2018-06-02 13:19:17 -05:00
Ye Luo	9a94d4d047	Setting the chunk size as a constant	2018-06-02 12:30:11 -05:00
Ye Luo	fa21b8d52a	Add functions to do threaded memcpy and memset threaded_memXXX is contains a parallel do region threaded_barrier_memXXX contains do region without parallel threaded_nowait_memXXX contains do region without parallel and a nowait at the end do	2018-06-02 12:22:42 -05:00
Ye Luo	9c16309006	Chuncked computing in cegterg.	2018-06-02 10:23:01 -05:00
Ye Luo	c54ca024c6	Threade more in cegterg.	2018-06-01 00:31:26 -05:00
Pietro Bonfa	fbc368ad32	beta\|psi now performed on GPU. No need to have target attribute in modules	2018-05-29 17:30:04 +02:00
Ye Luo	14fef459bb	Clean up threaded fill.	2018-05-28 19:36:01 -05:00
Ye Luo	14508b0810	Optimize hpsi_dot_v	2018-05-28 19:13:39 -05:00
Ye Luo	85f6e070d9	Add threaded copy.	2018-05-28 15:29:25 -05:00
Ye Luo	8b628c3f0a	Clean up garbage when npw < npwx.	2018-05-28 15:14:05 -05:00
Ye Luo	af2fac5ef9	Replace allgather with gather.	2018-05-28 08:34:42 -05:00
Ye Luo	2c6c859896	Remove all unnecessary mem ops in cegterg.	2018-05-27 21:54:46 -05:00
Ye Luo	0f340dd372	A bit comments.	2018-05-24 19:29:51 -05:00
Pietro Bonfa	0c90e6b212	Updated buffer class, removed contiguos pointers	2018-04-23 14:48:29 +02:00
Ye Luo	8d563908a8	Merge remote-tracking branch 'gitlab/develop' into opt-threading-all-parts	2018-04-18 18:32:01 -05:00
Pietro Bonfa	5f6d231bdd	Added GPU version for h_psi, s_psi, g_psi and vloc. psic and psic_nc module variables should be standardized. Many data copies between CPU and GPU should be replaced by data in modules. All test passing except for pw_vc-relax/vc-relax3.in which has too loose convergence (noticed by Josh Romero) and for dsygvdx_gpu (real problem) occasionally failing	2018-04-18 13:28:38 +02:00
Pietro Bonfa	45a59c3ae7	Added serial and parallel Davidson solvers for data on the GPU. Only serial version is actually accelerated.	2018-04-10 14:14:59 +02:00
Ye Luo	fceb56cf0c	Avoid filling ptmp zero in hpsi_dot_v of regterg.	2018-03-19 11:01:16 -05:00
Ye Luo	8925a803aa	Better threading cegterg.	2018-03-18 18:45:45 -05:00
Ye Luo	e421345814	Avoid filling ptmp zero in hpsi_dot_v of cegterg.	2018-03-18 16:09:45 -05:00
Paolo Giannozzi	a06d380cf4	Replicated routine "set_bgrp_index" replaced by "divide"	2017-12-23 22:00:32 +01:00
degironc	ae1805bb72	redundant duplicate module constants.f90 removed from KS_Solvers/CG Mathematical constant PI defined as a local parameter when needed git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@14010 c92efa57-630b-4861-b058-cf58834340f0	2017-11-25 21:07:59 +00:00
degironc	aba852b428	order of input arguments in KS_Solver routines changed bringing overlap logical flag close to the s_psi function it affects git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13800 c92efa57-630b-4861-b058-cf58834340f0	2017-08-29 08:09:06 +00:00
degironc	0d2d3d5721	minor estetic change git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13730 c92efa57-630b-4861-b058-cf58834340f0	2017-08-19 13:30:16 +00:00
degironc	a8340b4d40	Duplicate routines cdiaghg and rdiaghg moved from KS_Solvers/XX to LAXlib. Duplicate module mp_bands.f90 moved from KS_Solvers/XX to UtilXlib/mp_bands_util.f90 Makefiles and makedepend.sh updated that should take care of the duplicate symbols git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13712 c92efa57-630b-4861-b058-cf58834340f0	2017-08-08 21:44:44 +00:00
degironc	3e6b4f8e76	MAJOR restructuring of the FFTXlib library In real space processors are organized in a 2D pattern. Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes. In reciprocal space each processor owns Z-columns that belong to a sub set of X-values. This allows to split the processors in two sets for communication in the YZ and XY planes. In alternative, if the situation allows for it, a task group paralelization is used (with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc paralleization if task group can be used because a smaller number of larger ammounts of data are transferred. Hence three types of fft are implemented: ! !! ... isgn = +-1 : parallel 3d fft for rho and for the potential ! !! ... isgn = +-2 : parallel 3d fft for wavefunctions ! !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group ! !! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iGR) !! ... fft along z using pencils (cft_1z) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along x using pencils (cft_1x) ! !! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iGR)/Omega !! ... fft along x using pencils (cft_1x) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along z using pencils (cft_1z) ! ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to ! dfft%nproc2, the number of Y-sections in which a plane is divided. ! Data are reshuffled by the fft_scatter_tg routine so that each of the ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes ! of a single wavefunciton. ! fft_type module heavily modified, a number of variables renamed with more intuitive names (at least to me), a number of more variables introduced for the Y-proc parallelization. Task_group module made void. task_group management is now reduced to the logical component fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc. In term of interfaces, the 'easy' calling sequences are SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany ) !! where: !! !! grid_type = 'Dense' : !! inverse/direct fourier transform of potentials and charge density f !! on the dense grid (dfftp). On output, f is overwritten !! !! grid_type = 'Smooth' : !! inverse/direct fourier transform of potentials and charge density f !! on the smooth grid (dffts). On output, f is overwritten !! !! grid_type = 'Wave' : !! inverse/direct fourier transform of wave functions f !! on the smooth grid (dffts). On output, f is overwritten !! !! grid_type = 'tgWave' : !! inverse/direct fourier transform of wave functions f with task group !! on the smooth grid (dffts). On output, f is overwritten !! !! grid_type = 'Custom' : !! inverse/direct fourier transform of potentials and charge density f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! grid_type = 'CustomWave' : !! inverse/direct fourier transform of wave functions f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! dfft = FFT descriptor, IMPORTANT NOTICE: grid is specified only by dfft. !! No check is performed on the correspondence between dfft and grid_type. !! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' !! from all other cases Many more files modified. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0	2017-08-01 20:31:02 +00:00

1 2 3 4 5 ...

253 Commits