quantum-espresso

Commit Graph

Author	SHA1	Message	Date
Stefano de Gironcoli	7a76e5b076	more threaded_backassignement (including optionally summing another vector)	2018-08-07 20:45:56 +02:00
Stefano de Gironcoli	a091a2604a	more thrreaded (back) assignments	2018-08-07 20:45:56 +02:00
Stefano de Gironcoli	bb6889e8b2	more omp assignements	2018-08-07 20:45:56 +02:00
Stefano de Gironcoli	56197ff9ca	more chuncked omp parallel do loops	2018-08-07 20:45:56 +02:00
Stefano de Gironcoli	a241241d27	updated dependencies	2018-08-05 16:52:11 +02:00
Stefano de Gironcoli	d936f16226	export_gstart_2_* and set_mpi_comm_4_* moved to LAXLIB their call corrected in init_run and mp_global a recently added bug in ppcg_k when npol=1 corrected	2018-08-05 16:52:11 +02:00
Paolo Giannozzi	db9228d819	make.depend updated	2018-08-05 11:08:40 +02:00
Paolo Giannozzi	cd22b7fc54	Some compilers flag the presence of a comma as in WRITE( ), list-of-variables as obsolete syntax	2018-08-05 11:05:47 +02:00
Stefano de Gironcoli	ac8b63bd4c	update of previous merge PPCG	2018-08-05 08:25:56 +00:00
Stefano de Gironcoli	90dafe5d29	timing of PPCG routines updated.	2018-08-03 10:20:51 +02:00
Stefano De Gironcoli	b8f879e0d7	timing using start_clock/stop_clock	2018-08-03 09:27:57 +02:00
Stefano de Gironcoli	2c6d20ed77	updated versions of ppcg_gamma/k solvers. the generic-k version works also in the case npol=2 (at least on my laptop with mpirun -np 4 ...)	2018-08-03 04:15:56 +02:00
Stefano de Gironcoli	57ec56ed6b	further changes to make npol=2 case work	2018-08-03 04:15:56 +02:00
Stefano de Gironcoli	b013e79275	first attempt to generalize to non-collinear case. tests CRASH.	2018-08-03 04:15:56 +02:00
Stefano de Gironcoli	203126fd44	avg number of iteration in ppcg computed properly	2018-08-03 04:15:56 +02:00
Stefano de Gironcoli	e177dce7da	fixed (hopefully) the dependence for the stand-alone cp compilation	2018-08-03 04:15:56 +02:00
Stefano de Gironcoli	5ad3ee115a	let's change something so that the server recompiles	2018-08-03 04:15:56 +02:00
Stefano de Gironcoli	d55e74a4e4	more minor changes to deal with ppcg option. PW/examples/example01 script modified to include ppcg; corresponding references added	2018-08-03 04:15:56 +02:00
Stefano De Gironcoli	854fe693e0	PPCG: renaming of a few files originating form the CG case and makefile update	2018-08-03 04:15:56 +02:00
Stefano de Gironcoli	82fc9fa868	adding PPCG to KS_Solvers directory. makedeps script updated	2018-08-03 04:15:56 +02:00
Ye Luo	aa13725349	Need to clean up the garbage npw to npwx.	2018-06-14 19:58:00 -05:00
Ye Luo	6ac7f8c32a	Merge branch 'bugfix-ndiag' into opt-threading-all-parts	2018-06-14 19:05:31 -05:00
Ye Luo	94a9c8ca6b	Bugfix Need to protect the array range properly.	2018-06-14 18:21:54 -05:00
Ye Luo	f91ec7499e	Chuncked innermost loop in collapse.	2018-06-03 09:24:05 -05:00
Ye Luo	8812c4085f	Reverted to the old algorithm in hpsi_dot_v.	2018-06-02 16:24:36 -05:00
Ye Luo	f0b9584bf8	Minor change	2018-06-02 13:19:17 -05:00
Ye Luo	9a94d4d047	Setting the chunk size as a constant	2018-06-02 12:30:11 -05:00
Ye Luo	fa21b8d52a	Add functions to do threaded memcpy and memset threaded_memXXX is contains a parallel do region threaded_barrier_memXXX contains do region without parallel threaded_nowait_memXXX contains do region without parallel and a nowait at the end do	2018-06-02 12:22:42 -05:00
Ye Luo	9c16309006	Chuncked computing in cegterg.	2018-06-02 10:23:01 -05:00
Ye Luo	c54ca024c6	Threade more in cegterg.	2018-06-01 00:31:26 -05:00
Ye Luo	14fef459bb	Clean up threaded fill.	2018-05-28 19:36:01 -05:00
Ye Luo	14508b0810	Optimize hpsi_dot_v	2018-05-28 19:13:39 -05:00
Ye Luo	85f6e070d9	Add threaded copy.	2018-05-28 15:29:25 -05:00
Ye Luo	8b628c3f0a	Clean up garbage when npw < npwx.	2018-05-28 15:14:05 -05:00
Ye Luo	af2fac5ef9	Replace allgather with gather.	2018-05-28 08:34:42 -05:00
Ye Luo	2c6c859896	Remove all unnecessary mem ops in cegterg.	2018-05-27 21:54:46 -05:00
Ye Luo	0f340dd372	A bit comments.	2018-05-24 19:29:51 -05:00
Ye Luo	8d563908a8	Merge remote-tracking branch 'gitlab/develop' into opt-threading-all-parts	2018-04-18 18:32:01 -05:00
Ye Luo	fceb56cf0c	Avoid filling ptmp zero in hpsi_dot_v of regterg.	2018-03-19 11:01:16 -05:00
Ye Luo	8925a803aa	Better threading cegterg.	2018-03-18 18:45:45 -05:00
Ye Luo	e421345814	Avoid filling ptmp zero in hpsi_dot_v of cegterg.	2018-03-18 16:09:45 -05:00
Paolo Giannozzi	a06d380cf4	Replicated routine "set_bgrp_index" replaced by "divide"	2017-12-23 22:00:32 +01:00
degironc	ae1805bb72	redundant duplicate module constants.f90 removed from KS_Solvers/CG Mathematical constant PI defined as a local parameter when needed git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@14010 c92efa57-630b-4861-b058-cf58834340f0	2017-11-25 21:07:59 +00:00
degironc	aba852b428	order of input arguments in KS_Solver routines changed bringing overlap logical flag close to the s_psi function it affects git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13800 c92efa57-630b-4861-b058-cf58834340f0	2017-08-29 08:09:06 +00:00
degironc	0d2d3d5721	minor estetic change git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13730 c92efa57-630b-4861-b058-cf58834340f0	2017-08-19 13:30:16 +00:00
degironc	a8340b4d40	Duplicate routines cdiaghg and rdiaghg moved from KS_Solvers/XX to LAXlib. Duplicate module mp_bands.f90 moved from KS_Solvers/XX to UtilXlib/mp_bands_util.f90 Makefiles and makedepend.sh updated that should take care of the duplicate symbols git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13712 c92efa57-630b-4861-b058-cf58834340f0	2017-08-08 21:44:44 +00:00
degironc	3e6b4f8e76	MAJOR restructuring of the FFTXlib library In real space processors are organized in a 2D pattern. Each processor owns data from a sub-set of Z-planes and a sub-set of Y-planes. In reciprocal space each processor owns Z-columns that belong to a sub set of X-values. This allows to split the processors in two sets for communication in the YZ and XY planes. In alternative, if the situation allows for it, a task group paralelization is used (with ntg=nyfft) where complete XY planes of ntg wavefunctions are collected and Fourier trasnformed in G space by different task-groups. This is preferable to the Z-proc + Y-proc paralleization if task group can be used because a smaller number of larger ammounts of data are transferred. Hence three types of fft are implemented: ! !! ... isgn = +-1 : parallel 3d fft for rho and for the potential ! !! ... isgn = +-2 : parallel 3d fft for wavefunctions ! !! ... isgn = +-3 : parallel 3d fft for wavefunctions with task group ! !! ... isgn = + : G-space to R-space, output = \sum_G f(G)exp(+iGR) !! ... fft along z using pencils (cft_1z) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along x using pencils (cft_1x) ! !! ... isgn = - : R-space to G-space, output = \int_R f(R)exp(-iGR)/Omega !! ... fft along x using pencils (cft_1x) !! ... transpose across nodes (fft_scatter_xy) !! ... fft along y using pencils (cft_1y) !! ... transpose across nodes (fft_scatter_yz) !! ... fft along z using pencils (cft_1z) ! ! If task_group_fft_is_active the FFT acts on a number of wfcs equal to ! dfft%nproc2, the number of Y-sections in which a plane is divided. ! Data are reshuffled by the fft_scatter_tg routine so that each of the ! dfft%nproc2 subgroups (made by dfft%nproc3 procs) deals with whole planes ! of a single wavefunciton. ! fft_type module heavily modified, a number of variables renamed with more intuitive names (at least to me), a number of more variables introduced for the Y-proc parallelization. Task_group module made void. task_group management is now reduced to the logical component fft_desc%have_task_groups of fft_type_descriptor type variable fft_desc. In term of interfaces, the 'easy' calling sequences are SUBROUTINE invfft/fwfft( grid_type, f, dfft, howmany ) !! where: !! !! grid_type = 'Dense' : !! inverse/direct fourier transform of potentials and charge density f !! on the dense grid (dfftp). On output, f is overwritten !! !! grid_type = 'Smooth' : !! inverse/direct fourier transform of potentials and charge density f !! on the smooth grid (dffts). On output, f is overwritten !! !! grid_type = 'Wave' : !! inverse/direct fourier transform of wave functions f !! on the smooth grid (dffts). On output, f is overwritten !! !! grid_type = 'tgWave' : !! inverse/direct fourier transform of wave functions f with task group !! on the smooth grid (dffts). On output, f is overwritten !! !! grid_type = 'Custom' : !! inverse/direct fourier transform of potentials and charge density f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! grid_type = 'CustomWave' : !! inverse/direct fourier transform of wave functions f !! on a custom grid (dfft_exx). On output, f is overwritten !! !! dfft = FFT descriptor, IMPORTANT NOTICE: grid is specified only by dfft. !! No check is performed on the correspondence between dfft and grid_type. !! grid_type is now used only to distinguish cases 'Wave' / 'CustomWave' !! from all other cases Many more files modified. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13676 c92efa57-630b-4861-b058-cf58834340f0	2017-08-01 20:31:02 +00:00
giannozz	15215e2262	Compiled modules shouldn't be under revision control! git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13663 c92efa57-630b-4861-b058-cf58834340f0	2017-07-31 16:43:50 +00:00
degironc	1b33777cbd	remove some timing printing. intra_pool_comm (the parent_comm of intra_bgrp_comm) should be the first argument of set_mpi_comm_4_XX routines. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13645 c92efa57-630b-4861-b058-cf58834340f0	2017-07-29 19:48:15 +00:00
degironc	4636bca635	KS_Solvers directory has been created with three subdirectories: KS_Solvers/CG, KS_Solvers/Davidson, KS_Solvers/Davidson_RCI. Two are currently used by QE, the third one implements the Davidson diagonalization within the Reverse Communication Interface paradigm, courtesy of Micael Oliveira. KS_Solvers routines depend only on lower level libraries, notably UtilXlib, LAXlib, (SCA)LAPACK, and BLAS. reorganization can be improved. For instance some duplicated routines like cdiaghg and rdiaghg could/should be moved in LAXlib. This could reduce the need to include KS_Solvers directories in the link step of many codes. Minimal changes to calling sequence have been made, essentially just adding h_psi,s_psi,g_psi and h_1psi,s_1psi routines names as arguments (with a specific calling sequence ihardcode inside the routines that agree with PWSCF one). This could be avoided adopting the RCI paradigm. Compiled in serial and parallel, 177/182 pw tests passed (3 that were failing even before on my laptop pw-berry, pw-langevin, pw-pawatom + 2 unknown==not tested), 12 /17 cp tests passed (some o2-us-para-pbe-X fail but the same was for the original version) I assume the modified calling procedure is working and the problem lies somewhere else. Randomly tested some examples in pw, ph, pwcond and it seams to work. Please report any problem. git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@13644 c92efa57-630b-4861-b058-cf58834340f0	2017-07-29 12:19:19 +00:00

50 Commits