Commit Graph

6620 Commits

Author SHA1 Message Date
giannozz 2249e5e536 Merge branch 'reduce_wait_openacc' into 'develop'
Reduce some overhead in openacc and compiler fix

See merge request QEF/q-e!2422
2024-09-03 05:55:34 +00:00
afonari 80060dba75 Merge branch q-e:develop into afonari-develop-patch-31461 2024-08-26 13:34:11 +00:00
afonari 65185fb5c7 Update pw_init_qexsd_input.f90. Always dump free positions, regardless of the calculation. 2024-08-26 13:31:23 +00:00
Laura Bellentani 31de484af1 opt[overhead,acc]: present and deviceptr moved outside the loop
This way the compiler checks only once if data are on the GPU
Reduces wait status in openacc regions
2024-08-22 13:47:05 +02:00
Paolo Giannozzi d10d24c3a1 Useless BLAS interfaces removed 2024-08-21 09:24:51 +02:00
Ivan Carnimeo f9411237de Fix compilation with _OSCDFT after changes in h_psi_gpu/s_psi_acc 2024-08-14 11:19:41 +02:00
Ivan Carnimeo d862590701 Fix ph and tddfpt with the new h_psi_gpu/s_psi_acc 2024-08-14 11:19:41 +02:00
Ivan Carnimeo cba53beeeb Fix Hubbard after changing h_psi_gpu/s_psi_acc 2024-08-14 11:19:40 +02:00
Ivan Carnimeo ba9ca941f1 Fix ParO (both k and gamma cases) 2024-08-14 11:17:57 +02:00
Ivan Carnimeo bd710c00ad e_d to OpenACC in CG 2024-08-14 11:17:56 +02:00
Ivan Carnimeo f1a5031d53 Fix CG at Gamma (CUF-->OpenACC)
(and cleanup some previous debug prints)
2024-08-14 11:17:56 +02:00
Ivan Carnimeo aa2b133609 Fix CG with k points (CUF-->OpenACC) 2024-08-14 11:17:56 +02:00
Ivan Carnimeo 2af290518c Remove calbec_cuf from h_psi_gpu:
- h_psi_gpu must receive psi as acc variable to remove calbec_cuf
	- for consistency, also hpsi_d --> hpsi (acc) in h_psi_gpu
	- no need to receive psi and spsi as deviceptr in s_psi_acc because evc is fully OpenACC
	- psi_d and spsi_d --> psi, hpsi (acc) also in s_psi_acc
	- fix cegterg and rotate_wfc_gpu (gamma/k) to work with the new h_psi_gpu and s_psi_acc:
	  pass psi, spsi, hpsi as acc variables instead of cuf
	- only cegterg works (all others will be fixed in the next commits)
2024-08-14 11:17:56 +02:00
giannozz a59e791975 Merge branch 'more_cuda_cleanup' into 'develop'
More cuda cleanup

See merge request QEF/q-e!2410
2024-08-13 07:01:00 +00:00
Ye Luo cdf89662a7 Fix some lda+U hang related to device_resident 2024-08-12 18:10:32 -05:00
Paolo Giannozzi 238776494a Variabled declared "device_resident" was not a local one: unsafe? 2024-08-12 14:29:48 +02:00
Paolo Giannozzi ed3fcc75ed Heavily simplified force_us 2024-08-12 12:06:16 +02:00
Paolo Giannozzi 6254abcccb Two more CUDA Fortran variables replaced by OpenACC ones 2024-08-12 12:00:53 +02:00
giannozz 7093afc527 Merge branch 'fix-omp-hang' into 'develop'
Fix openacc+omp hang

See merge request QEF/q-e!2409
2024-08-12 06:45:05 +00:00
giannozz 462fc6dde5 Merge branch 'stres_us_again' into 'develop'
Nonlocal stress reshuffling

See merge request QEF/q-e!2404
2024-08-12 06:44:48 +00:00
Ye Luo 331039c1b8 Fix nonlinear hang related to device_resident 2024-08-11 22:14:10 -05:00
Ye Luo 22e1ffcd0c Fix metaGGA hang related to device_resident 2024-08-11 18:44:59 -05:00
Ye Luo a837f6e7bf device_resident seams causing hang in OMP threads>1. 2024-08-11 18:24:22 -05:00
Paolo Giannozzi a35d80bfab Variable "ofsbeta" must also be copied to GPU 2024-08-11 11:33:15 +02:00
Paolo Giannozzi 3f5ee4617e Noncolinear case also works 2024-08-10 09:09:50 +02:00
Paolo Giannozzi f5147abcbf Seems to work for spinorbit (not plain noncolin yet) 2024-08-09 22:44:14 +02:00
Paolo Giannozzi c448d65b78 Improved US stress - also k-points, non collinear NOT YET IMPLEMENTED 2024-08-09 19:03:28 +02:00
Ivan Carnimeo 51e763aa1a psi, hpsi, spsi to OpenACC in RMM-DIIS (k-points) 2024-08-09 15:07:42 +02:00
Ivan Carnimeo d0cf28362f psi, hpsi, spsi to OpenACC in RMM-DIIS at gamma 2024-08-08 19:52:51 +02:00
Ivan Carnimeo 43a191b735 Simplify rotate_xpsi and c_bands with OpenACC 2024-08-08 19:16:48 +02:00
Ivan Carnimeo fc18953b26 Simplify RMM-DIIS with OpenACC 2024-08-08 18:49:35 +02:00
Pietro Davide Delugas f029ce3686 openACC: add offloading in h_psi_meta.f90
* accepts device pointers as arguments intead of host arrays

* internal arrays mapped to Device and loops offloaded
2024-08-07 16:56:09 +02:00
Paolo Giannozzi bfcd55645a Gamma-only case works with OpenACC as well 2024-07-28 14:59:44 +02:00
Paolo Giannozzi 3979028fff Now working in parallel execuation as well 2024-07-28 13:23:03 +02:00
Paolo Giannozzi a509854730 stres_us cleanup - First step: Gamma case, CPU only 2024-07-27 19:19:49 +02:00
Paolo Giannozzi 55bcbd59df Dumb mistake in last commit 2024-07-24 11:58:50 +02:00
Paolo Giannozzi 03462f453b Case many_fft=1 (not used on GPUs) was not correct 2024-07-22 15:30:05 +02:00
Paolo Giannozzi 8fef512db5 Small changes:
- in sum_band, rename get_rho_gpu => get_rho_k
- in vloc_psi_gpu, use "acc declare device_resident"
2024-07-22 11:41:25 +02:00
Paolo Giannozzi 1d48886730 Minor cleanup 2024-07-20 10:54:21 +02:00
Paolo Giannozzi cef4a0ba5f More cleanup, useless allocations removed 2024-07-20 10:29:44 +02:00
Paolo Giannozzi 8665c627c5 Large array allocation removed 2024-07-20 09:29:46 +02:00
Paolo Giannozzi 3dc27ea352 Cleanup: no more device variables 2024-07-18 21:46:04 +02:00
Paolo Giannozzi a0e4e32dee Removed a few vrs_d and scf_mod_gpu leftover 2024-07-17 10:41:11 +02:00
Paolo Giannozzi 457c7d55ef More Vloc*psi cleanup
Variable "vrs" used in the Hamiltonian is now an ACC variable, replaces vrs_d.
vrs is copied to device in set_vrs. Obsolete using_vrs* machinery deleted.
2024-07-17 10:04:16 +02:00
Paolo Giannozzi 7bdae85022 Some vloc_psi cleanup 2024-07-16 08:28:13 +02:00
Paolo Giannozzi 5d61187d27 Meta-GGA case in sum_band brought to GPU, plus some cleanup.
Still too many diffferences between noncolinear, metaGGA, gamma and k cases
2024-07-12 11:28:17 +02:00
Paolo Giannozzi 24c7d7d7c7 Calculation of becsum moved to compute_becsum for increased (IMHO) readability 2024-07-10 21:15:56 +02:00
Paolo Giannozzi 73c5cc1606 becsum cleanup, better comments 2024-07-10 17:47:38 +02:00
Paolo Giannozzi 1d066441d8 In charge density calculation, separate task-group code (to be deleted sooner
or later) from the rest. Makes the code much more readable. Misc cleanup.
2024-07-10 14:30:01 +02:00
Paolo Giannozzi bb1e876b9b In some cases dynmat.x must be run with mpirun or similar 2024-07-07 19:56:31 +02:00