giannozz
2249e5e536
Merge branch 'reduce_wait_openacc' into 'develop'
...
Reduce some overhead in openacc and compiler fix
See merge request QEF/q-e!2422
2024-09-03 05:55:34 +00:00
afonari
80060dba75
Merge branch q-e:develop into afonari-develop-patch-31461
2024-08-26 13:34:11 +00:00
afonari
65185fb5c7
Update pw_init_qexsd_input.f90. Always dump free positions, regardless of the calculation.
2024-08-26 13:31:23 +00:00
Laura Bellentani
31de484af1
opt[overhead,acc]: present and deviceptr moved outside the loop
...
This way the compiler checks only once if data are on the GPU
Reduces wait status in openacc regions
2024-08-22 13:47:05 +02:00
Paolo Giannozzi
d10d24c3a1
Useless BLAS interfaces removed
2024-08-21 09:24:51 +02:00
Ivan Carnimeo
f9411237de
Fix compilation with _OSCDFT after changes in h_psi_gpu/s_psi_acc
2024-08-14 11:19:41 +02:00
Ivan Carnimeo
d862590701
Fix ph and tddfpt with the new h_psi_gpu/s_psi_acc
2024-08-14 11:19:41 +02:00
Ivan Carnimeo
cba53beeeb
Fix Hubbard after changing h_psi_gpu/s_psi_acc
2024-08-14 11:19:40 +02:00
Ivan Carnimeo
ba9ca941f1
Fix ParO (both k and gamma cases)
2024-08-14 11:17:57 +02:00
Ivan Carnimeo
bd710c00ad
e_d to OpenACC in CG
2024-08-14 11:17:56 +02:00
Ivan Carnimeo
f1a5031d53
Fix CG at Gamma (CUF-->OpenACC)
...
(and cleanup some previous debug prints)
2024-08-14 11:17:56 +02:00
Ivan Carnimeo
aa2b133609
Fix CG with k points (CUF-->OpenACC)
2024-08-14 11:17:56 +02:00
Ivan Carnimeo
2af290518c
Remove calbec_cuf from h_psi_gpu:
...
- h_psi_gpu must receive psi as acc variable to remove calbec_cuf
- for consistency, also hpsi_d --> hpsi (acc) in h_psi_gpu
- no need to receive psi and spsi as deviceptr in s_psi_acc because evc is fully OpenACC
- psi_d and spsi_d --> psi, hpsi (acc) also in s_psi_acc
- fix cegterg and rotate_wfc_gpu (gamma/k) to work with the new h_psi_gpu and s_psi_acc:
pass psi, spsi, hpsi as acc variables instead of cuf
- only cegterg works (all others will be fixed in the next commits)
2024-08-14 11:17:56 +02:00
giannozz
a59e791975
Merge branch 'more_cuda_cleanup' into 'develop'
...
More cuda cleanup
See merge request QEF/q-e!2410
2024-08-13 07:01:00 +00:00
Ye Luo
cdf89662a7
Fix some lda+U hang related to device_resident
2024-08-12 18:10:32 -05:00
Paolo Giannozzi
238776494a
Variabled declared "device_resident" was not a local one: unsafe?
2024-08-12 14:29:48 +02:00
Paolo Giannozzi
ed3fcc75ed
Heavily simplified force_us
2024-08-12 12:06:16 +02:00
Paolo Giannozzi
6254abcccb
Two more CUDA Fortran variables replaced by OpenACC ones
2024-08-12 12:00:53 +02:00
giannozz
7093afc527
Merge branch 'fix-omp-hang' into 'develop'
...
Fix openacc+omp hang
See merge request QEF/q-e!2409
2024-08-12 06:45:05 +00:00
giannozz
462fc6dde5
Merge branch 'stres_us_again' into 'develop'
...
Nonlocal stress reshuffling
See merge request QEF/q-e!2404
2024-08-12 06:44:48 +00:00
Ye Luo
331039c1b8
Fix nonlinear hang related to device_resident
2024-08-11 22:14:10 -05:00
Ye Luo
22e1ffcd0c
Fix metaGGA hang related to device_resident
2024-08-11 18:44:59 -05:00
Ye Luo
a837f6e7bf
device_resident seams causing hang in OMP threads>1.
2024-08-11 18:24:22 -05:00
Paolo Giannozzi
a35d80bfab
Variable "ofsbeta" must also be copied to GPU
2024-08-11 11:33:15 +02:00
Paolo Giannozzi
3f5ee4617e
Noncolinear case also works
2024-08-10 09:09:50 +02:00
Paolo Giannozzi
f5147abcbf
Seems to work for spinorbit (not plain noncolin yet)
2024-08-09 22:44:14 +02:00
Paolo Giannozzi
c448d65b78
Improved US stress - also k-points, non collinear NOT YET IMPLEMENTED
2024-08-09 19:03:28 +02:00
Ivan Carnimeo
51e763aa1a
psi, hpsi, spsi to OpenACC in RMM-DIIS (k-points)
2024-08-09 15:07:42 +02:00
Ivan Carnimeo
d0cf28362f
psi, hpsi, spsi to OpenACC in RMM-DIIS at gamma
2024-08-08 19:52:51 +02:00
Ivan Carnimeo
43a191b735
Simplify rotate_xpsi and c_bands with OpenACC
2024-08-08 19:16:48 +02:00
Ivan Carnimeo
fc18953b26
Simplify RMM-DIIS with OpenACC
2024-08-08 18:49:35 +02:00
Pietro Davide Delugas
f029ce3686
openACC: add offloading in h_psi_meta.f90
...
* accepts device pointers as arguments intead of host arrays
* internal arrays mapped to Device and loops offloaded
2024-08-07 16:56:09 +02:00
Paolo Giannozzi
bfcd55645a
Gamma-only case works with OpenACC as well
2024-07-28 14:59:44 +02:00
Paolo Giannozzi
3979028fff
Now working in parallel execuation as well
2024-07-28 13:23:03 +02:00
Paolo Giannozzi
a509854730
stres_us cleanup - First step: Gamma case, CPU only
2024-07-27 19:19:49 +02:00
Paolo Giannozzi
55bcbd59df
Dumb mistake in last commit
2024-07-24 11:58:50 +02:00
Paolo Giannozzi
03462f453b
Case many_fft=1 (not used on GPUs) was not correct
2024-07-22 15:30:05 +02:00
Paolo Giannozzi
8fef512db5
Small changes:
...
- in sum_band, rename get_rho_gpu => get_rho_k
- in vloc_psi_gpu, use "acc declare device_resident"
2024-07-22 11:41:25 +02:00
Paolo Giannozzi
1d48886730
Minor cleanup
2024-07-20 10:54:21 +02:00
Paolo Giannozzi
cef4a0ba5f
More cleanup, useless allocations removed
2024-07-20 10:29:44 +02:00
Paolo Giannozzi
8665c627c5
Large array allocation removed
2024-07-20 09:29:46 +02:00
Paolo Giannozzi
3dc27ea352
Cleanup: no more device variables
2024-07-18 21:46:04 +02:00
Paolo Giannozzi
a0e4e32dee
Removed a few vrs_d and scf_mod_gpu leftover
2024-07-17 10:41:11 +02:00
Paolo Giannozzi
457c7d55ef
More Vloc*psi cleanup
...
Variable "vrs" used in the Hamiltonian is now an ACC variable, replaces vrs_d.
vrs is copied to device in set_vrs. Obsolete using_vrs* machinery deleted.
2024-07-17 10:04:16 +02:00
Paolo Giannozzi
7bdae85022
Some vloc_psi cleanup
2024-07-16 08:28:13 +02:00
Paolo Giannozzi
5d61187d27
Meta-GGA case in sum_band brought to GPU, plus some cleanup.
...
Still too many diffferences between noncolinear, metaGGA, gamma and k cases
2024-07-12 11:28:17 +02:00
Paolo Giannozzi
24c7d7d7c7
Calculation of becsum moved to compute_becsum for increased (IMHO) readability
2024-07-10 21:15:56 +02:00
Paolo Giannozzi
73c5cc1606
becsum cleanup, better comments
2024-07-10 17:47:38 +02:00
Paolo Giannozzi
1d066441d8
In charge density calculation, separate task-group code (to be deleted sooner
...
or later) from the rest. Makes the code much more readable. Misc cleanup.
2024-07-10 14:30:01 +02:00
Paolo Giannozzi
bb1e876b9b
In some cases dynmat.x must be run with mpirun or similar
2024-07-07 19:56:31 +02:00