mirror of https://github.com/abinit/abinit.git
293 lines
19 KiB
Plaintext
293 lines
19 KiB
Plaintext
|
|
.Version 8.0.3 of FFTPROF
|
|
.(sequential version, prepared for a x86_64_linux_gnu4.6 computer)
|
|
|
|
.Copyright (C) 1998-2025 ABINIT group .
|
|
FFTPROF comes with ABSOLUTELY NO WARRANTY.
|
|
It is free software, and you are welcome to redistribute it
|
|
under certain conditions (GNU General Public License,
|
|
see ~abinit/COPYING or http://www.gnu.org/copyleft/gpl.txt).
|
|
|
|
ABINIT is a project of the Universite Catholique de Louvain,
|
|
Corning Inc. and other collaborators, see ~abinit/doc/developers/contributors.txt .
|
|
Please read https://docs.abinit.org/theory/acknowledgments for suggested
|
|
acknowledgments of the ABINIT effort.
|
|
For more information, see https://www.abinit.org .
|
|
|
|
.Starting date : Mon 4 Apr 2016.
|
|
- ( at 22h45 )
|
|
|
|
Tool for profiling and testing the FFT libraries used in ABINIT.
|
|
Allowed options are:
|
|
fourdp --> Test FFT transforms of density and potentials on the full box.
|
|
fourwf --> Test FFT transforms of wavefunctions using the zero-pad algorithm.
|
|
gw_fft --> Test the FFT transforms used in the GW code.
|
|
all --> Test all FFT routines.
|
|
|
|
|
|
==== OpenMP parallelism is ON ====
|
|
- Max_threads: 2
|
|
- Num_threads: 2
|
|
- Num_procs: 4
|
|
- Dynamic: F
|
|
- Nested: F
|
|
|
|
Real(R)+Recip(G) space primitive vectors, cartesian coordinates (Bohr,Bohr^-1):
|
|
R(1)= 20.0000000 0.0000000 0.0000000 G(1)= 0.0500000 0.0000000 0.0000000
|
|
R(2)= 0.0000000 20.0000000 0.0000000 G(2)= 0.0000000 0.0500000 0.0000000
|
|
R(3)= 0.0000000 0.0000000 20.0000000 G(3)= 0.0000000 0.0000000 0.0500000
|
|
Unit cell volume ucvol= 8.0000000E+03 bohr^3
|
|
Unit cell volume ucvol= 8.0000000E+03 bohr^3
|
|
Angles (23,13,12)= 9.00000000E+01 9.00000000E+01 9.00000000E+01 degrees
|
|
Angles (23,13,12)= 9.00000000E+01 9.00000000E+01 9.00000000E+01 degrees
|
|
|
|
==== FFT setup for fftalg 110 ====
|
|
FFT mesh divisions ........................ 100 100 100
|
|
Augmented FFT divisions ................... 101 101 100
|
|
FFT algorithm ............................. 110
|
|
FFT cache size ............................ 16
|
|
|
|
==== FFT setup for fftalg 111 ====
|
|
FFT mesh divisions ........................ 100 100 100
|
|
Augmented FFT divisions ................... 101 101 100
|
|
FFT algorithm ............................. 111
|
|
FFT cache size ............................ 16
|
|
|
|
==== FFT setup for fftalg 112 ====
|
|
FFT mesh divisions ........................ 100 100 100
|
|
Augmented FFT divisions ................... 101 101 100
|
|
FFT algorithm ............................. 112
|
|
FFT cache size ............................ 16
|
|
|
|
==== FFT setup for fftalg 410 ====
|
|
FFT mesh divisions ........................ 100 100 100
|
|
Augmented FFT divisions ................... 101 101 100
|
|
FFT algorithm ............................. 410
|
|
FFT cache size ............................ 16
|
|
|
|
==== FFT setup for fftalg 411 ====
|
|
FFT mesh divisions ........................ 100 100 100
|
|
Augmented FFT divisions ................... 101 101 100
|
|
FFT algorithm ............................. 411
|
|
FFT cache size ............................ 16
|
|
|
|
==== FFT setup for fftalg 412 ====
|
|
FFT mesh divisions ........................ 100 100 100
|
|
Augmented FFT divisions ................... 101 101 100
|
|
FFT algorithm ............................. 412
|
|
FFT cache size ............................ 16
|
|
|
|
==== FFT setup for fftalg 312 ====
|
|
FFT mesh divisions ........................ 100 100 100
|
|
Augmented FFT divisions ................... 101 101 100
|
|
FFT algorithm ............................. 312
|
|
FFT cache size ............................ 16
|
|
|
|
==== FFT setup for fftalg 512 ====
|
|
FFT mesh divisions ........................ 100 100 100
|
|
Augmented FFT divisions ................... 101 101 100
|
|
FFT algorithm ............................. 512
|
|
FFT cache size ............................ 16
|
|
|
|
==============================================================
|
|
==== fourwf with option 0, cplex 0, ndat 1, istwf_k 1 ====
|
|
==============================================================
|
|
Library CPU-time WALL-time nthreads ncalls Max_|Err| <|Err|>
|
|
- Goedecker (110) 0.0870 0.0870 1 (100%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.1090 0.0665 2 ( 65%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.1325 0.0590 3 ( 49%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.1450 0.0710 4 ( 31%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (111) 0.0810 0.0810 1 (100%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (111) 0.0925 0.0665 2 ( 61%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (111) 0.1115 0.0650 3 ( 42%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (111) 0.1295 0.0735 4 ( 28%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (112) 0.0385 0.0390 1 (100%) 2 5.86E-14 1.94E-15
|
|
- Goedecker (112) 0.0480 0.0260 2 ( 75%) 2 5.86E-14 1.94E-15
|
|
- Goedecker (112) 0.0560 0.0220 3 ( 59%) 2 5.86E-14 1.94E-15
|
|
- Goedecker (112) 0.1075 0.0430 4 ( 23%) 2 5.86E-14 1.94E-15
|
|
- Goedecker2002 (410) 0.0925 0.0925 1 (100%) 2 6.08E-14 1.96E-15
|
|
- Goedecker2002 (410) 0.1010 0.1010 2 ( 46%) 2 6.08E-14 1.96E-15
|
|
- Goedecker2002 (410) 0.0925 0.0925 3 ( 33%) 2 6.08E-14 1.96E-15
|
|
- Goedecker2002 (410) 0.0975 0.0975 4 ( 24%) 2 6.08E-14 1.96E-15
|
|
- Goedecker2002 (411) 0.0345 0.0345 1 (100%) 2 6.08E-14 1.96E-15
|
|
- Goedecker2002 (411) 0.0340 0.0340 2 ( 51%) 2 6.08E-14 1.96E-15
|
|
- Goedecker2002 (411) 0.0335 0.0335 3 ( 34%) 2 6.08E-14 1.96E-15
|
|
- Goedecker2002 (411) 0.0305 0.0305 4 ( 28%) 2 6.08E-14 1.96E-15
|
|
- Goedecker2002 (412) 0.0310 0.0310 1 (100%) 2 6.08E-14 1.96E-15
|
|
- Goedecker2002 (412) 0.0305 0.0305 2 ( 51%) 2 6.08E-14 1.96E-15
|
|
- Goedecker2002 (412) 0.0315 0.0315 3 ( 33%) 2 6.08E-14 1.96E-15
|
|
- Goedecker2002 (412) 0.0360 0.0360 4 ( 22%) 2 6.08E-14 1.96E-15
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
|
|
Consistency check: MAX(Max_|Err|) = 6.08E-14, Max(<|Err|>) = 1.96E-15, reference_lib: Goedecker (110)
|
|
|
|
|
|
==============================================================
|
|
==== fourwf with option 1, cplex 1, ndat 1, istwf_k 1 ====
|
|
==============================================================
|
|
Library CPU-time WALL-time nthreads ncalls Max_|Err| <|Err|>
|
|
- Goedecker (110) 0.1045 0.1045 1 (100%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.1240 0.0705 2 ( 74%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.1510 0.0670 3 ( 52%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.1515 0.0645 4 ( 41%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (111) 0.0605 0.0605 1 (100%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (111) 0.0985 0.0655 2 ( 46%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (111) 0.1195 0.0700 3 ( 29%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (111) 0.1495 0.0805 4 ( 19%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (112) 0.0385 0.0385 1 (100%) 2 2.18E-11 1.42E-14
|
|
- Goedecker (112) 0.0385 0.0210 2 ( 92%) 2 2.18E-11 1.42E-14
|
|
- Goedecker (112) 0.0575 0.0290 3 ( 44%) 2 2.18E-11 1.42E-14
|
|
- Goedecker (112) 0.0970 0.0455 4 ( 21%) 2 2.18E-11 1.42E-14
|
|
- Goedecker2002 (410) 0.1065 0.1065 1 (100%) 2 2.18E-11 1.44E-14
|
|
- Goedecker2002 (410) 0.1100 0.1100 2 ( 48%) 2 2.18E-11 1.44E-14
|
|
- Goedecker2002 (410) 0.0985 0.0990 3 ( 36%) 2 2.18E-11 1.44E-14
|
|
- Goedecker2002 (410) 0.0830 0.0830 4 ( 32%) 2 2.18E-11 1.44E-14
|
|
- Goedecker2002 (411) 0.0375 0.0370 1 (100%) 2 2.18E-11 1.44E-14
|
|
- Goedecker2002 (411) 0.0605 0.0605 2 ( 31%) 2 2.18E-11 1.44E-14
|
|
- Goedecker2002 (411) 0.0420 0.0415 3 ( 30%) 2 2.18E-11 1.44E-14
|
|
- Goedecker2002 (411) 0.0445 0.0445 4 ( 21%) 2 2.18E-11 1.44E-14
|
|
- Goedecker2002 (412) 0.0380 0.0380 1 (100%) 2 2.18E-11 1.44E-14
|
|
- Goedecker2002 (412) 0.0320 0.0320 2 ( 59%) 2 2.18E-11 1.44E-14
|
|
- Goedecker2002 (412) 0.0365 0.0370 3 ( 34%) 2 2.18E-11 1.44E-14
|
|
- Goedecker2002 (412) 0.0400 0.0400 4 ( 24%) 2 2.18E-11 1.44E-14
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
|
|
Consistency check: MAX(Max_|Err|) = 2.18E-11, Max(<|Err|>) = 1.44E-14, reference_lib: Goedecker (110)
|
|
|
|
|
|
==============================================================
|
|
==== fourwf with option 2, cplex 1, ndat 1, istwf_k 1 ====
|
|
==============================================================
|
|
Library CPU-time WALL-time nthreads ncalls Max_|Err| <|Err|>
|
|
- Goedecker (110) 0.1600 0.1600 1 (100%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.1810 0.0990 2 ( 81%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.2665 0.1035 3 ( 52%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.3130 0.1315 4 ( 30%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (111) 0.1140 0.1140 1 (100%) 2 2.22E-16 1.86E-19
|
|
- Goedecker (111) 0.1550 0.0950 2 ( 60%) 2 2.22E-16 1.86E-19
|
|
- Goedecker (111) 0.1935 0.0920 3 ( 41%) 2 2.22E-16 1.86E-19
|
|
- Goedecker (111) 0.2335 0.1095 4 ( 26%) 2 2.22E-16 1.86E-19
|
|
- Goedecker (112) 0.0505 0.0505 1 (100%) 2 2.23E-16 2.39E-19
|
|
- Goedecker (112) 0.0570 0.0285 2 ( 89%) 2 2.23E-16 2.39E-19
|
|
- Goedecker (112) 0.0700 0.0235 3 ( 72%) 2 2.23E-16 2.39E-19
|
|
- Goedecker (112) 0.1355 0.0520 4 ( 24%) 2 2.23E-16 2.39E-19
|
|
- Goedecker2002 (410) 0.1605 0.1610 1 (100%) 2 3.33E-16 2.53E-19
|
|
- Goedecker2002 (410) 0.1730 0.1690 2 ( 48%) 2 3.33E-16 2.53E-19
|
|
- Goedecker2002 (410) 0.1845 0.1750 3 ( 31%) 2 3.33E-16 2.53E-19
|
|
- Goedecker2002 (410) 0.1820 0.1690 4 ( 24%) 2 3.33E-16 2.53E-19
|
|
- Goedecker2002 (411) 0.0700 0.0705 1 (100%) 2 3.33E-16 2.53E-19
|
|
- Goedecker2002 (411) 0.0585 0.0580 2 ( 61%) 2 3.33E-16 2.53E-19
|
|
- Goedecker2002 (411) 0.0575 0.0575 3 ( 41%) 2 3.33E-16 2.53E-19
|
|
- Goedecker2002 (411) 0.0725 0.0730 4 ( 24%) 2 3.33E-16 2.53E-19
|
|
- Goedecker2002 (412) 0.0570 0.0570 1 (100%) 2 3.33E-16 2.53E-19
|
|
- Goedecker2002 (412) 0.0510 0.0510 2 ( 56%) 2 3.33E-16 2.53E-19
|
|
- Goedecker2002 (412) 0.0505 0.0510 3 ( 37%) 2 3.33E-16 2.53E-19
|
|
- Goedecker2002 (412) 0.0565 0.0565 4 ( 25%) 2 3.33E-16 2.53E-19
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
|
|
Consistency check: MAX(Max_|Err|) = 3.33E-16, Max(<|Err|>) = 2.53E-19, reference_lib: Goedecker (110)
|
|
|
|
|
|
==============================================================
|
|
==== fourwf with option 3, cplex 0, ndat 1, istwf_k 1 ====
|
|
==============================================================
|
|
Library CPU-time WALL-time nthreads ncalls Max_|Err| <|Err|>
|
|
- Goedecker (110) 0.0645 0.0645 1 (100%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.0955 0.0480 2 ( 67%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.1215 0.0565 3 ( 38%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.1430 0.0710 4 ( 23%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (111) 0.0510 0.0515 1 (100%) 2 1.12E-16 6.13E-20
|
|
- Goedecker (111) 0.0900 0.0620 2 ( 42%) 2 1.12E-16 6.13E-20
|
|
- Goedecker (111) 0.0870 0.0415 3 ( 41%) 2 1.12E-16 6.13E-20
|
|
- Goedecker (111) 0.1150 0.0675 4 ( 19%) 2 1.12E-16 6.13E-20
|
|
- Goedecker (112) 0.0530 0.0530 1 (100%) 2 1.12E-16 6.13E-20
|
|
- Goedecker (112) 0.0650 0.0425 2 ( 62%) 2 1.12E-16 6.13E-20
|
|
- Goedecker (112) 0.0900 0.0380 3 ( 46%) 2 1.12E-16 6.13E-20
|
|
- Goedecker (112) 0.1115 0.0545 4 ( 24%) 2 1.12E-16 6.13E-20
|
|
- Goedecker2002 (410) 0.0500 0.0500 1 (100%) 2 2.22E-16 5.34E-20
|
|
- Goedecker2002 (410) 0.0510 0.0495 2 ( 51%) 2 2.22E-16 5.34E-20
|
|
- Goedecker2002 (410) 0.0635 0.0595 3 ( 28%) 2 2.22E-16 5.34E-20
|
|
- Goedecker2002 (410) 0.0650 0.0600 4 ( 21%) 2 2.22E-16 5.34E-20
|
|
- Goedecker2002 (411) 0.0260 0.0260 1 (100%) 2 2.22E-16 5.34E-20
|
|
- Goedecker2002 (411) 0.0280 0.0280 2 ( 46%) 2 2.22E-16 5.34E-20
|
|
- Goedecker2002 (411) 0.0265 0.0265 3 ( 33%) 2 2.22E-16 5.34E-20
|
|
- Goedecker2002 (411) 0.0280 0.0280 4 ( 23%) 2 2.22E-16 5.34E-20
|
|
- Goedecker2002 (412) 0.0260 0.0260 1 (100%) 2 2.22E-16 5.34E-20
|
|
- Goedecker2002 (412) 0.0280 0.0280 2 ( 46%) 2 2.22E-16 5.34E-20
|
|
- Goedecker2002 (412) 0.0260 0.0255 3 ( 34%) 2 2.22E-16 5.34E-20
|
|
- Goedecker2002 (412) 0.0235 0.0235 4 ( 28%) 2 2.22E-16 5.34E-20
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
|
|
Consistency check: MAX(Max_|Err|) = 2.22E-16, Max(<|Err|>) = 6.13E-20, reference_lib: Goedecker (110)
|
|
|
|
|
|
==============================================================
|
|
==== fourwf with option 2, cplex 2, ndat 1, istwf_k 1 ====
|
|
==============================================================
|
|
Library CPU-time WALL-time nthreads ncalls Max_|Err| <|Err|>
|
|
- Goedecker (110) 0.1900 0.1900 1 (100%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.2580 0.1535 2 ( 62%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.2340 0.0920 3 ( 69%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (110) 0.3080 0.1580 4 ( 30%) 2 0.00E+00 0.00E+00
|
|
- Goedecker (111) 0.1755 0.1760 1 (100%) 2 3.33E-16 2.19E-19
|
|
- Goedecker (111) 0.2015 0.1480 2 ( 59%) 2 3.33E-16 2.19E-19
|
|
- Goedecker (111) 0.1815 0.0890 3 ( 66%) 2 3.33E-16 2.19E-19
|
|
- Goedecker (111) 0.2175 0.1150 4 ( 38%) 2 3.33E-16 2.19E-19
|
|
- Goedecker (112) 0.0640 0.0640 1 (100%) 2 3.33E-16 3.05E-19
|
|
- Goedecker (112) 0.0715 0.0405 2 ( 79%) 2 3.33E-16 3.05E-19
|
|
- Goedecker (112) 0.0720 0.0325 3 ( 66%) 2 3.33E-16 3.05E-19
|
|
- Goedecker (112) 0.0955 0.0430 4 ( 37%) 2 3.33E-16 3.05E-19
|
|
- Goedecker2002 (410) 0.1830 0.1830 1 (100%) 2 3.34E-16 3.19E-19
|
|
- Goedecker2002 (410) 0.1480 0.1430 2 ( 64%) 2 3.34E-16 3.19E-19
|
|
- Goedecker2002 (410) 0.1730 0.1625 3 ( 38%) 2 3.34E-16 3.19E-19
|
|
- Goedecker2002 (410) 0.1735 0.1595 4 ( 29%) 2 3.34E-16 3.19E-19
|
|
- Goedecker2002 (411) 0.0715 0.0715 1 (100%) 2 3.34E-16 3.19E-19
|
|
- Goedecker2002 (411) 0.0690 0.0685 2 ( 52%) 2 3.34E-16 3.19E-19
|
|
- Goedecker2002 (411) 0.0615 0.0615 3 ( 39%) 2 3.34E-16 3.19E-19
|
|
- Goedecker2002 (411) 0.0635 0.0630 4 ( 28%) 2 3.34E-16 3.19E-19
|
|
- Goedecker2002 (412) 0.0495 0.0495 1 (100%) 2 3.34E-16 3.19E-19
|
|
- Goedecker2002 (412) 0.0505 0.0505 2 ( 49%) 2 3.34E-16 3.19E-19
|
|
- Goedecker2002 (412) 0.0495 0.0495 3 ( 33%) 2 3.34E-16 3.19E-19
|
|
- Goedecker2002 (412) 0.0510 0.0510 4 ( 24%) 2 3.34E-16 3.19E-19
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- FFTW3 (312) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
- DFTI (512) N/A N/A N/A N/A N/A N/A
|
|
|
|
Consistency check: MAX(Max_|Err|) = 3.34E-16, Max(<|Err|>) = 3.19E-19, reference_lib: Goedecker (110)
|
|
|
|
|
|
Analysis completed.
|