mirror of https://gitlab.com/QEF/q-e.git
33a8250aba
git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@9639 c92efa57-630b-4861-b058-cf58834340f0 |
||
---|---|---|
.. | ||
Make.BGP | ||
Make.BGP-openMP | ||
Make.BGP-openMP+FFTW | ||
Make.altix | ||
Makefile_iotk | ||
Makefile_lapack | ||
Makefile_lapack_testing_lin | ||
README.CINECA_fermi | ||
README.CRAY-XE6.CSCS_rosa | ||
README.CRAY-XK7.CSCS_todi | ||
clean.sh | ||
config.guess | ||
config.sub | ||
configure | ||
configure.ac | ||
configure.msg.in | ||
extlibs_makefile | ||
includedep.sh | ||
install-sh | ||
iotk_config.h | ||
make.sys.in | ||
make_blas.inc.in | ||
make_lapack.inc.in | ||
make_wannier90.sys.in | ||
makedeps.sh | ||
moduledep.sh | ||
namedep.sh | ||
plugins_list | ||
plugins_makefile | ||
update_version |
README.CRAY-XK7.CSCS_todi
Info by Filippo Spiga, Oct. 2012, valid for any version of QE after 5. Machine name : TODI (Cray XKT) at CSCS, Lugano (CH) Machine spec : http://user.cscs.ch/hardware/todi_cray_xk7/index.html Similar systems : TITAN (ORNL, USA) 1. Compile the code ... starting from a SVN checkout... module switch PrgEnv-cray PrgEnv-intel module load cudatoolkit/5.0.33.103 module unload atp hss-llm cd espresso cd GPU/ ./configure --enable-openmp --enable-cuda --with-gpu-arch=35 \ --with-cuda-dir=${CRAY_CUDATOOLKIT_DIR} --disable-magma \ --disable-profiling --enable-phigemm --enable-parallel \ --with-scalapack ARCH=crayxt cd .. make -f Makefile.gpu all-gpu ( or just "make -f Makefile.gpu pw-gpu" ) Executables will be located under "./bin" IMPORTANT NOTE (!): only pw-gpu.x, neb-gpu.x, ph-gpu.x use extensively the GPU card in multiple sections of the code. All the other executable exploit the GPU only by the phiGEMM library (for now) IMPORTANT NOTE : not all the codes are enough "big" or "computational intensive" like PWscf. To generate all the other codes missing in the above list do "make -f Makefile.gpu distclean" and follow the instruction in the file "README.CSCS_rosa" IMPORTANT NOTE : CPU-only code supports PGI, GNU and Intel compilers. GPU+CPU code supports *ONLY* Intel compiler. A bug report has been filled to PGI and NVIDIA. 2. Good practices - Each NVIDIA Tesla K20 GPU has 6 GB of memory on the card. Better to limit the number of MPI per node (so the number of MPI sharing the same GPU) to 2. - If the calculation is not too memory demanding, it is possible to increase the ratio MPI:GPU up to 4. The new Hyper-Q technology will help to leverage and exploit the GPU at its best. - In order to share the GPU between multiple MPI processes within the node is mandatory to export the variable CRAY_CUDA_PROXY ("export CRAY_CUDA_PROXY=1") 3. Example scripts #SBATCH --job-name="QE-BENCH-SPIGA" #SBATCH --nodes=64 # REMEMBER: --ntasks-per-node * --cpus-per-task <= 16 #SBATCH --ntasks-per-node=2 #SBATCH --cpus-per-task=8 #SBATCH --time=02:00:00 #SBATCH --output=QE-BENCH.%j.o #SBATCH --error=QE-BENCH.%j.e #SBATCH --account=<...> echo "The current job ID is $SLURM_JOB_ID" echo "Running on $SLURM_NNODES nodes" echo "Using $SLURM_NTASKS_PER_NODE tasks per node" echo "A total of $SLURM_NPROCS tasks is used" export OMP_NUM_THREADS=8 export CRAY_CUDA_PROXY=1 export MALLOC_MMAP_MAX_=0 export MALLOC_TRIM_THRESHOLD_=536870912 #export MPICH_VERSION_DISPLAY=1 #export MPICH_ENV_DISPLAY=1 aprun -n $SLURM_NPROCS -N 2 -d 8 ./pw.x -input <...> | tee out