quantum-espresso/install
marsamos 33a8250aba change plumed link, change other plugin links to 5.0.2
git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@9639 c92efa57-630b-4861-b058-cf58834340f0
2012-11-21 14:44:32 +00:00
..
Make.BGP eliminated Multigrid dep in Make.something in dir install 2011-03-24 15:44:01 +00:00
Make.BGP-openMP eliminated Multigrid dep in Make.something in dir install 2011-03-24 15:44:01 +00:00
Make.BGP-openMP+FFTW Recent make.sys sample for BG, may be useful 2012-06-28 15:44:23 +00:00
Make.altix eliminated Multigrid dep in Make.something in dir install 2011-03-24 15:44:01 +00:00
Makefile_iotk extlibs deleted moved to archive and main install 2012-01-03 11:33:44 +00:00
Makefile_lapack extlibs deleted moved to archive and main install 2012-01-03 11:33:44 +00:00
Makefile_lapack_testing_lin extlibs deleted moved to archive and main install 2012-01-03 11:33:44 +00:00
README.CINECA_fermi UPdated with latest performance improvements 2012-10-11 21:17:02 +00:00
README.CRAY-XE6.CSCS_rosa XE6 and XK7 are the latest CRAY MPP machines, XT4 is quite obsolete. Examples provided are based on Monte Rosa and Todi at CSCS (instructions are quite general since the front-end environment is almost the same for every CRAY machine). 2012-10-28 22:29:32 +00:00
README.CRAY-XK7.CSCS_todi XE6 and XK7 are the latest CRAY MPP machines, XT4 is quite obsolete. Examples provided are based on Monte Rosa and Todi at CSCS (instructions are quite general since the front-end environment is almost the same for every CRAY machine). 2012-10-28 22:29:32 +00:00
clean.sh An error in previous commit. 2012-08-20 15:29:26 +00:00
config.guess More minor tweaking: obsolete or useless variables removed, 2006-09-21 20:07:55 +00:00
config.sub added autoconf-based configure (file "configure.new") and related files 2003-11-13 13:35:10 +00:00
configure CRAY systems support Intel compiler 2012-10-28 22:17:50 +00:00
configure.ac CRAY systems support Intel compiler 2012-10-28 22:17:50 +00:00
configure.msg.in - better support for SCALAPACK library. 2009-08-20 13:24:31 +00:00
extlibs_makefile ELPA v0.2: temporary disabled OpenMP extensions since they were producing crashes on Linux machines, added COPYING/License informations, improved configure (if --without-elpa then the library is not compiled at all). 2012-10-23 11:30:37 +00:00
includedep.sh More miscellanous cleanup from Axel: 2006-12-12 11:02:09 +00:00
install-sh added autoconf-based configure (file "configure.new") and related files 2003-11-13 13:35:10 +00:00
iotk_config.h extlibs deleted moved to archive and main install 2012-01-03 11:33:44 +00:00
make.sys.in ELPA v0.2: temporary disabled OpenMP extensions since they were producing crashes on Linux machines, added COPYING/License informations, improved configure (if --without-elpa then the library is not compiled at all). 2012-10-23 11:30:37 +00:00
make_blas.inc.in extlibs deleted moved to archive and main install 2012-01-03 11:33:44 +00:00
make_lapack.inc.in extlibs deleted moved to archive and main install 2012-01-03 11:33:44 +00:00
make_wannier90.sys.in make_wannier90.sys.in added to dir install 2010-11-23 11:57:58 +00:00
makedeps.sh make.depend and scrpt generating them updated 2012-10-24 14:31:17 +00:00
moduledep.sh More miscellanous cleanup from Axel: 2006-12-12 11:02:09 +00:00
namedep.sh More miscellanous cleanup from Axel: 2006-12-12 11:02:09 +00:00
plugins_list change plumed link, change other plugin links to 5.0.2 2012-11-21 14:44:32 +00:00
plugins_makefile change plumed link, change other plugin links to 5.0.2 2012-11-21 14:44:32 +00:00
update_version IBM machines do not like "diff -q" 2012-07-20 13:37:00 +00:00

README.CRAY-XK7.CSCS_todi

Info by Filippo Spiga, Oct. 2012, valid for any version of QE after 5.


Machine name    : TODI (Cray XKT) at CSCS, Lugano (CH)
Machine spec    : http://user.cscs.ch/hardware/todi_cray_xk7/index.html
Similar systems : TITAN (ORNL, USA)


1. Compile the code

... starting from a SVN checkout...

module switch PrgEnv-cray PrgEnv-intel
module load cudatoolkit/5.0.33.103
module unload atp hss-llm
cd espresso
cd GPU/
./configure --enable-openmp --enable-cuda --with-gpu-arch=35 \
         --with-cuda-dir=${CRAY_CUDATOOLKIT_DIR} --disable-magma \
         --disable-profiling --enable-phigemm --enable-parallel \
         --with-scalapack ARCH=crayxt
cd ..
make -f Makefile.gpu all-gpu
( or just "make -f Makefile.gpu pw-gpu" )

Executables will be located under "./bin"


IMPORTANT NOTE (!): only pw-gpu.x, neb-gpu.x, ph-gpu.x use extensively 
                    the GPU card in multiple sections of the code. All 
                    the other executable exploit the GPU only by the 
                    phiGEMM library (for now)
                
IMPORTANT NOTE : not all the codes are enough "big" or "computational 
                 intensive" like PWscf. To generate all the other codes
                 missing in the above list do "make -f Makefile.gpu distclean"
                 and follow the instruction in the file "README.CSCS_rosa"

IMPORTANT NOTE : CPU-only code supports PGI, GNU and Intel compilers. GPU+CPU
                 code supports *ONLY* Intel compiler. A bug report has been 
                 filled to PGI and NVIDIA.


                
2. Good practices

- Each NVIDIA Tesla K20 GPU has 6 GB of memory on the card. Better to limit 
  the number of MPI per node (so the number of MPI sharing the same GPU) 
  to 2.

- If the calculation is not too memory demanding, it is possible to increase 
  the ratio MPI:GPU up to 4. The new Hyper-Q technology will help to leverage 
  and exploit the GPU at its best.
  
- In order to share the GPU between multiple MPI processes within the node is 
  mandatory to export the variable CRAY_CUDA_PROXY ("export CRAY_CUDA_PROXY=1")



3. Example scripts 

#SBATCH --job-name="QE-BENCH-SPIGA"
#SBATCH --nodes=64
# REMEMBER: --ntasks-per-node * --cpus-per-task <= 16
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=8
#SBATCH --time=02:00:00
#SBATCH --output=QE-BENCH.%j.o
#SBATCH --error=QE-BENCH.%j.e
#SBATCH --account=<...>

echo "The current job ID is $SLURM_JOB_ID"
echo "Running on $SLURM_NNODES nodes"
echo "Using $SLURM_NTASKS_PER_NODE tasks per node"
echo "A total of $SLURM_NPROCS tasks is used"

export OMP_NUM_THREADS=8
export CRAY_CUDA_PROXY=1

export MALLOC_MMAP_MAX_=0
export MALLOC_TRIM_THRESHOLD_=536870912
#export MPICH_VERSION_DISPLAY=1
#export MPICH_ENV_DISPLAY=1

aprun -n $SLURM_NPROCS -N 2 -d 8 ./pw.x -input <...> | tee out