mirror of https://github.com/QMCPACK/qmcpack.git
3.2 KiB
3.2 KiB
Profiling QMCPACK
LLVM-XRAY based
With CUDA:
LLVM 4.0.1 and NVCC 9.1 -- allows simultaneous nvprof and xray instrumentation of CPU code
- Build: see build_script
- Running: With the performance tests set up.
[epd@oxygen ]$ export QMCPACK_APP=/path/to/your/instrumented/bin/qmcpack
[epd@oxygen ]$ cd /your_scratch/performace/NiO/sample/dmc-a4-e48-gpu
[epd@oxygen ]$ OMP_NUM_THREADS=12 XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic" nvprof -o simul.nvprof $QMCPACK_APP NiO-fcc-S1-dmc.xml
[epd@oxygen ]$ llvm-xray account xray-log.qmcpack.yrxHG1 -instr_map=$QMCPACK_APP -sort=med -top=10 -sortorder=dsc
Functions with latencies: 514
funcid count [ min, med, 90p, 99p, max] sum function
1 1 [12.220955, 12.220955, 12.220955, 12.220955, 12.220955] 12.220955 <invalid>:0:0: main
55 1 [ 5.188287, 5.188287, 5.188287, 5.188287, 5.188287] 5.188287 <invalid>:0:0: qmcplusplus::QMCMain::execute()
59 1 [ 2.152482, 2.152482, 2.152482, 2.152482, 2.152482] 2.152482 <invalid>:0:0: qmcplusplus::QMCMain::validateXML()
64 1 [ 1.795763, 1.795763, 1.795763, 1.795763, 1.795763] 1.795763 <invalid>:0:0: qmcplusplus::WaveFunctionPool::put(_xmlNode*)
1266 1 [ 1.795672, 1.795672, 1.795672, 1.795672, 1.795672] 1.795672 <invalid>:0:0: qmcplusplus::WaveFunctionFactory::build(_xmlNode*, bool)
1267 1 [ 1.488744, 1.488744, 1.488744, 1.488744, 1.488744] 1.488744 <invalid>:0:0: qmcplusplus::WaveFunctionFactory::addFermionTerm(_xmlNode*)
1767 1 [ 1.488722, 1.488722, 1.488722, 1.488722, 1.488722] 1.488722 <invalid>:0:0: qmcplusplus::SlaterDetBuilder::put(_xmlNode*)
335 1 [ 1.198301, 1.198301, 1.198301, 1.198301, 1.198301] 1.198301 <invalid>:0:0: qmcplusplus::VMCcuda::runWithDrift()
56 3 [ 0.793548, 1.041833, 1.198912, 1.198912, 1.198912] 3.034292 <invalid>:0:0: qmcplusplus::QMCMain::executeQMCSection(_xmlNode*, bool)
58 3 [ 0.793514, 1.041820, 1.198901, 1.198901, 1.198901] 3.034235 <invalid>:0:0: qmcplusplus::QMCMain::runQMC(_xmlNode*)
If you also have a new version of llvm installed you can use the newer xray tools with the old trace. The best of these allows you to convert to a event-trace format that can then be massaged further to view in a graphical tool.
/home/epd/opt/llvm-7/bin/llvm-xray convert -output-format=trace_event -instr_map=$QMCPACK_APP -symbolize -sort -output=dmc-a4-e48-gpu.trace xray-log.qmcpack.yrxHG1
Then convert into chrome viewable html doc using the catapult tool.
~/codes/catapult/tracing/bin/trace2html dmc-a4-e48-gpu.trace -o dmc-a4-e48-gpu.html
example -- only works with chrome
You can then download the nvprof trace:
$ scp /your_scratch/performace/NiO/sample/dmc-a4-e48-gpu/simul.nvprof ./
$ nvvp
You'll need to import simul.nvprof to a new session in nvvp. Then you'll be able look at the traces from the CPU and GPU point of view at the same time.