TODO (datafile & High throughput & reproducibility specific) - April 2016 1) XML input in AIDA (what do we need to have these two working in harmony?) 2) Datafile recording _all_ input info for reproducibility (some format decisions have to be made here) 4) Run time writing strategies at each scf step (do we really want this? can a new,optional "scf_history" file suffice? like bfgs history?) 5) Datafile writing strategies for (vc)relax/md (data-file.0.xml 1.xml ...? giving the option to have the first and the last only) 6) Finally, writing the output data in the datafile, fixing the redundant or missing pieces, and changing the language a bit so an outsider too can understand the content without too much suffering. 7) or better, writing a documentation for the tags used... TODO LIST 0) Suspected bugs/problems: 0.1 something happened to elf calculation between 4.1 to 4.2 0.2 wf_collect and phonons on the grid + unneeded scf calculations 0.3 phonon restart should not recalculate the entire initialization 0.4 Small energy differences between PW and CP - maybe PP-related, but might also be a problem of Ewald calculation in CP ("raggio"?) 0.5 Numerical instabilities with: BLYP; TPSS; noncollinear GGA; LDA+U 0.6 Strange numbers from phonon code in some specific cases 0.7 XDM does not work with task groups 0.8 Bands with DFT+U should read U from data file, not from input 1.0 "S matrix not positive definite" 1) Organization 1.0 Binary packages 1.0.0 Debian compatibility: - directory name and tarball name should be the same - No Java binaries (ColorCalculator) 1.3 svn/packaging: 1.3.1 automatic download for GWL 1.3.2 run source-normalizer script dev-tools/src-normal 1.4 mailing lists: 1.4.1 Send monthly reminder? possibly with netiquette 1.4.2 move to forums? 1.7 Testing 1.7.3 Many functionalities are still untested and there is not even an example. PP/ is a notable offender: there are a lot of possible calculations, most of which are never tested. Other untested features: contraints, thermostats, ... 2) Documentation 2.1 Better (and shorter) FAQ list. Notably missing: - hardware for QE - segmentation fault (present but not visible enough) 2.2 Documentation on how to generate PP for GIPAW calculation is missing 3) Pseudopotentials 3.3 Implement OEP PPs? 3.7 The intermediate hard NCPP is useless for PAW and should not be done when generating a PAW set 3.8 Finish merge of Meta-GGA (TPSS) code into atomic code 4) Development 4.0 Major highly desirable restructuring: 4.0.1 Add the possibility to run NEB/PH in a dynamical way, with a home-made scheduler that executes tasks as soon as resources are free; might be used for image parallelization as well. 4.0.2 Better-structured relaxation and molecular dynamics (including the variable-cell case), with more extensive integration of PW and CP (langevin code) 4.0.3 Exx_divergence is confusing and contains some suspect values. 4.0.4 Martyna-Tuckerman has to be extended to any system dimensionality and to exx, using coulomb_vcut trick. 4.1 New developments to be added (sooner or later): 4.1.2 XMCD in XSpectra 4.1.3 Genetic Algorithm 4.1.6 Converse NMR 4.2 Small new developments, desirable or to be added: 4.2.0 Fermi energy in insulators (tetrahedron method) should either be at the top of the valence band or in the middle of the gap; now it is sometimes at the bottom of the conduction band (Eduardo) 4.2.1 configure issues: - use environment variables MKL_* if available; guess MKL_ROOT otherwise, use it consistently 4.2.2 constraints should be implemented in all cases; a check should be added if constraints break the symmetry 4.2.3 inversion symmetry should allow real hamiltonian and wavefunctions 4.2.4 nscf calculations are slow. There must be a way to make a better usage of the available information from the scf calculation: wavefunctions are just discarded. Same for phonon calculation: it shouldn't be needed to recalculate everything almost from scratch at each different q-point 4.2.5 Fermi-Dirac: pass T instead of "broadening", make it possible to use it on top of smearing for free-energy calculations 4.2.7 matdyn should write frequencies in a format that is suitable for direct plotting by gnuplot/xmgrace - see also Eyvaz' script for phonon plotting. Also about phonons: - perform calculations now performed by dynmat.x at the end of a phonon run at gamma? - perform calculations now performed by q2r.x: C(q)=>C(R), at the end of a dispersion run - projected phonon DOS with tetrahedra - adapt plotband.x to phonon case; in general, simplify phonon plotting - Bring QHA calculations inside matdyn? see fqha.f90 4.2.8 Interface to RESP calculation - requires adding radii to xml file 4.2.9 elf for USPP/PAW ; delocalization indices 4.1.10 Collection of tools and utilities for data analysis (things like g(r) from MD simulations). Also: for ev.x, write output file with E(V) for direct plotting of EOS 4.2.11 Gamma: same input as for PH 4.2.12 Stop with 'prefix.EXIT' and restart in D3 and Gamma (KK) 4.2.13 Various defaults for CP (proposed from Princeton): - emass(emass=300), dt (dt=7), for preconditioning cutoff (3) - automatic box grid for USPP from radii of augmented charge - Electronic minimization: damp as default, sd discouraged introduce an automatic default schedule, something as: 1st step sd followed by 5 steps with with damp= 0.8, followed by 5 steps with damp=0.5, followed by 10 steps with damp=0.3, followed by 10 steps with damp=0.2, followed by as many steps as necessary to achieve the required convergence with damp=0.1 A max number of steps should be included to ensure program termination. The other option allowed should be conjugate gradients: see NM - it could one day become the default - Ionic minimization: again SD should be discouraged A default scheme for simultaneous damped dynamics should be given (to be tested) for example: zero damp on ions and start with damp=0.5 on electrons to become then 0.1 or perhaps the values should be set given the forces on the ions When moving ions and electrons simultaneously an important parameter is the ratio between electron and ion masses - For minimization it is better to set up all the ion masses equal - A default value for the ion masses (considering the defaults for emass and dt) is perhaps 10 AMU (we should do some test to see if 20 AMU is s a safer value) - Default values for randomization should be given amprp=0.1 is a decent value - amprp=0.01 is too small - Car-Parrinello dynamics: the proper masses for the ions, an optimal value for emass and dt should be set up by the code, based on the smaller atomic mass and the default value used in the minimization e.g. Amass_default=10 AMU. If the minimum physical AMASS is 20 then dt=sqrt(2) dt_default and emass should be increased so to keep emass^2/dt constant - defaults for the Nose thermostat 4.3 Performance enhancements/Parallelization: 4.3.1 make hpsi/spsi/CG faster - remove complex factor i**l from beta fct and q(r) - shift structure factor from beta to psi when computing becp (reduce memory) - use real BLAS routine instead of COMPLEX one in hpsi/spsi (at least 2 times faster). - use only half of the G's when computing real integrals (2 times faster) - seek for CG and DIIS algoritms that only use (H-eS)|psi> and not the two vector separately ... compute it in one single call. In this way S|psi> is inexpensive 4.3.3 PH: use charge mixing instead of potential mixing 4.3.4 D3: verify status of parallelization, clean it up if needed 4.4 Cleanup 4.4.1 Increase modularization by - collecting variables and routines acting on those variables into modules - classifying modules in a hierarchical way - avoiding as much as possible that modules depend on many other modules 4.4.2 Avoid monster routines that do too many things at the same time depending on the value of too many variables. An example: read_file 4.4.3 There is some confusion in the various initialization steps: - default values at startup - reading of the input data and copy into internal variables - reading from data file - initialization of general variables (that presumably will be written to or read from file) - initialization of variables used in a specific calculation (that may not be written to or read from the data file) All these steps are intermixed and/or replicated and it is never clear what is initialized where. Same for variable allocation: see recent GIPAW workaround for an example of allocation confusion (qnorm, cell_factor in allocate_nlpot) 4.4.4 More PW/CP merge: - f_inp, fixed occupancies - lda+U modules - makov-payne - "cellmd" module of PW and "cell_base" of CP - PW "real-space" approach / CP "small boxes" - there should be a single function or routine for periodic boundary conditions (i.e. bringing all atoms inside the unit cell) - spherical harmonics and integration routines - merge of atomic positions! currently CP uses a complex logic that is very hard to follow 4.4.6 adding/removing/modifying input variables is too complex Why are some checks on input variables performed in read_namelist, while others apparently similar are in */input.f90? 4.4.7 Units: all units should be clearly documented and printed on output (and also it should be clearly stated what the printed quantites are) 4.4.8 There should be a function calculating dj_l/dx; j_l with l=-1 should not be needed 4.4.9 too many confusing error messages are still around 4.4.10 Output should be more informative and less confused, better structured, and ready for automatic reading (.e.g by xcrysden) 4.4.12 Replace "use pwcom" with more "use" statements 4.4.13 Move all plots requiring Fourier (or real-space) interpolation into pawplot.x, leaving in pp.x only gaussian cube and 3d xsf files. Plots of sums and differences should be performed using data files ready for plotting (gnuplot, xsf, cube; may require some tools). pp.x should be simplified a lot and intermediate format should disappear. Also: there is no reason to have dos.x together with projwfc.x 4.4.14 All allocated variables should be deallocated at the end: it makes easier to find memory leaks. Currently most variables are deallocated, but a few (mostly in ffts and in pseudopotential reading) aren't 4.4.16 add_efield must be rewritten from scratch: it is a mess beyond control 4.4.17 Some postprocessing codes could be used with command-line options instead of fortran input 4.4.18 What about transforming 'bands'/'nscf' into a postprocessing code? 4.5 Trouble-makers. inconsistencies, etc 4.5.1 Negative Charge problems (see qe-forge, H on graphene) 4.5.3 G-vector shells, especially in the variable-cell case, and the various tricks to reduce cpu by not re-calculating things that depend on |G| only (see e.g. qvan2). Maybe we should move to interpolation of all quantities and get rid of shells and tricks 4.5.4 PP: complete postprocessing in Gamma case (only average missing), and with CP data (in the latter case: when the data file does not contain the charge/potential, issue an error message saying what is missing and why instead of just crashing in iotk) 4.5.5 CP: add error check if dt^2/emass too large does not allow ortho to converge or cause energy to increase as time step evolve 4.5.6 epsilon.x should be extended at least to have the nonlocal contribution included; there should be a pointer in the documentation explaining how to make a better calculation. 4.5.8 Still a few quirks with the atomic coordinate parser, if DOS characters or tabulators are present (Lorenzo) 4.5.9 Spin-polarized cases: input is clumsy, confusing, error-prone; the case occupations='from_input' is implemented as a special case, for no apparent good reason. 4.5.10 k-points in crystal IBZ should be correctly calculated also if input k-points are not in the IBZ of the lattice. It should sufficient to use brute forces instead of group theory: be expand to full BZ, remove symmetry-equivalent k. Or maybe there is a problem with grids not having the symmetry of the crystal? 5) Files and I/O 5.2 Scratch files are a big mess. It should be possible to open files in places other than tmp_dir without resorting to obscure coding. This is especially serious for PH, D3 etc 5.3 There should be a clearer distinction, both in the code and in the input data, between directories to be read (and left unchanged), directories to be (over-)written, temporary files or directories 5.4 Some inconsistencies between PW and CP in the xml file format (and inconsistencies with the documentation). Also: CP should behave like PW and create a directory if not existent 5.6 There should be a lock mechanism that prevents people from overwriting files of running processes. Should be done with care, or else every time a code crashes will make the following one crash as well! 5.7 Add rotation of restart files, similar to what is done in CPMD: "The number of distinct RESTART files generated during CPMD runs is read from the next line. The restart files are written in turn. Default is 1. If you specify e.g. 3, then the files RESTART.1, RESTART.2, RESTART.3 are used in rotation."