OUT_OF_MEMORY issues encountered with flexoelectric calculations

Hi,
I’ve been attempting to run flexoelectric calculations for 2-atom systems, but I’m encountering OUT_OF_MEMORY job cancellations, even when using 96 and 128 cores. Do you have any suggestions on how to resolve this issue?

Hi there,

Can you provide some details on your compilation?
What does the log file of abinit indicates? It might show where the calculations stops if it requires more memory.
Simply increasing the number of processors is not always the optimal way to benefit from more memory. A more reflective question is what are your cluster configurations and good practices regarding the use of memory?

Let us know!
Bogdan

‘slurmstepd: error: Detected 1 oom_kill event in StepId=22981283.batch. Some of the step tasks have been OOM Killed.’ This was the slurm output I obtained.

The log file of the flexoelectric calculation ended like this:
’ One components of 2nd-order total energy (hartree) are
1,2,3: 0th-order hamiltonian combined with 1st-order wavefunctions
kin0= 1.01982813E+03 eigvalue= 7.75501918E+01 local= -1.33385299E+03
4,5,6: 1st-order hamiltonian combined with 1st and 0th-order wfs
loc psp = 0.00000000E+00 Hartree= 0.00000000E+00 xc= 0.00000000E+00
note that “loc psp” includes a xc core correction that could be resolved
7,8,9: eventually, occupation + non-local contributions
edocc= 0.00000000E+00 enl0= 2.57091043E+02 enl1= 0.00000000E+00


Perturbation wavevector (in red.coord.) 0.000000 0.000000 0.000000
Perturbation : 2nd derivative wrt k, idir1 = 3 idir2 = 1
littlegroup_pert: only one element in the set of symmetries for this perturbation:
1 0 0 0 1 0 0 0 1
symatm: atom number 1 is reached starting at atom
1
symatm: atom number 2 is reached starting at atom
2
symkpt : not enough symmetry to change the number of k points.
getmpw: optimal value of mpw= 151448
Memory required for psi0_k, psi0_kq, psi1_kq: 610.1 [Mb] <<< MEM
About to read wavefunctions from: flexoo_DS1_WFK
wfk_read_my_kptbands: , wall: 2.52 [s] , cpu: 1.43 [s] <<< TIME
getmpw: optimal value of mpw= 151448
qpt is Gamma, psi_k+q initialized from psi_k in memory
Initialisation of the first-order wave-functions :
ireadwf= 0

  • dfpt_looppert: read the DDK wavefunctions from file: flexoo_DS2_1WF9
  • dfpt_looppert: read the DDK wavefunctions from file: flexoo_DS2_1WF7

getcut: wavevector= 0.0000 0.0000 0.0000 ngfft= 135 135 135
ecut(hartree)= 150.000 => boxcut(ratio)= 2.02654

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------’

I also tried to implement parallelism for the DFPT part by setting autoparal = 1, but still getting the out_of_memory error.

Thanks
Dominic

Hi Dominic,

Nothing looks off from what you’ve shared.
I take it that the large ecut value is absolutely necessary for your case.
The ecut that you are using will generate an enormous real-space FFT grid.
Can you check if your parallelization is optimal? k-point parallelization? ngfft distribution?
I would suggest to look into the paral_kgb option of abinit:

In paral_kgb essentially you have 3 layers of parallelization over k-points, G-vectors (meaning ngfft grid) and bands (4 layers if you consider spin). In order to make sure that you are using your 128 processors efficiently (including the associated memory), you might try a manual distribution of the calculation workload where you set alongside paral_kgb = 1, the npfft, npkpt/ np_spkpt, npband etc.
The most efficient use is normally the npkpt/np_spkpt parallelization if you have a lot of k-points.
Also have a look at what autoparal keyword is producing in your case, if it is active. autoparal is an option which takes the best guess on the optimal parallelization of what I just described.

Let us know!
Bogdan

Hi Dominic,

your out-of-memory problem arises in the calculation of the second-order derivatives of the total energy with respect to two wave vectors k, i.e., the calculation of the d2_dkdk response functions. The abnormal large use of memory in this part of the calculation is an issue we are aware of and have encountered few times in the past. However, the corresponding routines have not been optimized yet, although we plan to do it hopefully soon.

At the moment, my only suggestion is to reduce ecut in your calculation or, in case you are working with a slab-like supercell, reduce as much as possible the vacuum regions.

Miquel