SCF convergence deteriorates during structural relaxation resulting in crash

Hi,

I am trying to relax a 64 atom supercell of CdMnTe using ionmov 2 and optcell 2.

When dealing with bulk CdTe, the SCF loop converges and the structure relaxes below my tolerance (10e-4 tolmxf) after a few Broyden iterations - all good!

However, once I introduce Mn to the supercell (e.g. Cd0.5Mn0.5Te) the SCF loop does not converge but noticeably also starts at a very high density residual (nres2=2.060E+02) which does not improve after 30 SCF iterations (nres2=7.362E+02). After a few Broyden interations of the SCF loop not finding convergence, the calculation then crashes.

Any suggestions as to why this is happening and what steps/parameters I can change to get the SCF convergence to work would be greatly appreciated. I have played around with changing diemac (8, 12, 50) but this has not helped.

I have attached my input file for the CdMnTe supercell calculation and the corresponding output file.
den.in (5.75 KB)
den_oc2im2_d50.out (74.4 KB)
Cheers,
kalkm1

Update!:

Okay I have managed to get the SCF loop to behave by changing diemix to 0.1 from 0.7, and it now converges in about 60 steps in the first Broyden iteration (output file is attached).

However, I have now encountered a new issue during subsequent SCF convergence loops during the Broyden structural relaxation iterations. The SCF initially improves finding convergence in ~30 steps for the first 5 Broyden iterations but then begins to deteriorate during steps 6 and 7, not finding convergence in 100 SCF steps, with the calculation crashing after the 7th Broyden iteration.

I should note that during some identical calculations (64 atoms Cd0.5Mn0.5Te) the SCF convergence does not deteriorate and the calculation completes successfully. Other times it crashes as described above.

When the SCF loop deteriorates, is this a case of the structural relaxation not finding it’s minimum? If so, how come the calculation crashes after the SCF loop has already completed and, according to the log file (last lines of log file attached), as Abinit is trying to create an HDf-5 file? The last line in the log file before the calculation terminates is always:

  • Creating HDf5 file with MPI-IO support: tmp/den_o_GSR.nc

I’ve attached a figure showing the total energy as a function of stress on the supercell: this shows the structural relaxation progressing well before the stress suddenly increases and the calculation crashes.

If you have more understanding as to why this is happening - whether it is an issue with the code or a minimization problem - and what can be done to avoid it, please let me know!

den_oc2im2_d8_O2sm.out (263 KB)
den.log (7.74 KB)


Cheers,
kalkm1

The last line in the log file before the calculation terminates is always:

  • Creating HDf5 file with MPI-IO support: tmp/den_o_GSR.nc

The problem is not necessarily due to the output of the GSR file.
In your log file, I find the following section:

- Creating HDf5 file with MPI-IO support: tmp/den_o_GSR.nc
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 27
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 28
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 29
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 31
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 32
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 33
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 46
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 47
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 48
application called MPI_Abort(MPI_COMM_WORLD, 13) - process 49

This indicates that abinit aborted execution because a critical condition occurred but this critical
event is explicitly handled by the developer who calls MPI_ABORT to shutdown everything.
In principle, there should be an ABI_MPIABORTFILE file with the error message produced by the first MPI process
that invokes MPI_ABORT. Having the error message would be useful to pinpoint the problem.

Unfortunately, it may happen that ABI_MPIABORTFILE is empty since the MPI runtime environment may kill all the processes without giving them enough time to flush their IO buffer to file.

Thanks for your response!

I do usually get an ABI_MPIABORTFILE file when the runs crash, but as you correctly predicted, it is always empty. I was confused as to why it is empty, so your answer does clarify this somewhat. Unfortunately, this makes it more tricky to diagnose the issue.

I have since made some more observations which might be of interest. For the 64 atom supercell, as I have already mentioned in an above post, sometimes the calculation crashes with this error in the log file, but sometimes it also completes.

For a 216 atom supercell, the calculation will never complete and always crashes at Broyden iteration ~7/8 with the SCF deteriorating after the ~4/5 Broyden iteration (i.e. SCF no longer finding convergence in 100 steps). If I restart the calculation with the atomic positions and acell from one of the relaxation steps which found convergence (e.g. Broyden iteration 4), the calculation will again run for 4/5 Broyden iterations before the SCF deteriorates and the job crashes.

This forum post describes a similar issue: https://forum.abinit.org/viewtopic.php?f=8&t=4131 and suggests compiling Abinit with the -O2 and --enable-avx-safe-mode flags. This solved the issue for the user in the original post, but has not made any difference in my case.

Cheers,
kalkm1