SCF stops before nstep and tolerance with unexpected warning

elliotperviz · September 2, 2024, 12:08pm

Hi all,

I am running a static scf calculation on a relaxed supercell in ABINIT 9.10.3. In my abinit input, I set nstep 200 and toldfe 10^-11.

Before reaching energetic convergence and the maximum nsteps, the scf cycle stops and outputs the following error with tag scprqt:
nstep= 200 was not enough SCF cycles to converge;
maximum energy difference= 3.638E-11 exceeds toldfe= 1.000E-11

This is unexpected, since the last iteration of the scf is no. 132 out of 200.

Further, there is no output to standard error so I am unsure why the scf cycle is exiting early, and then saying that it has reached maximum steps when in reality this is not the case.

In this particular example, I have performed two runs: In the second run, I start from the density of the previous. I receive the same warning about hitting maximum nstep both times (except I am not reaching max nsteps…).

This behaviour is not exclusive only to this specific system, but also is happening for other supercell configurations which I am running.

Any insight is appreciated, thank you!

Apparently, as a new user I am unable to attach files directly so I provide a dropbox link temporarily:

https://www.dropbox.com/scl/fo/k9nccy7j4g0xerdvnx8qe/AIhmD4tpFNKRV83DKBzZvck?rlkey=lb2vwjflslu5uz0mjuk0lagqf&st=qfv1e3bp&dl=0

bguster · September 3, 2024, 12:38pm

Hi Elliot,

I fail to see where your DATASET 1 is in the ab.abo.
Did you delete some portions of the file?
Can you please do a quick test deactivating the Van der Waals flags?
It will help localize the problem. Please attach the result here.
Bogdan

elliotperviz · September 3, 2024, 4:56pm

Apologies, I was not completely clear regarding the naming conventions I am using. I updated the dropbox with additional files and provide the details and naming conventions below.

There are two separate calculations (i.e. not chained together), so you will not find dataset 1 in ab.abo. Explanation of files:

Input files provided:

Mo.psp8
S.psp8
ab.in

Run 1 output files:

superc_DEN (renamed from abo_DS1_DEN) - this name is chosen as this density is originally intended to be used as a starting point for other calculations, so it was a given a unique name (which I maintain for consistency with the input of run 2)
ab_RUN1.abo (regular abinit output for run 1).

To create Run 2 I simply modified in ab.in jdset 1 → jdset 2 and added the lines
#DATASET 2: restart scf
getden_filepath “superc_DEN”

Run 2 output files:

abo_DS2_DEN
ab_RUN2.abo (regular abinit output for run 2)
log_RUN2 (standard out for run 2)

I am running the test without vdW flags now, and will attach the result when it is done.

elliotperviz · September 4, 2024, 10:56am

So, the test without vdW correction finished and returns the same problem. The relevant abo and log file is attached here (also in the dropbox with the rest of the files).

ab_RUN3.abo (91.1 KB)
log_RUN3.txt (204.1 KB)

bguster · September 5, 2024, 6:53pm

Hi Elliot,

I couldn’t reproduce your error on my compilation of abinit on 1 kpoint (eliminating the paral_kgb options)

Can you try to see if you obtain the same problem if you use autoparal 1 instead of the paral_kgb options?

elliotperviz · September 9, 2024, 4:25pm

ab_autoparal.abo (4.4 KB)

ab_autoparal2.abo (6.3 KB)

Thank you for your helpful suggestions so far.

Looking at the output with autoparal = 1 (see ab_autoparal.abo) shows a similar truncation of the standard output even without beginning the scf cycle (again no error).

Then, I believe source of this behaviour is coming from the use of openmpi parallelisation, but not entirely:

For example, running on a single core forms a complete standard output with autoparal=1 (ab_autoparal2.abo).

But, I can run a full scf cycle on a primitive MoS2 cell on a single core or with many cores without issue.

The strange thing is that this error seems to occur with only the supercells, and even then not reliably. I have tested with other supercell systems and cannot always reproduce the error when using mpirun with ncores > 1.

tldr - I think some of the scf steps are being “lost” in the openmpi parallelisation. However, without parallelisation I cannot run my calculations given the size of the supercells that I want to work with.

bguster · September 10, 2024, 11:41am

Hi Elliot,

My best guess that it all narrows down to the compilation of Abinit.

For further reference, you can actually perform a couple of automated tests in your abinit/build directory using $pathtoabinit/tests/runtests.py paral (here paral is a keyword that will perform all tests from the abinit test suite that are labelled with it). If everything goes fine, then your compilation is fine. Otherwise, you might need to reconsider the libraries that you link during the compilation.

Let me know!
Bogdan