Efficiency of calculating large supercells

Dear community,

I am new to ABINIT and want to use it to do some positron calculations.

In a first step I am trying to perform electronic ground-state calculation of a supercell containing 256 aluminium atoms (without positron) but I found the computation takes quite long compared with VASP which I have used too.

I know that different codes use different approaches when doing computation, which might result in some efficiency difference, but 20 times seems too large, which makes me wonder if I have done any settings which markedly slowed my ABINIT down, or not implemented any setting to speed it up.

I copy my input in the following, only omitting the coordinates of the atoms.

Any suggestions would be greatly appreciated. Thank you!

System: Al_256

Lattice parameters

acell 1 1 1
rprim
30.607516101296 0.000000000000 0.000000000000
0.000000000000 30.607516101296 0.000000000000
0.000000000000 0.000000000000 30.607516101296
chkprim 0
maxnsym 20000

Atomic species and positions

ntypat 1
znucl 13
pseudos “Al.xml”
natom 256
typat 256*1
xcart
0.000000000000 0.000000000000 0.000000000000
0.000000000000 3.825939512662 3.825939512662

ecut 15
pawecutdg 30
nstep 100
toldfe 1.0e-6

KPOINTS

kptopt 1
ngkpt 4 4 4
nshiftk 1
shiftk 0.5 0.5 0.5

Define occupation and smearing

occopt 7
tsmear 0.01

Optimization of lattice parameters

optcell 2
ionmov 22
ntime 50
dilatmx 1.05
ecutsm 0.5
#other parameters
prtwf 0
autoparal 1

Dear yagniz,

Abinit can handle well 768 electrons by all scalability tests, but before doing any comparison with other codes, you could try to do some preliminary tests on your local implementation.

In testing your compilation, you can make use of the test suite inside Abinit that are aimed at the parallel runs.
Here I assume you have a more or less standard mpi compilation, so you can run inside the _build/tests directory the python script called runparal.py or runtests.py that you find in src/tests.
You can check the reference times available in the output reference files with your running times.

The next step would be to do a scalability test on your own compilation. Do you get the expected speedup when increasing the number of processors?

Finally, autoparal in your input takes a best guess on how to internally parallelize. It works mostly fine but sometimes fine tuning is required. So it might be useful to have a look at paral_kgb and try to have a balanced distribution over the number of bands in your calculation.

As a side remark, can you spot in the output file of abinit the expected symmetry of your system? From the look of it, you have only one degree of freedom in your calculation, namely the lattice constant.

Let us know!
Bogdan

Dear Bogdan,

thank you very much for your reply. Unfortunately I could not find the original build folder of the current installation, so I decided to install a newer version (10.0.7) on our cluster for the tests.
The new version gave very similar running time to the old version 9.6.2 for running the same script. Then I performed the tests in the src/tests. Surprisingly (or not?) I did not obtain the expected speed documented in references. The wall time is at least double (up to 8 times) for the same test. How should I understand this? Non of the libraries or compilers are older than demonstrated in the build tutorial. So is this a pure hardware issue? The processors are indeed not new, but I am still surprised if the efficiency would be influenced that badly.
By increasing the number of processors, I do get some speedup, but this is not very linear. I guess it has something to do with the parallelization strategies implemented in different cases (I used autoparal just for a quick test).
I indeed have noticed that autoparal does not always give the best case, but after checking I think it did not use very bad choice (npband is a divisor of nband), and I do not think it explains the big discrepancy between my real calculation speed and the expectation.
By symmetry of the system do you mean the nsym in the output file? It is 12288, which is 48*256, as I have fcc symmetry and 256 atoms. As I have perfect lattice (for current case), indeed only lattice constant varies during calculation. Shouldn’t that be an advantage for a faster calculation instead of slower?
The current calculation is not unbearably long (roughly a day for calculating 256 Al atoms using 16 cores), but the structure is already very close to convergence. When I have defects introduced which breaks symmetry and increase k points, I can imagine the time would be much longer.
Anyway, if I have done nothing wrong in the setting such as the choice of algorithm etc., then that is already useful takeaway for me. For further speedup I should probably simply use more cores and tweak the parallelization?
Thank you
yagniz

Hi Yagniz,

What you are describing is perfectly reasonable.
Regarding the speed-up, I should have been more clear that I meant the speed-up on your system since it’s large enough to explore a wide range of processors. In your particular case 256 atoms on 16 processors to take one day to compute doesn’t sound completely unreasonable to me. Indeed, having one degree of freedom allows for a faster convergence of the relaxation (but not necessarily each SCF step convergence).

I would recommend to have a look at how long does an SCF step takes under several cases: 16, 32, 64 or 128 processors (as in not to completely relax under each case but merely to perform a 6-7 SCF iterations calculations in each case to see the times). Ideally you should get a factor of 8 decrease in SCF step time (from 16 to 128), in practice if you get a factor between 6 and 8 of speed-up, it’s already reasonable. For a proper comparison I would make use of paral_kgb and control that the bands are distributed in as similar way as possible in each processor’s configuration, otherwise you might not have a smooth curve when comparing Speed-up vs SCF step time.

(And yes, it’s always a good idea to use as close as possible to the latest release)

Let me know if you reach a reasonable experience with your calculations!
Bogdan

Dear Bogdan,

thank you again for your suggestions. I will give it a try.

Best,
yagniz