Hi,
I’m trying the parallelization tutorial run with ABINIT 9.6.2 (openmpi capable), Ubuntu 18.04.6 LTS, Intel Xeon 2.3GHz × 32.
This was only half the number of cores required by the tutorial, so I modified max_ncpus 64
in the tgspw_01.abi file to 32. After a test run, it gave the highest weight proc distribution of 4 x npfft + 8 x npband, so I changed the autoparal
and max_ ncpus
in the tgspw_01.abi file, replaced them with npfft 4
and npband 8
and ran the following command:
mpirun -np 32 tgspw_01.04.08.abi | tee log01.04.08
After some time of normal computation, mpirun crashes. The input, output and log files are as follows (added suffix of log file to upload, no other changes):
tgspw_01.04.08.abi (10.1 KB)
tgspw_01.04.08.abo (63.1 KB)
log01.04.08.log (114.6 KB)
I suspected that this was a problem when using all available cores (which I had encountered on this system before with mpirun), so I also tried using 30 or 24 cores (npfft and npband corresponding to 3×10 and 3×8 respectively), but encountered a similar crash.
I have tried parallel computation based on k points on some smaller cells, like in the basic tutorial. It runs fine when the number of cores used does not exceed the effective number of k points. Is 64 cores mandatory requirement for this more advanced tutorial run? Or is it something else that went wrong?