KGB Parallelization Tutorial Run Crashs

Hi,

I’m trying the parallelization tutorial run with ABINIT 9.6.2 (openmpi capable), Ubuntu 18.04.6 LTS, Intel Xeon 2.3GHz × 32.

This was only half the number of cores required by the tutorial, so I modified max_ncpus 64 in the tgspw_01.abi file to 32. After a test run, it gave the highest weight proc distribution of 4 x npfft + 8 x npband, so I changed the autoparal and max_ ncpus in the tgspw_01.abi file, replaced them with npfft 4 and npband 8 and ran the following command:
mpirun -np 32 tgspw_01.04.08.abi | tee log01.04.08

After some time of normal computation, mpirun crashes. The input, output and log files are as follows (added suffix of log file to upload, no other changes):
tgspw_01.04.08.abi (10.1 KB)
tgspw_01.04.08.abo (63.1 KB)
log01.04.08.log (114.6 KB)

I suspected that this was a problem when using all available cores (which I had encountered on this system before with mpirun), so I also tried using 30 or 24 cores (npfft and npband corresponding to 3×10 and 3×8 respectively), but encountered a similar crash.

I have tried parallel computation based on k points on some smaller cells, like in the basic tutorial. It runs fine when the number of cores used does not exceed the effective number of k points. Is 64 cores mandatory requirement for this more advanced tutorial run? Or is it something else that went wrong?

Hi,

I succeeded to reproduce your problem… :wink:

You run the tests on a :
ubuntu 18.04 / OpenMPI 2.1.1 / GNU 7.5 / Netlib

I was able to run this test, for example, under :

  • ubuntu 18.04 / MPICH 3.3 / GNU 7.5 / Netlib
  • ubuntu 18.04 / OpenMPI 3.x or 4.x / GNU 10.2 / OpenBlas

I think OpenMPI 2 is too old :roll_eyes:

jmb

Hi,

Thanks for your help! OpenMPI 4.1.4 did solve this problem. apt somehow thought 2.1.1 is the newest version, so I didn’t notice it’s that old :roll_eyes: