I’m currently attempting to do some parallel computing. In this previous post, I found the computation to be suspiciously slow. So I tried the #2 case in this tutorial as a test.
According to the information in the tutorial, the calculation of the second input file, which contains 5 SCF steps, should be completed in about a minute using 64 processes in parallel. However, when I did it in practice, it took me more than 2 minutes to complete a single SCF step. Something must be very wrong, but I have no clue how to locate the problem. The log, input, and output files are below (I’ve manually killed this job after SCF step 3, but time information is still in the log file): tparal_bandpw_02.log (78.6 KB) tparal_bandpw_02.abi (10.2 KB) tparal_bandpw_02.abo (42.2 KB)
How do I fix this problem? Or is there any other information I can provide to help pinpoint the problem? Thank you very much.
I think this might be where the problem is, I tried to use a single node and the job finished pretty fast. If one “node” in my system has less than 64 CPUs and I have to use two nodes for this, but my ABINIT executable was compiled without OpenMP support, will it cause a problem like this?
Regarding the netlib library, did you compile from source or install a netlib package?
LINALG libraries (OpenBlas, MKL, etc.) are generally compiled with OMP.
Our cluster has only 56 CPUs per node. The result I found is that only when using one node (OMP_NUM_THREADS=1) and total cores 48 < 56, the 48 core 1 thread case, the Proc time is about 200 seconds. For the 64c1t and 32c2t cases, their Proc time is much longer.
And in all three cases, the overall time is still way longer than the result in that tutorial.