Positron Doppler calculation fails

Hello all,

I’m trying to do a positron Doppler spectrum calculation using abinit 9.6.2, and the job was aborted after the SCF steps were done, with error 41 “insufficient virtual memory”. I’ve noticed that my situation seems similar to this post, but unfortunately I cannot fix my problem with the command ulimit -s unlimited, which is the solution of that post.

I’m using 3 nodes running CentOS 7, each node has 24 CPUs and 64 GB of memory. The structure I’m computing is a 4H-SiC supercell with 215 atoms and 1 Si vacancy. The memory requirement analysis in the log file said this job should need less than 369.899 Mbytes of memory, so I’m not sure where went wrong.

The mentioned .abi and .log file are attached below:
sic-v-si-h.abi (10.5 KB)
sic-v-si-h.log (1.2 MB)

There’s also another anomaly that might be related to this. When I was trying to fix this problem by myself, I’ve run the tutorial to test my installation. Specifically, I’ve run the tpositron_5.abi with 1, 2, 8, and 24 CPU(s). It seems that using more CPUs than the number of k points (in this case, 24 CPUs) will cause a similar crash with the error message “allocatable array is already allocated”, even when I manually set the KGB parallelism parameters. When using no more than 8 CPUs, everything works fine in this tutorial.

Is this a problem with MPI, or my abinit installation? Thanks in advance for your attention.

Hai Iavas,
welcome to the virtual positron annihilators!
off late I have not experimented on doppler failing issue.

you are correct on the K-point parallelization.
In this case since you are taking 8kpoints, 8 cpus are ideal.
Have you tried 8cpu doppler run for SiC supercell?
it may take long time to complete doppler calculation part.
let me try your infile with 8cpu and getback

ps: my understanding is unlike lifetime calculations, doppler requires electronic stuff to be in memory which puts heavy demand on memory.

you can see the failure occured when it moved to doppler calculation

" (*) IPM=Independent particle Model
forrtl: severe (41): insufficient virtual memory"

-rajaraman

I think a better strategy is decoupling relaxation with positron-induced forces and doppler calculation.

run-1 relax defect configuration with the inclusion of positron-induced forces (like in tutorial tpositron_4.abi)
run-2 take the relaxed coordinates from run-1 and do doppler calculation

one should also look at creating potential with more valance electrons (like in tutorial tpositron_7.abi)

-rajaraman

I am able to run doppler part alone with 8cpu.
see results
doppler_out.txt (82.0 KB)

here is the input file I used
I have limited SCF steps to 2 to focus on memory issues.
sic-v-si-h.abi (10.5 KB)

attached images show memory usage during SCF & doppler calc
not much of memory demand during doppler calculation?!

mem-usage_8cpu_electronic_scf
mem-usage_8cpu_doppler_calc

however, when I went for 16cpu, the job failed when it started the doppler calculation after completing those two SCF steps.

The question to developers is why this failure occurs, though actual memory usage during doppler calculation is reasonable.

appears to me that steps happening after SCF and the start of the doppler may be leading to failure.

right now the best option is

  1. relax defect structure including positron-induced forces. (this can run easily with a higher number of CPUs and reduce runtime

  2. take the final configuration run doppler with Ncpu = nkpoints & enough number of electron&positron CSFs (posnstep)

increasing valance electrons for better doppler description may again force memory issues (we are facing such problem with for defects in W)

regards
-rajaraman

1 Like

Sorry for the late reply, there have been too many other things involved in recent months.

I recently restarted the project and seem to have roughly pinpointed the problem: the value of bandpp cannot be greater than 1 for KGB parallelism in Doppler calculation, otherwise, even a very small system (one SiC cell) could cause the program to crash. I compiled abinit-9.6.2 on Ubuntu and CentOS with ATLAS and MKL respectively, and both showed the same behavior.

After using bandpp 1, I still can’t use too many CPUs. for a 128-atom system, I can use 24 CPUs for Doppler calculation to complete, while 96 still have the problem of not having enough virtual memory.

I tried to get the program to print out the wfk file, and it turns out that it is not small (about 3.3 GB for the 128-atom system). I guess if some special conditions have made multiple CPUs each loading a copy of the wave function file in memory independently, it might indeed lead to memory overload.

Anyway, my temporary solution, for now, is to decouple the Doppler calculation from the rest like your suggestion, always use no more than 24 CPUs for the Doppler part, and manually limit the bandpp to 1.

Thanks again for your advice!

1 Like