I am trying to follow this tutorial to calculate the positron annihilation momentum distribution spectrum. Preliminary attempts show that the memory requirements for this calculation (specifically the Doppler part) are extremely high and directly related to the number of CPUs used. Even using only the Gamma point, the Doppler calculation starts with a 3-4 GB memory footprint per CPU at a scale of ~100 atoms. This makes large-scale parallelism almost impossible (a parallel calculation of about 30 CPUs would require more than 100 GB of memory from the node) and only a small number of CPUs can be used.

What I would like to know is:

Is such a memory requirement normal for positron Doppler calculations?

If this is normal, is it possible to start the Doppler calculation directly by reading the two wave function files (electron and positron) obtained from the previous GS calculation?

Regarding the second point, there is no mention of this usage in the documentation of positron and posdoppler, and it seems that one must perform at least two TC-DFT steps to get the wave functions for electrons and positrons to be able to start the Doppler calculation. However, with a severely limited number of available CPUs, the time consumed by even such two steps of calculations is significant. I think it may be possible to save a lot of time, if I can perform large-scale parallel TC-DFT calculations first to obtain the two wave function files, and then use these files to start the Doppler calculation, using a small number of CPUs that do not run out of memory.

i will try to cover your second question/ second part with a more general answer., although I have no experience with this side of the code.
In Abinit whenever you have chains of datasets, definitely the calculations can be split into independent calculations, this provided that you print in the previous dataset the data required for the next one.

As for the memory requirements, the memory requirements will not scale with the number of nodes but with the amount of data that you will deal with (this at least if you are not using way more processors than necessary and you are in a saturated regime when using parallelism).
For this, I would look into 2 things:

what options for ABINIT parallelism are you using: autoparal or paral_kgb? I would suggest the latter with an emphasis on using npkpt and fully parallelizing over your number of k-points.

how does the calculations scale (SCF step time) with the number of cpus when using either autoparal or paral_kgb?
I hope I understood your question correctly.
Let me know if this helps!
Bogdan

I don’t remember how the memory is distributed exactly in the Doppler calculation, but I would not say that it is impossible for larger scale calculations. I used it in supercells with 100-300 atoms and it was difficult to get it to work, but not impossible. It is normal that the memory requirements are high, Doppler calculations operate on wavefunctions (similar to hybrid functional calculations), but the memory is also distributed when you use parallelism. What architecture are you using, 100GB per node does not seem that large?

It is not possible to calculate Doppler by reading the wavefunction files, the wavefunctions files that you normally get are simply given on the FFT grid and you do not have everything that you need for PAW (all projectors etc.) and Doppler needs full PAW wavefunctions. Doppler calculations are also very time-consuming, so the TC-DFT calculation that comes before is just a fraction of time you would spend anyway and the overall gain from reading wavefunctions would be small.

By the way, is the memory footprint the actual usage or some estimate that it printed? I am not sure that it is more accurate now, but I remember that these estimates written by ABINIT were not very representative of what was actually used (but it has been a few years since I run these calculations and I might be misremembering something).