PAW Parallelization for positron calculation

I am running TCDFT calculations of positron lifetimes in a fairly complex material (Y1B2Cu3O7) and would like to get a bulk and a vacancy lifetime.

I’ve scanned iteratively ecut, ecutpawdg and the kpoints, and determined that ecut=ecutpawdg=9Ha and 6x6x6 gives stable values of total energy and positron lifetime. However, as soon as I attempt any system greater than a 1x1x1 primitive cell, the simulation time become very long (potentially several days to a couple of weeks based on how long the electron/positron SCF loop takes… several hours).

I am running on a cluster where I have access to 40+ nodes with 32 processors per node. I think I’m allocating considerable resources (having compiled abinit with_mpi, set autoparal=1 and allowing 2 nodes and 32 processes per node in the slurm file). Doubling the number of cores doesn’t seem to do much. Are those simulations just extremely expensive or is it possible to do better?

I’m looking for guidance on choosing the correct abinit flags for my application and any suggestions on basic settings I could have overlooked. I read the tutorial on parallelization but that didn’t help me, perhaps I’m just too inexperienced.

Thanks in advance!