The kgb parallelization for RF calculations

Dear ABINIT developers/users

Can we parallelize/optimize the kgb for RF calculations in abinit?
I tried to calculate the response functions for the electric field which starts from previous SCF result(WFK file).
The problem that I am running into trouble is the parallelization for the RF calculation.
The calculation could not be optimized and took a 3-4 times longer time than SCF cycle.


I found the following messages in log file:

nproc =  144   -> not optimal: autoparal keyword recommended in input file

and,

npfft, npband, npspinor and npkpt:     1    1    1  144

It seems that the calculation is parallelized only over k-points.
Many bands are set in my calculation, so the band parallelization should be required to optimize the calculation.
For the reference, the parameters above in the SCF calculation were set as below:

npfft, npband, npspinor and npkpt:     3    3    1   16

Also found,

--- !WARNING
src_file: mpi_setup.F90
src_line: 672
message: |
  Your number of spins*k-points (=16) and bands (=1200) will not distribute correctly
  with the current number of processors (=144).
  You will leave some empty.
  YOU ARE STRONGLY ADVICED TO ACTIVATE AUTOMATIC PARALLELIZATION!
  PUT "AUTOPARAL=1" IN THE INPUT FILE.

The message says to put autoparal=1 in the input file, but I did that actually.
Seems that the autoparal tag does not work for the RF calculation.

Thus, I tried to calculate without autoparal i.e. set parameters manually with respect to parallelization (npfft, npband, npspinor, and npkpt) using paral_kgb tag.
But I got the following comment:

paral_kgb != 0 is not available in optdriver 1. Setting paral_kgb to 0

and,

For non ground state calculation, set bandpp, npfft, npband, npspinor npkpt and nphf to 1

The RF is optdriver=1, thus the paral_kgb tag also could not be used, and not be parallelized over kgb.


I found a similar post for the kgb parallelization.

According to this post, it seems the parallelization was only made for k-points and spin for this time.
Is this a same problem with mine?


The attached is what I used in the calculation.

Sincerely yours,

au_cond.in (1.8 KB)

Hello Hiroki,

  1. the autoparal keyword is ignored in the DFPT runs. The message you sent is probably from your dataset 1 and the GS. paral_kgb is not implemented for DFPT (esp the plane wave G vector distribution)
  2. Good news is that the DFPT parallelizes automatically over k and bands, but you should ideally use a rectangular distribution nproc = nkpt * npband where npband is the number of band pools and a divisor of nband (avoid prime numbers there)
  3. which version of abinit are you running? In the latest (2021) I distributed the memory, which may help your calculation scale as well. It should also print to log some more information on the distribution. As you may run several different perturbations with different reduced symmetries the nkpt may change making re-distribution more complex as you loop through the perturbations.
  4. the post you found is for finite E field, which is very different from the DFPT

From your input it seems you are doing a simple ddk run. If you just want the momentum matrix elements there is a faster way with wfk_task = “wfk_ddk”
https://docs.abinit.org/variables/gstate/#wfk_task

best

MV

Dear Verstra,

Thank you for your reply and suggestions!

#1 and 2:
You say both autoparal and paral_kgb are not implemented in the DFPT runs, so how does the DFPT parallelize automatically over k and bands? Is no other keyword required for automatic parallelization?

I’ve run again just changing nband = 1197 from 1200 which can match nproc(144) = nkpt(16) * npband(9), but seems still parallelized over only k…
For the reference, I attach the log file. (I saw the message in line 306)

log.txt (874.2 KB)

#3:
the version is 8.8.4 (the latest is 9.6.2, so it must be too old…) the automatic parallelization in DFPT is not implemented in this version?
I know i should update the version, but for now, i don’t want change anything if possible… (I will update it after calculations that I’m doing and will do have done.) The keyword wfk_task is also implemented from ver.9.0.0, so I cannot use it for now.

Sincerely

Hiroki

There is no input variable, the code does the distribution on its own, first over k, then bands if it can. In principle the code should be robust, and deal with a few excess processors, but sometimes it crashes if the nproc is not strictly equal to nkpt*npband (and npband is a divisor of nband). This also depends on the build of MPI which you use - it can trigger a timeout in the sleeping processors.

Things were parallelized over bands in 8.8.4, but not distributing memory. Please do try with a recent version! In the end you will save time by re-doing the ground state and so on.