Crash for the calculations of dkdk wavefunction for higher ecut and dense k points

luowei · August 19, 2024, 5:46am

Dear Abinit Developers,

I have recently been working on calculating natural optical activity using Abinit. However, I’ve encountered a recurring issue when attempting to calculate the response of the dkdk wavefunction. The code consistently crashes when using a larger ecut (30 Hartree) or denser k-grids. My unit cell contains 40 atoms. Interestingly, the calculation works fine with a lower ecut (less than 20 Hartree) and sparse k-grids.

Do you have any experience or advice on how to resolve this issue?

Best regards,
Wei

mverstra · August 19, 2024, 10:27am

Hi Wei,

Please follow the nettiquette on the main page, and send a profile of your abinit version, hardware, input/log etc

This sounds like a banal memory issue. Higher order dfpt has fewer symmetries which may explain things, but it should distribute easily over k and spin. Distribution is probably done over bands for the cpu load but not the memory.

If you have a massive supercell with no k points, then you have to bribe me with beer and chocolate to continue the parallel memory distribution into second order dfpt.
In the meantime, increase memory per node and turn on openmp (eg 4 cores per task, and 4 times fewer tasks per node)

M.

luowei · August 19, 2024, 11:20am

Hi, Mverstra

Thanks very much for the quick reply!
The version is abinit/9.10.3. The input file (and poscar) and log file are shown below (it says I can not upload a file since I am a new user). I use Rocky Linux, version 8.9.

The crash happens for step 3.

# Linear response function of natural optical activity for SiO2 quartz
# (M. Royo and A. Zabalo 24.03.2023) 

ndtset 1

##Set 1 : ground state self-consistency
##*************************************
getwfk1   0  # do not read wf from previous
kptopt1   1  # do not need to change for GS calculations, automatic generate depends on some parameters.
tolvrs1   1.0d-18  #  usually 10-12 is ok
#   ## may beed tolwfr2   1.0d-20  due to scf and for electric field later
#
##Set 2: Response function calculation of d/dk wave function
##**********************************************************
kptopt2   2  #  Take into account only the time-reversal symmetry: k points will be generated in half the Brillouin zone, with the appropriate weights. (This is the usual mode when preparing or executing a RF calculation at q=(0 0 0) without non-collinear magnetism
iscf2    -3  # while negative values correspond to non-self-consistent calculations.
rfelfd2   2  # only the derivative of ground-state wavefunctions with respect to k
tolwfr2   1.0d-20  # Note that the preparatory GS calculations before a RF calculations must be highly converged. Typical values for these preparatory runs are tolwfr between 1.0d-16 and 1.0d-22.

#Set 3: Response function calculation of d2/dkdk wave function
#*************************************************************
kptopt   2
getddk   2 ## read ddk wavef from step 2
iscf    -3 # nscf cal
rf2_dkdk 3 ## The total, symmetric plus antisymmetric, response is activated with rf2_dkdk == 3.
tolwfr   1.0d-16

##Set 4 : Response function calculation of electric field
##*******************************************************
getddk4   2  # ## read ddk wavef from step 2
kptopt4   2 # Take into account only the time-reversal symmetry: 
rfelfd4   3  # only the generation of the first-order response to the electric field, assuming that the data on derivative of ground-state wavefunction with respect to k is available on disk.
tolvrs4   1.0d-8 
prepalw4  4 # LongWave calculation, Activates the calculation of perturbations required to build spatial- dispersion tensors which combine two electric field perturbations. It is therefore the option to choose if one intends to run subsequent longwave calculations with lw_natopt = 1.   
#
##Set 5 : Natural optical activity calculation
##********************************************
optdriver5 10  # 10--> longwave response functions (LONGWAVE), routine longwave. 
kptopt5   2 #Take into account only the time-reversal symmetry: k points will be generated in half the Brillouin zone, with the appropriate weights
get1wf5   4 #GET the first-order wavefunctions from _1WF file
get1den5  4 # GET the first-order density from _1DEN file
getddk5   2 #GET the DDK wavefunctions from _1WF file
getdkdk5  3 #GET the 2nd derivative of wavefunctions with respect to K, from _1WF file, 
##If getdkdk is positive, its value gives the index of the dataset for which the output wavefunction file appended with _1WF must be used.
lw_natopt5 1  ### 1 -->  Natural optical activity tensor is calculated.

#######################################################################

#Common input variables
#**********************
getwfk 1 #GET the wavefunctions from _WFK file , 1 means for the first step calculation
useylm 1  #USE YLM (the spherical harmonics), 1, yes

#Definition of the unit cell
#***************************
structure "poscar:t04_POSCAR"

#acell  1.0 1.0 1.0 
#rprim 7.0371999741000000    0.0000000000000000    0.000000000000000
#      0.0000000000000000    7.0629000664000001    0.0000000000000000
#      0.0000000000000000    0.0000000000000000    9.9919004440000005
#
#
##Definition of the atom types and positions
##******************************************
#natom   20
#ntypat  3
#znucl   56 40 16
#typat   1 1 1 1 4*2 12*3
#xred  0.7426700000000001  0.0375900270000000  0.7503899930000000
#      0.7573299999999999  0.9624099730000000  0.2503899930000000
#      0.2573299999999999  0.5375900270000000  0.7496100070000000
#      0.2426699999999999  0.4624099730000000  0.2496100070000000
#      0.7494000199999999  0.5000000000000000  0.5001000170000000
#      0.7505999800000001  0.5000000000000000  0.0001000169999998
#      0.2505999800000001  0.0000000000000000  0.9998999830000000
#      0.2494000199999999  0.0000000000000000  0.4998999830000000
#      0.4702999890000000  0.2824999990000000  0.5292999740000000
#      0.0297000110000000  0.7175000010000000  0.0292999740000000
#      0.5297000110000001  0.7824999990000000  0.9707000260000000
#      0.9702999889999999  0.2175000010000000  0.4707000260000000
#      0.6909000280000001  0.5042999980000000  0.2502000030000000
#      0.8090999719999999  0.4957000020000000  0.7502000030000000
#      0.3090999719999999  0.0042999980000000  0.2497999970000000
#      0.1909000280000002  0.9957000020000000  0.7497999970000000
#      0.5429000260000000  0.7899000050000000  0.5321999790000000
#      0.9570999740000000  0.2100999950000000  0.0321999790000000
#      0.4570999740000000  0.2899000050000000  0.9678000210000000
#      0.0429000259999999  0.7100999950000000  0.4678000210000000

#Gives the number of band, explicitely (do not take the default)
#***************************************************************
nband 120

#Definition of the planewave basis set and k-point grid
#******************************************************
ecut 30  ### prl paper use 50Ha
ngkpt 4 4 8
nshiftk 1
shiftk 0.0 0.0 0.5

#Definition of the SCF procedure
#*******************************
nstep 1
diemac 9.0 #model DIElectric MACroscopic constant, For metals 10e6,wider gap insulators, use 2.0-4.0, silicon, 12
autoparal 1

pp_dirpath "/home/weil/work/NOA-stengle/pseudo/"
pseudos "Ba_without_nlcc.psp8, Ti_without_nlcc.psp8, S_without_nlcc.psp8"

##############################################################
# This section is used only for regression testing of ABINIT #
##############################################################
## After modifying the following section, one might need to regenerate the pickle database with runtests.py -r
#%%<BEGIN TEST_INFO>
#%% [setup]
#%% executable = abinit
#%% test_chain =  tlw_8.abi
#%% [files]
#%% files_to_test = 
#%%   tlw_8.abo, tolnlines= 12, tolabs=  3.e-04, tolrel=  5.00e-04, fld_options=-medium
#%% [paral_info]
#%% max_nprocs = 4
#%% [extra_info]
#%% authors = M. Royo and A. Zabalo
#%% keywords = DFPT, LONGWAVE
#%% description = Natural optical activity of quartz
#%% topics = longwave
#%%<END TEST_INFO>

- dfpt_looppert: read the DDK wavefunctions from file: ino_DS2_1WF93
 
 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft= 120 120  60
         ecut(hartree)=     30.000   => boxcut(ratio)=   2.20673
 
 getcut : COMMENT -
  Note that boxcut > 2.2 ; recall that boxcut=Gcut(box)/Gcut(sphere) = 2
  is sufficient for exact treatment of convolution.
  Such a large boxcut is a waste : you could raise ecut 
  e.g. ecut=   36.522324 Hartrees makes boxcut=2
 
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: got SIGCONT
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source     
libpthread-2.17.s  00002AED42904630  Unknown               Unknown  Unknown
libirc.so          00002AED479FC5D0  __intel_avx_rep_m     Unknown  Unknown
abinit             0000000000C5B0FC  Unknown               Unknown  Unknown
abinit             0000000000C145AF  Unknown               Unknown  Unknown
abinit             000000000062C19A  Unknown               Unknown  Unknown

This is the configuration file
A B C
   1.0
    11.6709995269834721    0.0000000000000000    0.0000000000000000
    -5.8354997634917360   10.1073820779238535    0.0000000000000000
     0.0000000000000000    0.0000000000000000    5.8330001831000002
A B C
   6    6   18
Direct
  0.0000000000000000  0.3342300060000001  0.7430999879999999 
  0.3342300060000001  0.0000000000000000  0.7430999879999999 
  0.6657699939999999  0.6657699939999999  0.7430999879999999 
  0.0000000000000000  0.6657699939999999  0.2430999880000000 
  0.6657699939999999  0.0000000000000000  0.2430999880000000 
  0.3342300060000001  0.3342300060000001  0.2430999880000000 
  0.6666666666666666  0.3333333333333333  0.9332000019999999 
  0.3333333333333333  0.6666666666666666  0.4332000020000000 
  0.6666666666666666  0.3333333333333333  0.4332000020000000 
  0.3333333333333333  0.6666666666666666  0.9332000019999999 
  0.0000000000000000  0.0000000000000000  0.0713000299999999 
  0.0000000000000000  0.0000000000000000  0.5713000300000000 
  0.0000000000000000  0.1658400300000001  0.2925000189999999 
  0.1658400300000001  0.0000000000000000  0.2925000189999999 
  0.8341599699999999  0.8341599699999999  0.2925000189999999 
  0.0000000000000000  0.8341599699999999  0.7925000189999999 
  0.8341599699999999  0.0000000000000000  0.7925000189999999 
  0.1658400300000001  0.1658400300000001  0.7925000189999999 
  0.6678099930000001  0.1679300070000002  0.7080999910000000 
  0.5001200140000001  0.3321900069999999  0.7080999910000000 
  0.8320699929999998  0.4998799859999999  0.7080999910000000 
  0.3321900069999999  0.8320699929999998  0.2080999910000001 
  0.4998799859999999  0.6678099930000001  0.2080999910000001 
  0.1679300070000002  0.5001200140000001  0.2080999910000001 
  0.8320699929999998  0.3321900069999999  0.2080999910000001 
  0.6678099930000001  0.4998799859999999  0.2080999910000001 
  0.5001200140000001  0.1679300070000002  0.2080999910000001 
  0.1679300070000002  0.6678099930000001  0.7080999910000000 
  0.3321900069999999  0.5001200140000001  0.7080999910000000 
  0.4998799859999999  0.8320699929999998  0.7080999910000000

luowei · August 21, 2024, 7:57am

Hi, Mverstra

Indeed, when I change the script and use 4 cores per task. There is no error now.
However, I still do not understand the reason since I check the memory and find the memory using is a little, see below picture.

Best regards,
Wei