Crash while calculating non-linear optical response

I for the most part an experienced Vasp user and I am giving abinit a whirl as it has additional capabilities (such as non-linear optical coefficients) that Vasp does not. I have gone through the basic tutorials as well as the optic and DFPT tutorials. Based upon the DFPT tutorial, I have put together an input file for calculating the NLO response function for a defect in a supercell. Like the tutorial, the calculation is broken into 7 parts. The calculation proceeds without error the the third step and then fails without a clear error message.

In particular, the job fails with the line

 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  80  80  80
         ecut(hartree)=     40.000   => boxcut(ratio)=   2.08859
Abort(13) on node 7 (rank 7 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 13) - process 7

Since I am a new user and I cannot upload attachments, I have attached the input file below:. Does anyone have any suggestions as to what I might be doing wrong?

ndtset 7
############################################################################################
#### Global Variables.
############################################################################################
 ngkpt 4 4 4
 nshiftk 1
 shiftk    0.5    0.5    0.5
 ecut 16
 pawecutdg 40
 nstep 100
 nsppol 1
 ecutsm 0.5
 pawxcdev 0
 nband 160
 indata_prefix "indata/in"
 tmpdata_prefix "tmpdata/tmp"
 outdata_prefix "outdata/out"
 pp_dirpath "/data/abinit/pseudos"
 pseudos 
    "C.xml,
    N.xml"
############################################################################################
####                                         STRUCTURE                                         
############################################################################################
 natom 63
 ntypat 2
 typat
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 2
 znucl 6 7
 xred
  -8.1727793557d-05    0.9991255085    1.0000817278
  -1.8435613594d-05   5.2501921200d-04    0.5001060909
    0.0016378679    0.4991954111    0.9983621321
    0.9978086865    0.5008868602    0.4973330238
    0.4998939091   5.2501921200d-04    1.0000184356
    0.4998171893   7.1321590155d-04    0.5001828107
    0.5026669762    0.5008868602    0.0021913135
    0.4900396379    0.4951612619    0.5099603621
    0.3777663362    0.3785896418    0.3750992211
    0.3761215446    0.3754360679    0.8759010340
    0.3752321248    0.8747678752    0.3755613916
    0.3749928802    0.8750071198    0.8755930508
    0.8749416146    0.3750583854    0.3687393417
    0.8753010584    0.3746989416    0.8735954097
    0.8714103582    0.8722336638    0.3750992211
    0.8745639321    0.8738784554    0.8759010340
    0.3761215446    0.1259010340    0.1254360679
    0.3749928802    0.1255930508    0.6250071198
    0.3777663362    0.6250992211    0.1285896418
    0.3752321248    0.6255613916    0.6247678752
    0.8753010584    0.1235954097    0.1246989416
    0.8745639321    0.1259010340    0.6238784554
    0.8749416146    0.6187393417    0.1250583854
    0.8714103582    0.6250992211    0.6222336638
    0.1264045903    0.3746989416    0.1246989416
    0.1240989660    0.3754360679    0.6238784554
    0.1240989660    0.8738784554    0.1254360679
    0.1244069492    0.8750071198    0.6250071198
    0.6312606583    0.3750583854    0.1250583854
    0.6249007789    0.3785896418    0.6222336638
    0.6249007789    0.8722336638    0.1285896418
    0.6244386084    0.8747678752    0.6247678752
    0.1250994533    0.1249005467    0.3746491815
    0.1251492983    0.1248507017    0.8748507017
    0.1247303250    0.6250693536    0.3750693536
    0.1250994533    0.6246491815    0.8749005467
    0.6249306464    0.1252696750    0.3750693536
    0.6253508185    0.1249005467    0.8749005467
    0.6249306464    0.6250693536    0.8752696750
    0.0016378679    0.2483621321    0.2491954111
    0.9999182722    0.2500817278    0.7491255085
    0.9978086865    0.7473330238    0.2508868602
    0.9999815644    0.7501060909    0.7505250192
    0.5026669762    0.2521913135    0.2508868602
    0.4998939091    0.2500184356    0.7505250192
    0.4900396379    0.7599603621    0.2451612619
    0.4998171893    0.7501828107    0.7507132159
    0.2508744915    0.2500817278   8.1727793557d-05
    0.2494749808    0.2500184356    0.5001060909
    0.2494749808    0.7501060909   1.8435613594d-05
    0.2492867841    0.7501828107    0.5001828107
    0.7508045889    0.2483621321    0.9983621321
    0.7491131398    0.2521913135    0.4973330238
    0.7491131398    0.7473330238    0.0021913135
    0.7548387381    0.7599603621    0.5099603621
    0.2489344108    0.0010655892    0.2503311004
    0.2498738311    1.0001261689    0.7501261689
    0.2506584627    0.5004522646    0.2504522646
    0.2489344108    0.5003311004    0.7510655892
    0.7495477354  -6.5846272032d-04    0.2504522646
    0.7496688996    0.0010655892    0.7510655892
    0.7495477354    0.5004522646    0.7493415373
    0.7608648352    0.4891351648    0.2391351648
 acell    1.0    1.0    1.0
 rprim
   13.4537196110   2.4577523146d-06  -8.7490491474d-07
  -9.7139789812d-07   13.4537196110  -1.2410041145d-05
   2.3612581455d-06   1.0923686090d-05   13.4537196110

#####################
##### DATASET 1 #####
#####################
##############################################
####                SECTION: basic               
##############################################
 kptopt1 2
 occopt1 7
 toldfe1 1e-12
##############################################
####                SECTION: files               
##############################################
 prtden1 1
 prtwf1 1
##############################################
####                SECTION: paral               
##############################################
 autoparal1 1


#####################
##### DATASET 2 #####
#####################
##############################################
####                SECTION: basic               
##############################################
 kptopt2 2
 occopt2 7
 iscf2 -2
 tolwfr2 1e-22
##############################################
####                SECTION: files               
##############################################
 getden2 1
 getwfk2 1
 prtwf2 1
##############################################
####                SECTION: paral               
##############################################
 autoparal2 1


#####################
##### DATASET 3 #####
#####################
##############################################
####                SECTION: basic               
##############################################
 kptopt3 2
 tolwfr3 1e-22
##############################################
####                SECTION: dfpt                
##############################################
 rfelfd3 2
##############################################
####                SECTION: files               
##############################################
 getwfk3 2
 prtwf3 1


#####################
##### DATASET 4 #####
#####################
##############################################
####                SECTION: basic               
##############################################
 kptopt4 2
 tolvrs4 1e-12
##############################################
####                SECTION: dfpt                
##############################################
 rfphon4 1
 rfelfd4 3
 rfstrs4 3
 prepanl4 1
##############################################
####                SECTION: files               
##############################################
 getwfk4 2
 getddk4 3
 prtwf4 1
 prtden4 1


#####################
##### DATASET 5 #####
#####################
##############################################
####                SECTION: basic               
##############################################
 kptopt5 2
 tolwfr5 1e-22
##############################################
####                SECTION: dfpt                
##############################################
 rf2_dkde5 1
 prepanl5 1
##############################################
####                SECTION: files               
##############################################
 getwfk5 2
 getddk5 3
 get1den5 4
 getdelfd5 4
 getdkdk5 5
 prtwf5 1


#####################
##### DATASET 6 #####
#####################
##############################################
####                SECTION: basic               
##############################################
 kptopt6 2
 tolvrs6 1e-15
##############################################
####                SECTION: dfpt                
##############################################
 usepead6 0
 d3e_pert1_elfd6 1
 d3e_pert1_phon6 1
 d3e_pert2_elfd6 1
 d3e_pert3_elfd6 1
##############################################
####                SECTION: files               
##############################################
 getwfk6 2
 getddk6 3
 get1den6 4
 get1wf6 4
 getdkde6 6
##############################################
####               SECTION: gstate               
##############################################
 optdriver6 5


#####################
##### DATASET 7 #####
#####################
##############################################
####                SECTION: basic               
##############################################
 kptopt7 1
 tolvrs7 1e-15

Hi pfons, welcome to the abiverse!

Looks like it’s in the parallelization and distribution so harder to debug. Send along your configuration, abinit version, machine etc. Any _MPI_< stuff > files, slurm outputs etc.

First thing which might be problematic is the occupations: they should be constant throughout the datasets (occopt 7 etc). There are a few runmodes which only accept valence bands (no smearing or unoccupied states), but then you should do this from the beginning and GS. With your defect is the system metallic? How many electrons out of the 160 bands?

I would set kptopt to 3 everywhere. In the DFPT part it will reduce the k using perturbation-preserving-symops anyway.

The DS3 is a ddk calculation, correct? this benefits from smearing and empty bands usually, which improve convergence, but it should not crash. Did you check estimated memory? Could also be something simple like that.

I have done normal DFPT on larger systems, but with the full nonlinear spectroscopy etc will be tough, there are lots of cross terms to evaluate in the D3E steps.

Matthieu

1 Like

Dear Matthieu,
Thank you very much for your quick reply. I must admit, I am still learning abinit and hence the need for advice. You are correct in assuming that the system is insulating. This input file specifies an (ion relax) diamond NV center. There are a total of 63 atoms so I knew the calculation would be While there are a few isolated states in the gap, the system is insulating. I have changed all of the occopt to a value of “1”. Following your suggestion, I also changed the value of kptopt to 3 everywhere as well. To answer your questions about the computational cluster. I am using three nodes each with 32 cores that are connected by an Infiniband system. The OS is CentOS 7 (which I plan to upgrade to RockyLinux later this year). There is between 180-400 GB of memory/node. The memory estimates by abinit are much less. A grep of the memory estimates for each step from the chi2.abo file is included at the bottom of this message…

I also had another quick question if you would be so kind. I have become accustomed to using PAW pseudopotentials with Vasp and I have used PAW pseudopotentials in my attempt to calculate chi2 as well. There is a warning generated in the output regarding the overlap of PAW spheres. The system here is essentially diamond with a bond length of approximately 1.5 Angstroms. Is the extent of PAW overlap something to worry about? Would it be fair to trust the chi2 values (when I hopefully can calculate them) or should I switch to NC PS?

    PAW SPHERES ARE OVERLAPPING!
       There are   121 pairs of overlapping atoms.
       The maximum overlap percentage is obtained for the atoms  32 and  55.
        | Distance between atoms  32 and  55 is  :   2.80166
        | PAW radius of the sphere around atom  32 is:   1.50737
        | PAW radius of the sphere around atom  55 is:   1.50737
        | This leads to a (voluminal) overlap ratio of  0.7316 %
    THIS IS DANGEROUS, as PAW formalism assumes non-overlapping PAW spheres.

P This job should need less than                     138.123 Mbytes of memory.
P This job should need less than                     111.083 Mbytes of memory.
P This job should need less than                     148.049 Mbytes of memory.
P This job should need less than                     787.121 Mbytes of memory.
P This job should need less than                     148.049 Mbytes of memory.
P This job should need less than                     247.464 Mbytes of memory.
P This job should need less than                     171.855 Mbytes of memory.

I forgot to include the abinit version in my reply. It is ABINIT 9.6.2.

I changed the kptopt to 3 for all datasets as well as setting occopt to 1, but the job still crashed. Might you have any insight as to what to try next? I have pasted the input file below for reference. I will past the output failure as well in a separate message as I am unable to upload any files due to my new user status.

ndtset 7
############################################################################################
#### Global Variables.
############################################################################################
 ngkpt 4 4 4
 nshiftk 1
 shiftk    0.5    0.5    0.5
 ecut 16
 occopt 1
 pawecutdg 40
 nstep 300
 nsppol 1
 ecutsm 0.5
 pawxcdev 0
 nband 160
 indata_prefix "indata/in"
 tmpdata_prefix "tmpdata/tmp"
 outdata_prefix "outdata/out"
 pp_dirpath "/data/abinit/pseudos"
 pseudos 
    "C.xml,
    N.xml"
############################################################################################
####                                         STRUCTURE                                         
############################################################################################
 natom 63
 ntypat 2
 typat
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 1
 1 1 2
 znucl 6 7
 xred
  -8.1727793557d-05    0.9991255085    1.0000817278
  -1.8435613594d-05   5.2501921200d-04    0.5001060909
    0.0016378679    0.4991954111    0.9983621321
    0.9978086865    0.5008868602    0.4973330238
    0.4998939091   5.2501921200d-04    1.0000184356
    0.4998171893   7.1321590155d-04    0.5001828107
    0.5026669762    0.5008868602    0.0021913135
    0.4900396379    0.4951612619    0.5099603621
    0.3777663362    0.3785896418    0.3750992211
    0.3761215446    0.3754360679    0.8759010340
    0.3752321248    0.8747678752    0.3755613916
    0.3749928802    0.8750071198    0.8755930508
    0.8749416146    0.3750583854    0.3687393417
    0.8753010584    0.3746989416    0.8735954097
    0.8714103582    0.8722336638    0.3750992211
    0.8745639321    0.8738784554    0.8759010340
    0.3761215446    0.1259010340    0.1254360679
    0.3749928802    0.1255930508    0.6250071198
    0.3777663362    0.6250992211    0.1285896418
    0.3752321248    0.6255613916    0.6247678752
    0.8753010584    0.1235954097    0.1246989416
    0.8745639321    0.1259010340    0.6238784554
    0.8749416146    0.6187393417    0.1250583854
    0.8714103582    0.6250992211    0.6222336638
    0.1264045903    0.3746989416    0.1246989416
    0.1240989660    0.3754360679    0.6238784554
    0.1240989660    0.8738784554    0.1254360679
    0.1244069492    0.8750071198    0.6250071198
    0.6312606583    0.3750583854    0.1250583854
    0.6249007789    0.3785896418    0.6222336638
    0.6249007789    0.8722336638    0.1285896418
    0.6244386084    0.8747678752    0.6247678752
    0.1250994533    0.1249005467    0.3746491815
    0.1251492983    0.1248507017    0.8748507017
    0.1247303250    0.6250693536    0.3750693536
    0.1250994533    0.6246491815    0.8749005467
    0.6249306464    0.1252696750    0.3750693536
    0.6253508185    0.1249005467    0.8749005467
    0.6249306464    0.6250693536    0.8752696750
    0.0016378679    0.2483621321    0.2491954111
    0.9999182722    0.2500817278    0.7491255085
    0.9978086865    0.7473330238    0.2508868602
    0.9999815644    0.7501060909    0.7505250192
    0.5026669762    0.2521913135    0.2508868602
    0.4998939091    0.2500184356    0.7505250192
    0.4900396379    0.7599603621    0.2451612619
    0.4998171893    0.7501828107    0.7507132159
    0.2508744915    0.2500817278   8.1727793557d-05
    0.2494749808    0.2500184356    0.5001060909
    0.2494749808    0.7501060909   1.8435613594d-05
    0.2492867841    0.7501828107    0.5001828107
    0.7508045889    0.2483621321    0.9983621321
    0.7491131398    0.2521913135    0.4973330238
    0.7491131398    0.7473330238    0.0021913135
    0.7548387381    0.7599603621    0.5099603621
    0.2489344108    0.0010655892    0.2503311004
    0.2498738311    1.0001261689    0.7501261689
    0.2506584627    0.5004522646    0.2504522646
    0.2489344108    0.5003311004    0.7510655892
    0.7495477354  -6.5846272032d-04    0.2504522646
    0.7496688996    0.0010655892    0.7510655892
    0.7495477354    0.5004522646    0.7493415373
    0.7608648352    0.4891351648    0.2391351648
 acell    1.0    1.0    1.0
 rprim
   13.4537196110   2.4577523146d-06  -8.7490491474d-07
  -9.7139789812d-07   13.4537196110  -1.2410041145d-05
   2.3612581455d-06   1.0923686090d-05   13.4537196110

#####################
##### DATASET 1 #####
#####################
##############################################
####                SECTION: basic               
##############################################
 kptopt1 3
 toldfe1 1e-12
##############################################
####                SECTION: files               
##############################################
 prtden1 1
 prtwf1 1
##############################################
####                SECTION: paral               
##############################################
 autoparal1 1


#####################
##### DATASET 2 #####
#####################
##############################################
####                SECTION: basic               
##############################################
 kptopt2 3
 iscf2 -2
 tolwfr2 1e-22
##############################################
####                SECTION: files               
##############################################
 getden2 1
 getwfk2 1
 prtwf2 1
##############################################
####                SECTION: paral               
##############################################
 autoparal2 1


#####################
##### DATASET 3 #####
#####################
##############################################
####                SECTION: basic               
##############################################
 kptopt3 3
 tolwfr3 1e-22
##############################################
####                SECTION: dfpt                
##############################################
 rfelfd3 2
##############################################
####                SECTION: files               
##############################################
 getwfk3 2
 prtwf3 1


#####################
##### DATASET 4 #####
#####################
##############################################
####                SECTION: basic               
##############################################
 kptopt4 3
 tolvrs4 1e-12
##############################################
####                SECTION: dfpt                
##############################################
 rfphon4 1
 rfelfd4 3
 rfstrs4 3
 prepanl4 1
##############################################
####                SECTION: files               
##############################################
 getwfk4 2
 getddk4 3
 prtwf4 1
 prtden4 1


#####################
##### DATASET 5 #####
#####################
##############################################
####                SECTION: basic               
##############################################
 kptopt5 3
 tolwfr5 1e-22
##############################################
####                SECTION: dfpt                
##############################################
 rf2_dkde5 1
 prepanl5 1
##############################################
####                SECTION: files               
##############################################
 getwfk5 2
 getddk5 3
 get1den5 4
 getdelfd5 4
 getdkdk5 5
 prtwf5 1


#####################
##### DATASET 6 #####
#####################
##############################################
####                SECTION: basic               
##############################################
 kptopt6 3
 tolvrs6 1e-15
##############################################
####                SECTION: dfpt                
##############################################
 usepead6 0
 d3e_pert1_elfd6 1
 d3e_pert1_phon6 1
 d3e_pert2_elfd6 1
 d3e_pert3_elfd6 1
##############################################
####                SECTION: files               
##############################################
 getwfk6 2
 getddk6 3
 get1den6 4
 get1wf6 4
 getdkde6 6
##############################################
####               SECTION: gstate               
##############################################
 optdriver6 5


#####################
##### DATASET 7 #####
#####################
##############################################
####                SECTION: basic               
##############################################
 kptopt7 1
 tolvrs7 1e-15

The job crashed at dataset 3 without any useful error messages. Is there something I can change to let the job progress?

================================================================================
== DATASET  3 ==================================================================
-   mpi_nproc: 96, omp_nthreads: 1 (-1 if OMP is not activated)
 
 
--- !COMMENT
src_file: m_xgScalapack.F90
src_line: 236
message: |
    xgScalapack in auto mode
...
 
 mkfilename : getwfk/=0, take file _WFK from output of DATASET   2.
 
 
 getdim_nloc : deduce lmnmax  =   8, lnmax  =   4,
                      lmnmaxso=   8, lnmaxso=   4.
 Perdew, Burke & Ernzerhof SOL
 J. P. Perdew, A. Ruzsinszky, G. I. Csonka, O. A. Vydrov, G. E. Scuseria, L. A. Constantin, X. Zhou, and K. Burke, Phys. Rev. Lett. 100, 136406 (2008)
 Perdew, Burke & Ernzerhof SOL
 J. P. Perdew, A. Ruzsinszky, G. I. Csonka, O. A. Vydrov, G. E. Scuseria, L. A. Constantin, X. Zhou, and K. Burke, Phys. Rev. Lett. 100, 136406 (2008)
 
 Unit cell volume ucvol=  2.4351578E+03 bohr^3
 Angles (23,13,12)=  9.00000063E+01  8.99999937E+01  8.99999937E+01 degrees
 
 Coarse grid specifications (used for wave-functions):
 
 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  50  50  50
         ecut(hartree)=     16.000   => boxcut(ratio)=   2.06397
 
 Fine grid specifications (used for densities):
 
 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  80  80  80
         ecut(hartree)=     40.000   => boxcut(ratio)=   2.08859
 
 respfn : eigen0 array
 
 FFT (fine) grid used in SCF cycle:
 
 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  80  80  80
         ecut(hartree)=     40.000   => boxcut(ratio)=   2.08859
 
--- !WARNING
src_file: m_paw_tools.F90
src_line: 245
message: |
    PAW SPHERES ARE OVERLAPPING!
       There are   121 pairs of overlapping atoms.
       The maximum overlap percentage is obtained for the atoms  32 and  55.
        | Distance between atoms  32 and  55 is  :   2.80166
        | PAW radius of the sphere around atom  32 is:   1.50737
        | PAW radius of the sphere around atom  55 is:   1.50737
        | This leads to a (voluminal) overlap ratio of  0.7316 %
    THIS IS DANGEROUS, as PAW formalism assumes non-overlapping PAW spheres.
...
 
       Overlap ratio seems to be acceptable (less than value
       of "pawovlp" input parameter): execution will continue.
       But be aware that results might be approximate,
       and even inaccurate (depending on your physical system) !
 
 mkrho: echo density (plane-wave part only)
 Total charge density [el/Bohr^3]
      Maximum=    5.4654E-01  at reduced coord.    0.7400    0.5200    0.2600
      Minimum=    1.0956E-02  at reduced coord.    0.5200    0.5200    0.2600
   Integrated=    2.4523E+02
 
 ****** Psp strength Dij in Ha (atom      1) *****
   0.40954  -4.17142  -0.00000  -0.00000   0.00000   0.00000   0.00002  -0.00002
  -4.17142  40.48225   0.00002   0.00004  -0.00004  -0.00004  -0.00019   0.00019
  -0.00000   0.00002  -0.11152  -0.00001   0.00001   0.37807   0.00005  -0.00005
  -0.00000   0.00004  -0.00001  -0.11152  -0.00000   0.00005   0.37807   0.00001
   0.00000  -0.00004   0.00001  -0.00000  -0.11152  -0.00005   0.00001   0.37807
   0.00000  -0.00004   0.37807   0.00005  -0.00005   1.77136  -0.00032   0.00032
   0.00002  -0.00019   0.00005   0.37807   0.00001  -0.00032   1.77136  -0.00010
  -0.00002   0.00019  -0.00005   0.00001   0.37807   0.00032  -0.00010   1.77136
 
 
 ==>  initialize data related to q vector <== 
 
 respfn : the norm of the phonon wavelength (as input) was small (<1.d-7).
  q has been set exactly to (0 0 0)
 The list of irreducible perturbations for this q vector is:
    1)    idir= 1    ipert=  64
    2)    idir= 2    ipert=  64
    3)    idir= 3    ipert=  64
 
================================================================================
 Real(R)+Recip(G) space primitive vectors, cartesian coordinates (Bohr,Bohr^-1):
 R(1)= 13.4537196  0.0000025 -0.0000009  G(1)=  0.0743289  0.0000000 -0.0000000
 R(2)= -0.0000010 13.4537196 -0.0000124  G(2)= -0.0000000  0.0743289 -0.0000001
 R(3)=  0.0000024  0.0000109 13.4537196  G(3)=  0.0000000  0.0000001  0.0743289
 Unit cell volume ucvol=  2.4351578E+03 bohr^3
 Unit cell volume ucvol=  2.4351578E+03 bohr^3
 Angles (23,13,12)=  9.00000063E+01  8.99999937E+01  8.99999937E+01 degrees
 Angles (23,13,12)=  9.00000063E+01  8.99999937E+01  8.99999937E+01 degrees
 
 FFT (fine) grid used for densities/potentials:
 
 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  80  80  80
         ecut(hartree)=     40.000   => boxcut(ratio)=   2.08859
 
--------------------------------------------------------------------------------
 Perturbation wavevector (in red.coord.)   0.000000  0.000000  0.000000
 Perturbation : derivative vs k along direction   1
 
 dfpt_looppert : COMMENT -
  In a d/dk calculation, iscf is set to -3 automatically.
 littlegroup_pert: only one element in the set of symmetries for this perturbation:
   1   0   0   0   1   0   0   0   1
 symkpt : not enough symmetry to change the number of k points.
 getmpw: optimal value of mpw= 7455
 Memory required for psi0_k, psi0_kq psi1_kq: 54.6 [Mb] <<< MEM
 getmpw: optimal value of mpw= 7455
 qpt is Gamma, psi_k+q initialized from psi_k in memory
 Initialisation of the first-order wave-functions :
  ireadwf=   0
 
 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  80  80  80
         ecut(hartree)=     40.000   => boxcut(ratio)=   2.08859
Abort(13) on node 3 (rank 3 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 13) - process 3

Hi again, quick answers to the easier questions: volume overlap is definitely not a problem up to (roughly) 10%, and your memory needs are indeed very modest, so not a problem there. You can set kptopt occopt etc centrally, not by dtset.

Next suggestions: your distribution of cores may be inefficient: 96 / 64 = 1.5, so each core will get 1 k-point, but you don’t have enough to distribute bands or anything else - for DS3 96-64 = 32 cores will probably be idle. If your mpi is compiled to complain/time out if cores are idle, this may make it crash (not in abinit’s control) but should produce an error message. Try with 64 or 128 cores.

Finally, this may be an issue with using hybrids with DFPT runs (ddk dde or atpol perturbations) - might not be implemented. This is in the works, but I don’t know what the status is (@torrent @gonze @gmatteo ?).

Can you try the same input with GGA all the way through? In passing meta-GGA DFPT is also in the works, more advanced (something was working a few months ago) but this is unpublished so I think it’s not in your 9.6 version.

Hi from Tokyo,
I just restarted the job with 64 cores. I used the PAW JTH 1.1 from pseudo-dojo and the exchange functional was specified at PBE. From what I understood, abinit defaults to the exchange-functional specified in the pseudopotential file. This implies I should be using PBE. Is this correct? I pasted in the information in the abo file regarding the pseudopotentials.

--- Pseudopotential description ------------------------------------------------
- pspini: atom type   1  psp file is /data/abinit/pseudos/C.xml
- pspatm: opening atomic psp file    /data/abinit/pseudos/C.xml
- pspatm : Reading pseudopotential header in XML form from /data/abinit/pseudos/C.xml
 Pseudopotential format is: paw10
 basis_size (lnmax)=  4 (lmn_size=  8), orbitals=   0   0   1   1
 Spheres core radius: rc_sph= 1.50736703
 1 radial meshes are used:
  - mesh 1: r(i)=AA*[exp(BB*(i-1))-1], size=2001 , AA= 0.94549E-03 BB= 0.56729E-02
 Shapefunction is SIN type: shapef(r)=[sin(pi*r/rshp)/(pi*r/rshp)]**2
 Radius for shape functions =  1.30052589
 mmax= 2001
 Radial grid used for partial waves is grid 1
 Radial grid used for projectors is grid 1
 Radial grid used for (t)core density is grid 1
 Radial grid used for Vloc is grid 1
 Radial grid used for pseudo valence density is grid 1
 Mesh size for Vloc has been set to 1756 to avoid numerical noise.
 Compensation charge density is not taken into account in XC energy/potential
 pspatm: atomic psp has been read  and splines computed
 
- pspini: atom type   2  psp file is /data/abinit/pseudos/N.xml
- pspatm: opening atomic psp file    /data/abinit/pseudos/N.xml
- pspatm : Reading pseudopotential header in XML form from /data/abinit/pseudos/N.xml
 Pseudopotential format is: paw10
 basis_size (lnmax)=  4 (lmn_size=  8), orbitals=   0   0   1   1
 Spheres core radius: rc_sph= 1.20000000
 1 radial meshes are used:
  - mesh 1: r(i)=AA*[exp(BB*(i-1))-1], size= 787 , AA= 0.19344E-02 BB= 0.13541E-01
 Shapefunction is SIN type: shapef(r)=[sin(pi*r/rshp)/(pi*r/rshp)]**2
 Radius for shape functions =  1.00599851
 mmax=  787
 Radial grid used for partial waves is grid 1
 Radial grid used for projectors is grid 1
 Radial grid used for (t)core density is grid 1
 Radial grid used for Vloc is grid 1
 Radial grid used for pseudo valence density is grid 1
 Mesh size for Vloc has been set to  683 to avoid numerical noise.
 Compensation charge density is not taken into account in XC energy/potential
 pspatm: atomic psp has been read  and splines computed

Hi Matthieu,

I have now tried to run the same calculation (kptopt=3, occopt=1) with 64 cores and the crash in dataset 3 seems to occur in the same place. Any ideas as to what I might try next (or perhaps how to debug the cause)?

================================================================================
== DATASET  3 ==================================================================
-   mpi_nproc: 64, omp_nthreads: 1 (-1 if OMP is not activated)
 
 
--- !COMMENT
src_file: m_xgScalapack.F90
src_line: 236
message: |
    xgScalapack in auto mode
...
 
 mkfilename : getwfk/=0, take file _WFK from output of DATASET   2.
 
 
 getdim_nloc : deduce lmnmax  =   8, lnmax  =   4,
                      lmnmaxso=   8, lnmaxso=   4.
 Perdew, Burke & Ernzerhof SOL
 J. P. Perdew, A. Ruzsinszky, G. I. Csonka, O. A. Vydrov, G. E. Scuseria, L. A. Constantin, X. Zhou, and K. Burke, Phys. Rev. Lett. 100, 136406 (2008)
 Perdew, Burke & Ernzerhof SOL
 J. P. Perdew, A. Ruzsinszky, G. I. Csonka, O. A. Vydrov, G. E. Scuseria, L. A. Constantin, X. Zhou, and K. Burke, Phys. Rev. Lett. 100, 136406 (2008)
 
 Unit cell volume ucvol=  2.4351578E+03 bohr^3
 Angles (23,13,12)=  9.00000063E+01  8.99999937E+01  8.99999937E+01 degrees
 
 Coarse grid specifications (used for wave-functions):
 
 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  50  50  50
         ecut(hartree)=     16.000   => boxcut(ratio)=   2.06397
 
 Fine grid specifications (used for densities):
 
 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  80  80  80
         ecut(hartree)=     40.000   => boxcut(ratio)=   2.08859
 
 respfn : eigen0 array
 
 FFT (fine) grid used in SCF cycle:
 
 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  80  80  80
         ecut(hartree)=     40.000   => boxcut(ratio)=   2.08859
 
--- !WARNING
src_file: m_paw_tools.F90
src_line: 245
message: |
    PAW SPHERES ARE OVERLAPPING!
       There are   121 pairs of overlapping atoms.
       The maximum overlap percentage is obtained for the atoms  32 and  55.
        | Distance between atoms  32 and  55 is  :   2.80166
        | PAW radius of the sphere around atom  32 is:   1.50737
        | PAW radius of the sphere around atom  55 is:   1.50737
        | This leads to a (voluminal) overlap ratio of  0.7316 %
    THIS IS DANGEROUS, as PAW formalism assumes non-overlapping PAW spheres.
...
 
       Overlap ratio seems to be acceptable (less than value
       of "pawovlp" input parameter): execution will continue.
       But be aware that results might be approximate,
       and even inaccurate (depending on your physical system) !
 
 mkrho: echo density (plane-wave part only)
 Total charge density [el/Bohr^3]
      Maximum=    5.4654E-01  at reduced coord.    0.7400    0.5200    0.2600
      Minimum=    1.0952E-02  at reduced coord.    0.5200    0.5200    0.2600
   Integrated=    2.4523E+02
 
 ****** Psp strength Dij in Ha (atom      1) *****
   0.40954  -4.17142  -0.00000  -0.00000   0.00000   0.00000   0.00002  -0.00002
  -4.17142  40.48232   0.00002   0.00004  -0.00004  -0.00004  -0.00019   0.00019
  -0.00000   0.00002  -0.11152  -0.00001   0.00001   0.37807   0.00005  -0.00005
  -0.00000   0.00004  -0.00001  -0.11152  -0.00000   0.00005   0.37808   0.00001
   0.00000  -0.00004   0.00001  -0.00000  -0.11152  -0.00005   0.00001   0.37808
   0.00000  -0.00004   0.37807   0.00005  -0.00005   1.77133  -0.00032   0.00032
   0.00002  -0.00019   0.00005   0.37808   0.00001  -0.00032   1.77133  -0.00010
  -0.00002   0.00019  -0.00005   0.00001   0.37808   0.00032  -0.00010   1.77133
 
 
 ==>  initialize data related to q vector <== 
 
 respfn : the norm of the phonon wavelength (as input) was small (<1.d-7).
  q has been set exactly to (0 0 0)
 The list of irreducible perturbations for this q vector is:
    1)    idir= 1    ipert=  64
    2)    idir= 2    ipert=  64
    3)    idir= 3    ipert=  64
 
================================================================================
 Real(R)+Recip(G) space primitive vectors, cartesian coordinates (Bohr,Bohr^-1):
 R(1)= 13.4537196  0.0000025 -0.0000009  G(1)=  0.0743289  0.0000000 -0.0000000
 R(2)= -0.0000010 13.4537196 -0.0000124  G(2)= -0.0000000  0.0743289 -0.0000001
 R(3)=  0.0000024  0.0000109 13.4537196  G(3)=  0.0000000  0.0000001  0.0743289
 Unit cell volume ucvol=  2.4351578E+03 bohr^3
 Unit cell volume ucvol=  2.4351578E+03 bohr^3
 Angles (23,13,12)=  9.00000063E+01  8.99999937E+01  8.99999937E+01 degrees
 Angles (23,13,12)=  9.00000063E+01  8.99999937E+01  8.99999937E+01 degrees
 
 FFT (fine) grid used for densities/potentials:
 
 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  80  80  80
         ecut(hartree)=     40.000   => boxcut(ratio)=   2.08859
 
--------------------------------------------------------------------------------
 Perturbation wavevector (in red.coord.)   0.000000  0.000000  0.000000
 Perturbation : derivative vs k along direction   1
 
 dfpt_looppert : COMMENT -
  In a d/dk calculation, iscf is set to -3 automatically.
 littlegroup_pert: only one element in the set of symmetries for this perturbation:
   1   0   0   0   1   0   0   0   1
 symkpt : not enough symmetry to change the number of k points.
 getmpw: optimal value of mpw= 7455
 Memory required for psi0_k, psi0_kq psi1_kq: 54.6 [Mb] <<< MEM
 getmpw: optimal value of mpw= 7455
 qpt is Gamma, psi_k+q initialized from psi_k in memory
 Initialisation of the first-order wave-functions :
  ireadwf=   0
 
 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  80  80  80
         ecut(hartree)=     40.000   => boxcut(ratio)=   2.08859
Abort(13) on node 41 (rank 41 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 13) - process 41

Hi Mattheiu,

Let me thank you for your help up to now. I retried the last calculation using 64 core and the calculation crashed again, but unlike before, a reason for the crash was returned. It seems that the NetCDF library returned “Unknown file format”. I am at a loss as to what to try next. I guess I will delete all of the files in the outdata folder and try again. Might you have any other suggestions?

--- !ERROR
src_file: m_nctk.F90
src_line: 704
mpi_rank: 0
message: |
    opening file: outdata/outnc_DS3_1WF193 - NetCDF library returned: `NetCDF: Unknown file format`
...
 
Abort(13) on node 11 (rank 11 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 13) - process 11