A problem of dfpt calculation for optical utility

Dear all

I encountered a strange error when calculating the DDK perturbation in DFPT for optical utility. I get everything ok for a 41x41x19 K-mesh, but I get the error message shown in the attachment when I use a 45x45x19 K-mesh. Looks like the calculation is finished, but there is something wrong with the IO_parallelism. Could you give me some suggestions?
image

Best regards,

Dear @Yunfan, thanks for your post. This netcdf error is quite greneric I think and can occur for many reasons. Here are a few suggestions:

  • Is abinit correctly compiled with netcdf? I guess that if the ground state worked it is working. You can execute the abinit’s test suite to check out that dfpt calculations work in general.
  • Is the netcdf library you use compiled in parallel?
  • DDK calculations may use a lot of diskspace / RAM, do you have enough disk space to store the ddk wave functions? You can try reducing the calculation size to very low k-grids / number of bands to check it out. Although if I am not mistaken, netcdf returns a different error message when this happens.
  • Is your netcdf library path’s is present inside the environment variables (LD_LIBRARY_PATH and so on…)?

If everything’s ok perhaps @beuken can help you out!

Cheers

I get everything ok for a 41x41x19 K-mesh, but I get the error message shown in the attachment when I use a 45x45x19 K-mesh

With a 41x41x19 K-mesh the size of the array with the velocity matrix elements to be written to disk is ~1.7 Gb whereas it is 2.1 Gb with a 45x45x19 K-mesh.
The most likely explanation is that your netcdf library is using 32-bit offset format if we exclude a possible disk quota error.

Can you post the output of:

ulimit -a 

and

ncdump -k  GaZn_xo_DS4_EVK.nc

Thank you very much for your suggestions.
The output for 'ulimit -a ’ is

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 380195
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 16384
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 300
virtual memory          (kbytes, -v) 8388608
file locks                      (-x) unlimited

The output for ‘ncdump -k GaZn_xo_DS4_EVK.nc’ is

netCDF-4

Can you try the following patch:

diff --git a/shared/common/src/27_toolbox_oop/m_nctk.F90 b/shared/common/src/27_toolbox_oop/m_nctk.F90
index a5ebe8f17..487fb5b04 100644
--- a/shared/common/src/27_toolbox_oop/m_nctk.F90
+++ b/shared/common/src/27_toolbox_oop/m_nctk.F90
@@ -79,7 +79,8 @@ MODULE m_nctk

 #ifdef HAVE_NETCDF
  ! netcdf4-hdf5 is the default
- integer,save,private :: def_cmode_for_seq_create = ior(ior(nf90_clobber, nf90_netcdf4), nf90_write)
+ !integer,save,private :: def_cmode_for_seq_create = ior(ior(nf90_clobber, nf90_netcdf4), nf90_write)
+ integer,save,private :: def_cmode_for_seq_create = ior(ior(nf90_clobber, nf90_cdf5), nf90_write)
  ! netcdf4 classic
  !integer,save,private :: def_cmode_for_seq_create = ior(nf90_clobber, nf90_write)
 #endif

so that Abinit creates netcdf files in CDF-5 format that supports variables larger that 2Gbs.
Note that you need to rerun make to recompile the module and then build new executables (make clean is not needed in this case).

PS: Can you also post the output of

uname -m

Thank you for the patch.
The output for “uname -m” is x86_64. For the patch, the testing job is still running. I’ll let you know the result as soon as the calculation is finished.

I just test the code with the patch, but I get the same error message.

Looks like the code still writes the netcdf file in netcdf-4 format even I have changed the m_nctk.F90 file.

I replaced the line “integer,save,private :: def_cmode_for_seq_create = ior(ior(nf90_clobber, nf90_netcdf4), nf90_write)”
by “integer,save,private :: def_cmode_for_seq_create = ior(ior(nf90_clobber, nf90_cdf5), nf90_write)”. Am I using the patch correctly?

Am I using the patch correctly?

Sorry, the patch was incomplete.
One should also add:

diff --git a/shared/common/src/27_toolbox_oop/m_nctk.F90 b/shared/common/src/27_toolbox_oop/m_nctk.F90
index a5ebe8f17..ca537f1d9 100644
--- a/shared/common/src/27_toolbox_oop/m_nctk.F90
+++ b/shared/common/src/27_toolbox_oop/m_nctk.F90
@@ -770,7 +770,9 @@ integer function nctk_open_create(ncid, path, comm) result(ncerr)
 #ifdef HAVE_NETCDF_MPI
    call wrtout(std_out, sjoin("- Creating HDf5 file with MPI-IO support:", path))
    ! Believe it or not, I have to use xmpi_comm_self even in sequential to avoid weird SIGSEV in the MPI layer!
-   ncerr = nf90_create(path, cmode=ior(ior(nf90_netcdf4, nf90_mpiio), nf90_write), ncid=ncid, &
+   !cmode = ior(ior(nf90_netcdf4, nf90_mpiio), nf90_write)
+   cmode = ior(ior(nf90_cdf5, nf90_mpiio), nf90_write)
+   ncerr = nf90_create(path, cmode=cmode, ncid=ncid, &
      comm=comm, info=xmpio_info)
 #endif
  else

I have added the new lines into the ‘m_nctk.F90’. I get a new error message “No msg from caller - NetCDF library returned: NetCDF: Attempt to use feature that was not turned on when netCDF was built.” Looks like my ntcdf doesn’t support the new format. Should I also make changes to the compile setting? My current setting for configuring is

# installation location
prefix=/home1/07709/liangy12/software/abinit/

# Reduce AVX optimizations in sensitive subprograms (default is no)
#
enable_avx_safe_mode="no"

# Forced Fortran linker libraries
# Note: will override build-system configuration - USE AT YOUR OWN RISKS!
#
FC_LIBS="-lstdc++ -ldl"

# Determine whether to build parallel code (default is auto)
#
# Permitted values:
#
#   * no       : disable MPI support
#   * yes      : enable MPI support, assuming the compiler is MPI-aware
#   * <prefix> : look for MPI in the <prefix> directory
#
# If left unset, the build system will take all appropriate decisions by
# itself, and MPI will be enabled only if the build environment supports
# it. If set to "yes", the configure script will stop if it does not find
# a working MPI environment.
#
# Note:
#
#   * the build system expects to find subdirectories named bin/, lib/,
#     include/ under the prefix.
#
with_mpi="yes"

#ith-mpi="/home1/07709/liangy12/bin/lib/openmpi-4.1.1/openmpi/bin"

#with-mpi-flavor="openmpi"
# Activate parallel I/O (default is auto)
#
# Permitted values:
#
#   * auto : let the configure script auto-detect MPI I/O support
#   * no   : disable MPI I/O support
#   * yes  : enable MPI I/O support
#
# If left unset, the build system will take all appropriate decisions by
# itself, and MPI I/O will be enabled only if the build environment supports
# it. If set to "yes", the configure script will stop if it does not find
# a working MPI I/O environment.
#
enable_mpi_io="yes"

# Flavor of linear algebra libraries to use (default is netlib)
#
with_linalg_flavor="mkl"

# C preprocessing flags for linear algebra (default is unset)
#
LINALG_CPPFLAGS="-I$/opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/include"

# Fortran flags for linear algebra (default is unset)
#
LINALG_FCFLAGS="-I$/opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/include"

# Library flags for linear algebra (default is unset)
#
LINALG_LIBS="-L$/opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64 -Wl,--start-group  -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group"

# Flavor of FFT framework to support (default is auto)
#
with_fft_flavor="dfti"

# Fortran flags for the FFT framework (default is unset)
#
FFT_FCFLAGS="-I$/opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/include"

enable_gw_dpc="yes"

# mandatory libraries
with_hdf5="yes"

with_netcdf="yes"

with_netcdf_fortran="yes"

with_libxc="yes"

with_libpsml="yes"

with_libxc=/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/libxc/4.3.4

with_hdf5=/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/hdf5/1.10.6

with_netcdf=/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4/4.6.3

with_netcdf_fortran=/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4_fortran/4.5.2

with_xmlf90=/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/xmlf90/1.5.3.1

with_libpsml=/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/libpsml/1.1.7

Hi,

can you execute theses commands and sent outputs ?

cd  /home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4/4.6.3
./nc-config --all
cd /home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4_fortran/4.5.2
./nf-config --all

Best,

Sorry for the late reply.

The config for netcdf4 is


This netCDF 4.6.3 has been built with the following features:

  --cc            -> mpicc
  --cflags        -> -I/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4/4.6.3/include  -I/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/hdf5/1.10.6/include
  --libs          -> -L/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4/4.6.3/lib   -lnetcdf -lm -lz -L/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/hdf5/1.10.6/lib -lhdf5_hl -lhdf5  -L/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/hdf5/1.10.6/lib -lhdf5_hl -lhdf5

  --has-c++       -> no
  --cxx           ->

  --has-c++4      -> no
  --cxx4          ->

  --has-fortran   -> no
  --has-dap       -> no
  --has-dap2      -> no
  --has-dap4      -> no
  --has-nc2       -> no
  --has-nc4       -> yes
  --has-hdf5      -> yes
  --has-hdf4      -> no
  --has-logging   -> no
  --has-pnetcdf   -> no
  --has-szlib     -> no
  --has-cdf5      -> yes
  --has-parallel4 -> yes
  --has-parallel  -> yes

  --prefix        -> /home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4/4.6.3
  --includedir    -> /home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4/4.6.3/include
  --libdir        -> /home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4/4.6.3/lib
  --version       -> netCDF 4.6.3

The setting for netcdf_fortran is


This netCDF-Fortran 4.5.2 has been built with the following features:

  --cc        -> mpicc
  --cflags    ->  -I/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4_fortran/4.5.2/include  -I/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/hdf5/1.10.6/include -I/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4/4.6.3/include

  --fc        -> mpif90
  --fflags    -> -I/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4_fortran/4.5.2/include
  --flibs     -> -L/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4_fortran/4.5.2/lib -lnetcdff   -lnetcdf -lm -L/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4/4.6.3/lib -lnetcdf -L/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/hdf5/1.10.6/lib -lhdf5_hl -lhdf5 -ldl -lm -lz -L/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/hdf5/1.10.6/lib -lhdf5_hl -lhdf5 -L/home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4/4.6.3/lib -lnetcdf
  --has-f90   ->
  --has-f03   -> yes

  --has-nc2   -> no
  --has-nc4   -> yes

  --prefix    -> /home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4_fortran/4.5.2
  --includedir-> /home1/07709/liangy12/software/abinit-9.4.2/build/fallbacks/install_fb/intel/18.0/netcdf4_fortran/4.5.2/include
  --version   -> netCDF-Fortran 4.5.2

Dear all

the problem is not solved yet. Looks like the netcdf should support the format in the patch, so I am wondering what features are not turned on for netCDF. Could you give me some suggestions?

Best regards,