Issue with parallelized Abinit and Wannier90, hanging

Hello all,

I have been able to successfully compile Abinit version 10.0.9 following the instructions (ABINIT_build - abinit) to build it from scratch. I have also included Wannier90 when compiling, and it seems to have worked successfully. The results of ./runtests.py shows that all tests have either succeeded or passed.

My customized config.ac9 file: my_config_file_new.ac9 (353 Bytes)
My config.log: config.log (177.2 KB)

Both abinit --version and mpirun -n 1 abinit --version return 10.0.9. When testing the parallelized version with input files from tests\tutoparal, the jobs run through and don’t fail.

However, when going through the tutorials in tests\tutoplugs (following the tutorials referenced in wannier90 - abinit), I find a difference in behavior when running input files without parallelization vs with it. Specifically, using mpirun with Abinit causes the calculation to hang once the Wannier90 subroutine is called.

Running abinit tw90_1.abi produces the output files tw90_1.abo (31.6 KB),
wannier90.wout.txt (172.0 KB)

However, if I try to run the same file with mpirun -n 4 abinit tw90_1.abi > tw90_1.log, the calculation hangs at this point

1106 ** mlwfovlp :   call wannier90 library subroutine wannier_run
1107    Calculation is running
1108 -  see wannier90.wout for details.

tw90_1_withmpi.log (49.2 KB)
wannier90_withmpi.wout.txt (54.5 KB)

The non-parallelized version runs in less than a few seconds, but even waiting for up 10 minutes on the parallelized version, it still hangs. I suspect this is some issue with how I compiled Abinit.

Additional Information: I am working on a remote computer cluster, and I have anaconda on it. I had some initial issues with compiling Abinit so I disabled my anaconda environment for the time being. I made sure to run source set_abienv from my Abinit install directory before attempting any of the tutorials.

My environment is set up so that

echo $PATH; echo $LD_LIBRARY_PATH
/home/ccardot3/local/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/cuda/bin:/opt/pbs/bin
/home/ccardot3/local/lib:

and

ls $HOME/local/bin/
h5clear   h5diff            h5jam    h5perf         h5repart  hydra_nameserver  mpicc         mpiexec.hydra  mpirun     ncdump     parkill
h5copy    h5dump            h5ls     h5perf_serial  h5stat    hydra_persist     mpichversion  mpif77         mpivars    ncgen      ph5diff
h5debug   h5format_convert  h5mkgrp  h5redeploy     h5unjam   hydra_pmi_proxy   mpicxx        mpif90         nc-config  ncgen3     postw90.x
h5delete  h5import          h5pcc    h5repack       h5watch   mpic++            mpiexec       mpifort        nccopy     nf-config  wannier90.x
ls $HOME/local/include/
H5ACpublic.h   H5FDfamily.h    H5FDsubfiling.h  H5overflow.h   H5VLconnector.h           mpi.mod                         netcdf_mem.h
H5api_adpt.h   H5FDhdfs.h      H5FDwindows.h    H5PLextern.h   H5VLconnector_passthru.h  mpiof.h                         netcdf_meta.h
H5Apublic.h    H5FDioc.h       H5Fpublic.h      H5PLpublic.h   H5VLnative.h              mpio.h                          netcdf.mod
H5Cpublic.h    H5FDlog.h       H5Gpublic.h      H5Ppublic.h    H5VLpassthru.h            mpi_sizeofs.mod                 netcdf_nc_data.mod
H5DOpublic.h   H5FDmirror.h    H5Idevelop.h     H5PTpublic.h   H5VLpublic.h              netcdf4_f03.mod                 netcdf_nc_interfaces.mod
H5Dpublic.h    H5FDmpi.h       H5IMpublic.h     H5pubconf.h    H5Zdevelop.h              netcdf4_nc_interfaces.mod       netcdf_nf_data.mod
H5DSpublic.h   H5FDmpio.h      H5Ipublic.h      H5public.h     H5Zpublic.h               netcdf4_nf_interfaces.mod       netcdf_nf_interfaces.mod
H5Epubgen.h    H5FDmulti.h     H5Ldevelop.h     H5Rpublic.h    hdf5.h                    netcdf_aux.h                    netcdf_par.h
H5Epublic.h    H5FDonion.h     H5LDpublic.h     H5Spublic.h    hdf5_hl.h                 netcdf_dispatch.h               typesizes.mod
H5ESdevelop.h  H5FDpublic.h    H5Lpublic.h      H5TBpublic.h   mpi_base.mod              netcdf_f03.mod
H5ESpublic.h   H5FDros3.h      H5LTpublic.h     H5Tdevelop.h   mpi_constants.mod         netcdf_filter.h
H5FDcore.h     H5FDsec2.h      H5MMpublic.h     H5Tpublic.h    mpicxx.h                  netcdf_fortv2_c_interfaces.mod
H5FDdevelop.h  H5FDsplitter.h  H5Mpublic.h      H5TSdevelop.h  mpif.h                    netcdf.h
H5FDdirect.h   H5FDstdio.h     H5Opublic.h      H5version.h    mpi.h                     netcdf.inc
ls $HOME/local/lib/
libfmpich.so   libhdf5_hl.so.310      libmpi.a        libmpicxx.so.12       libmpi.la         libnetcdff.la        libnetcdf.so         pkgconfig
libh5bzip2.la  libhdf5_hl.so.310.0.0  libmpichcxx.so  libmpicxx.so.12.1.8   libmpi.so         libnetcdff.settings  libnetcdf.so.15
libh5bzip2.so  libhdf5.la             libmpichf90.so  libmpifort.a          libmpi.so.12      libnetcdff.so        libnetcdf.so.15.2.1
libhdf5.a      libhdf5.settings       libmpich.so     libmpifort.la         libmpi.so.12.1.8  libnetcdff.so.7      libnetcdf.so.18
libhdf5_hl.a   libhdf5.so             libmpicxx.a     libmpifort.so         libmpl.so         libnetcdff.so.7.0.0  libnetcdf.so.18.0.0
libhdf5_hl.la  libhdf5.so.310         libmpicxx.la    libmpifort.so.12      libnetcdf.a       libnetcdf.la         libopa.so
libhdf5_hl.so  libhdf5.so.310.0.0     libmpicxx.so    libmpifort.so.12.1.8  libnetcdff.a      libnetcdf.settings   libwannier.a

and finally

which mpirun; which mpif90
~/local/bin/mpirun
~/local/bin/mpif90

Hi ccardot,

First of all, I’ve never used Wannier before so take what I say with a grain of salt. Neverthless, tw90_1.abi works well for me using mpirun -n 4 abinit tw90_1.abi. (Assuming we copy the wannier file before running the command cp wannier90.win tw90_1o_DS2_w90.win)

  1. Wannier90 compilation

For the compilation of wannier90, I’ve used the gzipped-tar version 3.1.0 and make all. My make.inc had these lines as my linear algebra library is openblas :
F90 = mpifort
FCOPTS = -O2
LDOPTS = -O2
LIBDIR = /opt/OpenBLAS/lib
LIBS = -L$(LIBDIR) -lopenblas

  1. Abinit ac9

I have the version 10.0.3 of abinit instead of yours which is 10.0.9, although I don’t expect a minor version change to cause an error like this.
Abinit ac9 file doesn’t allow
with_wannier90=“$HOME/local” with
WANNIER90_LIBS=“$HOME/local/lib/libwannier.a”
which means the line with_wannier90 is ignored. Although unclear, this line indicate when this is happening :
configure:35503: WARNING: conflicting option settings for Wannier90
I recommend only using WANNIER90_LIBS=“$HOME/local/lib/libwannier.a”. I have not added anything else than this line to get Wannier running on my laptop.

  1. Question

If I understand correctly, the issue appears once wannier finished running, as if wannier didn’t properly exit and let abinit continue the simulation ?
Do other calculations without Wannier work well in parallel on your machine ?

Let me know if these information help for your problem.
Olivier

Using your example make.inc I recompiled Wannier90 and Abinit, and now the parallelized version appears to be working on the tw90_1 example (and other Wannier90 interface examples). I realized in my Wannier90 make.inc file I had set F90 = gfortran. Switching it to F90 = mpifort seems to have corrected my issue.

To answer your question, yes I was still able to do other parallel calculations that didn’t involve Wannier90 and they ran without issue. I guess it came down to matching the F90 compiler I used to compile Abinit with the F90 compiler I used to compile Wannier90?

Thank you very much the help Olivier!