I am excited to hear that the DFPT can now be computed on GPUs. I am very interested in this and would like to try it out.
I looked through the official documentation, but it didn’t seem to mention the improvements over traditional CPUs in this regard. Is there any real world examples or simple benchmark tests that could provide more details?
I have access to 4 V100 16GB GPUs, which are a bit dated but seem to meet the minimum requirement of OpenMP GPU offloading 7.0 compute capability. I’d like to give it a try as well (though the system size may not be too large).
Edit: I tried the examples in tests/gpu_omp/Input, it seems oncv psp is not supported yet, only paw etc. Will paw support eph calculation?
There is still no official publication about the GPU+DFPT capabilities of Abinit. Of course we have made tests but these benchmarks have not been published.
For DFPT, you cannot expect the same speed-up GPU/CPU as for ground state calculations because a DFPT run is made of a lot of small calculations, and because the MPI parallelism is already efficient for it. Also it depends strongly on the system you are looking at and on the perturbation type.
For example, on a 32 atom cell of gold we obain a speed-up of 3 or 4 between CPU and GPU for a phonon computation (note that the metric is a compute node on a supercomputer.
Note also that you need a “double precision” GPU device (i.e. a supercomputer, not a laptop).
The larger the system size is the higher the speedup.
I’m suprised to learn that is doesnt work with norm-conserving pseudopotentials. Which test case did you run?
Last question : eph is still not available with PAW.
Yes, the performance improvement using GPU on the ground state is quite obvious. I tried my material under the ONCV psp, and the speed was improved by about 5 times, compared to a 4 cpu nodes. The hardware is Xeon sliver 4316 and V100 GPU. But only when we use fftalg 401 and wfoptalg 114. However, for DFPT, the case using PAW is not as obvious.
My research system is a semiconductor containing transition metal elements and heavy metal elements. Since this system is still under investigation, I cannot publicly share the calculation files. However, I am looking for a similar example to repeat the investigation and see if the reason for ONCV being unable to use DFPT still persists. I will upload the files once I have them.