Dr Chulwoo Jung (Brookhaven National Laboratory)
The increasing unbalance between computing capabilities of individual nodes and internode communication makes it highly desirable for any Lattice QCD algorithm to minimize the amount of off-node communication. One of the relatively new methods for this is the 'split-grid' or 'split-domain', where data is rearranged within the running of a single binary, so that the routines which requires significant off-node communications such as Dirac operators are run on multiple smaller partitions in parallel with a better surface to volume ratio, while other routines are run in one large partition. While it is relatively straightforward to utilize split-grid for inverters, the typical Lanczos algorithm which has one starting vector does not render itself naturally to split-grid approach. Here we report on our investigation of Block Lanczos algorithm which allows multiple starting vectors to be concurrently. It is shown that for a moderate number of starting vectors, Block Lanczos algorithm has been implemented in Grid Data parallel C++ mathematical object library, and shown to achieve convergence comparable to normal Lanczos algorithm on DWF/Mobius ensemble with physical quark masses.