Speaker
Description
The scientific method is underpinned by reproducibility, however, parallel computing often violates this through lack of associativity when summing floating point numbers. For Lattice QCD calculations this can have several undesirable effects, such as dramatic variations in solver iteration count, as well as the fundamental inability to exactly reproduce a given Monte-Carlo generated ensemble. This issue can be accentuated on a GPU, where the additional thread hierarchy results in more opportunities to violate associativity, or when comparing results across different architectures. We solve this problem through the use of the reproducible summation algorithm by Ahrens et al. In particular we adapt the algorithm for efficient enablement on clusters of GPUs, as deployed in the QUDA framework, and are able to achieve both exact reproducibility and higher accuracy with no increased cost compared to a naive parallel tree summation algorithm.
Topical area | Software Development and Machines |
---|