Speaker
Description
In the rapidly changing hardware landscape of high performance computing (HPC), binding workforce to optimize simulation software for just a single architecture becomes a sustainability issue.
In this work I explored the feasibility of using performance portable parallel code for a staggered fermion kernel. Fusing the Kokkos C++ Performance Portability EcoSystem with MPI allows to scale on massive parallel machines while still being able to target a plentitude of different architectures with the same simple code.
Benchmarking on a range of currently deployed and recently introduced systems, including AMD EPYC 7742, AMD MI250, Fujitsu A64FX, Nvidia A100 and Nvidia H100 components, produced mostly encouraging results.
Topical area | Software Development and Machines |
---|