Speaker
Dr
Issaku Kanamori
(Hiroshima University)
Description
We port Domain-Decomposed-alpha-AMG solver to K computer.
The system has 8 cores and 16 GB memory par node, of which theoretical
peak is 128 GFlops (82,944 nodes in total). Its feature, as many as 256
registers par core and as large as 0.5 byte/Flop ratio, requires
a different tuning from other machines.
In order to use more registers, we change some of the data structure
and rewrite matrix-vector operations with intrinsics.
The improvement of the performance is more than factor two for twelve
solves including the setup. The efficiency is still about 5% after
the optimization, which is lower than a previously tuned mixed precision
solver for K computer, 22%. The throughput is, however,
almost three times more for a physical point configuration.
Primary author
Dr
Issaku Kanamori
(Hiroshima University)
Co-author
Prof.
Ken-Ichi Ishikawa
(Horoshima University)