Lattice 2023

Name: Lattice 2023
Start: 2023-07-31T00:00:00-05:00
End: 2023-08-04T23:59:59-05:00
Location: No location set

31 July 2023 to 4 August 2023

America/Chicago timezone

For more info

GPU computation energy-efficiency: from lattice QCD to large language model training

3 Aug 2023, 16:20

20m

Theory (WH3NW)

Theory

WH3NW

Parallel Talk Software Development and Machines

Antonin Portelli (University of Edinburgh)

In the current climate and energy crisis context, it is crucial to study and optimise the energy efficiency of scientific software used at large scale computing facilities. This supports moving toward net-zero computing targets, and reduce the negative impact of growing operational costs on the production of scientific data.
The energy efficiency of a computation is generally quantified as an amount of work performed per unit of energy spent. The study presented here was comissioned by the national UK STFC DiRAC facility, and performed on the Edinburgh "Tursa" supercomputer based on 724 NVIDIA A100 GPUs. We study how the energy efficiency of various workflows varies as we down-clock the frequency of the GPUs. From lattice QCD benchmarks (Grid & QUDA) to large language model training (GPT), we observe that lower frequencies than the default one lead to an increase of the GPU energy efficiency by 20-30%, with a reasonable impact on performances. This study led to a modification of the default GPU frequencies on Tursa in December 2022, resulting in an estimated saving to date of 60 MWh.

Topical area	Software Development and Machines

Antonin Portelli (University of Edinburgh)

portelli.pdf

Lattice 2023

For more info

GPU computation energy-efficiency: from lattice QCD to large language model training

Theory

WH3NW

Speaker

Description

Primary author

Presentation materials