11–14 May 2008
Hyatt Regency Chicago
US/Central timezone

HPC Configuration Management Challenges

11 May 2008, 18:15
Hyatt Regency Chicago

Hyatt Regency Chicago

151 East Wacker Drive, Chicago, Illinois, USA 60601

Speaker

Cory Lueninghoener (Argonne National Lab)

Description

Large-scale high performance computing (HPC) systems pose special problems to system administrators, particularly with respect to configuration management. These systems function at a scale larger than typical environments, run with synchronized workloads, and must be treated in a hands-off manner when jobs are running. Coupled with the need to keep compute systems as uniform as possible, these problems can put considerable stress on infrastructure and administrators alike. At the same time, HPC systems are perfect candidates for complete configuration management, generally exhibiting high levels of uniformity and administrator control. With a strong configuration management tool, keeping compute nodes identical, login nodes clean, and management nodes secure all become much more manageable. This can all be done while helping administrators both document and understand their environments better than with ad-hoc systems. In this talk, we will give an overview of the challenges we face in managing the 500TF Blue Gene/P system at Argonne National Laboratory's Leadership Computing Facility and its infrastructure. In particular, we will focus on the configuration tradeoffs that we face in this environment and the level of automation we have achieved by using Bcfg2, an open-source configuration management tool that we have developed in Python at Argonne.

Primary author

Cory Lueninghoener (Argonne National Lab)

Presentation materials

There are no materials yet.