Hepix 2006

Name: Hepix 2006
Start: 2006-10-09T08:00:00-05:00
End: 2006-10-13T12:00:00-05:00
Location: Jefferson Lab

9 Oct 2006, 08:00 → 13 Oct 2006, 12:00 US/Central

Jefferson Lab

Description

The HEPiX forum unifies IT system support engineers from the High Energy Physics (HEP) laboratories and institutes, such as BNL, CERN, DESY, FNAL, IN2P3, INFN, JLAB, NIKHEF, RAL, SLAC, TRIUMF and others. The HEPiX meetings have been held regularly since 1991, and are an excellent source of information for IT specialists. That's why they enjoy large participation also from the non-HEP organizations.

Monday, 9 October
- 08:00
  
  Registration/Continental Breakfast
- Welcome
  - 1
    
    Welcome to JLab
    
    Speaker: Roy Whitney (Jefferson Lab)
- Site Reports
  - 2
    
    BNL
    
    Speaker: Alexander Withers (BNL)
    
    Slides
    
    Video
  - 3
    
    CERN
    
    Speaker: Helge Meinhard (CERN)
    
    Slides
  - 4
    
    Fermilab
    
    Speaker: Lisa Giacchetti (Fermilab)
    
    Slides
  - 10:30
    
    Coffee break
  - 5
    
    GridKa
    
    Speaker: Manfred Alef (GridKa)
    
    Slides
  - 6
    
    Jlab
    
    Speaker: Sandy Philpott (Jlab)
    
    Slides
  - 7
    
    TRIUMF
    
    Speaker: Corrie Kost (Triumf)
    
    Slides
- 12:00
  
  Lunch
- Site Reports II
  - 8
    
    NERSC Site Report
    
    Speaker: Cary Whitney (NERSC)
    
    Slides
  - 9
    
    NIKHEF Site Report
    
    Speaker: Paul Kuipers (NIKHEF)
    
    Slides
  - 10
    
    CCLRC-RAL Site Report
    
    Speaker: Martin Bly (CCLRC-RAL)
    
    Slides
  - 11
    
    INFN Site Report
    
    Speaker: Roberto Gomezel (INFN)
    
    Slides
  - 14:50
    
    Coffee break
  - 12
    
    GSI/Darmstadt Site Report
    
    Speaker: Walter Schoen (GSI/Darmstadt)
  - 13
    
    DAPNIA Site Report
    
    Speaker: Pierrick Micout (CEA DAPNIA Saclay)
    
    Slides
  - 14
    
    SLAC Site Report
    
    Speaker: Chuck Boeheim (SLAC)
    
    Slides
  - 15
    
    INFN-CNAF Site Report
    
    Speaker: Andrea Chierici (INFN-CNAF)
    
    Slides
  - 16
    
    LAL Site Report
    
    Speaker: Mr Michel Jouvin (LAL / IN2P3)
    
    Slides
Tuesday, 10 October
- 08:30
  
  Continental Breakfast
- Core Services and Infrastructure
  - 17
    
    Experiences with SpamCop
    
    Spamcop is a popular tool on the internet for reporting "spammers" to the ISP's. Several HEP sites have signed up SpamCop as a method of detecting spam. Unfortunately, the way Fermilab processes bounced spam email it can appear to Spamcop that Fermilab is an initiator of spam. This has occurred several times in the past year. To resolve the last incident, we requested several sites add Fermilab to their servers whitelists. In addition, we adjusted our mail gateways to eliminate bounced spam messages whenever possible. We are also looking into improving our spam filtering systems to minimize any spam that might get through and subsequently forwarded to another site. We would like to discuss getting together a list of other HEP sites and their email servers, and sharing this list so that we don't inadvertantly block each others email transmissions.
    
    Speaker: Jim Fromm (Fermilab)
    
    Slides
  - 18
    
    What is TRAC?
    
    This talk will present Trac, a unique open source tool combining a wiki, an issue tracker, a Subversion client and a roadmap manager. More than a tool, Trac is an extensible framework based on plugins. LAL is currently using this tool both for software development and system administration.
    
    Speaker: Michel Jouvin (LAL / IN2P3)
    
    Slides
  - 10:00
    
    Coffee break
  - 19
    
    Scientific Linux Update
    
    In this talk, we will present the status of Scientific Linux, focusing on relevant changes in the past six months. Next, we will also present current projects with SL, focusing on SL 5.x and scientific applications. To conclude, we will talk about future enhancements.
    
    Speaker: Jim Fromm (Fermilab)
    
    Slides
  - 20
    
    TWiki at CERN
    
    The Database and Engineering Services (DES) Group of the IT Department at CERN supports and maintains a CERN TWiki. This presentation will cover the history of TWiki at CERN, facts about the system, the technical setup, problems we face and our plans for finding a solution to them.
    
    Speaker: Hege Hansbakk (CERN)
    
    Slides
  - 21
    
    Service Level Status - A Real-time status Display for IT
    
    Nowadays, IT departments provide, and people use many various computing services of more and more heterogeneous nature. And there is a growing need of having a common display that groups these different services and reports about their status and availabilities in a uniform way. At CERN, it led to launching the SLS project. Service Level Status Overview (SLS) is a web-based tool that dynamically shows availability, basic information and statistics about various IT services, as well as dependencies between them. The presentation starts with a short description of the project, its goals, architecture, and users. Then, the concepts of subservices, metaservices, dependencies, service availability etc. are introduced, followed by a demonstration of the system and an explanation of how to add a service to SLS. The talk ends with a information on how SLS could be used by other HEP institutes.
    
    Speaker: Sebastian Lopienski (CERN)
    
    Slides
  - 12:00
    
    Lunch
  - 22
    
    Managing system history and problem tracking with SVN/Trac
    
    This talk will present LAL experience to address the need to track system configuration changes and link this with an issue tracker, using a combination of Subversion and Trac.
    
    Speaker: Michel Jouvin (LAL / IN2P3)
    
    Slides
  - 23
    
    Using RT to Manage Installation Workflow
    
    We had a need to more formally manage the workflow of installation tasks, because there had gotten to be so many happening simultaneously that confusion was resulting. We modeled the workflow using RT, the Request Tracking system that we use for user requests. The result is a relatively lightweight and flexible system that gives planners a "dashboard" of the status of all active projects, and the information they need to execute the task.
    
    Speaker: Chuck Boeheim (SLAC)
    
    Slides
  - 24
    
    High Availability Methods at GSI
    
    This presentation gives an overwiev about the methods used to ensure the high availability of important services such as data base, web service, central file server a. o. Apart from commercial products for certain systems (Oracle, Exchange) different open source linux tools (heartbeat, drbd, mon) are combined with monitoring and hardware methods and adapted to our special needs.
    
    Speaker: Karin Miers (GSI/Darmstadt)
    
    Slides
  - 15:00
    
    Coffee break
  - 25
    
    Using Quattor to manage a grid (EGEE) Fabric
    
    Deploying grid services means managing a potentially large number of machines that partially share their configuration. A tool is needed not only to install but to maintain such a configuration. Quattor, developped as part of EDG, is such a tool. This talk will focus on the LCG/gLite support in Quattor.
    
    Speaker: Michel Jouvin (LAL / IN2P3)
    
    Slides
  - 26
    
    Spam - Statistics and Fighting Methods
    
    Speaker: Walter Schoen (GSI/Darmstadt)
  - 27
    
    Scientific Linux Inventory Project (SLIP)
    
    This talk will discuss the effort to provide an inventory of all Linux machines at Fermilab. We will describe the motivation for the project, the package we selected, and the current state of the project.
    
    Speaker: Jim Fromm (Fermilab)
    
    Slides
- Networking Dinner at Newport News City Center Marriott
  - 28
    
    From a Spark in Vacuum to Sparking the Vacuum
    
    Speaker: Fred Dylla (Jlab)
Wednesday, 11 October
- 08:30
  
  Continental Breakfast
- Compute Clusters/Storage
  - 29
    
    RACF's PXE Installation Management System
    
    The BNL RHIC/ATLAS Computing Facility (RACF) Central Analysis/Reconstruction Server (CAS/CRS) Farm is a large scale computing cluster currently consisting of ~2000 multiprocessor hosts running Scientific Linux. Besides providing for computation, the CAS/CRS systems' local disk drives are used by network distributed data systems such as dCache, ROOTD and XROOTD to store considerable amounts of data (presently ~400 TB). The sheer number of systems in the farm, combined with our distributed storage model complicates network installation management. This presentation describes the system developed at RACF to fully automate and simplify management of the PXE installation process.
    
    Speaker: Christopher Hollowell (BNL)
    
    Slides
  - 30
    
    Support of Kerberos 5 Authenticated Environment by TORQUE
    
    TORQUE is a successor of the OpenPBS batch queuing system, available as an Open Source product. Despite the wide spread usage of TORQUE as Job Management System on computational farms and LHC grid installations, this batch system does not support any advanced authentication mechanisms. We show two possibilities, how to redesign the existing source code in order to add Kerberos 5 authentication support for batch jobs. The first way uses local server-client RPC connections while the second one makes use of the Authenticated Remote Control tool (ARCv2). The described modifications have been successfully deployed in the local computing infrastructure of the H1 Collaboration at DESY. This provides on identical environment for batch jobs and user desktop processes.
    
    Speaker: Bogdan Lobodzinski (DESY)
    
    Slides
  - 10:00
    
    Coffee break
  - 31
    
    Planning for Hall D: The Hazards of Fast Tape Drives
    
    The upgrade to Jefferson Lab will require a hardware refresh of the mass storage system in order to handle the higher volume of data from new experiments and simulations. The next generation, higher capactity tape drives are also significantly faster, a fact that has implications for almost all parts of the mass storage system. This talk examines the performance tuning required to make efficient use of these drives and underscores some of the particular needs of tape-based storage systems used by most experiments.
    
    Speaker: Bryan Hess (Jefferson Lab)
    
    Slides
  - 32
    
    Porting to and Running Applications on 64 Bit Platforms
    
    The author describes his recent experience porting software packages to and running these packages on 64 bit machines with Solaris and Linux. Issues discussed include code modification, compiling, operating system requirements, and performance comparisons with 32 bit machines.
    
    Speaker: Carl Timmer (Jefferson Lab)
    
    Slides
  - 33
    
    NGF NERSC's Global Filesystem and PDSF
    
    I would like to explain a bit about our global filesystem and it's use on PDSF. Also about how this filesystem can be extended to other sites/labs. Our filesystem is GPFS, but the concept can also be extended to Lustre or other cluster filesystems.
    
    Speaker: Tom Langley (NERSC)
    
    Slides
  - 12:00
    
    Lunch
  - 34
    
    Storage Class : Problematic and Implementation at CCIN2P3
    
    Storage Classes attempt to represent storage use cases for a given experiment. It is considered harmfull to match the storage classes to real life storage system especialy if the latter is based on path to get the storage configuration of a file. This presentation aims to define the problematic of Storage Classes, explain one possible solution which is implemented at CCIN2P3 and discuss the pros ans cons.
    
    Speaker: Jonathan Schaeffer (CC-IN2P3)
    
    Slides
  - 35
    
    Benchmark Updates
    
    This talk will present the current state of the art of benchmarking at CERN. We will explain our benchmarking procedures, review our latest results and talk about where we are going from here. As part of the results review, we will comment on the current CPU trends and we will talk about the increasingly important power consumption.
    
    Speaker: Helge Meinhard (CERN)
    
    Slides
  - 36
    
    Recent Fabric Management Improvements at CERN
    
    This talk will describe some improvements to the monitoring and management of the storage and CPU services in the following areas - use of SMART for disk monitoring - integration of disk server monitoring and storage system management - transmission of Grid job memory requirements to the local workload management
    
    Speaker: Tony Cass (CERN)
    
    Slides
- Cyber Security/Authentication
  - 15:00
    
    Coffee break
  - 37
    
    The Stakkato Intrusions
    
    During 15 months, from late 2003 until early 2005, hundreds of supercomputing sites, universities and companies worldwide were hit by a series of intrusions, with the perpetrator leapfrogging from site to site using stolen ssh passwords. These are collectively known as the Stakkato intrusions, and includethe Teragrid Incident and the Cisco IOS source code theft, both of which received widespread attention from the media. This talk will cover case studies of performed intrusions, an analysis of why Stakkato could be so successful, and the story of how the suspect was finally tracked down and caught.
    
    Speaker: Leif Nixon
    
    Slides
  - 38
    
    Network Security Monitoring with Sguil
    
    Most mid- or large-sized organizations conduct some sort of network monitoring for security purposes. Traditional Intrusion Detection Systems (IDS) tell only part of the story, leaving analysts to perform complex and time-consuming data-mining operations from multiple sources just to answer simple questions about IDS alerts. This talk presents a more efficient model that uses the open source Sguil software to optimize the process for analyst time and efficiency.
    
    Speaker: David Bianco (Jlab)
    
    Slides
Thursday, 12 October
- 08:00
  
  Continental Breakfast
- Grid Projects
  - 39
    
    GridX1: A Canadian Computational grid for HEP Applications
    
    GridX1 is a Canadian computational grid which combines the shared resources of several Canadian research institutes for the primary purpose of executing HEP applications. With more than two years of production experience, GridX1 has demonstrated the successful application of Globus Toolkit (GT) v.2 cluster gatekeepers managed by a Condor-G resource brokering system. A novel feature of the project was a resource brokering interface to the LHC Compute Grid, which was used during Data Challenge 2 to route ATLAS jobs to the Canadian resources without having dedicated Compute Elements at each cluster. Further, independent Condor-G resource brokers have been implemented to manage the Canadian ATLAS and BaBar MC production systems. Finally, our recent efforts have been directed toward building a service-oriented grid using GT4, including a WS-MDS registry service and WS-GRAM enabled metaschedulers built upon Condor and GridWay.
    
    Speaker: I. Gable (University of Victoria/HEPnet Canada)
    
    Slides
  - 40
    
    GridPP
    
    GridPP is a UK e-Science project which started in 2001 with the aim of devloping and operating a production Grid for UK Particle Physicists. It is aligned with the EGEE infrastructure and the WLCG Project but also worsk with current running experiments and theorists. GridPP aims to provide an environment in which all UK particle physcists can do their analysis, share data, etc, and the UK can also contribute to the worldwide collaboration and activities of their experiments .
    
    Speaker: John Gordon (CCLRC-RAL)
    
    Slides
  - 10:00
    
    Coffee break
  - 41
    
    The EGEE Grid Infrastructure
    
    The EGEE grid infrastructure is in constant production use with significant workloads, not only for High Energy Physics but for many other scientific applications. An overview of the EGEE project, the infrastructure itself, and how it is being used will be given. Several applications rely on a long term infrastructure being in place; the current ideas of how this may be achieved will be discussed.
    
    Speaker: Ian Bird (CERN)
    
    Slides
  - 42
    
    Virtual Machines in a Distributed Environment
    
    Speaker: Mauricio Tsugawa (University of Florida)
    
    Slides
  - 12:00
    
    Lunch
  - 43
    
    Issues and problems around Grid site management
    
    The problems of grid site reliability and availability are becoming the biggest outstanding issue in building a reliable grid service. This is particularly important for WLCG where specific reliability targets are set. This talk will outline the scope of the problems that need to be addressed, and point out potential areas where HEPiX members can contribute, and will seek input on how we can address some of the problems.
    
    Speaker: Dr Ian Bird (CERN)
    
    Slides
  - 44
    
    FermiGrid - Status and Plans
    
    FermiGrid is the Fermilab Campus Grid. This talk will discuss the current state of FermiGrid and plans for the upcoming year.
    
    Speaker: Keith Chadwick (Fermilab)
    
    Slides
  - 15:00
    
    Coffee break
  - 45
    
    Open Science Grid Progress and Vision
    
    This talk will detail recent Open Science Grid progress and outline the vision for the upcoming year.
    
    Speaker: Keith Chadwick (Fermilab)
    
    Slides
  - 46
    
    Grid Security in WLCG and EGEE
    
    This talk will present the current status, plans and issues for Grid Security in WLCG and EGEE. This will include Authentication, Authorization, Policy and Operational Security.
    
    Speaker: David Kelsey (CCLRC/RAL)
    
    Slides
Friday, 13 October
- 09:00
  
  Continental Breakfast
- Grid Projects II
  - 47
    
    Testing the UK Tier 2 Data Transfer and Storage Infrastructure
    
    When the LHC experiments start taking data next year the Tier 2 sites in the UK (and elsewhere) will need to be able to recieve and transmit data data at unprecidented rates and reliabilities. We present the efforts in the UK to test the disk to disk transfer rates between Tier 2 sites along with some of the lessons learnt and results obtained.
    
    Speaker: Chris Brew (CCLRC - RAL)
    
    Slides
- 48
  
  IHEPCCC
  
  Speaker: Randy Sobie
  
  Slides
- Closing Comments