Hepix 2006

US/Central
Jefferson Lab

Jefferson Lab

Description
The HEPiX forum unifies IT system support engineers from the High Energy Physics (HEP) laboratories and institutes, such as BNL, CERN, DESY, FNAL, IN2P3, INFN, JLAB, NIKHEF, RAL, SLAC, TRIUMF and others. The HEPiX meetings have been held regularly since 1991, and are an excellent source of information for IT specialists. That's why they enjoy large participation also from the non-HEP organizations.
    • 08:00 09:00
      Registration/Continental Breakfast 1h
    • 09:00 09:30
      Welcome
      • 09:00
        Welcome to JLab 20m
        Speaker: Roy Whitney (Jefferson Lab)
    • 09:30 12:00
      Site Reports
      • 09:30
        BNL 20m
        Speaker: Alexander Withers (BNL)
        Slides
        Video
      • 09:50
        CERN 20m
        Speaker: Helge Meinhard (CERN)
        Slides
      • 10:10
        Fermilab 20m
        Speaker: Lisa Giacchetti (Fermilab)
        Slides
      • 10:30
        Coffee break 30m
      • 11:00
        GridKa 20m
        Speaker: Manfred Alef (GridKa)
        Slides
      • 11:20
        Jlab 20m
        Speaker: Sandy Philpott (Jlab)
        Slides
      • 11:40
        TRIUMF 20m
        Speaker: Corrie Kost (Triumf)
        Slides
    • 12:00 13:20
      Lunch 1h 20m
    • 13:30 17:00
      Site Reports II
      • 13:30
        NERSC Site Report 20m
        Speaker: Cary Whitney (NERSC)
        Slides
      • 13:50
        NIKHEF Site Report 20m
        Speaker: Paul Kuipers (NIKHEF)
        Slides
      • 14:10
        CCLRC-RAL Site Report 20m
        Speaker: Martin Bly (CCLRC-RAL)
        Slides
      • 14:30
        INFN Site Report 20m
        Speaker: Roberto Gomezel (INFN)
        Slides
      • 14:50
        Coffee break 30m
      • 15:20
        GSI/Darmstadt Site Report 20m
        Speaker: Walter Schoen (GSI/Darmstadt)
      • 15:40
        DAPNIA Site Report 20m
        Speaker: Pierrick Micout (CEA DAPNIA Saclay)
        Slides
      • 16:00
        SLAC Site Report 20m
        Speaker: Chuck Boeheim (SLAC)
        Slides
      • 16:20
        INFN-CNAF Site Report 20m
        Speaker: Andrea Chierici (INFN-CNAF)
        Slides
      • 16:40
        LAL Site Report 20m
        Speaker: Mr Michel Jouvin (LAL / IN2P3)
        Slides
    • 08:30 09:00
      Continental Breakfast 30m
    • 09:00 18:00
      Core Services and Infrastructure
      • 09:00
        Experiences with SpamCop 30m
        Spamcop is a popular tool on the internet for reporting "spammers" to the ISP's. Several HEP sites have signed up SpamCop as a method of detecting spam. Unfortunately, the way Fermilab processes bounced spam email it can appear to Spamcop that Fermilab is an initiator of spam. This has occurred several times in the past year. To resolve the last incident, we requested several sites add Fermilab to their servers whitelists. In addition, we adjusted our mail gateways to eliminate bounced spam messages whenever possible. We are also looking into improving our spam filtering systems to minimize any spam that might get through and subsequently forwarded to another site. We would like to discuss getting together a list of other HEP sites and their email servers, and sharing this list so that we don't inadvertantly block each others email transmissions.
        Speaker: Jim Fromm (Fermilab)
        Slides
      • 09:30
        What is TRAC? 30m
        This talk will present Trac, a unique open source tool combining a wiki, an issue tracker, a Subversion client and a roadmap manager. More than a tool, Trac is an extensible framework based on plugins. LAL is currently using this tool both for software development and system administration.
        Speaker: Michel Jouvin (LAL / IN2P3)
        Slides
      • 10:00
        Coffee break 30m
      • 10:30
        Scientific Linux Update 30m
        In this talk, we will present the status of Scientific Linux, focusing on relevant changes in the past six months. Next, we will also present current projects with SL, focusing on SL 5.x and scientific applications. To conclude, we will talk about future enhancements.
        Speaker: Jim Fromm (Fermilab)
        Slides
      • 11:00
        TWiki at CERN 30m
        The Database and Engineering Services (DES) Group of the IT Department at CERN supports and maintains a CERN TWiki. This presentation will cover the history of TWiki at CERN, facts about the system, the technical setup, problems we face and our plans for finding a solution to them.
        Speaker: Hege Hansbakk (CERN)
        Slides
      • 11:30
        Service Level Status - A Real-time status Display for IT 30m
        Nowadays, IT departments provide, and people use many various computing services of more and more heterogeneous nature. And there is a growing need of having a common display that groups these different services and reports about their status and availabilities in a uniform way. At CERN, it led to launching the SLS project. Service Level Status Overview (SLS) is a web-based tool that dynamically shows availability, basic information and statistics about various IT services, as well as dependencies between them. The presentation starts with a short description of the project, its goals, architecture, and users. Then, the concepts of subservices, metaservices, dependencies, service availability etc. are introduced, followed by a demonstration of the system and an explanation of how to add a service to SLS. The talk ends with a information on how SLS could be used by other HEP institutes.
        Speaker: Sebastian Lopienski (CERN)
        Slides
      • 12:00
        Lunch 1h 30m
      • 13:30
        Managing system history and problem tracking with SVN/Trac 30m
        This talk will present LAL experience to address the need to track system configuration changes and link this with an issue tracker, using a combination of Subversion and Trac.
        Speaker: Michel Jouvin (LAL / IN2P3)
        Slides
      • 14:00
        Using RT to Manage Installation Workflow 30m
        We had a need to more formally manage the workflow of installation tasks, because there had gotten to be so many happening simultaneously that confusion was resulting. We modeled the workflow using RT, the Request Tracking system that we use for user requests. The result is a relatively lightweight and flexible system that gives planners a "dashboard" of the status of all active projects, and the information they need to execute the task.
        Speaker: Chuck Boeheim (SLAC)
        Slides
      • 14:30
        High Availability Methods at GSI 30m
        This presentation gives an overwiev about the methods used to ensure the high availability of important services such as data base, web service, central file server a. o. Apart from commercial products for certain systems (Oracle, Exchange) different open source linux tools (heartbeat, drbd, mon) are combined with monitoring and hardware methods and adapted to our special needs.
        Speaker: Karin Miers (GSI/Darmstadt)
        Slides
      • 15:00
        Coffee break 30m
      • 15:30
        Using Quattor to manage a grid (EGEE) Fabric 30m
        Deploying grid services means managing a potentially large number of machines that partially share their configuration. A tool is needed not only to install but to maintain such a configuration. Quattor, developped as part of EDG, is such a tool. This talk will focus on the LCG/gLite support in Quattor.
        Speaker: Michel Jouvin (LAL / IN2P3)
        Slides
      • 16:00
        Spam - Statistics and Fighting Methods 30m
        Speaker: Walter Schoen (GSI/Darmstadt)
      • 16:30
        Scientific Linux Inventory Project (SLIP) 30m
        This talk will discuss the effort to provide an inventory of all Linux machines at Fermilab. We will describe the motivation for the project, the package we selected, and the current state of the project.
        Speaker: Jim Fromm (Fermilab)
        Slides
    • 18:00 20:00
      Networking Dinner at Newport News City Center Marriott
      • 18:00
        From a Spark in Vacuum to Sparking the Vacuum 30m
        Speaker: Fred Dylla (Jlab)
    • 08:30 09:00
      Continental Breakfast 30m
    • 09:00 15:00
      Compute Clusters/Storage
      • 09:00
        RACF's PXE Installation Management System 30m
        The BNL RHIC/ATLAS Computing Facility (RACF) Central Analysis/Reconstruction Server (CAS/CRS) Farm is a large scale computing cluster currently consisting of ~2000 multiprocessor hosts running Scientific Linux. Besides providing for computation, the CAS/CRS systems' local disk drives are used by network distributed data systems such as dCache, ROOTD and XROOTD to store considerable amounts of data (presently ~400 TB). The sheer number of systems in the farm, combined with our distributed storage model complicates network installation management. This presentation describes the system developed at RACF to fully automate and simplify management of the PXE installation process.
        Speaker: Christopher Hollowell (BNL)
        Slides
      • 09:30
        Support of Kerberos 5 Authenticated Environment by TORQUE 30m
        TORQUE is a successor of the OpenPBS batch queuing system, available as an Open Source product. Despite the wide spread usage of TORQUE as Job Management System on computational farms and LHC grid installations, this batch system does not support any advanced authentication mechanisms. We show two possibilities, how to redesign the existing source code in order to add Kerberos 5 authentication support for batch jobs. The first way uses local server-client RPC connections while the second one makes use of the Authenticated Remote Control tool (ARCv2). The described modifications have been successfully deployed in the local computing infrastructure of the H1 Collaboration at DESY. This provides on identical environment for batch jobs and user desktop processes.
        Speaker: Bogdan Lobodzinski (DESY)
        Slides
      • 10:00
        Coffee break 30m
      • 10:30
        Planning for Hall D: The Hazards of Fast Tape Drives 30m
        The upgrade to Jefferson Lab will require a hardware refresh of the mass storage system in order to handle the higher volume of data from new experiments and simulations. The next generation, higher capactity tape drives are also significantly faster, a fact that has implications for almost all parts of the mass storage system. This talk examines the performance tuning required to make efficient use of these drives and underscores some of the particular needs of tape-based storage systems used by most experiments.
        Speaker: Bryan Hess (Jefferson Lab)
        Slides
      • 11:00
        Porting to and Running Applications on 64 Bit Platforms 30m
        The author describes his recent experience porting software packages to and running these packages on 64 bit machines with Solaris and Linux. Issues discussed include code modification, compiling, operating system requirements, and performance comparisons with 32 bit machines.
        Speaker: Carl Timmer (Jefferson Lab)
        Slides
      • 11:30
        NGF NERSC's Global Filesystem and PDSF 30m
        I would like to explain a bit about our global filesystem and it's use on PDSF. Also about how this filesystem can be extended to other sites/labs. Our filesystem is GPFS, but the concept can also be extended to Lustre or other cluster filesystems.
        Speaker: Tom Langley (NERSC)
        Slides
      • 12:00
        Lunch 1h 30m
      • 13:30
        Storage Class : Problematic and Implementation at CCIN2P3 30m
        Storage Classes attempt to represent storage use cases for a given experiment. It is considered harmfull to match the storage classes to real life storage system especialy if the latter is based on path to get the storage configuration of a file. This presentation aims to define the problematic of Storage Classes, explain one possible solution which is implemented at CCIN2P3 and discuss the pros ans cons.
        Speaker: Jonathan Schaeffer (CC-IN2P3)
        Slides
      • 14:00
        Benchmark Updates 30m
        This talk will present the current state of the art of benchmarking at CERN. We will explain our benchmarking procedures, review our latest results and talk about where we are going from here. As part of the results review, we will comment on the current CPU trends and we will talk about the increasingly important power consumption.
        Speaker: Helge Meinhard (CERN)
        Slides
      • 14:30
        Recent Fabric Management Improvements at CERN 30m
        This talk will describe some improvements to the monitoring and management of the storage and CPU services in the following areas - use of SMART for disk monitoring - integration of disk server monitoring and storage system management - transmission of Grid job memory requirements to the local workload management
        Speaker: Tony Cass (CERN)
        Slides
    • 15:00 17:15
      Cyber Security/Authentication
      • 15:00
        Coffee break 30m
      • 15:30
        The Stakkato Intrusions 45m
        During 15 months, from late 2003 until early 2005, hundreds of supercomputing sites, universities and companies worldwide were hit by a series of intrusions, with the perpetrator leapfrogging from site to site using stolen ssh passwords. These are collectively known as the Stakkato intrusions, and includethe Teragrid Incident and the Cisco IOS source code theft, both of which received widespread attention from the media. This talk will cover case studies of performed intrusions, an analysis of why Stakkato could be so successful, and the story of how the suspect was finally tracked down and caught.
        Speaker: Leif Nixon
        Slides
      • 16:15
        Network Security Monitoring with Sguil 30m
        Most mid- or large-sized organizations conduct some sort of network monitoring for security purposes. Traditional Intrusion Detection Systems (IDS) tell only part of the story, leaving analysts to perform complex and time-consuming data-mining operations from multiple sources just to answer simple questions about IDS alerts. This talk presents a more efficient model that uses the open source Sguil software to optimize the process for analyst time and efficiency.
        Speaker: David Bianco (Jlab)
        Slides
    • 08:00 08:30
      Continental Breakfast 30m
    • 09:00 16:30
      Grid Projects
      • 09:00
        GridX1: A Canadian Computational grid for HEP Applications 30m
        GridX1 is a Canadian computational grid which combines the shared resources of several Canadian research institutes for the primary purpose of executing HEP applications. With more than two years of production experience, GridX1 has demonstrated the successful application of Globus Toolkit (GT) v.2 cluster gatekeepers managed by a Condor-G resource brokering system. A novel feature of the project was a resource brokering interface to the LHC Compute Grid, which was used during Data Challenge 2 to route ATLAS jobs to the Canadian resources without having dedicated Compute Elements at each cluster. Further, independent Condor-G resource brokers have been implemented to manage the Canadian ATLAS and BaBar MC production systems. Finally, our recent efforts have been directed toward building a service-oriented grid using GT4, including a WS-MDS registry service and WS-GRAM enabled metaschedulers built upon Condor and GridWay.
        Speaker: I. Gable (University of Victoria/HEPnet Canada)
        Slides
      • 09:30
        GridPP 30m
        GridPP is a UK e-Science project which started in 2001 with the aim of devloping and operating a production Grid for UK Particle Physicists. It is aligned with the EGEE infrastructure and the WLCG Project but also worsk with current running experiments and theorists. GridPP aims to provide an environment in which all UK particle physcists can do their analysis, share data, etc, and the UK can also contribute to the worldwide collaboration and activities of their experiments .
        Speaker: John Gordon (CCLRC-RAL)
        Slides
      • 10:00
        Coffee break 30m
      • 10:30
        The EGEE Grid Infrastructure 30m
        The EGEE grid infrastructure is in constant production use with significant workloads, not only for High Energy Physics but for many other scientific applications. An overview of the EGEE project, the infrastructure itself, and how it is being used will be given. Several applications rely on a long term infrastructure being in place; the current ideas of how this may be achieved will be discussed.
        Speaker: Ian Bird (CERN)
        Slides
      • 11:00
        Virtual Machines in a Distributed Environment 1h
        Speaker: Mauricio Tsugawa (University of Florida)
        Slides
      • 12:00
        Lunch 1h 30m
      • 13:30
        Issues and problems around Grid site management 1h
        The problems of grid site reliability and availability are becoming the biggest outstanding issue in building a reliable grid service. This is particularly important for WLCG where specific reliability targets are set. This talk will outline the scope of the problems that need to be addressed, and point out potential areas where HEPiX members can contribute, and will seek input on how we can address some of the problems.
        Speaker: Dr Ian Bird (CERN)
        Slides
      • 14:30
        FermiGrid - Status and Plans 30m
        FermiGrid is the Fermilab Campus Grid. This talk will discuss the current state of FermiGrid and plans for the upcoming year.
        Speaker: Keith Chadwick (Fermilab)
        Slides
      • 15:00
        Coffee break 30m
      • 15:30
        Open Science Grid Progress and Vision 30m
        This talk will detail recent Open Science Grid progress and outline the vision for the upcoming year.
        Speaker: Keith Chadwick (Fermilab)
        Slides
      • 16:00
        Grid Security in WLCG and EGEE 30m
        This talk will present the current status, plans and issues for Grid Security in WLCG and EGEE. This will include Authentication, Authorization, Policy and Operational Security.
        Speaker: David Kelsey (CCLRC/RAL)
        Slides
    • 09:00 09:30
      Continental Breakfast 30m
    • 09:30 10:00
      Grid Projects II
      • 09:30
        Testing the UK Tier 2 Data Transfer and Storage Infrastructure 30m
        When the LHC experiments start taking data next year the Tier 2 sites in the UK (and elsewhere) will need to be able to recieve and transmit data data at unprecidented rates and reliabilities. We present the efforts in the UK to test the disk to disk transfer rates between Tier 2 sites along with some of the lessons learnt and results obtained.
        Speaker: Chris Brew (CCLRC - RAL)
        Slides
    • 10:00 10:30
      IHEPCCC 30m
      Speaker: Randy Sobie
      Slides
    • 10:30 11:00
      Closing Comments