Hepix 2006

from to (US/Central)
at Jefferson Lab
The HEPiX forum unifies IT system support engineers from the High Energy Physics (HEP) laboratories and institutes, such as BNL, CERN, DESY, FNAL, IN2P3, INFN, JLAB, NIKHEF, RAL, SLAC, TRIUMF and others. The HEPiX meetings have been held regularly since 1991, and are an excellent source of information for IT specialists. That's why they enjoy large participation also from the non-HEP organizations. 
Go to day
  • Monday, October 9, 2006
    • 08:00 - 09:00 Registration/Continental Breakfast
    • 09:00 - 09:30 Welcome
      • 09:00 Welcome to JLab 20'
        Speaker: Roy Whitney (Jefferson Lab)
    • 09:30 - 12:00 Site Reports
      • 09:30 BNL 20'
        Speaker: Alexander Withers (BNL)
        Material: Slides powerpoint file Video link
      • 09:50 CERN 20'
        Speaker: Helge Meinhard (CERN)
        Material: Slides powerpoint file
      • 10:10 Fermilab 20'
        Speaker: Lisa Giacchetti (Fermilab)
        Material: Slides powerpoint file
      • 10:30 Coffee break 30'
      • 11:00 GridKa 20'
        Speaker: Manfred Alef (GridKa)
        Material: Slides pdf file
      • 11:20 Jlab 20'
        Speaker: Sandy Philpott (Jlab)
        Material: Slides powerpoint file
      • 11:40 TRIUMF 20'
        Speaker: Corrie Kost (Triumf)
        Material: Slides powerpoint file
    • 12:00 - 13:20 Lunch
    • 13:30 - 17:00 Site Reports II
      • 13:30 NERSC Site Report 20'
        Speaker: Cary Whitney (NERSC)
        Material: Slides powerpoint file
      • 13:50 NIKHEF Site Report 20'
        Speaker: Paul Kuipers (NIKHEF)
        Material: Slides powerpoint file
      • 14:10 CCLRC-RAL Site Report 20'
        Speaker: Martin Bly (CCLRC-RAL)
        Material: Slides powerpoint file
      • 14:30 INFN Site Report 20'
        Speaker: Roberto Gomezel (INFN)
        Material: Slides powerpoint file
      • 14:50 Coffee break 30'
      • 15:20 GSI/Darmstadt Site Report 20'
        Speaker: Walter Schoen (GSI/Darmstadt)
      • 15:40 DAPNIA Site Report 20'
        Speaker: Pierrick Micout (CEA DAPNIA Saclay)
        Material: Slides powerpoint file
      • 16:00 SLAC Site Report 20'
        Speaker: Chuck Boeheim (SLAC)
        Material: Slides pdf file
      • 16:20 INFN-CNAF Site Report 20'
        Speaker: Andrea Chierici (INFN-CNAF)
        Material: Slides powerpoint file
      • 16:40 LAL Site Report 20'
        Speaker: Mr. Michel Jouvin (LAL / IN2P3)
        Material: Slides powerpoint file
  • Tuesday, October 10, 2006
    • 08:30 - 09:00 Continental Breakfast
    • 09:00 - 18:00 Core Services and Infrastructure
      • 09:00 Experiences with SpamCop 30'
        Spamcop is a popular tool on the internet for reporting
        "spammers" to the ISP's. Several HEP sites have signed up
        SpamCop as a method of detecting spam. Unfortunately, the
        way Fermilab processes bounced spam email it can appear to
        Spamcop that Fermilab is an initiator of spam. This has
        occurred several times in the past year. To resolve the last
        incident, we requested several sites add Fermilab to their
        servers whitelists. In addition, we adjusted our mail
        gateways to eliminate bounced spam messages whenever
        possible. We are also looking into improving our spam
        filtering systems to minimize any spam that might get
        through and subsequently forwarded to another site. We would
        like to discuss getting together a list of other HEP sites
        and their email servers, and sharing this list so that we
        don't inadvertantly block each others email transmissions.
        Speaker: Jim Fromm (Fermilab)
        Material: Slides powerpoint file
      • 09:30 What is TRAC? 30'
        This talk will present Trac, a unique open source tool
        combining a wiki, an issue tracker, a Subversion client and
        a roadmap manager. More than a tool, Trac is an extensible
        framework based on plugins. LAL is currently using this tool
        both for software development and system administration.
        Speaker: Michel Jouvin (LAL / IN2P3)
        Material: Slides powerpoint file
      • 10:00 Coffee break 30'
      • 10:30 Scientific Linux Update 30'
        In this talk, we will present the status of Scientific
        Linux, focusing on relevant changes in the past six months.
        Next, we will also present current projects with SL,
        focusing on SL 5.x and scientific applications. To conclude,
        we will talk about future enhancements.
        Speaker: Jim Fromm (Fermilab)
        Material: Slides powerpoint file
      • 11:00 TWiki at CERN 30'
        The Database and Engineering Services (DES) Group of the IT
        Department at CERN supports and maintains a CERN TWiki.
        This presentation will cover the history of TWiki at CERN,
        facts about the system, the technical setup, problems we
        face and our plans for finding a solution to them.
        Speaker: Hege Hansbakk (CERN)
        Material: Slides powerpoint file
      • 11:30 Service Level Status - A Real-time status Display for IT 30'
        Nowadays, IT departments provide, and people use many
        various computing services of more and more heterogeneous
        nature. And there is a growing need of having a common
        display that groups these different services and reports
        about their status and availabilities in a uniform way. At
        CERN, it led to launching the SLS project.
        Service Level Status Overview (SLS) is a web-based tool that
        dynamically shows availability, basic information and
        statistics about various IT services, as well as
        dependencies between them.
        The presentation starts with a short description of the
        project, its goals, architecture, and users. Then, the
        concepts of subservices, metaservices, dependencies, service
        availability etc. are introduced, followed by a
        demonstration of the system and an explanation of how to add
        a service to SLS. The talk ends with a information on how
        SLS could be used by other HEP institutes.
        Speaker: Sebastian Lopienski (CERN)
        Material: Slides powerpoint filedown arrow
      • 12:00 Lunch 1h30'
      • 13:30 Managing system history and problem tracking with SVN/Trac 30'
        This talk will present LAL experience to address the need to
        track system configuration changes and link this with an
        issue tracker, using a combination of Subversion and Trac.
        Speaker: Michel Jouvin (LAL / IN2P3)
        Material: Slides powerpoint file
      • 14:00 Using RT to Manage Installation Workflow 30'
        We had a need to more formally manage the workflow of
        installation tasks, because there had gotten to be so many
        happening simultaneously that confusion was resulting.  We
        modeled the workflow using RT, the Request Tracking system
        that we use for user requests.  The result is a relatively
        lightweight and flexible system that gives planners a
        "dashboard" of the status of all active projects, and the
        information they need to execute the task.
        Speaker: Chuck Boeheim (SLAC)
        Material: Slides pdf file
      • 14:30 High Availability Methods at GSI 30'
        This presentation gives an overwiev about the methods used
        to ensure the high availability of important services such
        as  data base, web service, central file server a. o. Apart
        from commercial products for 
        certain systems (Oracle, Exchange) different open source
        linux tools (heartbeat, drbd, mon) are combined with
        monitoring and hardware 
        methods and adapted to our special needs.
        Speaker: Karin Miers (GSI/Darmstadt)
        Material: Slides pdf file
      • 15:00 Coffee break 30'
      • 15:30 Using Quattor to manage a grid (EGEE) Fabric 30'
        Deploying grid services means managing a potentially large
        number of machines that partially share their configuration.
        A tool is needed not only to install but to maintain such a
        configuration. Quattor, developped as part of EDG, is such a
        tool. This talk will focus on the LCG/gLite support in Quattor.
        Speaker: Michel Jouvin (LAL / IN2P3)
        Material: Slides powerpoint file
      • 16:00 Spam - Statistics and Fighting Methods 30'
        Speaker: Walter Schoen (GSI/Darmstadt)
      • 16:30 Scientific Linux Inventory Project (SLIP) 30'
        This talk will discuss the effort to provide an inventory of
        all Linux machines at Fermilab.  We will describe the
        motivation for the project, the package we selected, and the
        current state of the project.
        Speaker: Jim Fromm (Fermilab)
        Material: Slides powerpoint file
    • 18:00 - 20:00 Networking Dinner at Newport News City Center Marriott
      • 18:00 From a Spark in Vacuum to Sparking the Vacuum 30'
        Speaker: Fred Dylla (Jlab)
  • Wednesday, October 11, 2006
    • 08:30 - 09:00 Continental Breakfast
    • 09:00 - 15:00 Compute Clusters/Storage
      • 09:00 RACF's PXE Installation Management System 30'
        The BNL RHIC/ATLAS Computing Facility (RACF)  Central
        Analysis/Reconstruction Server (CAS/CRS) Farm is a large
        scale computing cluster currently consisting of ~2000
        hosts running Scientific Linux.  Besides providing for
        computation, the CAS/CRS systems' local disk drives are used
        by network 
        distributed data systems such as dCache, ROOTD and XROOTD to
        store considerable amounts of data (presently ~400 TB).  The
        sheer number of systems in the farm, combined with our
        distributed storage model complicates network installation
        This presentation describes the system developed at RACF to
        fully automate and simplify management of the PXE
        installation process.
        Speaker: Christopher Hollowell (BNL)
        Material: Slides pdf file
      • 09:30 Support of Kerberos 5 Authenticated Environment by TORQUE 30'
        TORQUE is a successor of the OpenPBS batch queuing system,
        available as an Open Source product. Despite the wide spread
        usage of TORQUE as Job Management System on  computational
        farms and LHC grid installations, this batch system does not
        support any advanced authentication mechanisms. 
        We show two possibilities, how to redesign the existing
        source code in order to add Kerberos 5 authentication
        support for batch jobs. 
        The first way uses local server-client RPC connections while
        the second one makes use of the Authenticated Remote Control
        tool (ARCv2).
        The described modifications have been successfully deployed
        in the local computing infrastructure of the H1
        Collaboration at DESY. This provides on identical
        environment for batch jobs and
        user desktop processes.
        Speaker: Bogdan Lobodzinski (DESY)
        Material: Slides pdf file
      • 10:00 Coffee break 30'
      • 10:30 Planning for Hall D: The Hazards of Fast Tape Drives 30'
        The upgrade to Jefferson Lab will require a hardware refresh
        of the mass storage system in order to handle the higher
        volume of data from new experiments and simulations. The
        next generation, higher capactity tape drives are also
        significantly faster, a fact that has implications for
        almost all parts of the mass storage system. This talk
        examines the performance tuning required to make efficient
        use of these drives and underscores some of the particular
        needs of tape-based storage systems used by most experiments.
        Speaker: Bryan Hess (Jefferson Lab)
        Material: Slides powerpoint file
      • 11:00 Porting to and Running Applications on 64 Bit Platforms 30'
        The author describes his recent experience porting software
        packages to and running these packages on 64 bit machines
        with Solaris and Linux. Issues discussed include code
        modification, compiling, operating system requirements, and
        performance comparisons with 32 bit machines.
        Speaker: Carl Timmer (Jefferson Lab)
        Material: Slides powerpoint file
      • 11:30 NGF NERSC's Global Filesystem and PDSF 30'
        I would like to explain a bit about our global filesystem
        and it's use on PDSF.  Also about how this filesystem can be
        extended to other sites/labs.  Our filesystem is GPFS, but
        the concept can also be extended to Lustre or other cluster
        Speaker: Tom Langley (NERSC)
        Material: Slides powerpoint file
      • 12:00 Lunch 1h30'
      • 13:30 Storage Class : Problematic and Implementation at CCIN2P3 30'
        Storage Classes attempt to represent storage use cases for a
        given experiment. It is considered harmfull to match the
        storage classes to real life storage system especialy if the
        latter is based on path to get the storage configuration of
        a file.
        This presentation aims to define the problematic of Storage
        Classes,  explain one possible solution which is implemented
        at CCIN2P3 and discuss the pros ans cons.
        Speaker: Jonathan Schaeffer (CC-IN2P3)
        Material: Slides pdf file
      • 14:00 Benchmark Updates 30'
        This talk will present the current state of the art of
        benchmarking at
        CERN. We will explain our benchmarking procedures, review our 
        latest results and talk about where we are going from here.
        As part of the results review, we will comment on the
        current CPU trends and we will talk about the increasingly
        important power consumption.
        Speaker: Helge Meinhard (CERN)
        Material: Slides powerpoint file
      • 14:30 Recent Fabric Management Improvements at CERN 30'
        This talk will describe some improvements to the monitoring
        and management of the storage and CPU services in the
        following areas
        - use of SMART for disk monitoring
        - integration of disk server monitoring and storage system
        - transmission of Grid job memory requirements to the local
        workload management
        Speaker: Tony Cass (CERN)
        Material: Slides powerpoint file
    • 15:00 - 17:15 Cyber Security/Authentication
      • 15:00 Coffee break 30'
      • 15:30 The Stakkato Intrusions 45'
        During 15 months, from late 2003 until early 2005, 
        hundreds of supercomputing sites, universities and 
        companies worldwide were hit by a series of intrusions, 
        with the perpetrator leapfrogging from site to site using 
        stolen ssh passwords. These are collectively known as 
        the Stakkato intrusions, and includethe Teragrid 
        Incident and the Cisco IOS source code theft, both of 
        which received widespread attention from the media. 
        This talk will cover case studies of performed intrusions, 
        an analysis of why Stakkato could be so successful, and 
        the story of how the suspect was finally tracked down 
        and caught.
        Speaker: Leif Nixon
        Material: Slides pdf file
      • 16:15 Network Security Monitoring with Sguil 30'
        Most mid- or large-sized organizations conduct some sort of
        network monitoring for security purposes.  Traditional
        Intrusion Detection Systems (IDS) tell only part of the
        story, leaving analysts to perform complex and
        time-consuming data-mining operations from multiple sources
        just to answer simple questions about IDS alerts.  This talk
        presents a more efficient model that uses the open source
        Sguil software to optimize the process for analyst time and
        Speaker: David Bianco (Jlab)
        Material: Slides pdf file
  • Thursday, October 12, 2006
    • 08:00 - 08:30 Continental Breakfast
    • 09:00 - 16:30 Grid Projects
      • 09:00 GridX1: A Canadian Computational grid for HEP Applications 30'
        GridX1 is a Canadian computational grid which combines the
        shared resources of several Canadian research institutes for
        the primary purpose of executing HEP applications. With more
        than two years of production experience, GridX1 has
        demonstrated the successful application of Globus Toolkit
        (GT) v.2 cluster gatekeepers managed by a Condor-G resource
        brokering system. A novel feature of the project was a
        resource brokering interface to the LHC Compute Grid, which
        was used during Data Challenge 2 to route ATLAS jobs to the
        Canadian resources without having dedicated Compute Elements
        at each cluster. Further, independent Condor-G resource
        brokers have been implemented to manage the Canadian ATLAS
        and BaBar MC production systems. Finally, our recent efforts
        have been directed toward building a service-oriented grid
        using GT4, including a WS-MDS registry service and WS-GRAM
        enabled metaschedulers built upon Condor and GridWay.
        Speaker: I. Gable (University of Victoria/HEPnet Canada)
        Material: Slides powerpoint file
      • 09:30 GridPP 30'
        GridPP is a UK e-Science project which started in 2001 with
        the aim of devloping and operating a production Grid for UK
        Particle Physicists. It is aligned with the EGEE
        infrastructure and the WLCG Project but also worsk with
        current running experiments and theorists. GridPP aims to
        provide an environment in which all UK particle physcists
        can do their analysis, share data, etc, and the UK can also
        contribute to the worldwide collaboration and activities of
        their experiments .
        Speaker: John Gordon (CCLRC-RAL)
        Material: Slides powerpoint file
      • 10:00 Coffee break 30'
      • 10:30 The EGEE Grid Infrastructure 30'
        The EGEE grid infrastructure is in constant production use
        with significant workloads, not only for High Energy Physics
        but for many other scientific applications.  An overview of
        the EGEE project, the infrastructure itself, and how it is
        being used will be given.  Several applications rely on a
        long term infrastructure being in place; the current ideas
        of how this may be achieved will be discussed.
        Speaker: Ian Bird (CERN)
        Material: Slides powerpoint file
      • 11:00 Virtual Machines in a Distributed Environment 1h0'
        Speaker: Mauricio Tsugawa (University of Florida)
        Material: Slides pdf file
      • 12:00 Lunch 1h30'
      • 13:30 Issues and problems around Grid site management 1h0'
        The problems of grid site reliability and availability are
        becoming the biggest outstanding issue in building a
        reliable grid service.  This is particularly important for
        WLCG where specific reliability targets are set.  This talk
        will outline the scope of the problems that need to be
        addressed, and point out potential areas where HEPiX members
        can contribute, and will seek input on how we can address
        some of the problems.
        Speaker: Dr. Ian Bird (CERN)
        Material: Slides powerpoint file
      • 14:30 FermiGrid - Status and Plans 30'
        FermiGrid is the Fermilab Campus Grid.  This talk will
        discuss the current state of FermiGrid and plans for the
        upcoming year.
        Speaker: Keith Chadwick (Fermilab)
        Material: Slides powerpoint file
      • 15:00 Coffee break 30'
      • 15:30 Open Science Grid Progress and Vision 30'
        This talk will detail recent Open Science Grid progress and
        outline the vision for the upcoming year.
        Speaker: Keith Chadwick (Fermilab)
        Material: Slides powerpoint file
      • 16:00 Grid Security in WLCG and EGEE 30'
        This talk will present the current status, plans and issues
        for Grid Security in WLCG and EGEE. This will include
        Authentication, Authorization, Policy and Operational Security.
        Speaker: David Kelsey (CCLRC/RAL)
        Material: Slides powerpoint file
  • Friday, October 13, 2006
    • 09:00 - 09:30 Continental Breakfast
    • 09:30 - 10:00 Grid Projects II
      • 09:30 Testing the UK Tier 2 Data Transfer and Storage Infrastructure 30'
        When the LHC experiments start taking data next year the
        Tier 2 sites in the UK (and elsewhere) will need to be able
        to recieve and transmit data data at unprecidented rates and
        reliabilities. We present the efforts in the UK to test the
        disk to disk transfer rates between Tier 2 sites along with
        some of the lessons learnt and results obtained.
        Speaker: Chris Brew (CCLRC - RAL)
        Material: Slides powerpoint file
    • 10:00 - 10:30 IHEPCCC 30'
      Speaker: Randy Sobie
      Material: Slides pdf file
    • 10:30 - 11:00 Closing Comments