Weekly CCE-IOS tele-conference

US/Central
Peter van Gemmeren (ANL), Rob Ross (ANL)
Description
BlueJeans Link: https://bluejeans.com/102100194

Attending (as of 2:02): RobR, Chris Jones, Doug Benjamin, Ken Herner, Lisa Goodenough, Patrick Gartung, Peter VG, Saba Sehrish, Shane Snyder, Suren Byna, Tammy Walton, Torre Wenaus, Kyle Knoepfel, Salman Habib

Management News

Salman: Nothing much, keep going. Will follow up with institutional leads on funding.

FNAL New Folks

Q: What should they be working on?

FNAL: HDF

Ken Herner: DUNE representative

Kyle, Marc Paterno, Lisa, and Tammy met this morning and discussed HDF work.

Kyle: Muon G-2 is interested in using HDF5.

  Currently use ROOT. Lisa is interested in analysis support.

  Tammy interested in framework/workflow aspects.

  Distinct use cases, both with HDF5.

PeterVG: Mostly been looking at HDF from the framework perspective.

  Single HDF file from multiple clients, for example.

  That said the code Saba has been involved with could be useful in a variety of contexts, but still rely on ROOT for serialization.

Kyle: Marc and Saba have been looking at HDF in the NoVA context.

  Boundary between reconstruction and analysis is a little more blurry than in some other cases.

PeterVG: Are we broadening beyond the use of HDF as _intermediate_ storage?

Kyle: Interested in understanding how analysis is different when HDF is a part of things.

PeterVG: Current work is rather generic, lots of possible directions.

FNAL: Darshan

Patrick: Got accounts settled, prepared to start doing some data gathering.

  CMS workflow with Darshan and various backend file systems.

Presumably these activities will all end up merging with the other groups working within the IOS effort.

Ken: Got it built, playing around with DUNE and LarSoft as relate to Darshan.

  Running at NERSC.

  Fighting with Latex to get plots.

  In discussion with Shane also.

Chris: Constraints on I/O from HEP Data Processing (see slides)

Thoughts on how HEP data processing frameworks (esp. for CMS) want to interact with the FS, constraints and opportunities.

Multi-core

- CMS uses threads

- Atlas uses multi-process but is adding multi-threaded for Run3

- Play to address CPU memory

- Some issues with running out of memory and needing to be too conservative with respect to memory availability.

- Lots of immutable stuff that can be shared across events.

- Also some mutable data (e.g., memory buffers for I/O) shared via synchronization.

  PeterVG: This is the bigger issue in their (ATLAS MP) cases as each process has their own I/O buffers.

Doug: How many threads are usually used?

Chris: 8 threads is common. Have run up to 32 threads on some other systems.

- Also applying multiple threads to a single event, but haven't figured out how to use more than ~1.5 threads for working on a single event.

Interval of Validity (IoVs)

- Calibration data, etc. (the immutable stuff) is only valid for particular spans of time (events).

- Want to bound the number of IoVs that are "open" at any time.

- Constrains what events one can work on at a time.

IoV Example

- "C1" is an IoV

- Time moves left-to-right in the figure.

Rob: How many IoVs can you keep in memory at a time?

Chris: Typically hundreds of these things, but in practice they work on maybe two of these groups at a time. Thousands of events in a single group ("IoV boundary").

- They don't actually change that frequently. Good to be aware of this but not a huge deal at this time.

Structure of Event Data

- Blue blob is an event

- Columns are data products

- Framework accesses data products individually

- Up to two orders of magnitude in size difference across data products

- Also vary between events

Rob: How big is an event?

Chris: A few megabytes in memory. On disk it could be less due to compression (2-10x compression).

- 100-200MB/event to process, including other information.

Suren: How are data products structured?

Chris: They're complex C++ data structures.

Data Requests per Event

- Frameworks schedule algorithms when data is available

- Some event data (i.e., data products) isn't needed every job (e.g., debugging data)

- Not all needed at the same time (i.e., for the same algorithm)

- Algorithms being applied to an event are allowed to be run concurrently

Concurrent Event Processing

- Framework processes multiple events concurrently

- Algorithms might process events in different orders

- Events process at different rates (any given algorithm:event pair)

  - So things are available in different orders

- Forcing all of event 1 being processed before starting event 2 is a current bottleneck.

Storage Opportunities

- Write events out of order

- Read events out of order

- Write data products out of order

- Read data products out of order

- Do concurrent read/write of events and data products

Storage Opportunities 2

- compress/decompress concurrently

- serialize/deserialize concurrently

- read/decompress/deserialize as separate steps

- serialize/compress/write as separate steps

ROOT Storage

- ROOT uses the term branches

- data products for multiple events are stored together, compressed together

- events can be grouped into baskets

- data products must store data for events in the same order

There are minutes attached to this event. Show them.
    • 14:00 14:05
      Management News 5m
      Speakers: Paolo Calafiura (LBNL), Dr Salman Habib (Argonne National Laboratory)
    • 14:05 14:10
      Introduction 5m
      Speakers: Dr Peter van Gemmeren (ANL), Rob Ross (ANL)
    • 14:10 14:15
      Update: Darshan for ROOT I/O in HEP workflows on HPC 5m
      Speakers: Christopher Jones (Fermilab), Doug Benjamin (ANL), Kenneth Herner (Fermilab), Patrick Gartung (Fermilib), Shane Snyder (Argonne National Laboratory)
    • 14:15 14:20
      Update: Investigate HDF5 as intermediate event storage for HPC processing 5m
      Speakers: Kyle Knoepfel (Fermilab), Lisa Goodenough, Dr Peter van Gemmeren (ANL), Saba Sehrish (Fermilab), Suren Byna (LBNL), Tammy Walton (Fermilab)
    • 14:20 14:35
      Discussion on new activities 15m
      Speaker: All
    • 14:35 14:55
      Constraints on I/O from HEP Data Processing 20m
      Speaker: Christopher Jones (Fermilab)