Weekly CCE-IOS tele-conference

US/Central
Peter van Gemmeren (ANL), Rob Ross (ANL)
Description
BlueJeans Link: https://bluejeans.com/102100194

Attendees: Salman, Paolo, Peter, RobR, Philippe, ChrisJ, Doug, RobL, Jakob, Shane, Saba, Liz, Suren

** No significant management news

** Darshan/ATLAS:

- Shane: Ongoing work in the background related to fork().

  Likely done with initial prototype today.

  Caveat: May (will) double-count things if something was done prior to the fork().

  Looking at adding some additional logic to reset counters.

  Will deliver something to Doug soon, and then we can work on more elaborate changes as needed.

- Doug: Trying to get latest ATLAS event service incarnation running away from NERSC

  Doing this at Brookhaven in a container. 

  Then can add Darshan into that container.

** No updates on HDF5 at this time.

 

** Philippe re: multi-threaded ROOT I/O **

Parallelism in ROOT

Explicit parallelism: TThreadExecutor and TProcessExecutor

Implicit parallelism: RDataFrame

- Declarative parallel analysis

- TTreeProcessor - process tree events in parallel

- TTree::GetEntry ....

TFile, TTree, and parallelism (p.7)

- Underlying I/O is thread safe

- Objects are not thread safe

- One thread for one file is fine (not bound to a particular thread)

- Some operations can be run in parallel for a given TTree

  - prefetching raw bytes

  - unzipping baskets (done with TBB)

  - processing branch content (done with TBB)

- Asynchronous prefetch is something that can be used also.

  No known issues with this.

  Main use case is for remote I/O.

  ChrisJ: CMS does a lot of remote access, but not sure if this is turned on.

  Doug: Changes in WAN infrastructure have an influence on use case, such as local caches.

Writing Bottleneck (p.8)

- Some discussion of concurrent writes.

- Using TTree fill (rather than TBranch) in CMS, this helps.

- Discussion of writing into "memory files" and then merging at the end.

- PVG: ATLAS is doing similar things, working in this direction.

- ChrisJ: Have used memory files, didn't get performance they expected. Perhaps memory files are better in an environment with larger thread counts than they have today.

CMSSW Reading Bottleneck (p.9)

- Prefetch cache is part of each TTree and TFile's state

- CMSSW needs two prefetching caches

- one with few branches, one with all branches

- explicit synchronization needed

CMSSW Reading Bottleneck (p.10)

- Input module reading a branch independently

- No concurrent reading, mostly serializing reads (if I understand correctly)

CMSSW requests (p.11)

- GetEntry option for multiple branches? Could do these in parallel instead

- interface for interrogating the cache to see what's there

- Thread-safe async. interface for branch decompression

  - async. decompression exists but might not be the right API

- working on the introspection piece. 

TBufferMerger (p.16)

- ATLAS is working towards being able to use this

- worker threads putting data buffers in a data queue

** Jakob portion **

RNTuple Format (p.19)

- schema of format on this slide

- similar to the Tree.

- Page is not bound to an "entry boundary"

  - large vector can span pages

  - pages are approx. the same size

  - helps with compression/decompression

- When pages are decompressed, the contents is little-endian

  - more vectorization friendly

RNTuple Concurrency (p.20)

- Thread friendly (no global state)

- object access needs to be serialized but can be used from multiple threads

- multiple readers on a single file is ok

- async. preloading is on by default, 1 I/O thread per RNTuple reader (idle CPU thread)

- "preconditions for vectorizing loops, unconfirmed"

- parallel page decompression to TBB

- MT access to single RNTuple reader -- possible -- should save some memory

  - active clusters must be fixed

- parallel writing: one cluster per thread, append-only merging possible

 

RBR: Is append-only seen as acceptable by the science teams?

ChrisJ: order in which we read has weak semantics. some jumbling ok, although like some boundaries (luminosity blocks).

RBR: No overwriting that would make this a show-stopper?

Philippe: Tree has always been write-once. Have to copy the data to a new file to change.

ChrisJ: There's a 2018 CHEP talk about CMS's use of this stuff (uploaded to agenda).

Doug: What's the timescale for the RNTuple code?

Jakob: Good question. Want to have something for end of year. Fully hardened by Run 4.

ChrisJ: CMS will give it a try this year.

Some discussion of RNTuple on DAOS. Someone in Europe looking at this?

Monthly call with DAOS engineers. Looking at mapping of RNTuple to DAOS. CERN fellow joining ROOT team in Sept. charged with first integration of RNTuple and DAOS.

There are minutes attached to this event. Show them.
    • 11:00 11:05
      Management News 5m
      Speakers: Paolo Calafiura (LBNL), Dr Salman Habib (Argonne National Laboratory)
    • 11:05 11:10
      Introduction 5m
      Speakers: Dr Peter van Gemmeren (ANL), Rob Ross (ANL)
    • 11:10 11:15
      Update: Darshan for ROOT I/O in HEP workflows on HPC 5m
      Speakers: Christopher Jones (Fermilab), Doug Benjamin (ANL), Shane Snyder (Argonne National Laboratory)
    • 11:15 11:20
      Update: Investigate HDF5 as intermediate event storage for HPC processing 5m
      Speakers: Dr Peter van Gemmeren (ANL), Saba Sehrish (Fermilab), Suren Byna (LBNL)
    • 11:20 11:45
      Multithreaded ROOT I/O 25m
      Speaker: Philippe Canal (FERMILAB)
    • 11:45 12:00
      Discussion 15m
      Speaker: All