Please read these instructions before posting any event on Fermilab Indico

The FERMI(FNAL) network authorization method has been removed. See news for more details.

Weekly CCE-IOS tele-conference

Peter van Gemmeren (ANL), Rob Ross (ANL)
BlueJeans Link:

Attending: Peter Van Gemmeren, Saba Sehrish, Phillippe Canal, Torre Wenaus, Tammy Walton, Doug Benjamin, Liz Sexton-Kennedy, Chris Jones, Suren Byna, Ken Herner, Patrick Gartung, Paolo, RobR, Shane Snyder

**Management.** Defining a policy for publications. Will see an email real soon; there's a draft.

Q (PVG): Is there going to be a general CCE meeting?

A (Paolo): In discussion. Not sure yet.


**Darshan for ROOT I/O.**

Ken Herner: Put some slides together.


- DUNE uses LArSoft, which is based on Art.

- Event generation -> Geant4 -> detector sim/noise -> recorded

- Each stage runs the same "lar" executable with different config file (a ".fcl" or "fickle" file).

- For this test everything is in the same "job"

- All the data is in CVMFS (


- Installed v3.2.1 in DUNE area with non-MPI mode

- in a shifter container

- simple bash script to run each stage serially

- copy Darshan files to laptop, darshan-merge, then job summary

- very preliminary!

- note: didn't compile with Lustre support

                - so missing some striping info, etc.

                - but that's fine for now.


Summary of a synthetic thing:

- Lots of small reads using STDIO

- POSIX accesses are dominated by 0-10K reads

                - Lots of 8191 byte reads (?)

- Mostly sequential/consecutive operations on the read side of things.

- In a "real" production job, would see more stuff, maybe more little stuff, etc.

- Some question about the veracity of the output stage write total (p.3 of the summary)

                - Going to share data with Shane and see if he can deduce what might be up.

  - Output should have been 10s of MBs?

Will run something larger / more real once we have had a closer look at discrepancies in this run.


Doug Benjamin:

- Raythena Scheme on Cori KNL and ANL

- Image of how ATLAS runs the next-generation event service

                - Fine-grained simulation

                - Tested at NERSC and LCRC

                - Something on the edge that gets information from PANDA

                - Pilot does monitoring on each node running the Ray Actor and the computational payload (inside a container via shifter / singularity)


- Not seeing the I/O behavior inside the container at this time. Do see the behavior all around it.

- Lots of files being opened (will need to filter).

- Have built a new container with Darshan within it, hopeful that this will work now.


- Shane notes that there's a ton of darshan data when looking only at outside the container, but we're not seeing (or weren't) what was going on inside the container. So this new approach is promising.

- First time we've tried to mix this inside- outside- container model like this.

                - But then Ken succeeded...

                - But it was Shifter and not Singularity, and run interactively


Doug: Python script calls Bash script that starts Singularity. Inside that is AthenaMP.

Patrick: Plan to mess with Darshan but have not, yet.


**HDF for Intermediate Results.** Saba and Suren working on this.


- Some updates, haven't uploaded slides yet, some updates to the slides not complete.

- Trying to write data products. Using HighFive API to write to HDF files. Two datasets per data product currently.

- Writing events has been implemented, working on re-reading and validation of the writing code.

- Looking at H5CPP as an alternative. Have initially discussed with the author. Have been able to use this as a write path for trivial tests as well.

- Parallelism comes later.

Some discussion of next steps. Peter interested in some testing.


**Constraints on I/O Discussion.** Ran out of time for discussion that day (Chris's presentation from maybe two weeks ago).

Peter has some slides:

- multi-threading to save memory

- multi-process doesn't do it, still soaks up 1-2GB/process more than necessary.

- See slides for additional details.

Phillippe: Some of the issues are being addressed as part of RNtuple work.

There are minutes attached to this event. Show them.
    • 1
      Management News
      Speakers: Paolo Calafiura (LBNL), Dr Salman Habib (Argonne National Laboratory)
    • 2
      Speakers: Dr Peter van Gemmeren (ANL), Rob Ross (ANL)
    • 3
      Update: Darshan for ROOT I/O in HEP workflows on HPC
      Speakers: Christopher Jones (Fermilab), Doug Benjamin (ANL), Kenneth Herner (Fermilab), Patrick Gartung (Fermilib), Shane Snyder (Argonne National Laboratory)
    • 4
      Update: Investigate HDF5 as intermediate event storage for HPC processing
      Speakers: Kyle Knoepfel (Fermilab), Lisa Goodenough, Dr Peter van Gemmeren (ANL), Saba Sehrish (Fermilab), Suren Byna (LBNL), Tammy Walton (Fermilab)
    • 5
      Follow Up: Constraints on I/O from HEP Data Processing
      Speakers: Christopher Jones (Fermilab), Dr Peter van Gemmeren (ANL), Philippe Canal (FERMILAB)