Attending: Peter Van Gemmeren, Saba Sehrish, Phillippe Canal, Torre Wenaus, Tammy Walton, Doug Benjamin, Liz Sexton-Kennedy, Chris Jones, Suren Byna, Ken Herner, Patrick Gartung, Paolo, RobR, Shane Snyder
**Management.** Defining a policy for publications. Will see an email real soon; there's a draft.
Q (PVG): Is there going to be a general CCE meeting?
A (Paolo): In discussion. Not sure yet.
**Darshan for ROOT I/O.**
Ken Herner: Put some slides together.
Background:
- DUNE uses LArSoft, which is based on Art.
- Event generation -> Geant4 -> detector sim/noise -> recorded
- Each stage runs the same "lar" executable with different config file (a ".fcl" or "fickle" file).
- For this test everything is in the same "job"
- All the data is in CVMFS (https://docs.nersc.gov/services/cvmfs/)
Darshan:
- Installed v3.2.1 in DUNE area with non-MPI mode
- in a shifter container
- simple bash script to run each stage serially
- copy Darshan files to laptop, darshan-merge, then job summary
- very preliminary!
- note: didn't compile with Lustre support
- so missing some striping info, etc.
- but that's fine for now.
Summary of a synthetic thing:
- Lots of small reads using STDIO
- POSIX accesses are dominated by 0-10K reads
- Lots of 8191 byte reads (?)
- Mostly sequential/consecutive operations on the read side of things.
- In a "real" production job, would see more stuff, maybe more little stuff, etc.
- Some question about the veracity of the output stage write total (p.3 of the summary)
- Going to share data with Shane and see if he can deduce what might be up.
- Output should have been 10s of MBs?
Will run something larger / more real once we have had a closer look at discrepancies in this run.
Doug Benjamin:
- Raythena Scheme on Cori KNL and ANL
- Image of how ATLAS runs the next-generation event service
- Fine-grained simulation
- Tested at NERSC and LCRC
- Something on the edge that gets information from PANDA
- Pilot does monitoring on each node running the Ray Actor and the computational payload (inside a container via shifter / singularity)
- Not seeing the I/O behavior inside the container at this time. Do see the behavior all around it.
- Lots of files being opened (will need to filter).
- Have built a new container with Darshan within it, hopeful that this will work now.
- Shane notes that there's a ton of darshan data when looking only at outside the container, but we're not seeing (or weren't) what was going on inside the container. So this new approach is promising.
- First time we've tried to mix this inside- outside- container model like this.
- But then Ken succeeded...
- But it was Shifter and not Singularity, and run interactively
Doug: Python script calls Bash script that starts Singularity. Inside that is AthenaMP.
Patrick: Plan to mess with Darshan but have not, yet.
**HDF for Intermediate Results.** Saba and Suren working on this.
Saba:
- Some updates, haven't uploaded slides yet, some updates to the slides not complete.
- Trying to write data products. Using HighFive API to write to HDF files. Two datasets per data product currently.
- Writing events has been implemented, working on re-reading and validation of the writing code.
- Looking at H5CPP as an alternative. Have initially discussed with the author. Have been able to use this as a write path for trivial tests as well.
- Parallelism comes later.
Some discussion of next steps. Peter interested in some testing.
**Constraints on I/O Discussion.** Ran out of time for discussion that day (Chris's presentation from maybe two weeks ago).
Peter has some slides:
- multi-threading to save memory
- multi-process doesn't do it, still soaks up 1-2GB/process more than necessary.
- See slides for additional details.
Phillippe: Some of the issues are being addressed as part of RNtuple work.