Attendees: Rob Ross, Chris Jones, Liz Sexton-Kennedy, Matthieu Dorier, Peter van Gemmeren, Suren Byna, Shane Snyder, Rob Latham, Philippe Canal, Paolo Calafiura., Saba Sehrish, Doug Benjamin
Last couple of months have been an opportunity to get to know each others' work, tools, etc. Next step is to dig more concretely into activities. Also need to work out staffing levels.
1) Darshan and ROOT (really, Darshan capturing workflow I/O behavior)
Some work to make all this work. Doug and Shane have already begun discussing this for the ATLAS use case. Eventually we would want to broaden this to cover more HEP workflows, but this specific use case is our initial driver. Big issue from Doug's perspective at the moment is that I/O might be happening from a fork()ed process (for ATLAS in AthenaMP).
Chris: CMS doesn't do that, so that might be an easier use case (CMS uses multi-threading).
- Some work on ANL/CS side to consider what to do re: fork(), assuming that isn't easy to avoid.
Folks that are interested in this early activity:
- Doug Benjamin
- Shane Snyder
- Chris Jones - at 10%, can help contact CMS people if that's the way we want to go. (Chris may have additional help).
- Philippe Canal - keep him in the loop
- Liz Sexton-Kennedy also in the loop
Shared library loading mechanisms behave differently in a fork() context as compared to a freshly launched process.
TBD: Ensure that Shane is charging to HEP-CCE.
Perhaps we build Darshan into the CMS container in the mean time.
- Doug builds at home (for ATLAS)
- Converts to singularity
- Moves to Brookhaven
TODO: Create a new mailing list for this? HEP-CCE-IOS-Darshan ?
- Start with the existing list and see.
2) HDF5 for HEP simulation
Focus on intermediate data products.
People interested in this:
- Suren Byna
- Saba Sehrish
- Peter van Gemmeren
- Paolo C. - keep him in the loop
Want to think about how to map HEP data into appropriate HDF datatypes. Would be looking at simulation. Relatively complex objects.
Saba: Have some experience with this. From CMS side, experiences w/ "bacon bits". Columnar data.
Liz: Is it a jagged array?
Saba: Not jagged array, multiple tables, connections back to event table. loose "database-like" model.
Saba doesn't have time allocation for this work at this time.
Liz will circle back regarding funds and effort.
Question of whether to take a "blob" strategy or a "table" strategy. Conditions world uses blobs effectively. Taking C++ objects, using ROOT serialization, dump into (whatever). This is a good point to discuss and investigate, will be workflow dependent.
Liz will find effort on the FNAL side.
Discussion of relationship between HEPnOS and this work.
- easy to do blobs, tables
- have done tracks and hits (using boost serialization)
- Looking at ICARUS production chain too
Saba: Also doing some NoVA data work, putting into HDF files under SciDAC
Some discussion of issues related to multi-threaded I/O as compared to multi-process I/O.
ROOT is working on concurrent I/O. Philippe will report next week.