Rob Ross, Peter Van Gemmeren, Doug Benjamin, Chris Jones, Paolo Calafiura, Philippe Canal, Matthieu Dorier, Shane Snyder, Suren Byna, Torre Wenaus, Rob Latham, Saba Sehrish
Topics for future calls:
- RobL: HPC I/O, how we think about it
- Chris: CMS production workflows, or "I/O usage in CMS multi-threaded framework"
- Philippe/Jakob: More details on ROOT including rntuple stuff
- Shane: Darshan, what it is
- Matthieu and Saba: HEPnOS and Mochi, what they are
- Doug/Torre: ATLAS Simulation w/ and w/out EventService, including (or additionally) Fast-Simulation, Fast-Chain
- ?: ROOT and its use in ATLAS, CMS, and DUNE (?)
- ?: What IRIS-HEP is doing re: alternative data formats?
- ?: DAOS?
Milestones:
First quarter:
- documentation of patterns
- get to know one another
Second quarter:
- performance of HEP experiment benchmarks
- using Cori for ATLAS, maybe, or maybe on the Grid...
- ATLAS Simulation w/out EventService
- ATLAS EventService Simulation (fine-grained (event-wise) processing)
- instrument ROOT I/O patterns
Experiment use cases:
- Because IRIS-HEP is covering analysis, we should focus on "production" workflows.
- simulation
- full simulation (easy), fast simulation (hard), to be presented, discussed.
- reconstruction
- derivation -- when they write the physics products -- maybe?
- nail down 3 (or maybe 4) specific use cases
- not "look at everything"
- Q: what's the appropriate CMS one?
- Chris: reconstruction (maybe): something we want to do well
- something from DUNE?
---
PVG: HEP Experiment and ROOT I/O
files have "compressed baskets" of a tree
1. read compressed baskets
2. decompress baskets -- typically have data from multiple events/entries
3. deserialize into an object, creates "persistent state" or a "transient state"
3.a if you got persistent state, then convert to transient state (TP conversion)
most of this is ROOT. CMS doesn't do any TP conversion.
Compression:
- lossless
- some type conversion for reducing fidelity, separate from this, done in serialization
- sometimes more than this, aware of ranges and such. all done in serialization, bit packing.
- but this is unusual.
Serialization:
- decomposition is the job of the Streamer
- every class has a Streamer
- ROOT writes class descriptions with data
- Streamerinfo list is used to decode an object
- splitting into TBranches - decides how member data is meshed into branches
- can put in a single branch or split across many branches
- structs of arrays vs. arrays of structs
T/P Conversion -- ATLAS specific
- not ROOT specifically
- some experiments use simpler persistent state objects to capture more complex transient classes
- also helps with schema evolution or custom/domain specific compression
CMS started with a policy that the files generated by the framework should be easily readable without a lot of extra stuff, thus no T/P conversion. File format is meant to be directly readable.
Suren: typical # of baskets, all read?
PVG: Varies widely between different products: 10s-1000s branches. ATLAS and CMS similar. To be followed up.