Weekly CCE-IOS tele-conference

US/Central
Peter van Gemmeren (ANL), Rob Ross (ANL)
Description
BlueJeans Link: https://bluejeans.com/102100194

Attended: Paolo Calafiura, Salman Habib, Rob Ross, Peter van Gemmeren, Matthieu Dorier, Saba Sehrish, Doug Benjamin, Jakob Blomer, Philippe Canal, Chris Jones, Rob Latham, Liz Sexton-Kennedy, Torre Wenaus

 

Management News

Salman: Setting up a set of slides for HEP-CCE, might want something from us, will follow up outside the call.

Paolo: Deadline for those is 2-3 weeks.

 

HEPnOS

Matthieu and Saba

Slide 2) HEP terms, event, runs...

Slide 3) HEP: data processing: traditionally file-based processing.

Slide 4) Trying to avoid writing of intermediate data into the file system

Slide 5) Thinking about how to reorganize data and access around the domain science and HPC platforms, rather than grid computing infrastructure. demonstrating with real HEP applications.

Slide 6) Mochi: components and methodology. HEPnOS is a Mochi product.

- "object store"

- multiple instances of in-memory databases, stores C++ objects (using Boost)

- captures the hierarchical organization implicit in HEP datasets (or at least the ones we're working with)

- tools for conversion and loading of other formats (so far HDF5)

Slide 8) HEPnOS: data organization

Datasets, Runs, Subruns, and Events are types of "Containers"

identified by a number

can contain "products" -- instances of C++ objects, identified by C++ type and a label

Slide 9-13)

Example (DataSet root, not CERN ROOT)

Makes things look similar to a std::map, RPC is all hidden.

template serialize method is used for serialization, typical Boost use.

Slide 14)

Other features:

- AsyncEngine - I/O in the background, configurable threads

- WriteBatch - batching of products, configurable, works with AsyncEngine (background flush)

- Prefetcher - containers and products prefetched automatically, configurable batch/cache sizes, works with AsyncEngine

Slide 15)

HDF2HEPnOS - python program that takes an HDF5 file, creates C++ class for read and boost serialization specific to the FNAL data

HEPnOS-Dataloader - compiled against the C++ classes from above, uses work queue to read HDF5 and store, distributed

Slide 16)

Application view of HEPnOS

- distributed data service for managing HEP data

- makes data products available for multiple phases of workflow

- accelerates access

- global view of data across nodes, removes FS overheads/artifacts

Slide 17)

HEPnOS and Framework

- goal is to have integration with framework so that HEPnOS is the source/sink over the course over the workflow without having to modify user code

Slide 18)

LArSoft data products used by DUNE

- used art-related software

  - gallery

- used some of the data product classes from LArSoft, not using art directly, yet.

- docker containers

Slide 19)

ICARUS

- extending to an ICARUS workflow (WIP)

- goal is to have multiple stages working with HEPnOS rather than file-based intermediates

- last step should be an external tool (this has been demonstrated elsewhere)

- data exported into whatever longer-term format at end of workflow (e.g., ROOT, HDF)

- lots of variance in execution times of different steps

Slide 20)

HEPnOS for an analysis application

- NOvA's "4th analysis" of neutrino candidate selection

- took the code, DIY-based MPI parallel application

- selection code from NOvA unchanged (NOvA CAFAna code, I think)

- data in HDF5

- NOvA code won't run natively on ALCF/NERSC, using containers. HEPnOS runs native.

- significant engineering to set all this up.

- are running on Theta, working through configuration issues now. Lots of tuning to do.

 

Discussion:

Chris Jones: How would HEPnOS interact with a multi-threaded application?

Rob: Distributed service, happy to have multiple clients simultaneously requesting data.

 

Doug: HEPnOS meant to run on different nodes?

Matthieu: Yes, but you don't have to do that. We're not limiting memory use at this time, but we could.

 

Doug: How is HEPnOS presenting data off storage?

Rob: We're doing a load phase explicitly at the moment.

Saba: For event selection, we're not persisting the output data in our demonstration. It's a small output file that could be handled simply. Longer term (e.g., ICARUS) we would explore persisting different ways.

 

Paolo: In a few weeks we will hear about the ATLAS event service. You'll see HEPnOS would fit into this model. Main issue is the Boost serialization. ROOT's persistification is more capable, they would miss some of the ROOT capabilities.

Q: How big a deal would it be to use ROOT's capabilities instead?

Matthieu: Boost serializes into buffer, backend doesn't know what is in the buffer. Also, we bypass Boost when it's plain data. Transition to ROOT doing this serialization wouldn't necessarily be a big deal.

 

Presentation plan:

- Next week is too early for ATLAS Simulation/EventService, tentatively in two weeks.

- Thanks Shane for (being volunteered) to give and overview of Darshan next week.

There are minutes attached to this event. Show them.