Weekly CCE-IOS tele-conference

US/Central
Peter van Gemmeren (ANL), Rob Ross (ANL)
Description
BlueJeans Link: https://bluejeans.com/102100194

Weekly CCE-IOS tele-conference (1 Apr 2020)
Chaired by: Ross, Rob; Dr. van Gemmeren, Peter

Attend: Rob Ross, Peter Van Gemmeren, Salman Habib, Chris Jones, Doug Benjamin, Liz Sexton-Kennedy, Jakob Blomer, John Wu, Matthieu Dorier, Philippe Canal, Rob Latham, Saba Sehrish, Shane Snyder, Suren Byna, Torre Wenaus

 

Philippe Canal: ROOT I/O

Slide 3) HEP data flow: raw (from detector) -> reco -> analysis formats -> images. Last three are typically stored in ROOT

Slide 4) "cling" is a C++ interpreter built into ROOT that interprets headers, lets them write C++ objects via serialization

Slide 6) Parallelization in ROOT refers to thread level parallelism.

Slide 9) TFile is the file class in ROOT.

- header

- records

- possible compression

- FS-like structure

- self descriptive

Slide 10) plug-in system that allows for remote access, data in SQL

Slide 11) file is mix of headers plus object data. deleted things might still take up space

- file header -- summary information, pointers to free regions, etc.

- logical record header -- information on the objects in the region, etc.

Slide 14) serialization

- lots of features in here that likely make it impossible to use another solution without giving up features (e.g., custom serialization of types, schema evolution)

Slide 18) column format

- represented by TTree, or just "tree"

- a TBranch, "branch" is a column

Slide 22) "anatomy of a file" slide speaks to this

- "cluster" is a contiguous area of the file holding an integral number of entries for all the branches (things from different columns)

  - so all the data from a set of rows.

- "baskets" are collections of data from a "branch" -- specifically a chunk of data from a single column

  - baskets tend to be single writes, range from hundreds of bytes to ones of MBs

  - clusters are often 10s of MBs (but this is customizable)

Slide 32) Fast Merge -- way of combining data from multiple files into a final ROOT file. This is done with threads. Some testing on individual drives (HD and SSD).

Slide 43) Basic MPI-based thing for doing this also (just a quick mention)

Slide 45) TFile WriteCache as a way to do aggregation. Could be a location for inserting code to do smarter writes.

FastMerge mechanism can be enhanced to collect and reorganize how the baskets are layout on the file

 

Jakob Blomer: RNTuple -- evolution of TTree I/O

"the future" -- experimental new I/O subsystem in ROOT

looking at simple(r) event models, want to understand if they can get faster performance at cost of incompatiblity

Slide 47) borrowing from Apache Arrow concepts, for example

thinking about object stores

Slide 48) storage layer knows how to get byte ranges from whatever the back end is

  notionally this would be a way for us to store RNTuple data in a Mochi-based service

  looking at DAOS, in touch with Intel folks, should be followed up with HPC experts

Slide 49) layout is similar to TTree layout

  basket -> page

  leaf -> column

  cluster -> cluster

There are minutes attached to this event. Show them.