Name: ROOT I/O Workshop, Early Spring 2018
Start: 2018-02-21T16:00:00+01:00
End: 2018-02-21T19:50:00+01:00
Location: No location set

Hide

Attendees: Brian, Chris Jones, Peter V.G., Jim P, Philippe C., David M., Guilherme A, Danilo P, Liz S., Maciej Szymanskim Mikolaj Krzewickim, Matevz, Xavi, Axel, Andrei, Mihaela, Marcin, Oksana, Enric, Peter H., + at least one more.

Peter and Brian are pointing problems with the AsyncPrefetching either dead-locks or corrupted buffers.

Peter: we are very interested in using this AsyncPrefetching and thus helping with the debugging.

Brian: we ought to have a miss-cache that then load all missing baskets.

PR 240 should be able to be merged in. Need rebase and retest.

Brian: If we extend the default we should add a way to auto-disable it if the I/O operation are fast enough. Maybe keeping a Exponential Moving Average and if below 1ms disable. Maybe decide once a cluster.

Philippe: If there is ’one’ long I/O for a given file, you may want to keep the TC on even if some (most) operation are faster.

Brian: With the Prefill now exists, should we redo the training for each file?

Philippe: The penalty can be large for small selection on low bandwidth link. Maybe if we are keeping more statistic we can make an inform decision (don’t do retraining for low-bandwidth)

Peter: If there was mis-cache, this is a good indication you should do retraining.

Brian: Change “drop-behind” behavior

Peter: Yes, David Clark introduced this feature.

Brian: Should we also optimize for more than one tree per file?

all: this is really a framework level use case.

David M: Is Oksana already in contact with the ATLAS I/O performance inverstigators? If not, then we ought to put them in contact. There is meeting at CERN regarding that the first week of March.

Jim: Compared to parquet, Root ‘lose’ in the size of boolean when uncompressed (8 vs 1 bytes). Also more meta-data in ROOT (to allow multiple schema in same file).

Jim: Conclusion parquet is actually very similar to ROOT, it produces smaller files but slower.

Chris: Need to avoid the repetitive writing of the partial TTree.

Peter: we need to add asynchronous prefetch to the I/O POW.

Peter: In the TBufferMerger we need to have a way to know which entry number we are at.

There are minutes attached to this event. Show them.