Audio connection will be opened with ReadyTalk:
Conference code: 8867778
Phone numbers: https://www.readytalk.com/account-administration/international-numbers
The discussion concerns the use of structures to represent specific types of data in LArSoft.
Interest has been expressed by elements from a broader community, that could yield a wider forum. This discussion will keep it in mind, but will focus on LArSoft only. The outcome may be of use in that wider forum.
Three areas were proposed that could yield a specific recommendation. Robert Kutschke has suggested to add a fourth.
It was agreed that while the fourth item be out of the scope of the present recommendation, it needs to be kept in mind.
The incorporation of the first two items was rejected on the ground of being distinct enough that accommodating both with a single library would come with a risk of unnecessarily degrading one or both of the areas. The risk is considerable: the convened could identify only two libraries that explicitly support the area 2, both developed by the physics community. This does not preclude a scenario where the two areas are eventually satisfied with the same library. It is also conceivable to provide the missing features bridging from a 4D vector in Cartesian metric to one in a Minkowski metric by specific functions.
The area 3, linear algebra, should be weighted toward small data structures, as most of the applications do not go beyond rank 2 (matrices) and dimension 5. It is still possible to deal with exceptions as such, by using a specific library for a specific case, if the benefit is overwhelming.
A open list of candidate libraries was presented as a starting point for discussion. The possibility of an entirely custom, newly developed library has not been considered. It was instead suggested that custom interfaces of small, maintainable size could be developed to fill a usability gap, should such a gap manifest on a otherwise excellent library.
Most of the items in the list were quickly dismissed. For instance:
Elemental was not known to the convened, but being based on BLAS it has the same limitations.
Overall, a few libraries were selected as papable:
The conclusions of a investigation by the ATLAS experiment with a purpose similar to this one were presented at CHEP 2013 This presentation is three years old and contains some outdated information. CLHEP, Intel Math Kernel Library, ROOT and Eigen were compared on representative synthetic benchmarks. Custom implementations of matrix operations were also included, including explicit vectorisation optimisation. Their conclusions are reflected in the choice of ATLAS to replace their CLHEP code with Eigen and their mathematical function library implementation.
A open list of features was proposed to pick requirements from. Discussion elected the following as requirements:
One specific characterisation of portability is that the binary distribution based on Scientific Linux Fermi 6 should work on all the compatible Linux systems.
Another set of features was considered relevant:
Memory overhead is intended as usage of memory to store redundant data or metadata. Two examples have been enumerated. ROOT TVector3 class, deriving from TObject, has additional inherited data members that are not necessary to define the content of a 3D vector, including a pointer to the virtual table that enables TObject polymorphism. The C++ standard std::vector dynamically allocates its content, adding three pointers plus a header in the heap for common implementations. A further example was not discussed, of small-matrix optimisation used by ROOT TMatrix, that always contains 25 elements used to avoid dynamic allocation: if the size of the data is known, this approach is always non-optimal.
A point was made that memory overhead is an aspect that is often traded for execution speed. For example, storing only the upper triangular part of a symmetric matrix about halves its memory, but it might degrade the speed of its operations.
The conversion of data to a different format is often a necessity when using libraries that do not support our original format. A proposed example is Fourier transform from FFTW library, that expect its data to be stored in a contiguous area of double precision real numbers. The directness of such a conversion will typically depend on the internal data representation, and the more abstruse that is, the more likely is the need for a conversion by copy.
The ability to use ROOT-serialised classes in an environment different than art (ROOT interactive console, python) is not considered endangered, as techniques are known and acceptably convenient to overcome the issues.
The language of implementation is a moot point given the current selection of candidates, all in C++. Whether the libraries are header only or not is considered irrelevant. Support for sparse data, although not irrelevant to LArSoft use case, should not affect the decision.
It was also proposed as a criterion a judgement about the ease to write code with the library. Proper education of the community should quell the issue. Nevertheless, past history shows that has seldom happened in LArSoft. Resources should be actively devoted to a education effort proportional to the learning difficulty. Moderate usability barriers can be overcome with additional custom interfaces. This is also a balance between performance improvement, maintenance of the interface, and steepening of the learning curve.
The election of a library from the surviving candidates must be informed by tuned performance benchmarks. The suggested path is to:
The identification of representative cases was quickly done at the meeting. MicroBooNE detector electronics response and the following reconstruction are good use cases. The detector simulation is potentially dominated by Geant4, on which we don't have leverage, and it was therefore discarded.
For the isolation of the relevant components a profiling procedure has been suggested that counts the calls to relevant objects (vectors, matrices, etc.). The call stack can point to the originating code, from where the usage pattern can be read.