Notes from the framework discussion session:


What are the gains? (slide 2 of framework discussion)

CMS is now concerned with more than just the memory use of the applications.  Other important issues are the management of many files, I/O, and the scaling of the batch systems to handle the number of jobs they want to have running and to adequately schedule multiple cores on nodes.  Multicore job still appear to be causing disruptions in service due to overloading of the I/O systems on the farm nodes.  They are still hopeful that the multicore processing will yield performance very close to the current single-threaded applications (which cannot yet be matched).  They are also targeting LS1 and LS2.

Some of the factors that are meant by performance of the applications:
	1.	maximizing throughput - many speakers showed this to indicate performance
	2.	minimizing response time - this was only mentioned indirectly and not really quantified in regards to utilizing more GRID resources than are utilized now.
	3.	minimizing resource use - memory appears to the big one here for LHC experiments
	4.	maximizing utilization of the computing resources - this was sometimes discussed in other sessions with regards to algorithm develop using vectorization

ATLAS pointed out that memory use is going to be an even bigger problem with higher luminosity. They mention again that whole-node processing using the current application and a python script is straightforward and performs well. They also brought out initialization as an impediment to good multi-process application performance.  They said that this might be one of the reasons for going multi-threaded, and it might be the primary reason.  

Where these framework fit into the whole picture was discussed:
The usefulness of these HEP frameworks was confirmed during the conversation. They still have an important role of scheduling units of work, managing data and provenance, abstracting out the I/O, and providing APIs to assemble processing sequences at the right granularity.  As we move into accelerators and co-processors, everyone that spoke up at the meeting thought that these framework features would continue to be useful.  There were no strong opinions that these frameworks would be made available to run on the reduced platforms (co-processors and accelerators).

Nearly everyone believes that multithreading within the framework is useful and necessary, partly to increase throughput by better I/O scheduling and operation at the event level across the processing of a dataset instead of at a file level, and also to share other resources that consume memory.  

Where is more work needed? (slide 3)

ROOT thread safety issues were discussed.  The ROOT team claims that the replacement for CINT will help with this.  ROOT mentioned the possible need to utilize threading resources for managing I/O behind the scenes.  Most groups did not particularly like this.  CMS suggested having ROOT invoke an experiment-specific function to request tasks to be scheduled by the framework using its work scheduling mechanism.  CMS also said that I/O has be synchronized and need to be able to read from the same event. 
Intensity frontier said they want the ability to write multiple event processing streams into the same file concurrently.  ROOT suggested using an client-server model with a node to serial the work down to one running process that can write the data. Development is needed for this, but it fits well into the current ROOT model.  
Others pointed out that histograms and other operations on collections are affected by the course locking that ROOT either employs or is planning on employing.  
CMS suggested that ROOT needs to document what is const and what is really not const under the covers (mutable) and what the side effects are.
Interacting with the type system should follow the C++ standard and be thread safe.  ROOT team says it is a lot cleaner than it used to be.

The number of cores on “big core” machine has not been increasing as was thought last year. We are not even seeing 64 “big core” machine in use. In fact, the reasonably priced Intel machine might have < 16 real cores.  What this could mean is that the smaller, more specialized computing resources will become very much more important.  Our frameworks will need to be able to efficiently schedule and manage work carried out by Xeon Phi and K20 and beyond GPUs.

Performance? (slide 4)

Many are still looking for better tools to evaluate application performance coding and better means of learning to use these tools. OpenLab is looking for additional use cases for a benchmarking suite.
We are need of standard ways to measure performance as mentioned earlier and to characterize gains. We must be careful to retest on the standard “big core” after algorithm changes are made, because many times the applications run faster on all platforms.  

The best number of tasks to generate and number of threads to allocate is still largely unknown.  The framework seem to be putting controls in for adjusting these parameters for various workloads and scenarios.  

Summary items

The area of performance studies and good measures of performance increase is an area that we can work together on.  
Clearly stating the requirement that will be necessary for efficient I/O is another area we can work together on.
The framework have a much clear direction than last.  The performance increases of multithreading on big core processors is still a hope at this point and more work is necessary to demonstrate good results. The prototype/demonstration systems are starting to move into real projects.  One major area that need to be added to make things more real is I/O, which is expected to be a bottleneck.