Notes on Topic 4 Session 1 Topics in the document: Visualization Scientific and development workflows - including human components Regression and validation software/processes New computer hardware architectures - short, medium, long term, multi-threading Software frameworks and interfaces Organization of common/shared components, including policies Don't need fully polished requirements after Session 1 on this topic, but a set of materials to get started -- seed the discussion. How to enable using new computing hardware, e.g. DOE HPC facilities Different roles, Algorithm developer, algorithm tester/validator, and science data analyst. Important takeaway from this morning's plenary talks: a common language for communicating requirements and performance. Sharing of things other than code is important to discuss. The discussion of documentation was different this morning from the discussion of code. Let's not try to combine everything into single use cases, but rather have multiple use cases, especially if the resuting requirements are orthogonal or pull in different directions. Use cases -- go in the section before reqiurements Online visualization -- multiple use cases. Checking detector performance and whether the DAQ is collecting new events. DAQ streams -- would we like to select events at the DAQ level in order to be picked later? Separate data streams for examining detector performance. Can the event display rerun reconstruction? Back up and reprocess an event? Interactive iteration of rereconstruction of the same event. What happens to the provenance in this case? During tuning of reconstruction interactively with changing parameters, what is the meaning of the output file? What is the requirement on the use of an event display for non-automated reconstruction? ICARUS -- very first step is automatic -- rejection of empty triggers and non-fiducial interactions. The next step is a visual classification, but after the objects have been selected, then automatic reconstruction of the hand-selected clusters can happen in batch mode. Visualization is important for tuning the selection and filtering. Initally, when an experiment comes online, the data are surprising. Require the ability to write out what the hand-scanner inputs in to the process. ICARUS selects hits for clusters this way and the answers are put in the output rootfile. Event displays with animation. Color matching between 2D and 3D event displays. VR-compatible. Can be hard on the users. Event displays on the web. Android apps looking at event displays. Event picker -- is it during automated reconstruction that events of interest are identified and listed? Central database of picked events? Can metadata be updated after an event has been reconstructed? Is an event picker a separate database of interesting events? They have to be aligned with reconstruction pass -- different events may be selected on different passes. Some requirements may be requirements of SAM and not LArSoft. Documentation: Limits on push without documentation. Librarian and release managers. CMS has librarians-only push privileges to central repositories. Use case -- want to be able to read a document instead of code. Don't want to throw barriers in the way of development. User guide vs. reference manual. Example -- cppreference.com's description of stl map is much easier to read than the header file. What documentation is needed to use a class or method is different from what is needed to modify or develop it. Doxygen produces neither a user guide nor a reference manual. Citation of documentation is an important incentive for algorithm designers. FERMLAB-TN numbers -- citeable. Has developer's name on the memo. Even if it's not possible to send to NIM. Use case: mcshower is not really appropriate for NIM but FERMILAB-TN is possible. Collaboration public notes. Documentation can get out of date. LXR -- cross-referencing source code. Units -- important to say what units numbers are in Erica mentioned INSPIRE. Could also use arXiv? HPC -- Use case -- use a leadership-class computing facility. Clarify what the workflow is. Ability to reprocess data in a finite amount of time. What HPC's are going to be like in the future is an interesting question. What sort of modifications are required of our code to be able to run on an HPC? What is needed to make LArSoft multi-threaded? Thousands of threads? Distributed parallelism instead of multithreading. HPC's specify the compiler, libraries, machines. Libraries are optimized for the HPC and users are required to use them. Optical simulation of many photons. G4 v10 is multi-thread capable, but each event is given its own thread.