LArSoft Coordination Meeting

Name: LArSoft Coordination Meeting
Start: 2020-11-17T09:00:00-06:00
End: 2020-11-17T10:30:00-06:00
Location: Zoom-only

Tuesday 17 Nov 2020, 09:00 → 10:30 US/Central

Zoom-only

Description

To connect via Zoom: Meeting ID 831-443-820

Password distributed with meeting announcement

(See instructions for setting Zoom default to join a meeting with audio and video off: https://larsoft.org/zoom-info/)

PC, Mac, Linux, iOS, Android: https://fnal.zoom.us/j/831443820

Phone:

US toll: meeting ID 831-443-820
+1 646-558-8656
+1 408-638-0968
International numbers:
https://fnal.zoom.us/zoomconference?m=SvP8nd8sBN4intZiUh6nLkW0-N16p5_b

H.323:

162.255.37.11 (US West)
162.255.36.11 (US East)
213.19.144.110 (EMEA)
See https://fnal.zoom.us/ for more information

At Fermilab: no in-person presence at the lab for this meeting

Support

scisoft-team@fnal.gov

Hide

Erica: Release and project report

Herb noted one of his PRs is missing. May be on the fork? Will investigate.
- A bug fix, so can get it into this release if it can be recovered, approved quickly

Erica: 2021 LArSoft Work Plan summary

Hans: noted that photon simulation in G4 is already capable of running on GPU. Just a matter of a build switch. Should look into that
- Erica: This would be a hybrid solution, given that existing production platforms are grid-based. Mike has worked on allowing access to GPU from the grid. Hoping to see this operate at production scale
- Mike noted that his solution is directed at machine learning. More difficult to do what Hans is suggesting.
- Mike/Hans should talk at some point to better understand what would be needed to make it work
Krzysztof: mentioned that next version of G4 will support execution on accelerators. So moving toward HCP may not be as difficult w G4 as we might initially believe
- Erica: This is a direction we believe we need to go, so we will be interested to learn how to do this.
- We would then want to find an experiment interested in pursuing one or both of these options, and we will collaborate with them on that.

Kyle Knoepfel: Concurrent cache support

Intro
- art has supported concurrent events since June 2018
- Many experiment algorithms not designed with multi-threading / concurrency in mind
- In pursuing MT upgrades in LArSoft, the need for a concurrent caching system for conditions information became apparent
- Unlike CMS, art does not have a dedicated conditions system
  - Has led experiments to pursue their own solutions
  - Closest art has is concept of "producing" services
- This work is intended to provide a different solution
Previous idea: Producing services
- can insert data prods in serialized context immediately after the principal has been created
  - DB queries can be made in a controlled fashion
- Access to data products is thread-safe, so users need not be concerned about thread-safety
- For simple and small conditions info, this is a good approach
Downside
- Potentially memory-expensive, unless a caching mech is developed
- Significant breaking change for configurations
- Shift in the mental model of what data products are for
Can framework adopt a conditions system like CMS?
- largely no. Would require significant analysis to determine what implementation, interface, and scheduling adjustments would be necessary
- The art framework is "feature frozen"
  - Small framework-agnostic features have been implemented, but large-scale dev has been halted
- Less efficient, framework-agnostic, concurrent caching utility could be developed
Assumptions
- Must support associative list of user-defined key-value pairs
- Insertion, retrieval and (perhaps implicit) erasure of entries + any locking needed
- Access shall be const/immutable (so no locks needed after retrieval)
- Once access to an entry has been granted, no locking should be needed to use it
- Implementation cannot remove a cache entry if it is being used any any thread
- Retrieval by key or quantity that can be transformed to at most one key
Implementation
- template in hep_concurrency (already in art, based on TBB's concurrent containers)
- Use the example: hep::concurrency::cache<...>
- ...Described lookup interface with examples...
Cache handles
- Provides access to a cache entry
  - const access to the key, the value, and the cache entry's sequence number
- Valid vs invalid
  - convertible to boolean true or false, respectively
  - Dereferencing invalid handle results in exception
- Valid handles can be copied and moved. (The moved-from handle becomes invalid)
- Can be compared
  - "==" and "!=", depending on whether they point to same entry, or diff entries.
- Are reference counted.
  - Cache entries will not be deleted as long as at least one valid handle points to it
Cache entries
- Explicit call needed to drop unused entries
  - Can keep last N most recent entries
  - Recency determined by "sequence number" corresponding to when it was inserted into the cache
- To avoid unnecessary locking, the cache includes an aux data structure that cannot shrink during concurrent processing
- If serialized execution can be guaranteed, the shrink_to_fit() function may be called, removing all unused entries from cache and from the aux structure
- Inserting
  - done vie emplacement: cache.emplace(...)
  - emplace may be called concurrently, but be mindful of the efficiency and thread-safety issues in creating its arguments
  - Talk to scisoft-team if concerned about this being a problem
Next plans
- Will release it concurrently with art 3.07 suite
- Expect this to be most useful to art service authors
- Please let us know if you have concerns or suggestions

There are minutes attached to this event. Show them.

- 09:00 → 09:20
  
  Release and project report 20m
  
  Speaker: Erica Snider (Fermilab)
  
  draft-larsoft-2021-work-plan-summary-2020-11-17.pdf
  
  larsoft-coordination-meeting-2020-11-17.pdf
- 09:20 → 09:50
  
  Concurrent cache support 30m
  
  Speaker: Kyle Knoepfel (Fermilab)
  
  2020-11-10.pdf