LArSoft Coordination Meeting

US/Central
Zoom-only

Zoom-only

Description

To connect via Zoom:  Meeting ID 831-443-820

Password distributed with meeting announcement

(See instructions for setting Zoom default to join a meeting with audio and video off: https://larsoft.org/zoom-info/)

PC, Mac, Linux, iOS, Android:  https://fnal.zoom.us/j/831443820

Phone:

H.323:

162.255.37.11 (US West)
162.255.36.11 (US East)
213.19.144.110 (EMEA)
See https://fnal.zoom.us/ for more information
 

At Fermilab:  no in-person presence at the lab for this meeting

 

Erica: Release and project report

  • Herb noted one of his PRs is missing. May be on the fork? Will investigate.

    • A bug fix, so can get it into this release if it can be recovered, approved quickly

 

Erica: 2021 LArSoft Work Plan summary

  • Hans: noted that photon simulation in G4 is already capable of running on GPU. Just a matter of a build switch. Should look into that

    • Erica: This would be a hybrid solution, given that existing production platforms are grid-based. Mike has worked on allowing access to GPU from the grid. Hoping to see this operate at production scale

    • Mike noted that his solution is directed at machine learning. More difficult to do what Hans is suggesting.

    • Mike/Hans should talk at some point to better understand what would be needed to make it work

  • Krzysztof: mentioned that next version of G4 will support execution on accelerators. So moving toward HCP may not be as difficult w G4 as we might initially believe

    • Erica: This is a direction we believe we need to go, so we will be interested to learn how to do this.

    • We would then want to find an experiment interested in pursuing one or both of these options, and we will collaborate with them on that.

 

Kyle Knoepfel: Concurrent cache support

  • Intro

    • art has supported concurrent events since June 2018

    • Many experiment algorithms not designed with multi-threading / concurrency in mind

    • In pursuing MT upgrades in LArSoft, the need for a concurrent caching system for conditions information became apparent

    • Unlike CMS, art does not have a dedicated conditions system

      • Has led experiments to pursue their own solutions

      • Closest art has is concept of "producing" services

    • This work is intended to provide a different solution

  • Previous idea: Producing services

    • can insert data prods in serialized context immediately after the principal has been created

      • DB queries can be made in a controlled fashion

    • Access to data products is thread-safe, so users need not be concerned about thread-safety

    • For simple and small conditions info, this is a good approach

  • Downside

    • Potentially memory-expensive, unless a caching mech is developed

    • Significant breaking change for configurations

    • Shift in the mental model of what data products are for

  • Can framework adopt a conditions system like CMS?

    • largely no. Would require significant analysis to determine what implementation, interface, and scheduling adjustments would be necessary

    • The art framework is "feature frozen"

      • Small framework-agnostic features have been implemented, but large-scale dev has been halted

    • Less efficient, framework-agnostic, concurrent caching utility could be developed

  • Assumptions

    • Must support associative list of user-defined key-value pairs

    • Insertion, retrieval and (perhaps implicit) erasure of entries + any locking needed

    • Access shall be const/immutable (so no locks needed after retrieval)

    • Once access to an entry has been granted, no locking should be needed to use it

    • Implementation cannot remove a cache entry if it is being used any any thread

    • Retrieval by key or quantity that can be transformed to at most one key

  • Implementation

    • template in hep_concurrency (already in art, based on TBB's concurrent containers)

    • Use the example: hep::concurrency::cache<...>

    • ...Described lookup interface with examples...

  • Cache handles

    • Provides access to a cache entry

      • const access to the key, the value, and the cache entry's sequence number

    • Valid vs invalid

      • convertible to boolean true or false, respectively

      • Dereferencing invalid handle results in exception

    • Valid handles can be copied and moved. (The moved-from handle becomes invalid)

    • Can be compared

      • "==" and "!=", depending on whether they point to same entry, or diff entries.

    • Are reference counted.

      • Cache entries will not be deleted as long as at least one valid handle points to it

  • Cache entries

    • Explicit call needed to drop unused entries

      • Can keep last N most recent entries

      • Recency determined by "sequence number" corresponding to when it was inserted into the cache

    • To avoid unnecessary locking, the cache includes an aux data structure that cannot shrink during concurrent processing

    • If serialized execution can be guaranteed, the shrink_to_fit() function may be called, removing all unused entries from cache and from the aux structure

    • Inserting

      • done vie emplacement: cache.emplace(...)

      • emplace may be called concurrently, but be mindful of the efficiency and thread-safety issues in creating its arguments

      • Talk to scisoft-team if concerned about this being a problem

  • Next plans

    • Will release it concurrently with art 3.07 suite

    • Expect this to be most useful to art service authors

    • Please let us know if you have concerns or suggestions

There are minutes attached to this event. Show them.