To connect via Zoom: Meeting ID 831-443-820
Password distributed with meeting announcement
(See instructions for setting Zoom default to join a meeting with audio and video off: https://larsoft.org/zoom-info/)
PC, Mac, Linux, iOS, Android: https://fnal.zoom.us/j/831443820
Phone:
https://fnal.zoom.us/zoomconference?m=SvP8nd8sBN4intZiUh6nLkW0-N16p5_b
H.323:
162.255.37.11 (US West)
162.255.36.11 (US East)
213.19.144.110 (EMEA)
See https://fnal.zoom.us/ for more information
At Fermilab: no in-person presence at the lab for this meeting
Release and project report
Comments:
Wes Ketchum: New PR from Pandora?
Bug fix for Pandora. Needed by ICARUS.
Andy said it was submitted earlier today
Lynn Garren: Should go in before the Pandora PRs being discussed today. Is that ok? A: yes.
Wes: ICARUS is particularly interested in larsim#44, promote OpFastScintillation methods from protected to public
SciSoft saw no issues (although this speaks to architectural problems, so there may be follow-up), so just waiting for L2 approval
Andy Chappell: Pandora deep learning pull requests
Active requests for larpandoracontent and larpandora add LibTorch support to Pandora
relevant libs can be built/linked w cmake/mrb option DPANDORA_LIBTORCH=ON/OFF to larpandoraocontent / larpandora respecitvely, but also checks for presence of libTorch and falls back to std build if missing
Default is ...
DL lib in pandora
Diagram
Key additions are LArCLContent and DLMasterAlgorithm
DLMasterAlgorithm is a subclass of MasterAlgorithm so registers DL alongside standard algs
Using the LArDLContent library
If no need of LIbTorch, Pandora can be used entirely unchanged, but to use LibTorch functionality, an alternative master algorithm is used: LArDLMaster
If the only XML change, everything will run as if the LArMaster algorithm was still in use
Useful for validationg the build, while leaving all ouptuchs unchanged
When built with -DPANDORA_LIBTORCH=ON, larpandora registers DL algs, which can then be included via top-level XML
To use LibTorch-based alg, the alg should live in larpandoradlcontent branch of larpandoracontent repository (in lar_dl_content namespace), and then added to the approp worker XML
LibTorch performance implications
Investigation on recently observed performance issues when running in LArSoft context is on-going, so currently not undesirable to have DL algs running now.
As such, although DL functionality is include by default in these PRs, no perf hit is expected while the LibTorch perf issue is investigated
Main aim was just to make this functionality available
In near-term, trach/shower discrimination and vertexing networks can be tested w/in existing reco chain
Longer-term: investigating use of networks across diff areas of the reco, from clustering through to alg selection
Also eager to use GPUaaS
Summary
PRs provide LIbTorch DL support
Goal is to allow functionality for development and testing
No performance hit expected from current LibTorch issues
Patrick Gartung: Update on LibTorch investigation
Libtorch will pick the best algorithm based on the CPU it is being run on. The “ups” build disables the inclusion of the mkldnn library for worry that it might cause an illegal instruction on an AMD cpu (eg, on the grid somewhere). PG tried a libtorch build with mkldnn enabled on an AMD cpu system and it fell back to using libopenblas, which has an algorithm that will work with the AMD cpu instruction set. The mkldnn only provides the performance boost on Intel processors with AVX512 registers.
So, the speedup seems to be where it uses just in time compilation
The good news: Can build it the way Andy built it, and when it lands on the right kind of machine, it will be performant.
How often will it be the right machine?
Depends. And hard to say for the general case
Alex Himmel: Can put something in class ads to require certain instruction sets? Talk to FIFE people