Scaling ML meeting

Name: Scaling ML meeting
Start: 2024-06-13T15:00:00-05:00
End: 2024-06-13T16:05:00-05:00
Location: No location set

Thursday 13 Jun 2024, 15:00 → 16:05 US/Central

Description

https://cern.zoom.us/j/67753039215?pwd=eHY4VTFFbi80N0diVk9mN0Y4d0dyZz09

Hide

Present: Paolo, Walter, Rafa, Aishik, Alina, Atif, Rui Verena, Doug

NSBI Paper: https://cds.cern.ch/record/2869862/files/ATL-SOFT-PROC-2023-023.pdf

Original NSBI paper (Mad Miner) Brehmer Cranmer Louppe and Pavez 2018

NNs substitute histograms as probabilistic models of data for inference. Unlike Brehmer et al. (single NN), here, we use ensembles of networks and add extra NNs that are correct for systematics. Essentially, replace the ratio of normalized bin content with an ensemble of networks. Reference sample defines the support you can study. First step is to train NN to determine if you are within the support region. Safely in support use NSBI, outside support use binned log likelihood.

Key insight from Mad Miner: We don’t need to estimate prob densities but density ratios. The denominator is any “reference sample” that populates the region of phase space we are interested in.

Like for omnifold once you know density ratios you can estimate any statistics (mass, BR, …) in your support region.

Unbinned analyses sensitive to biases and spuria coming from background shapes. NN trained to have low bias tend to have large variance. Large NN 5-6 layers deep and thousands of neurons wide reduce bias, and ensembles of 100s on NNs reduce variance. For systematics usually one NN is enough, but used ensembles when precision is needed.

Total of O(10**4) NNs per process O(10) processes per analysis. ½ day per MLP, NNs coded in tensorflow. 500 GPUs at SMU more of less constantly used.

Trivial to parallelize, independent training. Developed metrics for HPO (reweighting, calibration, etc)

Memory usage. Save 10**4 floats per NN, 500GB 1TB memory needed to do fast inference.

NSBI Applicable broadly to many analyses, bigger gain for highly non-linear distributions

Future work: Optimize NSBI chain, reducing resource consumptionTransition from MLP to more advanced architectures capturing symmetries (Lorentz equivariant, GNNs,...)

All tools for distributed training, testing, management developed in house and described in the paper.

Playing with the possibility of reinterpretation, proof of concept is not yet ready for public consumption.

What would you do with 10x more resources?

We don't need more than 500 GPUs, but we would need to continue to find this level of GPU resources for every NSBI analysis

Aishik, Rafael interested in collaboration with SML on workflows for distributed training on HPC, and also many avenues to optimize the training.

There are minutes attached to this event. Show them.

- 15:00 → 15:15
  
  Intro 15m
  
  Speakers: Paolo Calafiura (LBNL), Walter Hopkins (Argonne National Laboratory)
- 15:15 → 15:30
  
  Simulation Based Inference 15m
  
  Speakers: Aishik Ghosh (UCI), Jay Sandesara (UMass), Rafael Lopes de Sa (Fermilab)