Scaling ML meeting

US/Central

Discussion with HEP-CCE portability people

  • Workflow prep for inference and potentially grid HPO 

 

Scaling ML application candidates

  • Neural-Based Inference (talk to Aishik and portability) and Omnifold
    • Follow up with Aishik?
      • Apparently the code is not public. Follow up with Aishik on what prevents the code from being public. (Xiangyang & WH)
      • Simulated data used for training is also not public but can use a suregate simulations (e.g., Delphes, HEPsim...). Check with Aishik if HEPSim could work. (WH)
      • Invite Aishik to next meeting or dedicated meeting or chat on slack (Xiangyang)
      • Data can be copied to Perlmutter or using Globus
    • Work with workflow people to train multiple models in parallel on various computing resources. How to figure out how many resources one model needs? Will the resource allocation be automated in the workflow?
    • Add Ben et al to SML (WH)
  • Inference as a Service for tracking: see ATLAS upgrade week.
    • Chicago for Kubernetes
    • Check if Aurora has Kubernetes available: not available at Polaris but Aurora maybe might work. Follow up about Aurora (Rui).
  • Resource constrained ML?
    • Follow up with Lindsey whether they have people (WH) 

 

FASST RFI: 

Send to HEP-CCE and then submit.

 

There are minutes attached to this event. Show them.