DUNE Global Computing Sites

Name: DUNE Global Computing Sites
Start: 2020-04-13T07:00:00-07:00
End: 2020-04-13T08:00:00-07:00
Location: https://fnal.zoom.us/j/636941598

Monday 13 Apr 2020, 07:00 → 08:00 US/Pacific

https://fnal.zoom.us/j/636941598

Andrew Mcnab, Heidi Schellman (Oregon state), Kenneth Herner (Fermilab), Michael Kirby (FNAL), Steven Timm (Fermilab), Stuart Fuess (Fermilab), peter clarke (University Edinburgh)

Description

Weekly meeting for sites doing DUNE computing

- 07:00 → 07:20
  
  General discussion 20m
  
  Minutes
  
  Speakers: Dr Andrew McNab (University of Manchester), Heidi Schellman (Oregon state), Dr Michael Kirby (FNAL), Dr Steven Timm (Fermilab), Prof. peter clarke (University Edinburgh)
  
  Notes from DUNE Site meeting 4/13/20
  
  2 GGUS tickets
  
  One at Manchester—not responded to as yet
  
  One at PIC—resolved—they didn’t know about voms1 change, need to make sure they are on list.
  
  NERSC—had 600 simultaneous nodes going, filled up scratch to 167% 34 TB
  
  Steve will pull the deism files back to FNAL.
  
  Problem with CVMFS library only on workers, still don’t know why but can work around.
  
  Also a different failure in reco still being investigated.
  
  ETF testing —still going, lots of auth failures, need to be investigated
  
  2 SE’s still to onboard, RAL-PP and NIKHEF
  
  Brazil
  
  CBPF test jobs now working
  
  Still working on getting a project on the supercomputer
  
  Other institutions likely to join
  
  Each one will have to send technical contact and institutional one to Heidi.
  
  (2 other institutions planning to join)
  
  In past APS has had money to collaborate with Brazil
  
  Request to OSG factory to put GLIDEIN_DUNESite on all our sites
  
  Change FE at Fermilab to use the DUNE proxy everywhere
  
  Sam4Users—tom hasn’t been able to make it go
  
  Ken—might want to think about longer pilot lifetimes—difficult to get jobs for prod3 going right now because we need longer than 24hr for data reconstruction, or we need more memory
  
  Single core < 2.5GB and 24 hr run time we can’t use right now.=
  
  At least 1/3 of data reco takes > 24 hrs
  
  Are all target data files where we need them? Seems to be ok
  
  After the target 8 runs will go to 1GeV data reco.
  
  First pass of data reco doing totally off site.