DUNE DAQ Integration Working Group Meeting

Name: DUNE DAQ Integration Working Group Meeting
Start: 2021-10-15T08:00:00-05:00
End: 2021-10-15T09:00:00-05:00
Location: Zoom

Friday 15 Oct 2021, 08:00 → 09:00 US/Central

Zoom

https://fnal.zoom.us/j/96158272618?pwd=Q3Rnd2lOc3lTbXhwZFVwMEwxcDhEQT09

Alec Habig (Univ. of Minnesota Duluth), Bonnie King (FNAL)

- 08:00 → 08:30
  General Updates 30m
  
  Minutes
  - NP04 coldbox preparations
  - NP04 server upgrades and issues
    DNS problems
    RAID issues (004 reinstall)
  - mainline kernel testing with FELIX: packaging issues
  - podman/pocket investigations, TRACE from containers
  - HWDB update: Part ID discussion
  PID_DAQ_FD-draft101621.xlsx
  NP04 coldbox preparations (Alessandro)
  
  Single mode fiber is run to NP02, with a 1G switch there to run the coldbox stuff. The 10G switch was an HP model that only takes HP cartridges that are hard to find and expensive, so we don't have 10G there like was planned.
  
  a 25-pair fiber trunk will be laid soon. Then we can add an optical NIC card to the FELIX server and get all the bandwidth we need back to the rest of the DAQ
  
  Plan is also for a 100GB switch to ride the end of that trunk. Lead time on this is 7-8 months(!) but that should still be before we have a whole VD detector to read out.
  
  NP04 server upgrades and issues (Alec)
  
  Need 1G temporary connection for np04-srv-004 to PXE boot from for upgrade. People will be around next week who can plug that in (we will need to register the NIC too: first 1G max address port is a4:bf:01:38:c0:15, just to get it written down where I can find it).
  
  DNS problems (Alec)
  
  Using /etc/hosts will keep ups from exciting the CERN DNS server. To make that easier to manage, Pengfei is cooking up an ansible script that pulls from DNS, writes, and distributes /etc/hosts for our machines so internal traffic never has to hit the DNS.
  
  RAID issues (004 reinstall) (Pengfei)
  
  Pengfei reconfigured the raid on this machine from raid5 to faster (but less storage) raid10. Performance greatly improved.
  
  Addressing the raid via UUID rather than /dev/md0 hopefully will make it come back automatically after reboot, something that has been a problem otherwise.
  
  mainline kernel testing with FELIX: packaging issues (Bonnie)
  
  We need kernel 5.x for FELIX drivers. CS8 has only 3.x. There is a "mainline" kernel 5.x available from the EPEL repo: but the devel headers rpm is missing bits FELIX needs. Bonnie will get an rpm we can deploy manually, and look into fixing the EPEL version, so we can eventually just track that.
  
  podman/pocket investigations, TRACE from containers (Ron)
  
  Trace can write to a file in the shared data area outside the container with the right config (both container and server side).
  
  file is owned by the owner of the container.
  
  Multiple containers all tracing could be an issue. Do they all use different files, or put process names/IDs in the trace messages to ID which line is theirs? Either is an option.
  
  CS8 (Alessandro)
  
  Pengfei has a native CS8 build of the DAQ almost ready for primetime, which will then need to be tested both on the CS8 systems, and in CS8 containers.
  
  HWDB (Alec)
  
  There is a push for each consortium to generate unique Part IDs to serve as HWDB keys. A DAQ scheme to do so is being put together for discussion at Wednesday's Installation meeting. Initial draft is in the materials for this indico.
  
  Timing system scheme will be drawn from the list of parts in EDMS (thanks David!)
  
  Will expand DAQ computing from this materials list (thanks Alessandro!)
  
  Discussion of cables: we don't want most cables in the HWDB. Long fibers, yes: because they have QA and loss information so are each distinct. But no one cares which 1m cat6 cable is where. Note that most long fibers are owned by Installation or Facilities, not DAQ.
- 08:30 → 08:50
  Specification Document 20m
  
  Minutes
  Update old interface documents to reflect move from CUC to top of cryostat. Alessandro needs them next week.
  
  David reports that this is pretty much what he did for timing, just a matter of slogging through updating stuff.