Please read these instructions before posting any event on Fermilab Indico

The FERMI(FNAL) network authorization method has been removed. See news for more details.

DAQ Coordination Meeting

America/Chicago
    • 08:00 08:05
      General news 5m
      Speakers: Alessandro Thea (STFC Rutherford Appleton Laboratory), Asher Kaboth (RHUL), Roland Sipos (CERN)
    • 08:05 08:20
      Actions from previous meetings 15m
      Speakers: Alessandro Thea (STFC Rutherford Appleton Laboratory), Asher Kaboth (RHUL), Roland Sipos (CERN), Wesley Ketchum (Fermi National Accelerator Laboratory)
    • 08:20 08:30
      Run Coordination news 10m
      Speaker: Wesley Ketchum (Fermi National Accelerator Laboratory)
    • 08:30 08:50
      Activity coordination round table 20m

      Release Coordination v5
      Iceberg
      ND test setups
      SW coordination

      Speakers: Bonnie King (FNAL), Eric Flumerfelt (Fermilab), John Freeman, Kurt Biery (Fermilab)

      John Freeman, Software Coordination:

      • Have a script on a feature branch of daq-release which shows the differences between currently-used externals versions and Spack's "Preferred Versions" (typically the newest)
      • "Floated" the versions to have Spack try to get those Preferred Versions rather than using pinned versions from a release yaml - have a nightly which uses this and "passes" the integration tests (https://github.com/DUNE-DAQ/daq-release/actions/runs/11998127399). "passes" means that performance is comparable to when we use the traditional externals versions (i.e., nothing broke)
      • Some of the important version bumps in the test were:
        • Boost: 1.77 -> 1.85
        • cppzmq: 4.8.1 -> 4.10.0
        • fmt: 8.1.1 -> 10.2.1
        • hdf5: 1.12.0 -> 1.14.3
        • nlohmann-json: 3.9.0 -> 3.11.2
        •  
      • Things to work on include using "spack checksum" to update package.pys (e.g., folly's package.py still has a 2021 release as the preferred version), and investigating a couple places things broke (dpdklibs doesn't build against the newer fmt package, and cmdlib doesn't build against the latest intel-tbb)
      • Hope to have a meeting on Thursday, Dec. 5 if possible, though I understand schedules are in flux. 
    • 08:50 09:20
      Working groups round table 30m
      Speakers: Adam Barcock (UKRI STFC), Alec Habig (Univ. of Minnesota Duluth), Alexander Tapper (Imperial College London), Artur Sztuc (University College London), Bonnie King (FNAL), David Cussans (University of Bristol), Jonathan Hays (Queen Mary University of London), Joshua Klein, Kurt Biery (Fermilab), Pierre Lasorak (Imperial College London), Roland Sipos (CERN), Stoyan Trilov, Wesley Ketchum (Fermi National Accelerator Laboratory)

      CCM

      • Control
        • Endpoint and detector name in ps (Pawel)
        • Linting for `drunc` complete, with ruff (which was included by the RSE) (Pawel)
        • Process manager separate logs almost complete (Pawel)
        • Splitting repositories for python ERS/Opmon planned (Pawel)
        • Generating tree ID recursively done and merged (Claudia)
        • Fixing an issue, where, if one app dies at boot, all the parent trees die. (Claudia)
        • Simplified control service merged (patch and develop) (Pierre)
        • FSM schema (drunc side) work starting (Pierre)
      • Configuration
        • Session 2 of the cong workshop-- 4 Dec usual CCM slot but extended to 2+ hours: https://indico.fnal.gov/event/66612/
        • Meeting with Wes/Marco/Alessandro/Giovanna on management on the DB for operations (remove objects etc.).
        • Twiki page started (Marco)
        • VNC up for editor UI tools, send Alec an email if you need access.
        • OKS reader for our offline colleagues (Pierre)
          • Problem: the Session isn't in our run registry.
        • The hunt for the new name of the Session continues (Alessandro,Pierre)
          • Giovanna: in the Detector "entry" of the session, the channel maps will need to be updated, currently we can only cater for one, but we will need more (PDS, BDE, TDE)
      • Monitoring
        • Nothing this week beyond what was reported in the control: splitting ERS/Opmon python utilities from their C++ packages.

       

      Trigger:

      • Closed 18 git issues (mostly outdated...)
      • Trigger schema now has descriptions.
      • Various changes & PRs to the RandomTCMaker (to be used for timing triggers):
        • Fixed issue with low-trigger-rate config giving abnormally high trigger rates (int -> uint...)
        • Fixed shutdown procedure when running with SystemClock rather than with TimeSync
        • Using new feature from IOManager, allowing multiple RTCMs/modules/receivers subscribing to the same network input. We can now have multiple RTCMs listening to the same TimeSync connection.
        • New integration test by Eric
      • Removed CustomTCMaker: historically used for testing. Its functionality can be reproduced by using multiple RTCMs.
      • Removed some old & obsolete v4 code that somehow got into v5.
      • Updated trigger documentation to reflect changes done from v4 to v5.
      • Overhaul of TriggerDataHandlerModel by Deniz, allowing to send vectors of TPs.
      • Ongoing work on PDS TPs. Preliminary trigger workflow on github, and being tested. v4 for now.
      • Ongoing work on porting the Replay application.

       

      Core Software:

      A couple of operations and testing notes from Kurt:

      • ~190 unclosed raw data files from PDS calibration runs at EHN1 were recovered and subsequently transferred to offline
      • Something weird is happening on the daq.fnal.gov teststand node
        • often, test systems fail to start during integtests (this started in the last couple of weeks, I believe)
        • when the system does start, there are often trigger inhibits during the first run when TPG is enabled
        • this weirdness is somewhat of a pain, but it has allowed us to debug several undesirable features of the system. I initially thought that the weirdness was correlated with dune-daq software builds happening on the computer at the same time as integtests, but the problems have persisted for a while, independent of whether builds are running.
        • regarding the cause of this odd behavior…
          • an observation: the ConnectivityService process appears to take a non-trivial amount of CPU throughout integtest runs. This does not seem to be the case on np04-srv-003.
            • I seem to recall that there is a script or app somewhere that can be used to test ConnectivityService performance. Is that available somewhere?
        • regarding the ssues that this behavior has uncovered...
          • Pierre noticed that drunc is missing a sleep() call when connection information can’t be fetched from the ConnSvc.
            • with the sleep() added, there are still problems observed. it seems like it takes many seconds before some of the daq_apps are attempted to be started
            • can we get more timestamps in controller logfiles?
            • this is still under investigation
          • the trigger inhibits are caused by delays in establishing network senders in the DFO, TRBs, and FragmentAggregator. I have fixes for these.

      Readout

      • WP2.0
        • TD quality management system entry has been submitted.
        • Readout schedule information given to PPD project office.
      • WP2.2 and 2.3
        • Latest 40 GbE system test ran for ~3 days without issues. Observed the correct data rate.
        • Intermittent fault with 10 GbE emulator control plane resolved.
        • Potential explanation for 10 GbE system sync issues. Believe it is an issue with the bend radii of the MTP trunk fibres.
        • AE creating build with IBERT IP to test sync link.
        • Can now script generate results.
    • 09:20 09:25
      Actions 5m