Please read these instructions before posting any event on Fermilab Indico
The FERMI(FNAL) network authorization method has been removed. See news for more details.
Zoom: https://fnal.zoom.us/j/91792149723 (usual password)
TBD
Plus, any thoughts on how to ensure valid configurations generally.
In following up on a question from Artur, I noticed that in a system with multiple TPStreamWriter apps, these apps all have the same SourceID. This is to be compared with the DataWriter apps, which have different SourceIDs. I presume that this is just a bug in configurations, but I wanted to check. And, if it is a bug in configuration, how to we find and fix all of the configurations that have it? And, how do we prevent it happening in the future?
Background information:
[biery@np04-srv-005 test]$ pwd
/data3/test
[biery@np04-srv-005 test]$ dir *32644* | head
-rw-r--r-- 1 aoranday np-comp 4275140200 Nov 4 06:09 swtest_tp_run032644_0000_tp-stream-writer-apa1_tpw_4_20241104T050713.hdf5
-rw-r--r-- 1 aoranday np-comp 4255322960 Nov 4 06:08 swtest_tp_run032644_0000_tp-stream-writer-apa2_tpw_4_20241104T050713.hdf5
-rw-r--r-- 1 aoranday np-comp 4247215528 Nov 4 06:08 swtest_tp_run032644_0000_tp-stream-writer-apa3_tpw_4_20241104T050713.hdf5
-rw-r--r-- 1 aoranday np-comp 4279945968 Nov 4 06:08 swtest_tp_run032644_0000_tp-stream-writer-apa4_tpw_4_20241104T050713.hdf5
-rw-r--r-- 1 aoranday np-comp 4291641792 Nov 4 06:11 swtest_tp_run032644_0001_tp-stream-writer-apa1_tpw_4_20241104T050940.hdf5
-rw-r--r-- 1 aoranday np-comp 4289043592 Nov 4 06:09 swtest_tp_run032644_0001_tp-stream-writer-apa2_tpw_4_20241104T050825.hdf5
-rw-r--r-- 1 aoranday np-comp 4239198200 Nov 4 06:09 swtest_tp_run032644_0001_tp-stream-writer-apa3_tpw_4_20241104T050824.hdf5
-rw-r--r-- 1 aoranday np-comp 4271226696 Nov 4 06:09 swtest_tp_run032644_0001_tp-stream-writer-apa4_tpw_4_20241104T050827.hdf5
-rw-r--r-- 1 aoranday np-comp 4273074968 Nov 4 06:14 swtest_tp_run032644_0002_tp-stream-writer-apa1_tpw_4_20241104T051159.hdf5
-rw-r--r-- 1 aoranday np-comp 4263665560 Nov 4 06:10 swtest_tp_run032644_0002_tp-stream-writer-apa2_tpw_4_20241104T050946.hdf5
[biery@mac-135043 run32644]$ grep -A 7 'df-0' tmpkswqliw0.data.xml | egrep 'DFApplication|SourceIDConf'
<obj class="DFApplication" id="df-01">
<rel name="source_id" class="SourceIDConf" id="srcid-df-01"/>
<obj class="DFApplication" id="df-02">
<rel name="source_id" class="SourceIDConf" id="srcid-df-02"/>
<obj class="DFApplication" id="df-03">
<rel name="source_id" class="SourceIDConf" id="srcid-df-03"/>
<ref class="DFApplication" id="df-01"/>
<ref class="DFApplication" id="df-02"/>
<ref class="DFApplication" id="df-03"/>
<obj class="SourceIDConf" id="srcid-df-01">
<obj class="SourceIDConf" id="srcid-df-02">
<obj class="SourceIDConf" id="srcid-df-03">
<obj class="SourceIDConf" id="srcid-tp-stream-writer">
[biery@mac-135043 run32644]$
[biery@mac-135043 run32644]$
[biery@mac-135043 run32644]$ grep -A 1 srcid-df-0 tmpkswqliw0.data.xml | grep -A 1 obj
<obj class="SourceIDConf" id="srcid-df-01">
<attr name="sid" type="u32" val="1"/>
--
<obj class="SourceIDConf" id="srcid-df-02">
<attr name="sid" type="u32" val="2"/>
--
<obj class="SourceIDConf" id="srcid-df-03">
<attr name="sid" type="u32" val="3"/>
[biery@mac-135043 run32644]$ grep -A 7 'tp-stream-writer-apa' tmpkswqliw0.data.xml | grep -A 7 TPStreamWriterApplication | egrep 'SourceIDConf|TPStreamWriterApplication'
<ref class="TPStreamWriterApplication" id="tp-stream-writer-apa1"/>
<ref class="TPStreamWriterApplication" id="tp-stream-writer-apa2"/>
<ref class="TPStreamWriterApplication" id="tp-stream-writer-apa3"/>
<ref class="TPStreamWriterApplication" id="tp-stream-writer-apa4"/>
<obj class="TPStreamWriterApplication" id="tp-stream-writer-apa1">
<rel name="source_id" class="SourceIDConf" id="srcid-tp-stream-writer"/>
<obj class="TPStreamWriterApplication" id="tp-stream-writer-apa2">
<rel name="source_id" class="SourceIDConf" id="srcid-tp-stream-writer"/>
<obj class="TPStreamWriterApplication" id="tp-stream-writer-apa3">
<rel name="source_id" class="SourceIDConf" id="srcid-tp-stream-writer"/>
<obj class="TPStreamWriterApplication" id="tp-stream-writer-apa4">
<rel name="source_id" class="SourceIDConf" id="srcid-tp-stream-writer"/>
[biery@mac-135043 run32644]$
[biery@mac-135043 run32644]$
[biery@mac-135043 run32644]$ grep -A 1 srcid-tp-stream-writer tmpkswqliw0.data.xml | grep -A 1 obj
<obj class="SourceIDConf" id="srcid-tp-stream-writer">
<attr name="sid" type="u32" val="4"/>
This is currently not consistent with the way that we name raw data files. Should it be?
Raw data HDF5 files have filenames that include the dataflow application name, a constant string ("_dw_"), and the index of the DataWriter that created the raw data file. (Recall that there can be multiple DataWriter modules per DF application.)
The relevant place to look in the code is here.
In contrast, TPStream data files have filenames that include the application name, a constant string, and the source ID of the tp-stream-writer application.
The relevant place to look in the code is here.
I would argue that these two naming schemes should be consistent. And, therefore, tp-stream-writer files should have substrings like tp-stream-writer-apa3_tpw_0 in their name (assuming just one TPStreamWriter module per tp-stream-writer application.
Objections?
Currently, there is basically no enforced synchronization. It may happen in some cases, but that should not be expected. Do we want to take some steps to change this situation?
There have been some questions about how the data is organized in TPStream HDF5 data files...
Some reminders:
Is there a desire/need to add some synchronization between different TPStream streams?
Reminders of warnings and errors seen in daqsystemtest regression tests recently:
Let's start with (2)...
Based on this, I have implemented record-ordinal-based fragment count checking. And, I went ahead and implemented record-ordinal-based fragment size checking and TC-type-based fragment count and fragment size checking which will be useful in other situations.
For (1) [TA fragment size of 360]...
Trigger_Activity fragment with SourceID Trigger_0x0000044c from subdetector DAQ has size = 360 -----
Readout window before = 0, after = 32
Number of TAs in this fragment=2, overall number of referenced TPs=2, size of TA data=88
First TA type = 1, TA algorithm = 2, number of TPs = 1
First TA start time=108365367111959330, end time=108365367111959362, and activity time=0
Second TA type = 1, TA algorithm = 2, number of TPs = 1
Second TA start time=108365367111959330, end time=108365367111959362, and activity time=0
For (3), I found that the upward fluctuation in the size of the TC fragment was because of multiple TCs being included (a kPrescale one and a kRandom one). After poking around a bit, I noticed that the v5 version of the tpstream_writing_test was merging overlapping TCs (whereas the v4 version did not). To keep the spirit of this test consistent with what we had before, I added back the configuration parameter that disabled TC merging (link).
And, lastly, I'll say that I took the opportunity to tighten up some of the fragment-size and fragment-count checking, based on the new ability to do those checks based on the record ordinal number and the trigger type of the record.
Has there been any discussion of a tentative code-complete date for fddaq-v5.3.0?
I would appreciate reviews of the following PRs:
What is happening when we see error messages in daq_application log files like the following, for example here?