[Kurt] A very low-priority, and probably tangential, question: is the read/write performance of the data disks on np04-srv-005 sufficient so that we don't need to look into faster ways to calculate the full-file checksum?
09:20
→
09:40
Improving the stability of TPSet creation and handling20m
Speakers:
Alex Oranday, Kurt Biery
(Fermilab), Roland Sipos
(CERN), Wesley Ketchum
(Fermi National Accelerator Laboratory)
[Kurt] A couple of notes, to be discussed (if needed) after other speakers have finished:
It would be great to have monitoring that provides positive confirmation that no TPs are being lost when the system is running well, and alerts us when TPs are being lost.
And, it would be great if that monitoring could give a reasonably accurate picture of where in the chain TPs are being dropped if/when that occurs.
One example in this vein, if we can demonstrate via metrics that the TPStreamWriter is writing out all of the TPs that it receives and the write rate is not max-ed out, then we would know that any problem is elsewhere.
Independent of that, the TPStreamWriter could be a nice place to validate the completeness of the stream(s).
It would be helpful to have a consensus understanding of what should happen in various situations.
For example, if a TPSet arrives "too late" to be included in the original TimeSlice for the relevant time window and SourceID, should it be
silently dropped?
stored in a new TimeSlice with a slightly different name?
added to the existing TimeSlice (if the appropriate file is still open)?
I have draft changes to SourceEmulatorModel that allow us to emulate various TP-related exceptional conditions, such as two sources sending TPSets that are significantly out of sync.