This workshop will provide Fermilab staff and users an overview of the current state of Scientific Computing Monitoring, the types of data that are collected, and how to interact with that data. There will also be advanced tutorials on how to instrument applications and services. We will discuss the current direction and trends in monitoring, and garner feedback to help shape the future roadmap.
It is expected that the morning session will be of general interest to users and CSAID staff alike, while the afternoon session will be more geared towards service providers/staff, although users who are interested in developing custom monitoring for their collaboration are welcome as well.
Sessions will be hybrid, with location onsite at Fermilab (see session details - NB: different rooms for morning and afternoon) and Zoom (connection info in email announcement). Timetable is developing and subject to change.
Sessions will be recorded and posted in Teams/Sharepoint, with links added here when available.
Zoom information is available here (FNAL SSO required)
Introduction to Landscape - navigating dashboards, interacting with Grafana, and finding relevant information, with a focus on batch job and related resource monitoring.
Landscape grew out of FIFE batch monitoring, but it also serves CMS users, where everything is similar, but different. What can FIFE and DUNE learn from CMS, and vice-versa?
Using Kibana for ad-hoc and deep data exploration and troubleshooting, and directly accessing data for nonstandard usage.
Have any questions, concerns, or feedback about the current and future state of scientific computing monitoring for experiments and users? This is an opportunity to bring them up in an informal discussion.
Introduction to types of data that can be collected, what can be done with it, and what best practices you should follow.
Discussion on the state and future of combined monitoring, understanding usage patterns and where the resource contention and bottlenecks are at any given time - compute resources, network, dCache, tape?
The Elastic Analysis Facility is the next-generation interactive computing facility for users, providing scalable on-demand resources though "cloud native" technologies like Kubernetes (OKD) and Ceph, with extensive Prometheus metrics and logs for observability.
Bring your laptop for an interactive demo of adding instrumentation to a toy application and sending it to Landscape.