30–31 May 2024
Wigner Datacenter - HUN-REN Wigner Research Centre for Physics
Europe/Budapest timezone

Monitoring ALICE Analysis Facility infrastructure: operation and visibility

31 May 2024, 10:20
25m
Wigner Datacenter - HUN-REN Wigner Research Centre for Physics

Wigner Datacenter - HUN-REN Wigner Research Centre for Physics

HUN-REN Wigner RCP 1121 Budapest, Konkoly-Thege Miklós rd 29-33, Hungary

Speaker

Ádám Pintér

Description

We would like to present a short introduction to the ALICE Analysis Facility and WSCLAB (Wigner Scientific Computing Laboratory) projects in our datacenter and show some key operation and visibility details of monitoring. Hardware components are aging, so monitoring is an important method to keep infrastructure healthy and to prolong cluster lifetime.
We created server types (worker node, storage) and defined entities in our monitoring system. In some cases, monitoring checks are just basic, others are advanced and some are even more complex to make sure we know the most important details in almost real time.
For power consumption we are using a visualization solution for power usage statistics based on each rack.
Ansible automation tool was used to scale up the monitoring system.
Historical data is also very valuable, so we integrated a database solution (InfluxDB) into our monitoring workflow.
Current milestones and roadmap for monitoring: continuous disk tests (S.M.A.R.T.), smart alerting for complex cases, scheduled backup for monitoring data, proper alerting based on pre-defined warning and critical levels, iterative time-based optimization for running checks, HTCondor service monitoring.

Presentation materials