-
Story
-
Resolution: Done
-
Undefined
-
None
With Zabbix now taking over from Nagios, we need to document it better. I'm happy to deal with that. I think the initial list would be:
SOPs
- Taking a single machine in/out of monitoring for reboots etc
- Scheduling a maintenance window for a group of hosts
- Updating & applying template variables (eg cpu load) for a host or group via Ansible
Larger Howtos
- Overview of Zabbix and how we've implemented it
- Developing a new template for a group of checks on certain hosts
- SAML mappings and which groups have what levels of access
Any others we need?