Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: Logging 6.2.4
Affects Version/s: None
Component/s: Log Storage
Labels:
- devel_ack+

Activity Type:
Quality / Stability / Reliability
Story Points:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs QE Status:
NEW
QE Status:
VERIFIED
Release Note Text:
With this fix users will be able to create RecordingRules or AlertingRules in both Infrastructure and/or Audit without having to specify a namespace label
Release Note Type:
Bug Fix
Git Pull Request:
https://github.com/openshift/loki/pull/438
Intelligence Requested:
Market:

Sprint:
Log Storage - Sprint 273, Logging - Sprint 274
Severity:
Important

Target Version:

Logging 6.0.z, Logging 6.1.z, Logging 6.2.z

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Problem statement

To detect and address storage issues, a customer would like to alert on kernel messages indicating that there are I/O errors, such as the following:

Mar 02 02:12:33 example-node.example.com kernel: I/O error, dev sde, sector 236672120 op 0x1:(WRITE) flags 0x4a00 phys_seg 16 prio class 2
Mar 02 02:12:33 example-node.example.com kernel: I/O error, dev sde, sector 236671992 op 0x1:(WRITE) flags 0x4a00 phys_seg 16 prio class 2

These messages are also visible via Loki / Infrastructure Logging:

{"@timestamp":"2025-03-12T12:11:40.169048Z","_RUNTIME_SCOPE":"system","_SOURCE_MONOTONIC_TIMESTAMP":"1012091349","hostname":"ip-10-0-2-224","kubernetes":{"container_name":"","namespace_name":"","pod_name":""},"level":"warning","log_source":"node","log_type":"infrastructure","message":"I/O error, dev sde, sector 236672120 op 0x1:(WRITE) flags 0x4a00 phys_seg 16 prio class 2","openshift":{"cluster_id":"41a17697-985f-4fc8-afa9-434482937887","sequence":1741781500535134264},"systemd":{"t":{"BOOT_ID":"ac3018b64b784cedbf332ee684d813b4","MACHINE_ID":"ec248febe93c4d59715b5326628d3475","TRANSPORT":"kernel"},"u":{"SYSLOG_FACILITY":"1","SYSLOG_IDENTIFIER":"kernel"}},"time":"2025-03-12T12:11:40+00:00"}

However, when trying to create an AlertingRule for this message to alert, this fails:

kind: AlertingRule
apiVersion: loki.grafana.com/v1
metadata:
  name: kernel-io-errors
  namespace: openshift-logging
  labels:
    openshift.io/log-alerting: 'true'
spec:
  groups:
    - interval: 1m
      name: KernelErrors
      rules:
        - alert: KernelIOErrors
          annotations:
            summary: Kernel log is showing I/O errors, this is potentially related to iSCSI connectivity issues or other storage issues
            description: Kernel log is showing I/O errors
            message: '{{ $labels.message }}'
          expr: 'count_over_time({ log_type="infrastructure" } |~ `I/O error` | json [60m]) > 0'
          labels:
            severity: critical
          for: 0m
  tenantID: infrastructure

This fails with the following error message:

$ oc apply -f rule.yml
[..]
spec.groups[0].rules[0].expr: Invalid value: "count_over_time({ log_type=\"infrastructure\" } |~ `I/O error` | json [60m]) > 0": rule needs to have a matcher for the namespace

I believe this to be a Bug, even though this issue has already been discussed in RFE-5656.

Affected versions

OpenShift Container Platform 4.18.1
OpenShift Logging 6.2.0
Cluster Observability Operator 1.0.0
Loki Operator 6.2.0

Steps to reproduce

1. Set up the complete logging stack with Loki on OpenShift Container Platform 4.18 as per the quick start guide in the documentation
2. Use "oc rsh" and "chroot /host" to run commands on a host. Generate a kernel message using "echo 'kernel: test message' > /dev/kmsg"
3. Try to create an AlertingRule as per the definition above to filter for "test message"
4. Observe that the alerting rule on Kernel messages cannot be created

Additional information

This has also been discussed in RFE-5656
Slack discussion: https://redhat-internal.slack.com/archives/CB3HXM2QK/p1741271929894219

clones

LOG-6862 Cannot create AlertingRule for infrastructure logs without specifying the namespace

Closed

links to

openshift/loki#438: [release-6.2] fix(operator): update webhook validator for alerts/rules

Assignee:: Kabir Bharti

Reporter:: Simon Krenger

Contributors:: Joao Marcal

QA Contact:: Kabir Bharti

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/06/16 3:26 PM

Updated:: 2025/09/13 6:27 PM

Resolved:: 2025/07/24 12:23 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates