-
Bug
-
Resolution: Not a Bug
-
Blocker
-
None
-
Logging 5.7.0, Logging 5.6.z
-
False
-
None
-
False
-
ansible operator, AppSRE, auth, auth n/z, Automated Environment Provisioning, Automated Release, autotuning and scalability, Azure, Bring Your Own Host, chargeback, Cloud, Cluster Operator, console, Continuous Release, core-services, CS-SRE, Developer Experience, DevProductivity Platform, DevProductivity Test Platform, DVO, Etcd, Hosted Service Delivery, installer, Logging, Marketplace, Master, Metering, monitoring, MT-SRE, multi-cluster, Network Edge, Node, olm, OpenShift Documentation, OpenStack as Infra, Operator SDK, operators, os, Pod, Quay, Release Automation, RHCOS, Runtimes
-
NEW
-
OCPSTRAT-507 - RHCOS based on RHEL 9.2
-
VERIFIED
-
-
-
Log Collection - Sprint 234, Log Collection - Sprint 235
Description of problem:
Deploy logging on OCP 4.13, the journal logs are not collected. Vector and fluentd all have this issue.
Logs in fluentd pods:
$ oc logs collector-2t8z5 Defaulted container "collector" out of: collector, logfilesmetricexporter POD_IPS: 10.131.0.32, PROM_BIND_IP: 0.0.0.0 Setting each total_size_limit for 3 buffers to 6841695232 bytes Setting queued_chunks_limit_size for each buffer to 815 Setting chunk_limit_size for each buffer to 8388608 /var/lib/fluentd/pos/journal_pos.json exists, checking if yajl parser able to parse this json file without any error. ruby 2.7.6p219 (2022-04-12 revision c9c2245c0a) [x86_64-linux] RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.900000 (default value: 2.000000) checking if /var/lib/fluentd/pos/journal_pos.json a valid json by calling yajl parser 2023-03-22 02:16:06 +0000 [warn]: '@' is the system reserved prefix. It works in the nested configuration for now but it will be rejected: @timestamp 2023-03-22 02:16:06 +0000 [warn]: '@' is the system reserved prefix. It works in the nested configuration for now but it will be rejected: @timestamp /usr/local/share/gems/gems/fluent-plugin-elasticsearch-5.2.2/lib/fluent/plugin/elasticsearch_compat.rb:8: warning: already initialized constant TRANSPORT_CLASS /usr/local/share/gems/gems/fluent-plugin-elasticsearch-5.2.2/lib/fluent/plugin/elasticsearch_compat.rb:3: warning: previous definition of TRANSPORT_CLASS was here /usr/local/share/gems/gems/fluent-plugin-elasticsearch-5.2.2/lib/fluent/plugin/elasticsearch_compat.rb:25: warning: already initialized constant SELECTOR_CLASS /usr/local/share/gems/gems/fluent-plugin-elasticsearch-5.2.2/lib/fluent/plugin/elasticsearch_compat.rb:20: warning: previous definition of SELECTOR_CLASS was here 2023-03-22 02:16:08 +0000 [warn]: For security reason, setting private_key_passphrase is recommended when cert_path is specified
There are lots of warning message in vector pod:
$ oc logs -c collector collector-lq9xz 2023-03-22T02:39:38.982492Z WARN vector::config::loading: Transform "route_container_logs._unmatched" has no consumers Journal file /var/log/journal/ec23f73e7f0fbe1a217e0f0640844695/system.journal uses an unsupported feature, ignoring file. Use SYSTEMD_LOG_LEVEL=debug journalctl --file=/var/log/journal/ec23f73e7f0fbe1a217e0f0640844695/system.journal to see the details. Data from the specified boot (+0) is not available: No such boot ID in journal 2023-03-22T02:39:39.048312Z WARN source{component_kind="source" component_id=raw_journal_logs component_type=journald component_name=raw_journal_logs}: vector::sources::journald: Journalctl process stopped. ...... Journal file /var/log/journal/ec23f73e7f0fbe1a217e0f0640844695/system.journal uses an unsupported feature, ignoring file. Use SYSTEMD_LOG_LEVEL=debug journalctl --file=/var/log/journal/ec23f73e7f0fbe1a217e0f0640844695/system.journal to see the details. Data from the specified boot (+0) is not available: No such boot ID in journal 2023-03-22T02:42:34.197493Z WARN source{component_kind="source" component_id=raw_journal_logs component_type=journald component_name=raw_journal_logs}: vector::sources::journald: Journalctl process stopped.
$ oc rsh collector-lq9xz Defaulted container "collector" out of: collector, logfilesmetricexporter sh-4.4# SYSTEMD_LOG_LEVEL=debug journalctl --file=/var/log/journal/ec23f73e7f0fbe1a217e0f0640844695/system.journal Journal effective settings seal=no compress=no compress_threshold_bytes=8B Journal file /var/log/journal/ec23f73e7f0fbe1a217e0f0640844695/system.journal has unknown incompatible flags 0x1c Failed to open journal file /var/log/journal/ec23f73e7f0fbe1a217e0f0640844695/system.journal: Protocol not supported mmap cache statistics: 0 hit, 1 miss Failed to open files: Protocol not supported sh-4.4# ls -Rl /var/log/journal/ /var/log/journal/: total 0 drwxr-sr-x+ 2 root systemd-journal 28 Mar 22 00:02 ec23f73e7f0fbe1a217e0f0640844695 /var/log/journal/ec23f73e7f0fbe1a217e0f0640844695: total 16388 -rw-r-----+ 1 root systemd-journal 16777216 Mar 22 03:12 system.journal sh-4.4# id uid=0(root) gid=0(root) groups=0(root)
In OCP 4.13, the RHCOS is upgraded to RHEL 9:
$ oc debug nodes/qe-daily-413-0322-4sdbs-worker-westus-7lskh sh-4.4# chroot /host sh-5.1# cat /etc/redhat-release CentOS Stream CoreOS release 4.13 sh-5.1# cat /etc/os-release NAME="CentOS Stream CoreOS" ID="rhcos" ID_LIKE="rhel fedora" VERSION="413.92.202303190222-0" VERSION_ID="4.13" VARIANT="CoreOS" VARIANT_ID=coreos PLATFORM_ID="platform:el9" PRETTY_NAME="CentOS Stream CoreOS 413.92.202303190222-0 (Plow)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:9coreos" HOME_URL="https://centos.org/" DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.13/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform" REDHAT_BUGZILLA_PRODUCT_VERSION="4.13" REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform" REDHAT_SUPPORT_PRODUCT_VERSION="4.13" OPENSHIFT_VERSION="4.13" RHEL_VERSION="9" OSTREE_VERSION="413.92.202303190222-0"
Version-Release number of selected component (if applicable):
Logging 5.5, 5.6, 5.7
Cluster version: 4.13.0-0.nightly-2023-03-19-052243
How reproducible:
Always
Steps to Reproduce:
- deploy logging on OCP 4.13
- check data in log store
Actual results:
Journal logs are not collected.
Expected results:
Journal logs should be collected.
Additional info:
No issue when testing on OCP 4.12 and prior OCP versions.
- is cloned by
-
LOG-3930 OCP4.13 uses RHCOS based on RHEL9.2, causing logging to fail to collect journal logs.
- Closed
- is related to
-
COS-1926 Move RHCOS to RHEL 9.2 in OCP 4.13
- Closed
- relates to
-
OCPBUGS-12203 Keep systemd journal using LZ4 compression (via new env var)
- Closed
- links to
- mentioned on