Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-3832

OCP4.13 uses RHCOS based on RHEL9.2, causing logging to fail to collect journal logs.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Blocker Blocker
    • None
    • Logging 5.7.0, Logging 5.6.z
    • Log Collection
    • False
    • None
    • False
    • ansible operator, AppSRE, auth, auth n/z, Automated Environment Provisioning, Automated Release, autotuning and scalability, Azure, Bring Your Own Host, chargeback, Cloud, Cluster Operator, console, Continuous Release, core-services, CS-SRE, Developer Experience, DevProductivity Platform, DevProductivity Test Platform, DVO, Etcd, Hosted Service Delivery, installer, Logging, Marketplace, Master, Metering, monitoring, MT-SRE, multi-cluster, Network Edge, Node, olm, OpenShift Documentation, OpenStack as Infra, Operator SDK, operators, os, Pod, Quay, Release Automation, RHCOS, Runtimes
    • NEW
    • OCPSTRAT-507 - RHCOS based on RHEL 9.2
    • VERIFIED
    • Log Collection - Sprint 234, Log Collection - Sprint 235

      Description of problem:

      Deploy logging on OCP 4.13, the journal logs are not collected. Vector and fluentd all have this issue. 

      Logs in fluentd pods:

      $ oc logs collector-2t8z5 
      Defaulted container "collector" out of: collector, logfilesmetricexporter
      POD_IPS: 10.131.0.32, PROM_BIND_IP: 0.0.0.0
      Setting each total_size_limit for 3 buffers to 6841695232 bytes
      Setting queued_chunks_limit_size for each buffer to 815
      Setting chunk_limit_size for each buffer to 8388608
      /var/lib/fluentd/pos/journal_pos.json exists, checking if yajl parser able to parse this json file without any error.
      ruby 2.7.6p219 (2022-04-12 revision c9c2245c0a) [x86_64-linux]
      RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.900000 (default value: 2.000000)
      checking if /var/lib/fluentd/pos/journal_pos.json a valid json by calling yajl parser
      2023-03-22 02:16:06 +0000 [warn]: '@' is the system reserved prefix. It works in the nested configuration for now but it will be rejected: @timestamp
      2023-03-22 02:16:06 +0000 [warn]: '@' is the system reserved prefix. It works in the nested configuration for now but it will be rejected: @timestamp
      /usr/local/share/gems/gems/fluent-plugin-elasticsearch-5.2.2/lib/fluent/plugin/elasticsearch_compat.rb:8: warning: already initialized constant TRANSPORT_CLASS
      /usr/local/share/gems/gems/fluent-plugin-elasticsearch-5.2.2/lib/fluent/plugin/elasticsearch_compat.rb:3: warning: previous definition of TRANSPORT_CLASS was here
      /usr/local/share/gems/gems/fluent-plugin-elasticsearch-5.2.2/lib/fluent/plugin/elasticsearch_compat.rb:25: warning: already initialized constant SELECTOR_CLASS
      /usr/local/share/gems/gems/fluent-plugin-elasticsearch-5.2.2/lib/fluent/plugin/elasticsearch_compat.rb:20: warning: previous definition of SELECTOR_CLASS was here
      2023-03-22 02:16:08 +0000 [warn]: For security reason, setting private_key_passphrase is recommended when cert_path is specified 

      There are lots of warning message in vector pod:

      $ oc logs -c collector collector-lq9xz 
      2023-03-22T02:39:38.982492Z  WARN vector::config::loading: Transform "route_container_logs._unmatched" has no consumers
      Journal file /var/log/journal/ec23f73e7f0fbe1a217e0f0640844695/system.journal uses an unsupported feature, ignoring file.
      Use SYSTEMD_LOG_LEVEL=debug journalctl --file=/var/log/journal/ec23f73e7f0fbe1a217e0f0640844695/system.journal to see the details.
      Data from the specified boot (+0) is not available: No such boot ID in journal
      2023-03-22T02:39:39.048312Z  WARN source{component_kind="source" component_id=raw_journal_logs component_type=journald component_name=raw_journal_logs}: vector::sources::journald: Journalctl process stopped.
      ......
      Journal file /var/log/journal/ec23f73e7f0fbe1a217e0f0640844695/system.journal uses an unsupported feature, ignoring file.
      Use SYSTEMD_LOG_LEVEL=debug journalctl --file=/var/log/journal/ec23f73e7f0fbe1a217e0f0640844695/system.journal to see the details.
      Data from the specified boot (+0) is not available: No such boot ID in journal
      2023-03-22T02:42:34.197493Z  WARN source{component_kind="source" component_id=raw_journal_logs component_type=journald component_name=raw_journal_logs}: vector::sources::journald: Journalctl process stopped.

       

      $ oc rsh collector-lq9xz
      Defaulted container "collector" out of: collector, logfilesmetricexporter
      sh-4.4# SYSTEMD_LOG_LEVEL=debug journalctl --file=/var/log/journal/ec23f73e7f0fbe1a217e0f0640844695/system.journal
      Journal effective settings seal=no compress=no compress_threshold_bytes=8B
      Journal file /var/log/journal/ec23f73e7f0fbe1a217e0f0640844695/system.journal has unknown incompatible flags 0x1c
      Failed to open journal file /var/log/journal/ec23f73e7f0fbe1a217e0f0640844695/system.journal: Protocol not supported
      mmap cache statistics: 0 hit, 1 miss
      Failed to open files: Protocol not supported
      sh-4.4# ls -Rl /var/log/journal/
      /var/log/journal/:
      total 0
      drwxr-sr-x+ 2 root systemd-journal 28 Mar 22 00:02 ec23f73e7f0fbe1a217e0f0640844695
      
      
      /var/log/journal/ec23f73e7f0fbe1a217e0f0640844695:
      total 16388
      -rw-r-----+ 1 root systemd-journal 16777216 Mar 22 03:12 system.journal
      sh-4.4# id
      uid=0(root) gid=0(root) groups=0(root) 

      In OCP 4.13, the RHCOS is upgraded to RHEL 9:

      $ oc debug nodes/qe-daily-413-0322-4sdbs-worker-westus-7lskh
      sh-4.4# chroot /host
      sh-5.1# cat /etc/redhat-release 
      CentOS Stream CoreOS release 4.13
      sh-5.1# cat /etc/os-release     
      NAME="CentOS Stream CoreOS"
      ID="rhcos"
      ID_LIKE="rhel fedora"
      VERSION="413.92.202303190222-0"
      VERSION_ID="4.13"
      VARIANT="CoreOS"
      VARIANT_ID=coreos
      PLATFORM_ID="platform:el9"
      PRETTY_NAME="CentOS Stream CoreOS 413.92.202303190222-0 (Plow)"
      ANSI_COLOR="0;31"
      CPE_NAME="cpe:/o:centos:centos:9coreos"
      HOME_URL="https://centos.org/"
      DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.13/"
      BUG_REPORT_URL="https://bugzilla.redhat.com/"
      REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
      REDHAT_BUGZILLA_PRODUCT_VERSION="4.13"
      REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
      REDHAT_SUPPORT_PRODUCT_VERSION="4.13"
      OPENSHIFT_VERSION="4.13"
      RHEL_VERSION="9"
      OSTREE_VERSION="413.92.202303190222-0"

      Version-Release number of selected component (if applicable):

      Logging 5.5, 5.6, 5.7

      Cluster version: 4.13.0-0.nightly-2023-03-19-052243

      How reproducible:

      Always

      Steps to Reproduce:

      1. deploy logging on OCP 4.13
      2. check data in log store

      Actual results:

      Journal logs are not collected.

      Expected results:

      Journal logs should be collected.

      Additional info:

      No issue when testing on OCP 4.12 and prior OCP versions. 

        1. collector-4tcv6.log
          158 kB
        2. screenshot-1.png
          screenshot-1.png
          106 kB
        3. screenshot-2.png
          screenshot-2.png
          113 kB

              jcantril@redhat.com Jeffrey Cantrill
              qitang@redhat.com Qiaoling Tang
              Qiaoling Tang Qiaoling Tang
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated:
                Resolved: