Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-4326

Restart of kube-api-server crashes vector collector pods

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major Major
    • None
    • Logging 5.7.2, Logging 5.7.3
    • Log Collection
    • False
    • None
    • False
    • NEW
    • NEW
    • Bug Fix
    • Hide

      Steps to Reproduce:

      1.  Vector collector pods and kube-api-server pods working fine.
      2.  Test that logs from kube-api-server are being forwarded.
      $ oc rsh kube-apiserver-xxxxxxx-master-0 bash -c "echo $(date -Ins) WARN BLATEST blatest bla |tee -a /proc/1/fd/1"  
      1. Restart kube-api-server pod and repeat step 2.
      Show
      Steps to Reproduce:  Vector collector pods and kube-api-server pods working fine.  Test that logs from kube-api-server are being forwarded. $ oc rsh kube-apiserver-xxxxxxx-master-0 bash -c "echo $(date -Ins) WARN BLATEST blatest bla |tee -a /proc/1/fd/1" Restart kube-api-server pod and repeat step 2.
    • Log Collection - Sprint 239, Log Collection - Sprint 240

      Description of problem:

      when vector pods and kube-api-server pods work fine and kube-api-server pod gets restarted for any reason or manually done.
      logs from kube-api-server does not get forwarded.
      The below logs are seen in the collector pod after kube-api-pod is restarted:  

      2023-06-28T10:32:50.171854Z  WARN vector::internal_events::file::source: Currently ignoring file too small to fingerprint. file=/var/log/pods/openshift-dns_node-resolver-xxxx_xxxx-xxx-xxxx-xxxx-xxxxxxxxxx/dns-node-resolver/1.log
      2023-06-28T10:35:01.275308Z ERROR source\{component_kind="source" component_id=raw_container_logs component_type=kubernetes_logs component_name=raw_container_logs}: vector::internal_events::kubernetes_logs: Failed to annotate event with pod metadata. event=Log(LogEvent \{ fields: Object({"file": Bytes(b"/var/log/pods/openshift-kube-apiserver_kube-apiserver-xxxxxx-xxxx-master-0_xxxxxxxxx/kube-apiserver-cert-syncer/0.log"), "message": Bytes(b"2023-06-28T10:34:57.917666754+00:00 stderr F I0628 10:34:57.917576       1 certsync_controller.go:66] Syncing configmaps: [{aggregator-client-ca false} \{client-ca false} \{trusted-ca-bundle true} \{control-plane-node-kubeconfig false} \{check-endpoints-kubeconfig false}]"), "source_type": Bytes(b"kubernetes_logs"), "timestamp": Timestamp(2023-06-28T10:35:01.275153322Z)}), metadata: EventMetadata \{ datadog_api_key: None, splunk_hec_token: None, finalizers: EventFinalizers([]), schema_definition: Definition { collection: Collection { known: {}, unknown: None }, meaning: {}, optional: {} } } }) error_code="annotation_failed" error_type="reader_failed" stage="processing"
      2023-06-28T10:35:01.275607Z ERROR source\{component_kind="source" component_id=raw_container_logs component_type=kubernetes_logs component_name=raw_container_logs}: vector::internal_events::kubernetes_logs: Failed to annotate event with pod metadata. event=Log(LogEvent \{ fields: Object({"file": Bytes(b"/var/log/pods/openshift-kube-apiserver_kube-apiserver-xxxxx-xxxxx-master-0_xxxxxxxxxxxxxxxx/kube-apiserver-cert-syncer/0.log"), "message": Bytes(b"2023-06-28T10:34:57.918288336+00:00 stderr F I0628 10:34:57.918238       1 certsync_controller.go:170] Syncing secrets: [{aggregator-client false} \{localhost-serving-cert-certkey false} \{service-network-serving-certkey false} \{external-loadbalancer-serving-certkey false} \{internal-loadbalancer-serving-certkey false} \{bound-service-account-signing-key false} \{control-plane-node-admin-client-cert-key false} \{check-endpoints-client-cert-key false} \{kubelet-client false} \{node-kubeconfigs false} \{user-serving-cert true} \{user-serving-cert-000 true} \{user-serving-cert-001 true} \{user-serving-cert-002 true} \{user-serving-cert-003 true} \{user-serving-cert-004 true} \{user-serving-cert-005 true} \{user-serving-cert-006 true} \{user-serving-cert-007 true} \{user-serving-cert-008 true} \{user-serving-cert-009 true}]"), "source_type": Bytes(b"kubernetes_logs"), "timestamp": Timestamp(2023-06-28T10:35:01.275588752Z)}), metadata: EventMetadata \{ datadog_api_key: None, splunk_hec_token: None, finalizers: EventFinalizers([]), schema_definition: Definition { collection: Collection { known: {}, unknown: None }, meaning: {}, optional: {}
      } } }) error_code="annotation_failed" error_type="reader_failed" stage="processing"
      2023-06-28T10:35:01.277228Z ERROR transform\{component_kind="transform" component_id=route_container_logs component_type=route component_name=route_container_logs}: vector::internal_events::conditions: VRL condition execution failed. error=function call error for "starts_with" at (3:51): expected string, got null internal_log_rate_secs=120 error_type="script_failed" stage="processing"
      2023-06-28T10:35:01.277279Z ERROR transform\{component_kind="transform" component_id=route_container_logs component_type=route component_name=route_container_logs}: vector::internal_events::conditions: Internal log [VRL condition execution failed.] is being rate limited. 
      {quote}
      
      Whenever kube-api-server pod gets restarted, its logs stop getting forwarded until we restart the vector pod.
      this happens randomly with any one of the 3 kube-api-server pods
      h4. Version-Release number of selected component (if applicable):
      h4. How reproducible:
      
      It is reproducible.
      h4. Steps to Reproduce:
       #  Vector collector pods and kube-api-server pods working fine.
       #  Test that logs from kube-api-server are being forwarded.
      
      {code:java}
      $ oc rsh kube-apiserver-xxxxxxx-master-0 bash -c "echo $(date -Ins) WARN BLATEST blatest bla |tee -a /proc/1/fd/1"  
      1. Restart kube-api-server pod and repeat step 2.

            

      Actual results:

      Logs from kube-api-server pods are not getting forwarded

      Expected results:

      Logs from kube-api-server should get forwarded.

      Additional info:

      • This issue happens randomly with any one of the 3 kube-api-server pods.
      • This issue is independent of where logs are been forwarded.
      • Below scenarios have been tested.
        •  OCP version 4.13.4, logging version 5.7.3, and logs being forwarded to Loki stack.
        • OCP version 4.12.13, logging version 5.7.2, and logs being forwarded to Kibana.
        • OCP version 4.12.21, logging version 5.7.2, and logs being forwarded to Splunk.
      • Workaround for this issue is to restart the corresponding collector pod of the node whose kube-api-server pod caused the issue.

              vparfono Vitalii Parfonov
              rhn-support-amanverm Aman Dev Verma
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: