-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Logging 5.7.2, Logging 5.7.3
-
False
-
None
-
False
-
NEW
-
NEW
-
Bug Fix
-
-
-
-
Log Collection - Sprint 239, Log Collection - Sprint 240
Description of problem:
when vector pods and kube-api-server pods work fine and kube-api-server pod gets restarted for any reason or manually done.
logs from kube-api-server does not get forwarded.
The below logs are seen in the collector pod after kube-api-pod is restarted:
2023-06-28T10:32:50.171854Z WARN vector::internal_events::file::source: Currently ignoring file too small to fingerprint. file=/var/log/pods/openshift-dns_node-resolver-xxxx_xxxx-xxx-xxxx-xxxx-xxxxxxxxxx/dns-node-resolver/1.log 2023-06-28T10:35:01.275308Z ERROR source\{component_kind="source" component_id=raw_container_logs component_type=kubernetes_logs component_name=raw_container_logs}: vector::internal_events::kubernetes_logs: Failed to annotate event with pod metadata. event=Log(LogEvent \{ fields: Object({"file": Bytes(b"/var/log/pods/openshift-kube-apiserver_kube-apiserver-xxxxxx-xxxx-master-0_xxxxxxxxx/kube-apiserver-cert-syncer/0.log"), "message": Bytes(b"2023-06-28T10:34:57.917666754+00:00 stderr F I0628 10:34:57.917576 1 certsync_controller.go:66] Syncing configmaps: [{aggregator-client-ca false} \{client-ca false} \{trusted-ca-bundle true} \{control-plane-node-kubeconfig false} \{check-endpoints-kubeconfig false}]"), "source_type": Bytes(b"kubernetes_logs"), "timestamp": Timestamp(2023-06-28T10:35:01.275153322Z)}), metadata: EventMetadata \{ datadog_api_key: None, splunk_hec_token: None, finalizers: EventFinalizers([]), schema_definition: Definition { collection: Collection { known: {}, unknown: None }, meaning: {}, optional: {} } } }) error_code="annotation_failed" error_type="reader_failed" stage="processing" 2023-06-28T10:35:01.275607Z ERROR source\{component_kind="source" component_id=raw_container_logs component_type=kubernetes_logs component_name=raw_container_logs}: vector::internal_events::kubernetes_logs: Failed to annotate event with pod metadata. event=Log(LogEvent \{ fields: Object({"file": Bytes(b"/var/log/pods/openshift-kube-apiserver_kube-apiserver-xxxxx-xxxxx-master-0_xxxxxxxxxxxxxxxx/kube-apiserver-cert-syncer/0.log"), "message": Bytes(b"2023-06-28T10:34:57.918288336+00:00 stderr F I0628 10:34:57.918238 1 certsync_controller.go:170] Syncing secrets: [{aggregator-client false} \{localhost-serving-cert-certkey false} \{service-network-serving-certkey false} \{external-loadbalancer-serving-certkey false} \{internal-loadbalancer-serving-certkey false} \{bound-service-account-signing-key false} \{control-plane-node-admin-client-cert-key false} \{check-endpoints-client-cert-key false} \{kubelet-client false} \{node-kubeconfigs false} \{user-serving-cert true} \{user-serving-cert-000 true} \{user-serving-cert-001 true} \{user-serving-cert-002 true} \{user-serving-cert-003 true} \{user-serving-cert-004 true} \{user-serving-cert-005 true} \{user-serving-cert-006 true} \{user-serving-cert-007 true} \{user-serving-cert-008 true} \{user-serving-cert-009 true}]"), "source_type": Bytes(b"kubernetes_logs"), "timestamp": Timestamp(2023-06-28T10:35:01.275588752Z)}), metadata: EventMetadata \{ datadog_api_key: None, splunk_hec_token: None, finalizers: EventFinalizers([]), schema_definition: Definition { collection: Collection { known: {}, unknown: None }, meaning: {}, optional: {} } } }) error_code="annotation_failed" error_type="reader_failed" stage="processing" 2023-06-28T10:35:01.277228Z ERROR transform\{component_kind="transform" component_id=route_container_logs component_type=route component_name=route_container_logs}: vector::internal_events::conditions: VRL condition execution failed. error=function call error for "starts_with" at (3:51): expected string, got null internal_log_rate_secs=120 error_type="script_failed" stage="processing" 2023-06-28T10:35:01.277279Z ERROR transform\{component_kind="transform" component_id=route_container_logs component_type=route component_name=route_container_logs}: vector::internal_events::conditions: Internal log [VRL condition execution failed.] is being rate limited. {quote} Whenever kube-api-server pod gets restarted, its logs stop getting forwarded until we restart the vector pod. this happens randomly with any one of the 3 kube-api-server pods h4. Version-Release number of selected component (if applicable): h4. How reproducible: It is reproducible. h4. Steps to Reproduce: # Vector collector pods and kube-api-server pods working fine. # Test that logs from kube-api-server are being forwarded. {code:java} $ oc rsh kube-apiserver-xxxxxxx-master-0 bash -c "echo $(date -Ins) WARN BLATEST blatest bla |tee -a /proc/1/fd/1"
- Restart kube-api-server pod and repeat step 2.
Actual results:
Logs from kube-api-server pods are not getting forwarded
Expected results:
Logs from kube-api-server should get forwarded.
Additional info:
- This issue happens randomly with any one of the 3 kube-api-server pods.
- This issue is independent of where logs are been forwarded.
- Below scenarios have been tested.
- OCP version 4.13.4, logging version 5.7.3, and logs being forwarded to Loki stack.
- OCP version 4.12.13, logging version 5.7.2, and logs being forwarded to Kibana.
- OCP version 4.12.21, logging version 5.7.2, and logs being forwarded to Splunk.
- Workaround for this issue is to restart the corresponding collector pod of the node whose kube-api-server pod caused the issue.