Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-4884

x509 certificate verification failure on collector while forwarding to Loki on IPv6 cluster

    • False
    • None
    • False
    • NEW
    • VERIFIED
    • Hide
      Without this update, collector could fail with a x509 certificate verification failure when forwarding logs to Loki on an IPv6 cluster. This update explicitly sets the environment variable KUBERNETES_SERVICE_HOST to 'kubernetes.default.svc' to resolve this issue.
      Show
      Without this update, collector could fail with a x509 certificate verification failure when forwarding logs to Loki on an IPv6 cluster. This update explicitly sets the environment variable KUBERNETES_SERVICE_HOST to 'kubernetes.default.svc' to resolve this issue.
    • Release Note Not Required

      Description:

      x509 certificate verification failure seen on collector pods (vector) when logs are forwarded to default logStore Loki on single stack IPv6 cluster.

      No such error seen on Log 5.8.1 on IPv6 cluster.

      CLO Image: registry.redhat.io/openshift-logging/cluster-logging-rhel9-operator@sha256:91964d8a9c1395fec7b120a7214f5b9b0361d1eb3f09f39ccededda8d78144c6

      Logs:

      2023-12-13T10:28:07.381012Z ERROR kube_client::client::builder: failed with error error trying to connect: error:0A000086:SSL routines:tls_post_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1889:: hostname mismatch
      2023-12-13T10:28:07.381043Z  WARN vector::kubernetes::reflector: Watcher Stream received an error. Retrying. error=InitialListFailed(HyperError(hyper::Error(Connect, ConnectError { error: Error { code: ErrorCode(1), cause: Some(Ssl(ErrorStack([Error
      { code: 167772294, library: "SSL routines", function: "tls_post_process_server_certificate", reason: "certificate verify failed", file: "ssl/statem/statem_clnt.c", line: 1889 }
      ]))) }, verify_result: X509VerifyResult { code: 62, error: "hostname mismatch" } })))
      

      How reproducible: Always

      Steps to reproduce:
      1) Deploy CLO and LO v5.9
      2) Forward logs to Loki using vector
      3) Observe collector pod logs

      Actual Result: X509 certificate failure seen on collector pods logs

      Expected Result: Logs should be forwarded to Loki without errors

      Additional Info:

      $ oc get csv
      NAME                     DISPLAY                     VERSION   REPLACES   PHASE
      cluster-logging.v5.9.0   Red Hat OpenShift Logging   5.9.0                Succeeded
      loki-operator.v5.9.0     Loki Operator               5.9.0                Succeeded
      

      LokiStack:

      apiVersion: loki.grafana.com/v1
      kind: LokiStack
      metadata:
        name: lokistack-sample
      spec:
        hashRing:
          memberlist:
            enableIPv6: true
          type: memberlist
        managementState: Managed
        size: 1x.demo
        storage:
          secret:
            name: s3-secret
            type: s3
        storageClassName: nfs
        tenants:
          mode: openshift-logging
        rules:
          enabled: true
          namespaceSelector:
            matchLabels:
              openshift.io/cluster-monitoring: "true"
          selector:
            matchLabels:
              openshift.io/cluster-monitoring: "true"

      CLF:

      apiVersion: logging.openshift.io/v1
      kind: ClusterLogForwarder
      metadata:
        name: instance
        namespace: openshift-logging
      spec:
        pipelines:
          - inputRefs:
              - application
            name: all-logs-to-lokistack
            outputRefs:
              - default
      status:
        conditions:
          - lastTransitionTime: '2023-12-13T10:33:48Z'
            status: 'True'
            type: Ready

      vector.toml attached

            [LOG-4884] x509 certificate verification failure on collector while forwarding to Loki on IPv6 cluster

            rhn-support-kbharti Added release notes, but not sure do we need it, because no error on 5.8
            cc: jcantril@redhat.com

            Vitalii Parfonov added a comment - rhn-support-kbharti Added release notes, but not sure do we need it, because no error on 5.8 cc: jcantril@redhat.com

            vparfono Could you please update release notes if applicable?

            Kabir Bharti added a comment - vparfono Could you please update release notes if applicable?

            CPaaS Service Account mentioned this issue in a merge request of openshift-logging / Log Collection Midstream on branch openshift-logging-5.9-rhel-9_upstream_80d8650ce9aa2671f7a4d4d299282f19:

            Updated US source to: a9916c6 LOG-4884: explicitly set env variable KUBERNETES_SERVICE_HOST=kubernetes.default.svc

            GitLab CEE Bot added a comment - CPaaS Service Account mentioned this issue in a merge request of openshift-logging / Log Collection Midstream on branch openshift-logging-5.9-rhel-9_ upstream _80d8650ce9aa2671f7a4d4d299282f19 : Updated US source to: a9916c6 LOG-4884 : explicitly set env variable KUBERNETES_SERVICE_HOST=kubernetes.default.svc

            This issue requires Release Notes Text. Please modify the Release Note Text or set the Release Note Type to "None"

            Jeffrey Cantrill added a comment - This issue requires Release Notes Text. Please modify the Release Note Text or set the Release Note Type to "None"

            Looks like it and per the issue is related to OpenSSL and Rust. Suggests a resolution of:

            Fixed it by explicitly adding env variable KUBERNETES_SERVICE_HOST to values.yaml:

            env:

            • name: KUBERNETES_SERVICE_HOST
              value: "kubernetes.default.svc"

            or waiting for it to be fixed in the upstream. I can not state if we will be able to consume the fix until we bump vector

            Jeffrey Cantrill added a comment - Looks like it and per the issue is related to OpenSSL and Rust. Suggests a resolution of: Fixed it by explicitly adding env variable KUBERNETES_SERVICE_HOST to values.yaml: env: name: KUBERNETES_SERVICE_HOST value: "kubernetes.default.svc" or waiting for it to be fixed in the upstream. I can not state if we will be able to consume the fix until we bump vector

            Vitalii Parfonov added a comment - jcantril@redhat.com rh-ee-calee Can be related to https://github.com/vectordotdev/vector/issues/17679  

            Jeffrey Cantrill added a comment - - edited

            rh-ee-calee do you know if LOG-4852 was ported forward? Could this be the same issue? Could the change have been lost in a 5.9 refactor?

            Jeffrey Cantrill added a comment - - edited rh-ee-calee do you know if LOG-4852 was ported forward? Could this be the same issue? Could the change have been lost in a 5.9 refactor?

              vparfono Vitalii Parfonov
              rhn-support-kbharti Kabir Bharti
              Kabir Bharti Kabir Bharti
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: