Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-5399

[release-5.7] Falling back to use AWS S3 without any warning when the endpoint is not with https in front

XMLWordPrintable

    • False
    • None
    • False
    • NEW
    • NEW
    • Before this update, the Loki Operator did not validate the correctness of the S3 endpoint URL format that is used in the storage secret. After this update, the S3 endpoint URL goes through a validation step that reflects on the status of the LokiStack.
    • Bug Fix
    • Log Storage - Sprint 252
    • Moderate

      Description of problem:

      When Loki is installed and defined for example an `endpoint` where the https is not in front of the destination, as:

      rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc 

      instead of:

       https://rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc

      The loki pods accessing to the object storage fail to use the AWS S3 buckets instead of the using the object storage set and not any error is indicated of this failback to the AWS S3 in the loki pods.

      Version-Release number of selected component (if applicable):

      $ oc get csv -n openshift-logging
      NAME                     DISPLAY                     VERSION   REPLACES                 PHASE
      cluster-logging.v5.8.3   Red Hat OpenShift Logging   5.8.3     cluster-logging.v5.8.2   Succeeded
      loki-operator.v5.8.3     Loki Operator               5.8.3     loki-operator.v5.8.2     Succeeded
      

      How reproducible:

      Always

      Steps to Reproduce:

      Create the Loki secret similar to the one below where the endpoint is not using in front `https://` pointing to ODF endpoint

      $ cat logging-loki-s3-secret.yaml 
      apiVersion: v1
      stringData:
        access_key_id: <access_key_id>
        access_key_secret: <access_key_secret>
        bucketnames: <bucketname>
        endpoint: rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc
        region: eu-central-1
      kind: Secret
      metadata:
        name: logging-loki-s3
        namespace: openshift-logging
      type: Opaque
      

      Create a lokistack CR similar to:

      kind: LokiStack
      metadata:
        name: logging-loki
        namespace: openshift-logging
      spec:
        size: 1x.demo
        storage:
          schemas:
          - version: v12
            effectiveDate: '2023-12-16'
          secret:
            name: logging-loki-s3
            type: s3
        storageClassName: gp3-csi
        tenants:
          mode: openshift-logging
      

      Actual results:

      The Loki configmap contains the object storage set: `rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc`:

      $ oc get cm logging-loki-config -o yaml -n openshift-logging |grep s3:
              s3:
                s3: rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc
      

      The LokiStack pods accessing to the Object storage fail to access to the endpoint defined without any message warning or failover message and try to send the objects to AWS S3:

      Messages in the ingester failing to be delivered:

       oc logs logging-loki-ingester-0 |grep -i "failed to flush chunks: store put chunk: NoSuchBucket" |head -2
      level=error ts=2024-02-12T19:08:59.134678495Z caller=flush.go:143 org_id=infrastructure msg="failed to flush" err="failed to flush chunks: store put chunk: NoSuchBucket: The specified bucket does not exist\n\tstatus code: 404, request id: J0JK77Q0QD43JGYZ, host id: qPQL8qae1ZN6ukM70TYRCY4/EV2JHEMWZvRkpNt/LBDdNRAbBA2p0kPddfuRVq02SaMxjxRbuMw=, num_chunks: 1, labels: {kubernetes_container_name=\"registry-server\", kubernetes_host=\"ip-10-0-x-x.eu-central-1.compute.internal\", kubernetes_namespace_name=\"openshift-marketplace\", kubernetes_pod_name=\"certified-operators-skt7l\", log_type=\"infrastructure\"}"
      level=error ts=2024-02-12T19:08:59.137049536Z caller=flush.go:143 org_id=infrastructure msg="failed to flush" err="failed to flush chunks: store put chunk: NoSuchBucket: The specified bucket does not exist\n\tstatus code: 404, request id: J0JSGEVNK1FW3EYP, host id: Hwv7CcV9ptGEYYoyxQMANZ4k7Sf0FUKq+b2ZZQ6IhvPCFtx3cJJDrGFMpENwVqft7Vo3zuGRhEc=, num_chunks: 1, labels: {kubernetes_container_name=\"kube-controller-manager-cert-syncer\", kubernetes_host=\"ip-10-0-x-x.eu-central-1.compute.internal\", kubernetes_namespace_name=\"openshift-kube-controller-manager\", kubernetes_pod_name=\"kube-controller-manager-ip-10-0-x-x.eu-central-1.compute.internal\", log_type=\"infrastructure\"}"
      

      Review in a tcpdump that resolving to AWS S3 and not the endpoint defined:

      /// Get the node were running the ingester
      $ oc get pods -o wide -n openshift-logging|grep ingester
      logging-loki-ingester-0                        0/1     Running   0          8h    10.129.2.29    ip-10-0-x-x.eu-central-1.compute.internal     <none>           <none>
      
      /// Follow https://access.redhat.com/solutions/4569211 to collect tcpdump for the loki ingester pod and observe as trying to resolve an AWS S3 DNS entries to send the object storage logs:
      #  nsenter $nsenter_parameters -- tcpdump -nn -i eth0   |grep -i amazonaws |head -5
      dropped privs to tcpdump
      tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
      listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
      19:09:22.164148 IP 10.129.x.x.47817 > 172.30.0.10.53: 16853+ [1au] AAAA? s3.eu-central-1.amazonaws.com.svc.cluster.local. (76)
      19:09:22.164163 IP 10.129.x.x.33561 > 172.30.0.10.53: 57000+ [1au] A? s3.eu-central-1.amazonaws.com.svc.cluster.local. (76)
      19:09:22.164404 IP 10.129.x.x.33561 > 172.30.0.10.53: 14295+ [1au] AAAA? s3.eu-central-1.amazonaws.com.cluster.local. (72)
      19:09:22.164427 IP 10.129.2.29.38220 > 172.30.0.10.53: 46362+ [1au] A? s3.eu-central-1.amazonaws.com.cluster.local. (72)
      19:09:22.164619 IP 10.129.x.x.33561 > 172.30.0.10.53: 14337+ [1au] A? s3.eu-central-1.amazonaws.com.eu-central-1.compute.internal. (88)
      
       nsenter $nsenter_parameters -- tcpdump -nn -i eth0   |grep -i rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc
      

      Expected results:

      See an entry in the Loki pods using object storage indicating that bad formatted the endpoint and for this reason failing back to use AWS S3.

      Also, it would improve the user experience if as the endpoint is not well formatted and falling back to AWS S3, in the Lokistack CR status is indicated an entry indicating this logic followed.

      Workaround or permanent fix

      Set the endpoing with `https://` as documented in "Installing log storage"

              btaani@redhat.com Bayan Taani
              rhn-support-ocasalsa Oscar Casal Sanchez
              Anping Li Anping Li
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: