[LOG-5397] [release-5.9] Falling back to use AWS S3 without any warning when the endpoint is not with https in front - Red Hat Issue Tracker

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: Logging 5.9.1
Affects Version/s: Logging 5.8.6
Component/s: Log Storage
Labels:
- devel_ack+

Blocked:
False
Blocked Reason:
None
Ready:
False
Docs QE Status:
NEW
QE Status:
VERIFIED
Release Note Text:
Before this update, the Loki Operator did not validate the correctness of the S3 endpoint URL format that is used in the storage secret. After this update, the S3 endpoint URL goes through a validation step that reflects on the status of the LokiStack.
Release Note Type:
Bug Fix
Git Pull Request:
https://github.com/openshift/loki/pull/286
Intelligence Requested:
Market:

Sprint:
Log Storage - Sprint 252
Severity:
Moderate

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

When Loki is installed and defined for example an `endpoint` where the https is not in front of the destination, as:

rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc

instead of:

 https://rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc

The loki pods accessing to the object storage fail to use the AWS S3 buckets instead of the using the object storage set and not any error is indicated of this failback to the AWS S3 in the loki pods.

Version-Release number of selected component (if applicable):

$ oc get csv -n openshift-logging
NAME                     DISPLAY                     VERSION   REPLACES                 PHASE
cluster-logging.v5.8.3   Red Hat OpenShift Logging   5.8.3     cluster-logging.v5.8.2   Succeeded
loki-operator.v5.8.3     Loki Operator               5.8.3     loki-operator.v5.8.2     Succeeded

How reproducible:

Always

Steps to Reproduce:

Create the Loki secret similar to the one below where the endpoint is not using in front `https://` pointing to ODF endpoint

$ cat logging-loki-s3-secret.yaml 
apiVersion: v1
stringData:
  access_key_id: <access_key_id>
  access_key_secret: <access_key_secret>
  bucketnames: <bucketname>
  endpoint: rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc
  region: eu-central-1
kind: Secret
metadata:
  name: logging-loki-s3
  namespace: openshift-logging
type: Opaque

Create a lokistack CR similar to:

kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  size: 1x.demo
  storage:
    schemas:
    - version: v12
      effectiveDate: '2023-12-16'
    secret:
      name: logging-loki-s3
      type: s3
  storageClassName: gp3-csi
  tenants:
    mode: openshift-logging

Actual results:

The Loki configmap contains the object storage set: `rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc`:

$ oc get cm logging-loki-config -o yaml -n openshift-logging |grep s3:
        s3:
          s3: rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc

The LokiStack pods accessing to the Object storage fail to access to the endpoint defined without any message warning or failover message and try to send the objects to AWS S3:

Messages in the ingester failing to be delivered:

 oc logs logging-loki-ingester-0 |grep -i "failed to flush chunks: store put chunk: NoSuchBucket" |head -2
level=error ts=2024-02-12T19:08:59.134678495Z caller=flush.go:143 org_id=infrastructure msg="failed to flush" err="failed to flush chunks: store put chunk: NoSuchBucket: The specified bucket does not exist\n\tstatus code: 404, request id: J0JK77Q0QD43JGYZ, host id: qPQL8qae1ZN6ukM70TYRCY4/EV2JHEMWZvRkpNt/LBDdNRAbBA2p0kPddfuRVq02SaMxjxRbuMw=, num_chunks: 1, labels: {kubernetes_container_name=\"registry-server\", kubernetes_host=\"ip-10-0-x-x.eu-central-1.compute.internal\", kubernetes_namespace_name=\"openshift-marketplace\", kubernetes_pod_name=\"certified-operators-skt7l\", log_type=\"infrastructure\"}"
level=error ts=2024-02-12T19:08:59.137049536Z caller=flush.go:143 org_id=infrastructure msg="failed to flush" err="failed to flush chunks: store put chunk: NoSuchBucket: The specified bucket does not exist\n\tstatus code: 404, request id: J0JSGEVNK1FW3EYP, host id: Hwv7CcV9ptGEYYoyxQMANZ4k7Sf0FUKq+b2ZZQ6IhvPCFtx3cJJDrGFMpENwVqft7Vo3zuGRhEc=, num_chunks: 1, labels: {kubernetes_container_name=\"kube-controller-manager-cert-syncer\", kubernetes_host=\"ip-10-0-x-x.eu-central-1.compute.internal\", kubernetes_namespace_name=\"openshift-kube-controller-manager\", kubernetes_pod_name=\"kube-controller-manager-ip-10-0-x-x.eu-central-1.compute.internal\", log_type=\"infrastructure\"}"

Review in a tcpdump that resolving to AWS S3 and not the endpoint defined:

/// Get the node were running the ingester
$ oc get pods -o wide -n openshift-logging|grep ingester
logging-loki-ingester-0                        0/1     Running   0          8h    10.129.2.29    ip-10-0-x-x.eu-central-1.compute.internal     <none>           <none>

/// Follow https://access.redhat.com/solutions/4569211 to collect tcpdump for the loki ingester pod and observe as trying to resolve an AWS S3 DNS entries to send the object storage logs:
#  nsenter $nsenter_parameters -- tcpdump -nn -i eth0   |grep -i amazonaws |head -5
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
19:09:22.164148 IP 10.129.x.x.47817 > 172.30.0.10.53: 16853+ [1au] AAAA? s3.eu-central-1.amazonaws.com.svc.cluster.local. (76)
19:09:22.164163 IP 10.129.x.x.33561 > 172.30.0.10.53: 57000+ [1au] A? s3.eu-central-1.amazonaws.com.svc.cluster.local. (76)
19:09:22.164404 IP 10.129.x.x.33561 > 172.30.0.10.53: 14295+ [1au] AAAA? s3.eu-central-1.amazonaws.com.cluster.local. (72)
19:09:22.164427 IP 10.129.2.29.38220 > 172.30.0.10.53: 46362+ [1au] A? s3.eu-central-1.amazonaws.com.cluster.local. (72)
19:09:22.164619 IP 10.129.x.x.33561 > 172.30.0.10.53: 14337+ [1au] A? s3.eu-central-1.amazonaws.com.eu-central-1.compute.internal. (88)

 nsenter $nsenter_parameters -- tcpdump -nn -i eth0   |grep -i rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc

Expected results:

See an entry in the Loki pods using object storage indicating that bad formatted the endpoint and for this reason failing back to use AWS S3.

Also, it would improve the user experience if as the endpoint is not well formatted and falling back to AWS S3, in the Lokistack CR status is indicated an entry indicating this logic followed.

Workaround or permanent fix

Set the endpoing with `https://` as documented in "Installing log storage"

clones

LOG-5070 Falling back to use AWS S3 without any warning when the endpoint is not with https in front

Closed

links to

[KCS] LokiStack falls back to AWS S3 in RHOCP 4

RHSA-2024:130615 Logging for Red Hat OpenShift - 5.9.1

Assignee:: Bayan Taani (Inactive)

Reporter:: Oscar Casal Sanchez

QA Contact:: Kabir Bharti

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/04/18 6:54 PM

Updated:: 2024/05/01 7:38 AM

Resolved:: 2024/05/01 7:38 AM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Workaround or permanent fix

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide