-
Bug
-
Resolution: Done-Errata
-
Normal
-
Logging 5.8.6
-
False
-
None
-
False
-
NEW
-
NEW
-
Before this update, the Loki Operator did not validate the correctness of the S3 endpoint URL format that is used in the storage secret. After this update, the S3 endpoint URL goes through a validation step that reflects on the status of the LokiStack.
-
Bug Fix
-
-
-
Log Storage - Sprint 252
-
Moderate
Description of problem:
When Loki is installed and defined for example an `endpoint` where the https is not in front of the destination, as:
rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc
instead of:
https://rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc
The loki pods accessing to the object storage fail to use the AWS S3 buckets instead of the using the object storage set and not any error is indicated of this failback to the AWS S3 in the loki pods.
Version-Release number of selected component (if applicable):
$ oc get csv -n openshift-logging NAME DISPLAY VERSION REPLACES PHASE cluster-logging.v5.8.3 Red Hat OpenShift Logging 5.8.3 cluster-logging.v5.8.2 Succeeded loki-operator.v5.8.3 Loki Operator 5.8.3 loki-operator.v5.8.2 Succeeded
How reproducible:
Always
Steps to Reproduce:
Create the Loki secret similar to the one below where the endpoint is not using in front `https://` pointing to ODF endpoint
$ cat logging-loki-s3-secret.yaml apiVersion: v1 stringData: access_key_id: <access_key_id> access_key_secret: <access_key_secret> bucketnames: <bucketname> endpoint: rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc region: eu-central-1 kind: Secret metadata: name: logging-loki-s3 namespace: openshift-logging type: Opaque
Create a lokistack CR similar to:
kind: LokiStack
metadata:
name: logging-loki
namespace: openshift-logging
spec:
size: 1x.demo
storage:
schemas:
- version: v12
effectiveDate: '2023-12-16'
secret:
name: logging-loki-s3
type: s3
storageClassName: gp3-csi
tenants:
mode: openshift-logging
Actual results:
The Loki configmap contains the object storage set: `rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc`:
$ oc get cm logging-loki-config -o yaml -n openshift-logging |grep s3: s3: s3: rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc
The LokiStack pods accessing to the Object storage fail to access to the endpoint defined without any message warning or failover message and try to send the objects to AWS S3:
Messages in the ingester failing to be delivered:
oc logs logging-loki-ingester-0 |grep -i "failed to flush chunks: store put chunk: NoSuchBucket" |head -2 level=error ts=2024-02-12T19:08:59.134678495Z caller=flush.go:143 org_id=infrastructure msg="failed to flush" err="failed to flush chunks: store put chunk: NoSuchBucket: The specified bucket does not exist\n\tstatus code: 404, request id: J0JK77Q0QD43JGYZ, host id: qPQL8qae1ZN6ukM70TYRCY4/EV2JHEMWZvRkpNt/LBDdNRAbBA2p0kPddfuRVq02SaMxjxRbuMw=, num_chunks: 1, labels: {kubernetes_container_name=\"registry-server\", kubernetes_host=\"ip-10-0-x-x.eu-central-1.compute.internal\", kubernetes_namespace_name=\"openshift-marketplace\", kubernetes_pod_name=\"certified-operators-skt7l\", log_type=\"infrastructure\"}" level=error ts=2024-02-12T19:08:59.137049536Z caller=flush.go:143 org_id=infrastructure msg="failed to flush" err="failed to flush chunks: store put chunk: NoSuchBucket: The specified bucket does not exist\n\tstatus code: 404, request id: J0JSGEVNK1FW3EYP, host id: Hwv7CcV9ptGEYYoyxQMANZ4k7Sf0FUKq+b2ZZQ6IhvPCFtx3cJJDrGFMpENwVqft7Vo3zuGRhEc=, num_chunks: 1, labels: {kubernetes_container_name=\"kube-controller-manager-cert-syncer\", kubernetes_host=\"ip-10-0-x-x.eu-central-1.compute.internal\", kubernetes_namespace_name=\"openshift-kube-controller-manager\", kubernetes_pod_name=\"kube-controller-manager-ip-10-0-x-x.eu-central-1.compute.internal\", log_type=\"infrastructure\"}"
Review in a tcpdump that resolving to AWS S3 and not the endpoint defined:
/// Get the node were running the ingester $ oc get pods -o wide -n openshift-logging|grep ingester logging-loki-ingester-0 0/1 Running 0 8h 10.129.2.29 ip-10-0-x-x.eu-central-1.compute.internal <none> <none> /// Follow https://access.redhat.com/solutions/4569211 to collect tcpdump for the loki ingester pod and observe as trying to resolve an AWS S3 DNS entries to send the object storage logs: # nsenter $nsenter_parameters -- tcpdump -nn -i eth0 |grep -i amazonaws |head -5 dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 19:09:22.164148 IP 10.129.x.x.47817 > 172.30.0.10.53: 16853+ [1au] AAAA? s3.eu-central-1.amazonaws.com.svc.cluster.local. (76) 19:09:22.164163 IP 10.129.x.x.33561 > 172.30.0.10.53: 57000+ [1au] A? s3.eu-central-1.amazonaws.com.svc.cluster.local. (76) 19:09:22.164404 IP 10.129.x.x.33561 > 172.30.0.10.53: 14295+ [1au] AAAA? s3.eu-central-1.amazonaws.com.cluster.local. (72) 19:09:22.164427 IP 10.129.2.29.38220 > 172.30.0.10.53: 46362+ [1au] A? s3.eu-central-1.amazonaws.com.cluster.local. (72) 19:09:22.164619 IP 10.129.x.x.33561 > 172.30.0.10.53: 14337+ [1au] A? s3.eu-central-1.amazonaws.com.eu-central-1.compute.internal. (88) nsenter $nsenter_parameters -- tcpdump -nn -i eth0 |grep -i rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc
Expected results:
See an entry in the Loki pods using object storage indicating that bad formatted the endpoint and for this reason failing back to use AWS S3.
Also, it would improve the user experience if as the endpoint is not well formatted and falling back to AWS S3, in the Lokistack CR status is indicated an entry indicating this logic followed.
Workaround or permanent fix
Set the endpoing with `https://` as documented in "Installing log storage"
- clones
-
LOG-5070 Falling back to use AWS S3 without any warning when the endpoint is not with https in front
- Closed
- links to
-
RHSA-2024:129898 Logging for Red Hat OpenShift - 5.7.13