-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
Logging 5.9.3
-
False
-
None
-
False
-
NEW
-
NEW
-
-
Hi Team,
We have recently released the new logging 5.9 which is compatible to work with the AWS STS for ROSA.
But after much testing, it is not able to push logs on the S3. The Ingestor pods are continuously reporting the below errors. It is the same problem both in my test cluster as well as in the customer environment.
Errors:
level=error ts=2024-07-02T07:33:12.26966227Z caller=flush.go:143 org_id=infrastructure msg="failed to flush" err="failed to flush chunks: store put chunk: WebIdentityErr: failed to retrieve credentials\ncaused by: ValidationError: Request ARN is invalid\n\tstatus code: 400, request id: ffbf992f-a989-4381-a480-25042885ee36, num_chunks: 1, labels: {kubernetes_container_name=\"osd-delete-backplane-serviceaccounts\", kubernetes_host=\"ip-10-0-137-68.ec2.internal\", kubernetes_namespace_name=\"openshift-backplane\", kubernetes_pod_name=\"osd-delete-backplane-serviceaccounts-28664850-82c2p\", log_type=\"infrastructure\"}"level=info ts=2024-07-02T07:33:12.269686607Z caller=flush.go:167 msg="flushing stream" user=infrastructure fp=6cdf0925a52305a9 immediate=false num_chunks=1 labels="{kubernetes_container_name=\"kube-rbac-proxy\", kubernetes_host=\"ip-10-0-134-70.ec2.internal\", kubernetes_namespace_name=\"openshift-multus\", kubernetes_pod_name=\"network-metrics-daemon-l42zn\", log_type=\"infrastructure\"}"level=error ts=2024-07-02T07:33:12.283183709Z caller=flush.go:143 org_id=infrastructure msg="failed to flush" err="failed to flush chunks: store put chunk: WebIdentityErr: failed to retrieve credentials\ncaused by: ValidationError: Request ARN is invalid\n\tstatus code: 400, request id: 0ac40531-5fe3-4093-b33f-d22e7abe3e9e, num_chunks: 1, labels: {kubernetes_container_name=\"kube-multus\", kubernetes_host=\"ip-10-0-152-179.ec2.internal\", kubernetes_namespace_name=\"openshift-multus\", kubernetes_pod_name=\"multus-7stsj\", log_type=\"infrastructure\"}"level=info ts=2024-07-02T07:33:12.283207051Z caller=flush.go:167 msg="flushing stream" user=infrastructure fp=6fa88b105cfa8728 immediate=false num_chunks=1 labels="{kubernetes_container_name=\"kube-rbac-proxy-crio\", kubernetes_host=\"ip-10-0-134-70.ec2.internal\", kubernetes_namespace_name=\"openshift-machine-config-operator\", kubernetes_pod_name=\"kube-rbac-proxy-crio-ip-10-0-134-70.ec2.internal\", log_type=\"infrastructure\"}"level=error ts=2024-07-02T07:33:12.300822691Z caller=flush.go:143 org_id=infrastructure msg="failed to flush" err="failed to flush chunks: store put chunk: WebIdentityErr: failed to retrieve credentials\ncaused by: ValidationError: Request ARN is invalid\n\tstatus code: 400, request id: 61cca9b5-faad-4abb-9f27-c55d4666e291, num_chunks: 1, labels: {kubernetes_container_name=\"kube-rbac-proxy\", kubernetes_host=\"ip-10-0-134-70.ec2.internal\", kubernetes_namespace_name=\"openshift-multus\", kubernetes_pod_name=\"network-metrics-daemon-l42zn\", log_type=\"infrastructure\"}"level=error ts=2024-07-02T07:33:12.317369873Z caller=flush.go:143 org_id=infrastructure msg="failed to flush" err="failed to flush chunks: store put chunk: WebIdentityErr: failed to retrieve credentials\ncaused by: ValidationError: Request ARN is invalid\n\tstatus code: 400, request id: 1940811c-5f44-4056-bffe-9e6e66bcd5d1, num_chunks: 1, labels: {kubernetes_container_name=\"kube-rbac-proxy-crio\", kubernetes_host=\"ip-10-0-134-70.ec2.internal\", kubernetes_namespace_name=\"openshift-machine-config-operator\", kubernetes_pod_name=\"kube-rbac-proxy-crio-ip-10-0-134-70.ec2.internal\", log_type=\"infrastructure\"}"level=info ts=2024-07-02T07:33:20.719915416Z caller=table_manager.go:136 index-store=tsdb-2022-06-01 msg="uploading tables"
The loki yaml
apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki namespace: openshift-logging spec: size: 1x.extra-small storage: schemas: - version: v12 effectiveDate: '2022-06-01' secret: name: logging-loki-s3 type: s3 credentialMode: token storageClassName: gp3-csi tenants: mode: openshift-logging rules: enabled: true selector: matchLabels: openshift.io/cluster-monitoring: "true" namespaceSelector: matchLabels: openshift.io/cluster-monitoring: "true"
Trust Policy
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::account_name:oidc-provider/oidc.op1.openshiftapps.com/oidc" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "oidc.op1.openshiftapps.com/oidc:sub": [ "system:serviceaccount:openshift-logging:logging-loki", "system:serviceaccount:openshift-logging:logging-loki-ruler" ] } } } ] }
Inline policy: AWS S3 full access
AWS S3 Secret in openshift-logging namespace
(base) [njajodia@fedora ~]$ oc get secret/logging-loki-s3 -oyaml apiVersion: v1 data: audience: c3RzLmFtYXpvbmF3cy5jb20= bucketnames: endpoint: = region: dXMtZWFzdC0x role_arn: YXJuOmF3czppYW06OjYxMDM2NTgzMTU1Nzpyb2xlL2xva2lyb2xlCg== kind: Secret metadata: creationTimestamp: "2024-07-02T02:08:54Z" name: logging-loki-s3 namespace: openshift-logging resourceVersion: "50774" uid: a483aa29-f4f8-4252-9b98-c46bc745f098 type: Opaque (base) [njajodia@fedora ~]$
Secret for the operator in openshift-operators-redhat namespace
(base) [njajodia@fedora ~]$ oc get secret/logging-loki-managed-credentials -n openshift-operators-redhat -oyaml apiVersion: v1 data: bucketnames: region: role_arn: YXJuOmF3czppYW06OjYxMDM2NTgzMTU1Nzpyb2xlL2xva2lyb2xlCg== web_identity_token_file: L3Zhci9ydW4vc2VjcmV0cy9vcGVuc2hpZnQvc2VydmljZWFjY291bnQvdG9rZW4K kind: Secret metadata: creationTimestamp: "2024-07-02T02:13:48Z" name: logging-loki-managed-credentials namespace: openshift-operators-redhat resourceVersion: "64656" uid: e414de26-62af-4aca-9451-90e1909f67df type: Opaque
All the loki pods are up and fine
(base) [njajodia@fedora ~]$ oc get pods
NAME READY STATUS RESTARTS AGE
cluster-logging-operator-5c57c86c9c-p6mf4 1/1 Running 0 4h35m
collector-5vhcr 1/1 Running 0 4h58m
collector-7c8t4 1/1 Running 0 4h58m
collector-k4mwx 1/1 Running 0 4h58m
collector-kkfz6 1/1 Running 0 4h58m
collector-kkr7f 1/1 Running 0 4h5m
collector-ml4wv 1/1 Running 0 4h58m
collector-smt66 1/1 Running 0 4h5m
collector-zkg7g 1/1 Running 0 73m
logging-loki-compactor-0 1/1 Running 0 77m
logging-loki-distributor-696887b4b4-bx4vg 1/1 Running 0 77m
logging-loki-distributor-696887b4b4-cj6x6 1/1 Running 0 77m
logging-loki-gateway-55bf9b9bc5-24kzk 2/2 Running 0 5h23m
logging-loki-gateway-55bf9b9bc5-fwzj9 2/2 Running 0 5h23m
logging-loki-index-gateway-0 1/1 Running 0 76m
logging-loki-index-gateway-1 1/1 Running 0 77m
logging-loki-ingester-0 1/1 Running 0 71m
logging-loki-ingester-1 1/1 Running 0 77m
logging-loki-querier-6546cf4f45-7hcsq 1/1 Running 0 77m
logging-loki-querier-6546cf4f45-ssqp9 1/1 Running 0 76m
logging-loki-query-frontend-59899fbfbc-jq9sg 1/1 Running 0 77m
logging-loki-query-frontend-59899fbfbc-r4rsm 1/1 Running 0 77m
logging-loki-ruler-0 1/1 Running 0 76m
logging-loki-ruler-1 1/1 Running 0 77m
logging-view-plugin-bd5978d6d-lnkn2 1/1 Running 0 3h51m
(base) [njajodia@fedora ~]$
Please let me know if you need any additional details to resolve this issue. The ingestor pods are not able to push logs in the S3.
I followed the below documentation
https://loki-operator.dev/docs/short_lived_tokens_authentication.md/