Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-5744

Loki is not working with ROSA AWS STS.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Major Major
    • None
    • Logging 5.9.3
    • Log Storage
    • False
    • None
    • False
    • NEW
    • NEW

      Hi Team,

      We have recently released the new logging 5.9 which is compatible to work with the AWS STS for ROSA.

      But after much testing, it is not able to push logs on the S3. The Ingestor pods are continuously reporting the below errors. It is the same problem both in my test cluster as well as in the customer environment.

      Errors:

      level=error
       ts=2024-07-02T07:33:12.26966227Z caller=flush.go:143 
      org_id=infrastructure msg="failed to flush" err="failed to flush chunks:
       store put chunk: WebIdentityErr: failed to retrieve credentials\ncaused
       by: ValidationError: Request ARN is invalid\n\tstatus code: 400, 
      request id: ffbf992f-a989-4381-a480-25042885ee36, num_chunks: 1, labels:
       {kubernetes_container_name=\"osd-delete-backplane-serviceaccounts\", 
      kubernetes_host=\"ip-10-0-137-68.ec2.internal\", 
      kubernetes_namespace_name=\"openshift-backplane\", 
      kubernetes_pod_name=\"osd-delete-backplane-serviceaccounts-28664850-82c2p\",
       log_type=\"infrastructure\"}"level=info
       ts=2024-07-02T07:33:12.269686607Z caller=flush.go:167 msg="flushing 
      stream" user=infrastructure fp=6cdf0925a52305a9 immediate=false 
      num_chunks=1 labels="{kubernetes_container_name=\"kube-rbac-proxy\", 
      kubernetes_host=\"ip-10-0-134-70.ec2.internal\", 
      kubernetes_namespace_name=\"openshift-multus\", 
      kubernetes_pod_name=\"network-metrics-daemon-l42zn\", 
      log_type=\"infrastructure\"}"level=error
       ts=2024-07-02T07:33:12.283183709Z caller=flush.go:143 
      org_id=infrastructure msg="failed to flush" err="failed to flush chunks:
       store put chunk: WebIdentityErr: failed to retrieve credentials\ncaused
       by: ValidationError: Request ARN is invalid\n\tstatus code: 400, 
      request id: 0ac40531-5fe3-4093-b33f-d22e7abe3e9e, num_chunks: 1, labels:
       {kubernetes_container_name=\"kube-multus\", 
      kubernetes_host=\"ip-10-0-152-179.ec2.internal\", 
      kubernetes_namespace_name=\"openshift-multus\", 
      kubernetes_pod_name=\"multus-7stsj\", log_type=\"infrastructure\"}"level=info
       ts=2024-07-02T07:33:12.283207051Z caller=flush.go:167 msg="flushing 
      stream" user=infrastructure fp=6fa88b105cfa8728 immediate=false 
      num_chunks=1 
      labels="{kubernetes_container_name=\"kube-rbac-proxy-crio\", 
      kubernetes_host=\"ip-10-0-134-70.ec2.internal\", 
      kubernetes_namespace_name=\"openshift-machine-config-operator\", 
      kubernetes_pod_name=\"kube-rbac-proxy-crio-ip-10-0-134-70.ec2.internal\",
       log_type=\"infrastructure\"}"level=error
       ts=2024-07-02T07:33:12.300822691Z caller=flush.go:143 
      org_id=infrastructure msg="failed to flush" err="failed to flush chunks:
       store put chunk: WebIdentityErr: failed to retrieve credentials\ncaused
       by: ValidationError: Request ARN is invalid\n\tstatus code: 400, 
      request id: 61cca9b5-faad-4abb-9f27-c55d4666e291, num_chunks: 1, labels:
       {kubernetes_container_name=\"kube-rbac-proxy\", 
      kubernetes_host=\"ip-10-0-134-70.ec2.internal\", 
      kubernetes_namespace_name=\"openshift-multus\", 
      kubernetes_pod_name=\"network-metrics-daemon-l42zn\", 
      log_type=\"infrastructure\"}"level=error
       ts=2024-07-02T07:33:12.317369873Z caller=flush.go:143 
      org_id=infrastructure msg="failed to flush" err="failed to flush chunks:
       store put chunk: WebIdentityErr: failed to retrieve credentials\ncaused
       by: ValidationError: Request ARN is invalid\n\tstatus code: 400, 
      request id: 1940811c-5f44-4056-bffe-9e6e66bcd5d1, num_chunks: 1, labels:
       {kubernetes_container_name=\"kube-rbac-proxy-crio\", 
      kubernetes_host=\"ip-10-0-134-70.ec2.internal\", 
      kubernetes_namespace_name=\"openshift-machine-config-operator\", 
      kubernetes_pod_name=\"kube-rbac-proxy-crio-ip-10-0-134-70.ec2.internal\",
       log_type=\"infrastructure\"}"level=info ts=2024-07-02T07:33:20.719915416Z caller=table_manager.go:136 index-store=tsdb-2022-06-01 msg="uploading tables" 

      The loki yaml

      apiVersion: loki.grafana.com/v1
      kind: LokiStack
      metadata:
        name: logging-loki
        namespace: openshift-logging
      spec:
        size: 1x.extra-small
        storage:
          schemas:
          - version: v12
            effectiveDate: '2022-06-01'
          secret:
            name: logging-loki-s3
            type: s3
            credentialMode: token
        storageClassName: gp3-csi
        tenants:
          mode: openshift-logging
        rules:
          enabled: true
          selector:
            matchLabels:
              openshift.io/cluster-monitoring: "true"
          namespaceSelector:
            matchLabels:
              openshift.io/cluster-monitoring: "true"

      Trust Policy

      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                      "Federated": "arn:aws:iam::account_name:oidc-provider/oidc.op1.openshiftapps.com/oidc"
                  },
                  "Action": "sts:AssumeRoleWithWebIdentity",
                  "Condition": {
                      "StringEquals": {
                          "oidc.op1.openshiftapps.com/oidc:sub": [
                              "system:serviceaccount:openshift-logging:logging-loki",
                              "system:serviceaccount:openshift-logging:logging-loki-ruler"
                          ]
                      }
                  }
              }
          ]
      } 

      Inline policy: AWS S3 full access

      AWS S3 Secret in openshift-logging namespace

      (base) [njajodia@fedora ~]$ oc get secret/logging-loki-s3 -oyaml
      apiVersion: v1
      data:
        audience: c3RzLmFtYXpvbmF3cy5jb20=
        bucketnames: 
        endpoint: =
        region: dXMtZWFzdC0x
        role_arn: YXJuOmF3czppYW06OjYxMDM2NTgzMTU1Nzpyb2xlL2xva2lyb2xlCg==
      kind: Secret
      metadata:
        creationTimestamp: "2024-07-02T02:08:54Z"
        name: logging-loki-s3
        namespace: openshift-logging
        resourceVersion: "50774"
        uid: a483aa29-f4f8-4252-9b98-c46bc745f098
      type: Opaque
      (base) [njajodia@fedora ~]$  

      Secret for the operator in openshift-operators-redhat namespace

      (base) [njajodia@fedora ~]$ oc get secret/logging-loki-managed-credentials -n openshift-operators-redhat -oyaml
      apiVersion: v1
      data:
        bucketnames: 
        region: 
        role_arn: YXJuOmF3czppYW06OjYxMDM2NTgzMTU1Nzpyb2xlL2xva2lyb2xlCg==
        web_identity_token_file: L3Zhci9ydW4vc2VjcmV0cy9vcGVuc2hpZnQvc2VydmljZWFjY291bnQvdG9rZW4K
      kind: Secret
      metadata:
        creationTimestamp: "2024-07-02T02:13:48Z"
        name: logging-loki-managed-credentials
        namespace: openshift-operators-redhat
        resourceVersion: "64656"
        uid: e414de26-62af-4aca-9451-90e1909f67df
      type: Opaque 

      All the loki pods are up and fine

      (base) [njajodia@fedora ~]$ oc get pods
      NAME                                           READY   STATUS    RESTARTS   AGE
      cluster-logging-operator-5c57c86c9c-p6mf4      1/1     Running   0          4h35m
      collector-5vhcr                                1/1     Running   0          4h58m
      collector-7c8t4                                1/1     Running   0          4h58m
      collector-k4mwx                                1/1     Running   0          4h58m
      collector-kkfz6                                1/1     Running   0          4h58m
      collector-kkr7f                                1/1     Running   0          4h5m
      collector-ml4wv                                1/1     Running   0          4h58m
      collector-smt66                                1/1     Running   0          4h5m
      collector-zkg7g                                1/1     Running   0          73m
      logging-loki-compactor-0                       1/1     Running   0          77m
      logging-loki-distributor-696887b4b4-bx4vg      1/1     Running   0          77m
      logging-loki-distributor-696887b4b4-cj6x6      1/1     Running   0          77m
      logging-loki-gateway-55bf9b9bc5-24kzk          2/2     Running   0          5h23m
      logging-loki-gateway-55bf9b9bc5-fwzj9          2/2     Running   0          5h23m
      logging-loki-index-gateway-0                   1/1     Running   0          76m
      logging-loki-index-gateway-1                   1/1     Running   0          77m
      logging-loki-ingester-0                        1/1     Running   0          71m
      logging-loki-ingester-1                        1/1     Running   0          77m
      logging-loki-querier-6546cf4f45-7hcsq          1/1     Running   0          77m
      logging-loki-querier-6546cf4f45-ssqp9          1/1     Running   0          76m
      logging-loki-query-frontend-59899fbfbc-jq9sg   1/1     Running   0          77m
      logging-loki-query-frontend-59899fbfbc-r4rsm   1/1     Running   0          77m
      logging-loki-ruler-0                           1/1     Running   0          76m
      logging-loki-ruler-1                           1/1     Running   0          77m
      logging-view-plugin-bd5978d6d-lnkn2            1/1     Running   0          3h51m
      (base) [njajodia@fedora ~]$ 
       

      Please let me know if you need any additional details to resolve this issue. The ingestor pods are not able to push logs in the S3.

      I followed the below documentation

      https://loki-operator.dev/docs/short_lived_tokens_authentication.md/

      https://docs.openshift.com/container-platform/4.15/observability/logging/log_storage/installing-log-storage.html#logging-loki-storage-aws_installing-log-storage

              Unassigned Unassigned
              njajodia Nirjhar Jajodia (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: