Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-8466

LokiStackWriteRequestErrors for expired certificates

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • Logging 6.1.2
    • Log Storage
    • None
    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • False
    • NEW
    • NEW
    • Bug Fix
    • Important

      Description of problem:

      The Loki Operator rotated the Loki certificates:

      pod=$(oc get pod -l app.kubernetes.io/name=loki-operator -n openshift-operators-redhat -o name)
      
      $ oc logs $pod -c manager -n openshift-operators-redhat|grep "Certificate expired" |tail -1
      2025-12-15T13:55:29.775477226+07:00 {"_ts":"2025-12-15T06:55:29.775463615Z","_level":"0","_component":"loki-operator_controllers_certrotation","_message":"Certificate expired","msg":"certificates expired for reasons: logging-loki-index-gateway-http: past its latest possible time 2025-12-15 06:55:28.8 +0000 UTC"}
      

      And the Loki certificates were renewed:

      $ for secret in $(oc get secrets -n openshift-logging -o name |grep logging-loki |egrep -v "token|logging-loki-s3|logging-loki$" ); do    echo "--- Secret: $secret ---";   oc get $secret -n openshift-logging -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -startdate -enddate; done
      --- Secret: secret/logging-loki-compactor-grpc ---
      notBefore=Dec 15 06:55:27 2025 GMT
      notAfter=Mar 15 06:55:28 2026 GMT
      --- Secret: secret/logging-loki-compactor-http ---
      notBefore=Dec 15 06:55:28 2025 GMT
      notAfter=Mar 15 06:55:29 2026 GMT
      --- Secret: secret/logging-loki-distributor-grpc ---
      notBefore=Dec 15 06:55:27 2025 GMT
      notAfter=Mar 15 06:55:28 2026 GMT
      --- Secret: secret/logging-loki-distributor-http ---
      notBefore=Dec 15 06:55:27 2025 GMT
      notAfter=Mar 15 06:55:28 2026 GMT
      --- Secret: secret/logging-loki-gateway ---
      Could not find certificate from <stdin>
      --- Secret: secret/logging-loki-gateway-client-http ---
      notBefore=Dec 15 06:55:28 2025 GMT
      notAfter=Mar 15 06:55:29 2026 GMT
      --- Secret: secret/logging-loki-gateway-http ---
      notBefore=Aug 30 08:22:55 2025 GMT
      notAfter=Aug 30 08:22:56 2027 GMT
      --- Secret: secret/logging-loki-index-gateway-grpc ---
      notBefore=Dec 15 06:55:27 2025 GMT
      notAfter=Mar 15 06:55:28 2026 GMT
      --- Secret: secret/logging-loki-index-gateway-http ---
      notBefore=Dec 15 06:55:28 2025 GMT
      notAfter=Mar 15 06:55:29 2026 GMT
      --- Secret: secret/logging-loki-ingester-grpc ---
      notBefore=Dec 15 06:55:28 2025 GMT
      notAfter=Mar 15 06:55:29 2026 GMT
      --- Secret: secret/logging-loki-ingester-http ---
      notBefore=Dec 15 06:55:26 2025 GMT
      notAfter=Mar 15 06:55:27 2026 GMT
      --- Secret: secret/logging-loki-querier-grpc ---
      notBefore=Dec 15 06:55:27 2025 GMT
      notAfter=Mar 15 06:55:28 2026 GMT
      --- Secret: secret/logging-loki-querier-http ---
      notBefore=Dec 15 06:55:27 2025 GMT
      notAfter=Mar 15 06:55:28 2026 GMT
      --- Secret: secret/logging-loki-query-frontend-grpc ---
      notBefore=Dec 15 06:55:27 2025 GMT
      notAfter=Mar 15 06:55:28 2026 GMT
      --- Secret: secret/logging-loki-query-frontend-http ---
      notBefore=Dec 15 06:55:26 2025 GMT
      notAfter=Mar 15 06:55:27 2026 GMT
      --- Secret: secret/logging-loki-ruler-grpc ---
      notBefore=Dec 15 06:55:27 2025 GMT
      notAfter=Mar 15 06:55:28 2026 GMT
      --- Secret: secret/logging-loki-ruler-http ---
      notBefore=Dec 15 06:55:26 2025 GMT
      notAfter=Mar 15 06:55:27 2026 GMT
      --- Secret: secret/logging-loki-signing-ca ---
      notBefore=Dec 20 04:21:04 2024 GMT
      notAfter=Dec 20 10:21:05 2029 GMT
      

      And the Loki pods restarted immediately

      $ for pod in $(oc get pods -l app.kubernetes.io/name=lokistack -o name -n openshift-logging); do echo "--- pod: $pod --- " ; oc get $pod -o yaml -n openshift-logging |grep -i creationTime; done
      --- pod: pod/logging-loki-compactor-0 --- 
        creationTimestamp: "2025-12-15T06:56:00Z"
      --- pod: pod/logging-loki-distributor-546997b99c-4q85p --- 
        creationTimestamp: "2025-12-15T06:55:49Z"
      --- pod: pod/logging-loki-distributor-546997b99c-nlxss --- 
        creationTimestamp: "2025-12-15T06:55:29Z"
      --- pod: pod/logging-loki-gateway-6d45f4cfff-8j7fd --- 
        creationTimestamp: "2025-12-15T06:55:29Z"
      --- pod: pod/logging-loki-gateway-6d45f4cfff-q7wqg --- 
        creationTimestamp: "2025-12-15T06:55:31Z"
      --- pod: pod/logging-loki-index-gateway-0 --- 
        creationTimestamp: "2025-12-15T06:56:50Z"
      --- pod: pod/logging-loki-index-gateway-1 --- 
        creationTimestamp: "2025-12-15T06:55:59Z"
      --- pod: pod/logging-loki-ingester-0 --- 
        creationTimestamp: "2025-12-15T06:59:21Z"
      --- pod: pod/logging-loki-ingester-1 --- 
        creationTimestamp: "2025-12-15T06:57:45Z"
      --- pod: pod/logging-loki-ingester-2 --- 
        creationTimestamp: "2025-12-15T06:56:00Z"
      --- pod: pod/logging-loki-querier-7b76d6dc77-42vdr --- 
        creationTimestamp: "2025-12-15T06:55:29Z"
      --- pod: pod/logging-loki-querier-7b76d6dc77-dzzdm --- 
        creationTimestamp: "2025-12-15T06:56:10Z"
      --- pod: pod/logging-loki-querier-7b76d6dc77-ljb74 --- 
        creationTimestamp: "2025-12-15T06:55:49Z"
      --- pod: pod/logging-loki-query-frontend-768b784c84-b5p9s --- 
        creationTimestamp: "2025-12-15T06:55:49Z"
      --- pod: pod/logging-loki-query-frontend-768b784c84-ff9rn --- 
        creationTimestamp: "2025-12-15T06:55:29Z"
      

      After some days, the network connection between the Loki Distributor and the Loki Gateways fails as it's using an expired certificate (the old certificate).

      /// In the Loki Gateway, only one has the error
      $ for pod in $(oc get pod -l app.kubernetes.io/component=lokistack-gateway -o name -n openshift-logging); do echo "--- pod: $pod ---"; oc logs $pod -c gateway -n openshift-logging |grep -c "tls: expired certificate" ; done
      --- pod: pod/logging-loki-gateway-6d45f4cfff-8j7fd ---
      25196
      --- pod: pod/logging-loki-gateway-6d45f4cfff-q7wqg ---
      0
      
      $ for pod in $(oc get pod -l app.kubernetes.io/component=lokistack-gateway -o name -n openshift-logging); do echo "--- pod: $pod ---"; oc logs $pod -c gateway -n openshift-logging |grep "tls: expired certificate" |tail -1; done
      --- pod: pod/logging-loki-gateway-6d45f4cfff-8j7fd ---
      2026-01-08T10:00:13.076005189+07:00 level=warn name=lokistack-gateway ts=2026-01-08T03:00:13.075990362Z caller=stdlib.go:105 caller=reverseproxy.go:661 msg="http: proxy error: remote error: tls: expired certificate"
      --- pod: pod/logging-loki-gateway-6d45f4cfff-q7wqg ---
      
      /// In the Loki Distributors, both is visible the error
      $ for pod in $(oc get pod -l app.kubernetes.io/component=distributor -o name -n openshift-logging); do echo "--- pod: $pod ---"; oc logs $pod  -n openshift-logging |grep -c "x509: certificate has expired or is not yet valid" ; done
      --- pod: pod/logging-loki-distributor-546997b99c-4q85p ---
      19453
      --- pod: pod/logging-loki-distributor-546997b99c-nlxss ---
      16867
      
      $ for pod in $(oc get pod -l app.kubernetes.io/component=distributor -o name -n openshift-logging); do echo "--- pod: $pod ---"; oc logs $pod  -n openshift-logging |grep  "x509: certificate has expired or is not yet valid" |tail -1; done
      --- pod: pod/logging-loki-distributor-546997b99c-4q85p ---
      2026-01-08T10:00:11.953588738+07:00 2026-01-08 03:00:11.953552 I | http: TLS handshake error from 10.249.2.161:54192: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2026-01-08T03:00:11Z is after 2026-01-02T06:55:29Z
      --- pod: pod/logging-loki-distributor-546997b99c-nlxss ---
      2026-01-08T10:00:12.272188774+07:00 2026-01-08 03:00:12.272153 I | http: TLS handshake error from 10.249.2.161:54206: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2026-01-08T03:00:12Z is after 2026-01-02T06:55:29Z
      

      The IP Address "10.249.2.161" observed in the Loki Distributors is owned by the Loki Gateway "logging-loki-gateway-6d45f4cfff-8j7fd"

      $ oc get pods -o wide -n openshift-logging|grep 10.249.2.161
      logging-loki-gateway-6d45f4cfff-8j7fd          2/2     Running   0          23d    10.249.2.161   node.example.com   <none>           <none>
      

      *Note*: the only one Loki Gateway failing was the one immediately restarted (next second) after the certificate was renewed

      "--- pod: pod/logging-loki-gateway-6d45f4cfff-8j7fd --- 
        creationTimestamp: "2025-12-15T06:55:29Z"
      

      Version-Release number of selected component (if applicable):

      Loki Operator v6.1

      How reproducible:

      Not reproducible as it requires Loki certificates being expired

      Steps to Reproduce:

      N/A

      Actual results:

      Some Loki pods are not using the Loki renewed certificates and they use the old Loki certificates that they will expire in some days.

      Expected results:

      The Loki Operator renew the Loki certificates and the Loki pods restarted use all of them the renewed certificates

      Workaround

      $ oc delete pod -l app.kubernetes.io/component=distributor -n openshift-logging
      $ oc delete pod -l app.kubernetes.io/component=lokistack-gateway -n openshift-logging
      

              Unassigned Unassigned
              rhn-support-ocasalsa Oscar Casal Sanchez
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: