-
Bug
-
Resolution: Done
-
Minor
-
Logging 5.7.0
-
False
-
None
-
False
-
NEW
-
VERIFIED
-
Before this update, the compactor would report TLS certificate errors from communications with the querier when retention was active. With this update, the compactor or querier no longer communicate erroneously over HTTP.
-
-
-
Log Storage - Sprint 231
Description of problem:
Deploy Lokistack as the log store and enable retention policy, after querying logs in Console, the compactor pod raises many TLS handshake error :
2023/01/12 09:10:34 http: TLS handshake error from 10.129.2.35:52996: remote error: tls: bad certificate 2023/01/12 09:10:34 http: TLS handshake error from 10.129.2.35:52998: remote error: tls: bad certificate 2023/01/12 09:10:34 http: TLS handshake error from 10.129.2.35:53002: remote error: tls: bad certificate
Checking the pod IP, the error is from querier pod:
$ oc get pod -owide |grep 10.129.2.35 loki-53817-querier-6dbf8f6dcb-r7hhs 1/1 Running 0 3h21m 10.129.2.35 ip-10-0-68-14.us-east-2.compute.internal <none> <none>
And in querier pod, there are lots of `x509: certificate signed by unknown authority` errors:
level=error ts=2023-01-12T09:10:34.645422331Z caller=http.go:97 msg="error getting delete requests from the store" err="Get \"https://loki-53817-compactor-http.openshift-logging.svc.cluster.local:3100/loki/api/v1/delete\": x509: certificate signed by unknown authority" ts=2023-01-12T09:10:34.645455061Z caller=spanlogger.go:80 user=infrastructure level=error msg="failed loading deletes for user" err="Get \"https://loki-53817-compactor-http.openshift-logging.svc.cluster.local:3100/loki/api/v1/delete\": x509: certificate signed by unknown authority" level=info ts=2023-01-12T09:10:34.678401576Z caller=metrics.go:143 component=querier org_id=infrastructure latency=fast query="{log_type=~\".+\"} | json" query_type=limited range_type=range length=10m21.99s start_delta=10m34.678395857s end_delta=12.688396021s step=14s duration=42.458403ms status=200 limit=100 returned_lines=100 throughput=26MB total_bytes=1.1MB total_entries=100 queue_time=0s subqueries=1 cache_chunk_req=5 cache_chunk_hit=5 cache_chunk_bytes_stored=0 cache_chunk_bytes_fetched=526932 cache_index_req=0 cache_index_hit=0 cache_result_req=0 cache_result_hit=0 level=error ts=2023-01-12T09:10:34.700818314Z caller=http.go:131 msg="error getting cache gen numbers from the store" err="Get \"https://loki-53817-compactor-http.openshift-logging.svc.cluster.local:3100/loki/api/v1/cache/generation_numbers\": x509: certificate signed by unknown authority" level=error ts=2023-01-12T09:10:34.700848345Z caller=gennumber_loader.go:136 msg="error loading cache generation numbers" err="Get \"https://loki-53817-compactor-http.openshift-logging.svc.cluster.local:3100/loki/api/v1/cache/generation_numbers\": x509: certificate signed by unknown authority" level=info ts=2023-01-12T09:10:34.700917678Z caller=engine.go:199 component=querier org_id=infrastructure msg="executing query" type=range query="{log_type=~\".+\"} | json" length=10m21.99s step=14s level=error ts=2023-01-12T09:10:34.710862063Z caller=http.go:97 msg="error getting delete requests from the store" err="Get \"https://loki-53817-compactor-http.openshift-logging.svc.cluster.local:3100/loki/api/v1/delete\": x509: certificate signed by unknown authority" ts=2023-01-12T09:10:34.710892209Z caller=spanlogger.go:80 user=infrastructure level=error msg="failed loading deletes for user" err="Get \"https://loki-53817-compactor-http.openshift-logging.svc.cluster.local:3100/loki/api/v1/delete\": x509: certificate signed by unknown authority" level=info ts=2023-01-12T09:10:34.729929408Z caller=metrics.go:143 component=querier org_id=infrastructure latency=fast query="{log_type=~\".+\"} | json" query_type=limited range_type=range length=10m21.99s start_delta=10m34.729924296s end_delta=12.73992446s step=14s duration=28.923928ms status=200 limit=100 returned_lines=100 throughput=20MB total_bytes=592kB total_entries=100 queue_time=0s subqueries=1 cache_chunk_req=2 cache_chunk_hit=2 cache_chunk_bytes_stored=0 cache_chunk_bytes_fetched=287170 cache_index_req=0 cache_index_hit=0 cache_result_req=0 cache_result_hit=0 level=error ts=2023-01-12T09:10:34.747831846Z caller=http.go:131 msg="error getting cache gen numbers from the store" err="Get \"https://loki-53817-compactor-http.openshift-logging.svc.cluster.local:3100/loki/api/v1/cache/generation_numbers\": x509: certificate signed by unknown authority" level=error ts=2023-01-12T09:10:34.747860449Z caller=gennumber_loader.go:136 msg="error loading cache generation numbers" err="Get \"https://loki-53817-compactor-http.openshift-logging.svc.cluster.local:3100/loki/api/v1/cache/generation_numbers\": x509: certificate signed by unknown authority" level=info ts=2023-01-12T09:10:34.747930036Z caller=engine.go:199 component=querier org_id=infrastructure msg="executing query" type=range query="{log_type=~\".+\"} | json" length=10m21.99s step=14s level=error ts=2023-01-12T09:10:34.75757163Z caller=http.go:97 msg="error getting delete requests from the store" err="Get \"https://loki-53817-compactor-http.openshift-logging.svc.cluster.local:3100/loki/api/v1/delete\": x509: certificate signed by unknown authority" ts=2023-01-12T09:10:34.757605681Z caller=spanlogger.go:80 user=infrastructure level=error msg="failed loading deletes for user" err="Get \"https://loki-53817-compactor-http.openshift-logging.svc.cluster.local:3100/loki/api/v1/delete\": x509: certificate signed by unknown authority" level=info ts=2023-01-12T09:10:34.775212079Z caller=metrics.go:143 component=querier org_id=infrastructure latency=fast query="{log_type=~\".+\"} | json" query_type=limited range_type=range length=10m21.99s start_delta=10m34.775206155s end_delta=12.785206303s step=14s duration=27.188399ms status=200 limit=100 returned_lines=100 throughput=20MB total_bytes=538kB total_entries=100 queue_time=0s subqueries=1 cache_chunk_req=0 cache_chunk_hit=0 cache_chunk_bytes_stored=0 cache_chunk_bytes_fetched=0 cache_index_req=0 cache_index_hit=0 cache_result_req=0 cache_result_hit=0
Version-Release number of selected component (if applicable):
loki-operator.v5.6.0
How reproducible:
Always
Steps to Reproduce:
- Deploy logging, use lokistack as the log store, and enable retention in lokistack
apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: loki-53817 namespace: openshift-logging spec: limits: global: retention: days: 2 streams: - days: 4 priority: 1 selector: '{kubernetes_namespace_name=~"test.+"}' - days: 1 priority: 1 selector: '{log_type="application"}' - days: 15 priority: 1 selector: '{log_type="audit"}' managementState: Managed size: 1x.extra-small storage: schemas: - effectiveDate: "2020-10-11" version: v11 secret: name: storage-secret type: s3 storageClassName: gp3-csi tenants: mode: openshift-logging
2. Log into Console, and enable console plugin, then go to Observe --> Logs to check logs stored in lokistack or query logs with logcli
3. check logs in compactor pod and querier pod
Actual results:
Compactor pod raises many errors.
Expected results:
No errors in compactor and querier pods.
Additional info:
When querying logs, we can get some output.
Not sure if the retention works well or not, still testing.
- clones
-
LOG-3494 [release-5.6] After querying logs in loki, compactor pod raises many TLS handshake error if retention policy is enabled.
- Closed