-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
-
Summary
Kiali CR never reaches Successful condition; env snapshot and performance stages skipped
Components
- Kiali Operator
- Kiali (CR / reconciliation)
- kiali-perf-tests (Jenkins pipeline)
- OpenShift / Istio (Service Mesh QE)
Affected Builds / Environment
| Report | Pipeline Build | Trigger | Cluster (API) | Kiali Repo Commit | Date (from CR) |
|---|---|---|---|---|---|
| #888 | kiali-perf-tests | upstream-istio-pipeline 3432 | ci-rhos-01-04.servicemesh.rhqeaws.com | ff38dbc99ca11113b625992dadfe8d5263312dd9 | 2026-02-27 |
| #889 | kiali-perf-tests | upstream-istio-pipeline 3434 | ci-rhos-d-03.servicemesh.rhqeaws.com | 63521c4e5d83c6e99bac615b643c74e832f20350 | 2026-02-28 |
| #890 | kiali-perf-tests | upstream-istio-pipeline 3436 | ci-rhos-d-02.servicemesh.rhqeaws.com | 63521c4e5d83c6e99bac615b643c74e832f20350 | 2026-03-01 |
- Jenkins job: kiali/test-jobs/kiali-perf-tests
- Upstream: kiali/main-pipelines/upstream-istio-pipeline (timer-triggered)
- Jenkinsfile repo: gitlab.cee.redhat.com/istio/servicemesh-qe/jenkins-csb-declaration @ bce7d0a55879aa1c1c4b336c2f889b152be6e176 (Hotfix2 for opentelemetrycollector)
- Istio: 1.29.0
- OpenShift: 4.21.1 (#888), 4.21.2 (#889, #890)
- Kubernetes: 1.34.2
- Kiali Operator: v2.23.0-SNAPSHOT
Description
Across three consecutive performance test runs (#888, #889, #890) on different OpenShift clusters, the pipeline fails at the same point:
- Invalid patch request
When the pipeline patches the Kiali CR to remove spec.api.namespaces, the server rejects the request:oc patch kiali kiali -n kiali-operator --type=json '-p=[{"op": "remove", "path": "/spec/api/namespaces"}]' The request is invalid: the server rejected our request due to an error in our request
- Kiali CR never reaches Successful condition
The pipeline then runs:oc wait --for=condition=Successful kiali/kiali --timeout=120s -n kiali-operator error: timed out waiting for the condition on kialis/kiali
So the Kiali CR does not reach condition=Successful within 120s (and still has a Type: Failure condition in oc describe).
- Downstream stages skipped
Because of the above failure:
- Stage "Get env snapshot" is skipped.
- Stage "Run Performance" is skipped.
- Pipeline result
All three runs end with:ERROR: script returned exit code 1 Finished: FAILURE
So the root cause of the test failure is the combination of:
- The rejected oc patch (remove spec.api.namespaces), and/or
- Kiali CR reconciliation not reaching Successful (whether due to that patch, operator behavior, or cluster state).
Steps to Reproduce
- Trigger the upstream-istio pipeline (or kiali-perf-tests) on a cluster with Kiali Operator and a Kiali CR.
- Let the pipeline run through "Install Test Namespaces" and reach the step that patches the Kiali CR (remove spec.deployment.accessible_namespaces, then remove spec.api.namespaces, then add spec.deployment.cluster_wide_access).
- Observe the patch for spec.api.namespaces failing with "The request is invalid".
- Observe oc wait --for=condition=Successful kiali/kiali --timeout=120s -n kiali-operator timing out.
- Observe "Get env snapshot" and "Run Performance" skipped and pipeline finishing with FAILURE.
Additional Observations (all three reports)
- #888: No GitHub Istio version retries; first cluster (ci-rhos-01-04).
- #889, #890: Multiple "Failed to get the latest Istio version from GitHub" retries (4 and 5 attempts respectively) before success.
- Recurring (non-fatal) messages in all runs:
- Error from server (AlreadyExists): routes.route.openshift.io "istio-ingressgateway" already exists
- Error from server (NotFound): services "bookinfo-gateway-istio" not found
- Error from server (NotFound): routes.route.openshift.io "bookinfo-gateway-istio" not found
- Error from server (NotFound): namespaces "sleep" not found (then namespace is created)
- Kiali CR status in all three shows a Type: Failure condition (with empty Message/Reason) and a Type: Successful with "Last reconciliation succeeded", plus Type: Running. So the CR is not consistently reporting a clean Successful state that oc wait expects.
Expected Result
- The patch that removes spec.api.namespaces either succeeds or is not required; the Kiali CR reaches condition=Successful within the wait timeout.
- Stages "Get env snapshot" and "Run Performance" run.
- Pipeline completes with SUCCESS (or fails only on real performance assertions).
Actual Result
- Patch for spec.api.namespaces is rejected.
- oc wait for condition=Successful times out.
- "Get env snapshot" and "Run Performance" are skipped.
- Pipeline finishes with FAILURE.
Labels / Fix Versions (suggested)
- kiali-operator
- kiali-perf-tests
- upstream-istio
- servicemesh-qe
Attachments / References
- Full pipeline logs: #888.txt, #889.txt, #890.txt
- Relevant pipeline step (from #888, similar in #889/#890):
- Patch steps and oc wait around lines 1162-1174 (#888), 1166-1178 (#889), 1167-1179 (#890)
- Skipped stages around 3406-3412 (#888), 3410-3416 (#889), 3411-3417 (#890)
- Final failure: end of each file (e.g. "Finished: FAILURE")
Suggested Next Steps
- Operator/API: Confirm whether spec.api.namespaces is optional and why remove-patch is rejected (validation, defaulting, or CRD).
- Pipeline: Consider making the "remove spec.api.namespaces" patch conditional or handling the error so the CR can still reach Successful (if cluster_wide_access is the real intent).
- Operator: Investigate why the Kiali CR does not report condition=Successful within 120s after these patches (reconciliation logic, condition updates).
- Re-run kiali-perf-tests after a fix and confirm "Get env snapshot" and "Run Performance" execute and pipeline can pass or fail on actual perf results.