Uploaded image for project: 'OpenShift Service Mesh'
  1. OpenShift Service Mesh
  2. OSSM-12713

Kiali Performance Tests — Repeated Failures (Builds #888, #889, #890)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • Kiali
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Summary

      Kiali CR never reaches Successful condition; env snapshot and performance stages skipped

      Components

      • Kiali Operator
      • Kiali (CR / reconciliation)
      • kiali-perf-tests (Jenkins pipeline)
      • OpenShift / Istio (Service Mesh QE)

      Affected Builds / Environment

      Report Pipeline Build Trigger Cluster (API) Kiali Repo Commit Date (from CR)
      #888 kiali-perf-tests upstream-istio-pipeline 3432 ci-rhos-01-04.servicemesh.rhqeaws.com ff38dbc99ca11113b625992dadfe8d5263312dd9 2026-02-27
      #889 kiali-perf-tests upstream-istio-pipeline 3434 ci-rhos-d-03.servicemesh.rhqeaws.com 63521c4e5d83c6e99bac615b643c74e832f20350 2026-02-28
      #890 kiali-perf-tests upstream-istio-pipeline 3436 ci-rhos-d-02.servicemesh.rhqeaws.com 63521c4e5d83c6e99bac615b643c74e832f20350 2026-03-01
      • Jenkins job: kiali/test-jobs/kiali-perf-tests
      • Upstream: kiali/main-pipelines/upstream-istio-pipeline (timer-triggered)
      • Jenkinsfile repo: gitlab.cee.redhat.com/istio/servicemesh-qe/jenkins-csb-declaration @ bce7d0a55879aa1c1c4b336c2f889b152be6e176 (Hotfix2 for opentelemetrycollector)
      • Istio: 1.29.0
      • OpenShift: 4.21.1 (#888), 4.21.2 (#889, #890)
      • Kubernetes: 1.34.2
      • Kiali Operator: v2.23.0-SNAPSHOT

      Description

      Across three consecutive performance test runs (#888, #889, #890) on different OpenShift clusters, the pipeline fails at the same point:

      1. Invalid patch request
        When the pipeline patches the Kiali CR to remove spec.api.namespaces, the server rejects the request:
        oc patch kiali kiali -n kiali-operator --type=json '-p=[{"op": "remove", "path": "/spec/api/namespaces"}]'
        The request is invalid: the server rejected our request due to an error in our request
        
      1. Kiali CR never reaches Successful condition
        The pipeline then runs:
        oc wait --for=condition=Successful kiali/kiali --timeout=120s -n kiali-operator
        error: timed out waiting for the condition on kialis/kiali
        

        So the Kiali CR does not reach condition=Successful within 120s (and still has a Type: Failure condition in oc describe).

      1. Downstream stages skipped
        Because of the above failure:
      • Stage "Get env snapshot" is skipped.
      • Stage "Run Performance" is skipped.
      1. Pipeline result
        All three runs end with:
        ERROR: script returned exit code 1
        Finished: FAILURE
        

      So the root cause of the test failure is the combination of:

      • The rejected oc patch (remove spec.api.namespaces), and/or
      • Kiali CR reconciliation not reaching Successful (whether due to that patch, operator behavior, or cluster state).

      Steps to Reproduce

      1. Trigger the upstream-istio pipeline (or kiali-perf-tests) on a cluster with Kiali Operator and a Kiali CR.
      2. Let the pipeline run through "Install Test Namespaces" and reach the step that patches the Kiali CR (remove spec.deployment.accessible_namespaces, then remove spec.api.namespaces, then add spec.deployment.cluster_wide_access).
      3. Observe the patch for spec.api.namespaces failing with "The request is invalid".
      4. Observe oc wait --for=condition=Successful kiali/kiali --timeout=120s -n kiali-operator timing out.
      5. Observe "Get env snapshot" and "Run Performance" skipped and pipeline finishing with FAILURE.

      Additional Observations (all three reports)

      • #888: No GitHub Istio version retries; first cluster (ci-rhos-01-04).
      • #889, #890: Multiple "Failed to get the latest Istio version from GitHub" retries (4 and 5 attempts respectively) before success.
      • Recurring (non-fatal) messages in all runs:
        • Error from server (AlreadyExists): routes.route.openshift.io "istio-ingressgateway" already exists
        • Error from server (NotFound): services "bookinfo-gateway-istio" not found
        • Error from server (NotFound): routes.route.openshift.io "bookinfo-gateway-istio" not found
        • Error from server (NotFound): namespaces "sleep" not found (then namespace is created)
      • Kiali CR status in all three shows a Type: Failure condition (with empty Message/Reason) and a Type: Successful with "Last reconciliation succeeded", plus Type: Running. So the CR is not consistently reporting a clean Successful state that oc wait expects.

      Expected Result

      • The patch that removes spec.api.namespaces either succeeds or is not required; the Kiali CR reaches condition=Successful within the wait timeout.
      • Stages "Get env snapshot" and "Run Performance" run.
      • Pipeline completes with SUCCESS (or fails only on real performance assertions).

      Actual Result

      • Patch for spec.api.namespaces is rejected.
      • oc wait for condition=Successful times out.
      • "Get env snapshot" and "Run Performance" are skipped.
      • Pipeline finishes with FAILURE.

      Labels / Fix Versions (suggested)

      • kiali-operator
      • kiali-perf-tests
      • upstream-istio
      • servicemesh-qe

      Attachments / References

      • Full pipeline logs: #888.txt, #889.txt, #890.txt
      • Relevant pipeline step (from #888, similar in #889/#890):
        • Patch steps and oc wait around lines 1162-1174 (#888), 1166-1178 (#889), 1167-1179 (#890)
        • Skipped stages around 3406-3412 (#888), 3410-3416 (#889), 3411-3417 (#890)
        • Final failure: end of each file (e.g. "Finished: FAILURE")

      Suggested Next Steps

      1. Operator/API: Confirm whether spec.api.namespaces is optional and why remove-patch is rejected (validation, defaulting, or CRD).
      2. Pipeline: Consider making the "remove spec.api.namespaces" patch conditional or handling the error so the CR can still reach Successful (if cluster_wide_access is the real intent).
      3. Operator: Investigate why the Kiali CR does not report condition=Successful within 120s after these patches (reconciliation logic, condition updates).
      4. Re-run kiali-perf-tests after a fix and confirm "Get env snapshot" and "Run Performance" execute and pipeline can pass or fail on actual perf results.

              hhovsepy@redhat.com Hayk Hovsepyan
              rhn-support-pmarek Pavel Marek
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: