Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-61339

Panic in cluster-version-operator when duplicate entries in .status.history

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Undefined Undefined
    • None
    • 4.16, 4.17, 4.18, 4.19
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      When a cluster is rollback or forced back to reapply a version, this may create a duplicate entry in the clusterversion, and the {{cluster-version-operator} will crashloop with the panic message:

      [quickcluster@upi-0 ~]$ oc logs -n openshift-cluster-version $(oc get pod -n openshift-cluster-version -o name) -p | grep -E "[EFW][0-9]{4}" | grep "tried to update cluster version history to contain duplicate image entries"
      E0908 05:58:12.415100       1 runtime.go:79] Observed a panic: &errors.errorString{s:"tried to update cluster version history to contain duplicate image entries: [\n  {\n    \"state\": \"Partial\",\n    \"startedTime\": \"2025-09-07T18:38:17Z\",\n    \"completionTime\": \"2025-09-07T19:17:39Z\",\n    \"version\": \"4.17.35\",\n    \"image\": \"quay.io/openshift-release-dev/ocp-release@sha256:20bf36ab093f1da58dc9662f6cd132803babe641b7471553d2cd6a929bdfc946\",\n    \"verified\": false\n  },\n  {\n    \"state\": \"Completed\",\n    \"startedTime\": \"2025-09-06T18:38:17Z\",\n    \"completionTime\": \"2025-09-06T19:17:39Z\",\n    \"version\": \"4.17.35\",\n    \"image\": \"quay.io/openshift-release-dev/ocp-release@sha256:20bf36ab093f1da58dc9662f6cd132803babe641b7471553d2cd6a929bdfc946\",\n    \"verified\": false\n  }\n]"} (tried to update cluster version history to contain duplicate image entries: [
      E0908 05:58:12.415166       1 runtime.go:79] Observed a panic: &errors.errorString{s:"tried to update cluster version history to contain duplicate image entries: [\n  {\n    \"state\": \"Partial\",\n    \"startedTime\": \"2025-09-07T18:38:17Z\",\n    \"completionTime\": \"2025-09-07T19:17:39Z\",\n    \"version\": \"4.17.35\",\n    \"image\": \"quay.io/openshift-release-dev/ocp-release@sha256:20bf36ab093f1da58dc9662f6cd132803babe641b7471553d2cd6a929bdfc946\",\n    \"verified\": false\n  },\n  {\n    \"state\": \"Completed\",\n    \"startedTime\": \"2025-09-06T18:38:17Z\",\n    \"completionTime\": \"2025-09-06T19:17:39Z\",\n    \"version\": \"4.17.35\",\n    \"image\": \"quay.io/openshift-release-dev/ocp-release@sha256:20bf36ab093f1da58dc9662f6cd132803babe641b7471553d2cd6a929bdfc946\",\n    \"verified\": false\n  }\n]"} (tried to update cluster version history to contain duplicate image entries: [
      

      Version-Release number of selected component (if applicable):

      This has initially been found in 4.16, but I was able to reproduce it in 4.19.

      How reproducible:

      This is easily reproducible by simply duplicating one of the .status.history section.

      • From:
        [quickcluster@upi-0 ~]$ oc get clusterversion version -o json | jq -r '.status.history[]'
        {
          "completionTime": "2025-09-05T02:24:31Z",
          "image": "quay.io/openshift-release-dev/ocp-release@sha256:bd4cd954feebfe3a6b2847c20271e8f3ba21e99ac1e234db6ce4cf2207f8955a",
          "startedTime": "2025-09-05T01:50:54Z",
          "state": "Completed",
          "verified": false,
          "version": "4.19.7"
        }
        
      • to
        [quickcluster@upi-0 ~]$ jq -r '.status.history' clusterversion.json
        [
          {
            "completionTime": "2025-09-07T02:24:31Z",
            "image": "quay.io/openshift-release-dev/ocp-release@sha256:bd4cd954feebfe3a6b2847c20271e8f3ba21e99ac1e234db6ce4cf2207f8955a",
            "startedTime": "2025-09-07T01:50:54Z",
            "state": "Completed",
            "verified": false,
            "version": "4.19.7"
          },
          {
            "completionTime": "2025-09-05T02:24:31Z",
            "image": "quay.io/openshift-release-dev/ocp-release@sha256:bd4cd954feebfe3a6b2847c20271e8f3ba21e99ac1e234db6ce4cf2207f8955a",
            "startedTime": "2025-09-05T01:50:54Z",
            "state": "Completed",
            "verified": false,
            "version": "4.19.7"
          }
        ]
        

      Steps to Reproduce:

      1. save the current clusterversion config:

      $ oc get clusterversion version -o json > clusterversion.json
      

      2. Update the file with a duplicate entry
      3. Apply the file

      [quickcluster@upi-0 ~]$ oc replace --subresource status -f clusterversion.json
      clusterversion.config.openshift.io/version replaced
      [quickcluster@upi-0 ~]$ oc get clusterversion version -o json | jq -r '.status.history'
      [
        {
          "completionTime": "2025-09-07T02:24:31Z",
          "image": "quay.io/openshift-release-dev/ocp-release@sha256:bd4cd954feebfe3a6b2847c20271e8f3ba21e99ac1e234db6ce4cf2207f8955a",
          "startedTime": "2025-09-07T01:50:54Z",
          "state": "Partial",
          "verified": false,
          "version": "4.19.7"
        },
        {
          "completionTime": "2025-09-05T02:24:31Z",
          "image": "quay.io/openshift-release-dev/ocp-release@sha256:bd4cd954feebfe3a6b2847c20271e8f3ba21e99ac1e234db6ce4cf2207f8955a",
          "startedTime": "2025-09-05T01:50:54Z",
          "state": "Completed",
          "verified": false,
          "version": "4.19.7"
        }
      ]
      

      4. Wait for the POD to crash loop

      [quickcluster@upi-0 ~]$ oc get pod -n openshift-cluster-version -w
      

      Actual results:

      The POD is crashing with a panic event

      Expected results:

      The POD stopping with a clear status (not as panic)

      Additional info:

      • The behavior is based on the version (tested with different state)
      • I have created the KCS solution 7130751 to describe the issue and provide a resolution.
        The KCS solution has been set to private awaiting for you thought if this should be public as this will also explain the customers how to removed any unsupported update from the clusterversion history.

              Unassigned Unassigned
              rhn-support-vlours Vincent Lours
              None
              None
              Jia Liu Jia Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: