-
Bug
-
Resolution: Done
-
Undefined
-
None
-
4.13.z
-
Important
-
None
-
False
-
-
-
-
Description of problem:
CVO stuck in upgrading state for Network clusteroperator. There are multiple partial upgrades observed: ================================= history: - completionTime: null image: fr2.icr.io/armada-master/ocp-release:4.13.43-x86_64 startedTime: "2024-07-04T06:21:36Z" state: Partial verified: false version: 4.13.43 - completionTime: "2024-07-04T06:21:36Z" image: fr2.icr.io/armada-master/ocp-release:4.12.58-x86_64 startedTime: "2024-06-26T16:25:53Z" state: Partial verified: false version: 4.12.58 - completionTime: "2024-06-26T16:25:53Z" image: fr2.icr.io/armada-master/ocp-release:4.12.56-x86_64 startedTime: "2024-06-05T17:06:12Z" state: Partial verified: false version: 4.12.56 - completionTime: "2024-06-05T17:06:12Z" image: fr2.icr.io/armada-master/ocp-release:4.12.55-x86_64 startedTime: "2024-05-01T19:41:10Z" state: Partial verified: false version: 4.12.55 - completionTime: "2024-05-01T19:41:10Z" image: fr2.icr.io/armada-master/ocp-release:4.12.51-x86_64 startedTime: "2024-04-03T17:32:01Z" state: Partial verified: false version: 4.12.51 - completionTime: "2024-04-03T17:32:01Z" image: fr2.icr.io/armada-master/ocp-release:4.12.49-x86_64 startedTime: "2024-03-06T19:11:42Z" state: Partial verified: false version: 4.12.49 - completionTime: "2024-03-06T19:11:42Z" image: fr2.icr.io/armada-master/ocp-release:4.12.46-x86_64 startedTime: "2024-02-07T19:11:28Z" state: Partial verified: false version: 4.12.46 - completionTime: "2024-02-07T19:11:28Z" image: fr2.icr.io/armada-master/ocp-release:4.12.44-x86_64 startedTime: "2023-12-13T19:06:13Z" state: Partial verified: false version: 4.12.44 - completionTime: "2023-11-16T14:33:09Z" image: fr2.icr.io/armada-master/ocp-release:4.12.40-x86_64 startedTime: "2023-11-16T13:57:39Z" state: Completed verified: false version: 4.12.40 ================================= All the operators are reporting Available (even network and dns)
Version-Release number of selected component (if applicable):
ROKS 4.13.43
How reproducible:
Tried reproducing it on RHOCP but couldn't reproduce. Triggered the upgrade from RHOCP 4.12.40 to 4.12.44 and before CVO tries upgrading Network operator, a new version upgrade was triggered. Above step was repeated till there were 7 partial upgrades. After triggering final upgrade to 4.13.z version, the upgrade completed successfully. Network, dns and machine-config operator got upgraded as well.
Steps to Reproduce:
1. NA 2. 3.
Actual results:
Upgrade is not proceeding further from network operator. The image used by network-operator is from 4.13.43 version itself which means the operator is using new image but CVO is not shoring the updated image in "oc get co" ==================== NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE console 4.13.43 True False False 55d csi-snapshot-controller 4.13.43 True False False 238d dns 4.12.40 True False False 238d image-registry 4.13.43 True False False 133d ingress 4.13.43 True False False 7d3h insights 4.13.43 True False False 98d kube-apiserver 4.13.43 True False False 238d kube-controller-manager 4.13.43 True False False 238d kube-scheduler 4.13.43 True False False 238d kube-storage-version-migrator 4.13.43 True False False 7d5h marketplace 4.13.43 True False False 238d monitoring 4.13.43 True False False 158d network 4.12.40 True False False 238d node-tuning 4.13.43 True False False 7d4h openshift-apiserver 4.13.43 True False False 238d openshift-controller-manager 4.13.43 True False False 238d openshift-samples 4.13.43 True False False 7d7h operator-lifecycle-manager 4.13.43 True False False 238d operator-lifecycle-manager-catalog 4.13.43 True False False 238d operator-lifecycle-manager-packageserver 4.13.43 True False False 7d4h service-ca 4.13.43 True False False 238d storage 4.13.43 True False False 238d
Expected results:
The upgrade should get completed fine.
Additional info:
Captured go routine stacks using below commands: In Terminal 1, run below command: $ oc logs <network-operator-pod> -f /// Leaving this running as it it, do not exit In terminal 1, run below set of commands: $ oc exec -it <network-operator-pod> -- bash $ kill -s QUIT 1 // This will kill the container and you will exit automatically.