-
Bug
-
Resolution: Done-Errata
-
Major
-
None
-
4.17.0, 4.16.z
Description of problem:
NROP operator upgrade gets stuck on pending state when trying to upgrade
Version-Release number of selected component (if applicable):
Reproducible on 4.16 and 4.17
How reproducible:
Everytime
Steps to Reproduce:
1. Apply a kubeletconfig and wait for MCP update
apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: cnf-worker-tuning spec: machineConfigPoolSelector: matchLabels: machineconfiguration.openshift.io/role: worker-cnf kubeletConfig: cpuManagerPolicy: "static" cpuManagerReconcilePeriod: "5s" reservedSystemCPUs: "0,1" memoryManagerPolicy: "Static" evictionHard: memory.available: "100Mi" kubeReserved: memory: "512Mi" reservedMemory: - numaNode: 0 limits: memory: "1124Mi" systemReserved: memory: "512Mi" topologyManagerPolicy: "single-numa-node" topologyManagerScope: "container"
2. Install NROP operator from production
a. Expect to see the controller pod + csv
NAME READY STATUS RESTARTS AGE pod/numaresources-controller-manager-6fbd7776b5-fls6s 1/1 Running 0 22m NAME DISPLAY VERSION REPLACES PHASE clusterserviceversion.operators.coreos.com/numaresources-operator.v4.16.2 numaresources-operator 4.16.2 numaresources-operator.v4.16.1 Succeeded
3. Apply the NROP CR and expect to see RTE pods
NAME READY STATUS RESTARTS AGE pod/numaresources-controller-manager-6fbd7776b5-fls6s 1/1 Running 0 22m pod/numaresourcesoperator-worker-cnf-59szj 2/2 Running 0 12m pod/numaresourcesoperator-worker-cnf-dgqv9 2/2 Running 0 12m
4. Upgrade operator by creating a new catalogsource for with the desired upgradable image source
Catalogsource.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: nrop-iib-operator-catalog namespace: openshift-marketplace spec: sourceType: grpc image: registry-proxy.engineering.redhat.com/rh-osbs/iib:823225 displayName: nrop iib-operator-catalog Catalog publisher: grpc
5. Edit the current subscription channel to it
oc edit sub/<sub_name> -n openshift-numaresources
under the spec if needed change the channel to the new channel and the source to point to the new catalogsource
6. Wait for the csv to flip from older version to the new version likewise:
NAME DISPLAY VERSION REPLACES PHASE clusterserviceversion.operators.coreos.com/numaresources-operator.v4.16.3 numaresources-operator 4.16.3 numaresources-operator.v4.16.2 Succeeded
- for this example it was 4.16.2-8 to 4.16.3-3
7. Make sure the controller pod is using the new build and the RTE pods use the new image
Actual results:
NAME DISPLAY VERSION REPLACES PHASE clusterserviceversion.operators.coreos.com/numaresources-operator.v4.16.2 numaresources-operator 4.16.2 numaresources-operator.v4.16.1 Replacing clusterserviceversion.operators.coreos.com/numaresources-operator.v4.16.3 numaresources-operator 4.16.3 numaresources-operator.v4.16.2 Pending
Expected results:
NAME DISPLAY VERSION REPLACES PHASE clusterserviceversion.operators.coreos.com/numaresources-operator.v4.16.3 numaresources-operator 4.16.3 numaresources-operator.v4.16.2 Succeeded
Additional info:
- is cloned by
-
OCPBUGS-43870 [4.16] NROP operator upgrade gets stuck on pending when trying to upgrade
-
- Closed
-
- links to
-
RHBA-2024:140204 OpenShift Container Platform 4.17.3 low-latency extras update