Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-10025

[alibabacloud] IPI on alibabacloud with realtime kernal failed due to cluster operator "machine-config" degraded

    XMLWordPrintable

Details

    • Critical
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      installing with enabling realtime kernel failed

      Version-Release number of selected component (if applicable):

      4.13.0-0.nightly-2023-03-11-033820

      How reproducible:

      Always
      1. "create install-config", then insert "credentialsMode: Manual" 
      2. "create manifests", then create the manifest files to enable RT kernel 
      3. create the required credentials manually 
      4. "create cluster"  

      Actual results:

      The installation failed, with co "machine-config" degraded.

      Expected results:

      The installation should succeed.

      Additional info:

      FYI the QE flexy-install job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/185177/
      
      $ oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version             False       False         22m     Error while reconciling 4.13.0-0.nightly-2023-03-11-033820: the cluster operator machine-config is degraded
      $ oc get nodes
      NAME                                        STATUS   ROLES                  AGE   VERSION
      jiwei-0313a-zcmqc-master-0                  Ready    control-plane,master   63m   v1.26.2+bc894ae
      jiwei-0313a-zcmqc-master-1                  Ready    control-plane,master   63m   v1.26.2+bc894ae
      jiwei-0313a-zcmqc-master-2                  Ready    control-plane,master   63m   v1.26.2+bc894ae
      jiwei-0313a-zcmqc-worker-us-east-1a-95gvm   Ready    worker                 37m   v1.26.2+bc894ae
      jiwei-0313a-zcmqc-worker-us-east-1a-hsb9s   Ready    worker                 36m   v1.26.2+bc894ae
      jiwei-0313a-zcmqc-worker-us-east-1b-tkgc2   Ready    worker                 35m   v1.26.2+bc894ae
      $ oc get co machine-config
      NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      machine-config   4.13.0-0.nightly-2023-03-11-033820   True        False         True       50m     Failed to resync 4.13.0-0.nightly-2023-03-11-033820 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 0)]
      $ oc get mcp
      NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      master   rendered-master-376fbf7e7ee4581bf68bc0a2686538ed   False     True       True       3              0                   0                     1                      59m
      worker   rendered-worker-67c5d5a1689043d5419056c2ec3a83b3   False     True       True       3              0                   0                     1                      59m
      $ oc describe co machine-config
      Name:         machine-config
      Namespace:
      Labels:       <none>
      Annotations:  exclude.release.openshift.io/internal-openshift-hosted: true
                    include.release.openshift.io/self-managed-high-availability: true
                    include.release.openshift.io/single-node-developer: true
      API Version:  config.openshift.io/v1
      Kind:         ClusterOperator
      ...output omitted...
      Status:
        Conditions:
          Last Transition Time:  2023-03-13T09:10:36Z
          Message:               Cluster version is 4.13.0-0.nightly-2023-03-11-033820
          Status:                False
          Type:                  Progressing
          Last Transition Time:  2023-03-13T09:27:33Z
          Message:               Failed to resync 4.13.0-0.nightly-2023-03-11-033820 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 0)]
          Reason:                RequiredPoolsFailed
          Status:                True 
          Type:                  Degraded
          Last Transition Time:  2023-03-13T09:10:35Z
          Message:               Cluster has deployed [{operator 4.13.0-0.nightly-2023-03-11-033820}]
          Reason:                AsExpected
          Status:                True 
          Type:                  Available
          Last Transition Time:  2023-03-13T09:17:38Z
          Message:               One or more machine config pools are degraded, please see `oc get mcp` for further details and resolve before upgrading
          Reason:                DegradedPool
          Status:                False
          Type:                  Upgradeable
        Extension:
          Master:  pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node jiwei-0313a-zcmqc-master-0 is reporting: \"error running rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install kernel-rt-core --install kernel-rt-modules --install kernel-rt-modules-extra --install kernel-rt-kvm: \\x1b[0m\\x1b[31merror: \\x1b[0mPackage/capability 'kernel-rt-core' is already requested\\n: exit status 1\""
          Worker:  pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node jiwei-0313a-zcmqc-worker-us-east-1a-95gvm is reporting: \"error running rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install kernel-rt-core --install kernel-rt-modules --install kernel-rt-modules-extra --install kernel-rt-kvm: \\x1b[0m\\x1b[31merror: \\x1b[0mPackage/capability 'kernel-rt-core' is already requested\\n: exit status 1\""
      ...output omitted...
      $ 
      

      Attachments

        Activity

          People

            bteng@redhat.com Bo Teng
            rhn-support-jiwei Jianli Wei
            Jianli Wei Jianli Wei
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: