Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.13.0
Component/s: Installer / Alibaba Cloud
Labels:
- pre-merge

Severity:
Critical
Regression:
No
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

installing with enabling realtime kernel failed

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-11-033820

How reproducible:

Always

1. "create install-config", then insert "credentialsMode: Manual" 
2. "create manifests", then create the manifest files to enable RT kernel 
3. create the required credentials manually 
4. "create cluster"

Actual results:

The installation failed, with co "machine-config" degraded.

Expected results:

The installation should succeed.

Additional info:

FYI the QE flexy-install job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/185177/

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       False         22m     Error while reconciling 4.13.0-0.nightly-2023-03-11-033820: the cluster operator machine-config is degraded
$ oc get nodes
NAME                                        STATUS   ROLES                  AGE   VERSION
jiwei-0313a-zcmqc-master-0                  Ready    control-plane,master   63m   v1.26.2+bc894ae
jiwei-0313a-zcmqc-master-1                  Ready    control-plane,master   63m   v1.26.2+bc894ae
jiwei-0313a-zcmqc-master-2                  Ready    control-plane,master   63m   v1.26.2+bc894ae
jiwei-0313a-zcmqc-worker-us-east-1a-95gvm   Ready    worker                 37m   v1.26.2+bc894ae
jiwei-0313a-zcmqc-worker-us-east-1a-hsb9s   Ready    worker                 36m   v1.26.2+bc894ae
jiwei-0313a-zcmqc-worker-us-east-1b-tkgc2   Ready    worker                 35m   v1.26.2+bc894ae
$ oc get co machine-config
NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config   4.13.0-0.nightly-2023-03-11-033820   True        False         True       50m     Failed to resync 4.13.0-0.nightly-2023-03-11-033820 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 0)]
$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-376fbf7e7ee4581bf68bc0a2686538ed   False     True       True       3              0                   0                     1                      59m
worker   rendered-worker-67c5d5a1689043d5419056c2ec3a83b3   False     True       True       3              0                   0                     1                      59m
$ oc describe co machine-config
Name:         machine-config
Namespace:
Labels:       <none>
Annotations:  exclude.release.openshift.io/internal-openshift-hosted: true
              include.release.openshift.io/self-managed-high-availability: true
              include.release.openshift.io/single-node-developer: true
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
...output omitted...
Status:
  Conditions:
    Last Transition Time:  2023-03-13T09:10:36Z
    Message:               Cluster version is 4.13.0-0.nightly-2023-03-11-033820
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2023-03-13T09:27:33Z
    Message:               Failed to resync 4.13.0-0.nightly-2023-03-11-033820 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 0)]
    Reason:                RequiredPoolsFailed
    Status:                True 
    Type:                  Degraded
    Last Transition Time:  2023-03-13T09:10:35Z
    Message:               Cluster has deployed [{operator 4.13.0-0.nightly-2023-03-11-033820}]
    Reason:                AsExpected
    Status:                True 
    Type:                  Available
    Last Transition Time:  2023-03-13T09:17:38Z
    Message:               One or more machine config pools are degraded, please see `oc get mcp` for further details and resolve before upgrading
    Reason:                DegradedPool
    Status:                False
    Type:                  Upgradeable
  Extension:
    Master:  pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node jiwei-0313a-zcmqc-master-0 is reporting: \"error running rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install kernel-rt-core --install kernel-rt-modules --install kernel-rt-modules-extra --install kernel-rt-kvm: \\x1b[0m\\x1b[31merror: \\x1b[0mPackage/capability 'kernel-rt-core' is already requested\\n: exit status 1\""
    Worker:  pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node jiwei-0313a-zcmqc-worker-us-east-1a-95gvm is reporting: \"error running rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install kernel-rt-core --install kernel-rt-modules --install kernel-rt-modules-extra --install kernel-rt-kvm: \\x1b[0m\\x1b[31merror: \\x1b[0mPackage/capability 'kernel-rt-core' is already requested\\n: exit status 1\""
...output omitted...
$

Assignee:: Bo Teng (Inactive)

Reporter:: Jianli Wei

QA Contact:: Jianli Wei

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/03/13 10:56 AM

Updated:: 2023/09/14 10:01 AM

Resolved:: 2023/04/07 6:29 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates