[OCPBUGS-8667] [gcp] with Confidential Computing enabled, IPI installation failed with "monitoring" degraded due to NoPodReady - Red Hat Issue Tracker

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.13.0
Component/s: Installer / openshift-installer
Labels:
None

Severity:
Important
Regression:
No
Release Blocker:
Proposed
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

with Confidential Computing enabled, IPI installation failed with "monitoring" degraded due to NoPodReady

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-07-131556

How reproducible:

Always

Steps to Reproduce:

1. "create install-config"
2. edit "install-config.yaml" to insert Confidential Computing settings, for example

$ yq-3.3.0 r test3/install-config.yaml platform
gcp:
  projectID: openshift-qe
  region: us-central1
  defaultMachinePlatform:
    confidentialCompute: Enabled
    onHostMaintenance: Terminate
    type: n2d-standard-4
$ 

3. "create cluster"

Actual results:

The installation failed, with the cluster operator "monitoring" degraded.

Expected results:

The installation should succeed.

Additional info:

The Prow CI job: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/37012/rehearse-37012-periodic-ci-openshift-verification-tests-master-installer-rehearse-4.13-installer-rehearse-gcp/1633286920605798400

$ ./oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          119m    Unable to apply 4.13.0-0.nightly-2023-03-07-131556: the cluster operator monitoring is not available
$ ./oc get nodes
NAME                                                          STATUS   ROLES                  AGE    VERSION
ci-op-knzzvf4r-47ef3-r9xz4-master-0.c.openshift-qe.internal   Ready    control-plane,master   115m   v1.26.2+bc894ae
ci-op-knzzvf4r-47ef3-r9xz4-master-1.c.openshift-qe.internal   Ready    control-plane,master   115m   v1.26.2+bc894ae
ci-op-knzzvf4r-47ef3-r9xz4-master-2.c.openshift-qe.internal   Ready    control-plane,master   115m   v1.26.2+bc894ae
ci-op-knzzvf4r-47ef3-r9xz4-worker-a-jtv4j                     Ready    worker                 101m   v1.26.2+bc894ae
ci-op-knzzvf4r-47ef3-r9xz4-worker-b-6k98z                     Ready    worker                 101m   v1.26.2+bc894ae
ci-op-knzzvf4r-47ef3-r9xz4-worker-c-h76sp                     Ready    worker                 101m   v1.26.2+bc894ae
$ ./oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-0.nightly-2023-03-07-131556   True        False         False      93m
baremetal                                  4.13.0-0.nightly-2023-03-07-131556   True        False         False      110m
cloud-controller-manager                   4.13.0-0.nightly-2023-03-07-131556   True        False         False      111m
cloud-credential                           4.13.0-0.nightly-2023-03-07-131556   True        False         False      118m
cluster-autoscaler                         4.13.0-0.nightly-2023-03-07-131556   True        False         False      109m
config-operator                            4.13.0-0.nightly-2023-03-07-131556   True        False         False      111m    
console                                    4.13.0-0.nightly-2023-03-07-131556   True        False         False      97m     
control-plane-machine-set                  4.13.0-0.nightly-2023-03-07-131556   True        False         False      108m    
csi-snapshot-controller                    4.13.0-0.nightly-2023-03-07-131556   True        False         False      110m    
dns                                        4.13.0-0.nightly-2023-03-07-131556   True        False         False      109m    
etcd                                       4.13.0-0.nightly-2023-03-07-131556   True        False         False      109m    
image-registry                             4.13.0-0.nightly-2023-03-07-131556   True        False         False      100m    
ingress                                    4.13.0-0.nightly-2023-03-07-131556   True        False         False      98m     
insights                                   4.13.0-0.nightly-2023-03-07-131556   True        False         False      104m    
kube-apiserver                             4.13.0-0.nightly-2023-03-07-131556   True        False         False      99m     
kube-controller-manager                    4.13.0-0.nightly-2023-03-07-131556   True        False         False      107m    
kube-scheduler                             4.13.0-0.nightly-2023-03-07-131556   True        False         False      106m
kube-storage-version-migrator              4.13.0-0.nightly-2023-03-07-131556   True        False         False      111m
machine-api                                4.13.0-0.nightly-2023-03-07-131556   True        False         False      101m
machine-approver                           4.13.0-0.nightly-2023-03-07-131556   True        False         False      110m
machine-config                             4.13.0-0.nightly-2023-03-07-131556   True        False         False      92m
marketplace                                4.13.0-0.nightly-2023-03-07-131556   True        False         False      109m
monitoring                                                                      False       True          True       93m     NoPodReady: shard 0: pod prometheus-k8s-0: containers with incomplete status: [init-config-reloader]...
network                                    4.13.0-0.nightly-2023-03-07-131556   True        False         False      113m
node-tuning                                4.13.0-0.nightly-2023-03-07-131556   True        False         False      109m
openshift-apiserver                        4.13.0-0.nightly-2023-03-07-131556   True        False         False      102m
openshift-controller-manager               4.13.0-0.nightly-2023-03-07-131556   True        False         False      106m
openshift-samples                          4.13.0-0.nightly-2023-03-07-131556   True        False         False      103m
operator-lifecycle-manager                 4.13.0-0.nightly-2023-03-07-131556   True        False         False      110m
operator-lifecycle-manager-catalog         4.13.0-0.nightly-2023-03-07-131556   True        False         False      110m
operator-lifecycle-manager-packageserver   4.13.0-0.nightly-2023-03-07-131556   True        False         False      104m
service-ca                                 4.13.0-0.nightly-2023-03-07-131556   True        False         False      111m
storage                                    4.13.0-0.nightly-2023-03-07-131556   True        False         False      110m
$ ./oc describe co monitoring
Name:         monitoring
Namespace:
Labels:       <none>
Annotations:  include.release.openshift.io/ibm-cloud-managed: true
              include.release.openshift.io/self-managed-high-availability: true
              include.release.openshift.io/single-node-developer: true
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2023-03-08T02:13:25Z
  Generation:          1
  Managed Fields:
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:include.release.openshift.io/ibm-cloud-managed:
          f:include.release.openshift.io/self-managed-high-availability:
          f:include.release.openshift.io/single-node-developer:
        f:ownerReferences:
          .:
          k:{"uid":"cbb4bb0c-d5ec-4d29-ab4b-292d512073c1"}:
      f:spec:
    Manager:      cluster-version-operator
    Operation:    Update
    Time:         2023-03-08T02:13:25Z
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:extension:
        f:relatedObjects:
    Manager:      cluster-version-operator
    Operation:    Update
    Subresource:  status
    Time:         2023-03-08T02:13:26Z
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
    Manager:      operator
    Operation:    Update
    Subresource:  status
    Time:         2023-03-08T02:39:15Z
  Owner References:
    API Version:     config.openshift.io/v1
    Controller:      true
    Kind:            ClusterVersion
    Name:            version
    UID:             cbb4bb0c-d5ec-4d29-ab4b-292d512073c1
  Resource Version:  31168
  UID:               0c45f5ca-5672-420f-8402-e8914ae08f7a
Spec:
Status:
  Conditions:
    Last Transition Time:  2023-03-08T02:39:15Z
    Message:               NoPodReady: shard 0: pod prometheus-k8s-0: containers with incomplete status: [init-config-reloader]
shard 0: pod prometheus-k8s-1: containers with incomplete status: [init-config-reloader]
    Reason:                UpdatingPrometheusK8SFailed
    Status:                False
    Type:                  Available
    Last Transition Time:  2023-03-08T02:39:15Z
    Message:               NoPodReady: shard 0: pod prometheus-k8s-0: containers with incomplete status: [init-config-reloader]
shard 0: pod prometheus-k8s-1: containers with incomplete status: [init-config-reloader]
    Reason:                UpdatingPrometheusK8SFailed
    Status:                True 
    Type:                  Degraded
    Last Transition Time:  2023-03-08T02:22:47Z
    Message:               Rolling out the stack.
    Reason:                RollOutInProgress
    Status:                True 
    Type:                  Progressing
    Last Transition Time:  2023-03-08T02:22:47Z
    Status:                Unknown
    Type:                  Upgradeable
  Extension:               <nil>
  Related Objects:
    Group:
    Name:      openshift-monitoring
    Resource:  namespaces
    Group:
    Name:      openshift-user-workload-monitoring
    Resource:  namespaces
    Group:     monitoring.coreos.com
    Name:
    Resource:  servicemonitors  
    Group:     monitoring.coreos.com
    Name:
    Resource:  podmonitors
    Group:     monitoring.coreos.com
    Name:
    Resource:  prometheusrules  
    Group:     monitoring.coreos.com
    Name:
    Resource:  alertmanagers
    Group:     monitoring.coreos.com
    Name:
    Resource:  prometheuses
    Group:     monitoring.coreos.com
    Name:
    Resource:  thanosrulers
    Group:     monitoring.coreos.com
    Name:
    Resource:  alertmanagerconfigs
Events:        <none>
$ ./oc get pods -n openshift-monitoring
NAME                                                    READY   STATUS     RESTARTS      AGE
alertmanager-main-0                                     6/6     Running    1 (96m ago)   96m
alertmanager-main-1                                     6/6     Running    1 (97m ago)   97m
cluster-monitoring-operator-6fc7495c9f-v26jg            1/1     Running    0             117m
kube-state-metrics-5844868575-88hr4                     3/3     Running    0             103m
node-exporter-52zf5                                     2/2     Running    0             103m
node-exporter-7rfb6                                     2/2     Running    0             103m
node-exporter-8xt24                                     2/2     Running    0             103m
node-exporter-9w4sl                                     2/2     Running    0             103m
node-exporter-bwvj8                                     2/2     Running    0             103m
node-exporter-k9tpb                                     2/2     Running    0             103m
openshift-state-metrics-8666f44bb-vxhwn                 3/3     Running    0             103m
prometheus-adapter-5c57687ddb-fr5rk                     1/1     Running    0             102m
prometheus-adapter-5c57687ddb-kkwjq                     1/1     Running    0             102m
prometheus-k8s-0                                        0/6     Init:0/1   0             102m
prometheus-k8s-1                                        0/6     Init:0/1   0             96m
prometheus-operator-58b5f659fd-hptbn                    2/2     Running    0             104m
prometheus-operator-admission-webhook-6c7b57b57-v4cxm   1/1     Running    0             114m
prometheus-operator-admission-webhook-6c7b57b57-wf6rs   1/1     Running    0             114m
telemeter-client-6bc4594c4-6g6l4                        3/3     Running    0             102m
thanos-querier-6d47f89d57-69925                         6/6     Running    0             102m
thanos-querier-6d47f89d57-dcbgp                         6/6     Running    0             102m
$ ./oc logs prometheus-k8s-0 -n openshift-monitoring
Error from server (BadRequest): container "prometheus" in pod "prometheus-k8s-0" is waiting to start: PodInitializing
$ ./oc logs prometheus-k8s-1 -n openshift-monitoring
Error from server (BadRequest): container "prometheus" in pod "prometheus-k8s-1" is waiting to start: PodInitializing
$

duplicates

OCPBUGS-7582 RHCOS misses udev rules for GCE PD NVMe disks

Closed

is related to

CORS-2553 CI Integration

Closed

Assignee:: Unassigned

Reporter:: Jianli Wei

QA Contact:: Gaoyun Pei

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2023/03/08 6:11 AM

Updated:: 2023/03/13 6:04 PM

Resolved:: 2023/03/13 6:04 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide