Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-8667

[gcp] with Confidential Computing enabled, IPI installation failed with "monitoring" degraded due to NoPodReady

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • 4.13.0
    • None
    • Important
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      with Confidential Computing enabled, IPI installation failed with "monitoring" degraded due to NoPodReady

      Version-Release number of selected component (if applicable):

      4.13.0-0.nightly-2023-03-07-131556

      How reproducible:

      Always

      Steps to Reproduce:

      1. "create install-config"
      2. edit "install-config.yaml" to insert Confidential Computing settings, for example
      
      $ yq-3.3.0 r test3/install-config.yaml platform
      gcp:
        projectID: openshift-qe
        region: us-central1
        defaultMachinePlatform:
          confidentialCompute: Enabled
          onHostMaintenance: Terminate
          type: n2d-standard-4
      $ 
      
      3. "create cluster" 

      Actual results:

      The installation failed, with the cluster operator "monitoring" degraded.

      Expected results:

      The installation should succeed.

      Additional info:

      The Prow CI job: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/37012/rehearse-37012-periodic-ci-openshift-verification-tests-master-installer-rehearse-4.13-installer-rehearse-gcp/1633286920605798400
      
      $ ./oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version             False       True          119m    Unable to apply 4.13.0-0.nightly-2023-03-07-131556: the cluster operator monitoring is not available
      $ ./oc get nodes
      NAME                                                          STATUS   ROLES                  AGE    VERSION
      ci-op-knzzvf4r-47ef3-r9xz4-master-0.c.openshift-qe.internal   Ready    control-plane,master   115m   v1.26.2+bc894ae
      ci-op-knzzvf4r-47ef3-r9xz4-master-1.c.openshift-qe.internal   Ready    control-plane,master   115m   v1.26.2+bc894ae
      ci-op-knzzvf4r-47ef3-r9xz4-master-2.c.openshift-qe.internal   Ready    control-plane,master   115m   v1.26.2+bc894ae
      ci-op-knzzvf4r-47ef3-r9xz4-worker-a-jtv4j                     Ready    worker                 101m   v1.26.2+bc894ae
      ci-op-knzzvf4r-47ef3-r9xz4-worker-b-6k98z                     Ready    worker                 101m   v1.26.2+bc894ae
      ci-op-knzzvf4r-47ef3-r9xz4-worker-c-h76sp                     Ready    worker                 101m   v1.26.2+bc894ae
      $ ./oc get co
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.13.0-0.nightly-2023-03-07-131556   True        False         False      93m
      baremetal                                  4.13.0-0.nightly-2023-03-07-131556   True        False         False      110m
      cloud-controller-manager                   4.13.0-0.nightly-2023-03-07-131556   True        False         False      111m
      cloud-credential                           4.13.0-0.nightly-2023-03-07-131556   True        False         False      118m
      cluster-autoscaler                         4.13.0-0.nightly-2023-03-07-131556   True        False         False      109m
      config-operator                            4.13.0-0.nightly-2023-03-07-131556   True        False         False      111m    
      console                                    4.13.0-0.nightly-2023-03-07-131556   True        False         False      97m     
      control-plane-machine-set                  4.13.0-0.nightly-2023-03-07-131556   True        False         False      108m    
      csi-snapshot-controller                    4.13.0-0.nightly-2023-03-07-131556   True        False         False      110m    
      dns                                        4.13.0-0.nightly-2023-03-07-131556   True        False         False      109m    
      etcd                                       4.13.0-0.nightly-2023-03-07-131556   True        False         False      109m    
      image-registry                             4.13.0-0.nightly-2023-03-07-131556   True        False         False      100m    
      ingress                                    4.13.0-0.nightly-2023-03-07-131556   True        False         False      98m     
      insights                                   4.13.0-0.nightly-2023-03-07-131556   True        False         False      104m    
      kube-apiserver                             4.13.0-0.nightly-2023-03-07-131556   True        False         False      99m     
      kube-controller-manager                    4.13.0-0.nightly-2023-03-07-131556   True        False         False      107m    
      kube-scheduler                             4.13.0-0.nightly-2023-03-07-131556   True        False         False      106m
      kube-storage-version-migrator              4.13.0-0.nightly-2023-03-07-131556   True        False         False      111m
      machine-api                                4.13.0-0.nightly-2023-03-07-131556   True        False         False      101m
      machine-approver                           4.13.0-0.nightly-2023-03-07-131556   True        False         False      110m
      machine-config                             4.13.0-0.nightly-2023-03-07-131556   True        False         False      92m
      marketplace                                4.13.0-0.nightly-2023-03-07-131556   True        False         False      109m
      monitoring                                                                      False       True          True       93m     NoPodReady: shard 0: pod prometheus-k8s-0: containers with incomplete status: [init-config-reloader]...
      network                                    4.13.0-0.nightly-2023-03-07-131556   True        False         False      113m
      node-tuning                                4.13.0-0.nightly-2023-03-07-131556   True        False         False      109m
      openshift-apiserver                        4.13.0-0.nightly-2023-03-07-131556   True        False         False      102m
      openshift-controller-manager               4.13.0-0.nightly-2023-03-07-131556   True        False         False      106m
      openshift-samples                          4.13.0-0.nightly-2023-03-07-131556   True        False         False      103m
      operator-lifecycle-manager                 4.13.0-0.nightly-2023-03-07-131556   True        False         False      110m
      operator-lifecycle-manager-catalog         4.13.0-0.nightly-2023-03-07-131556   True        False         False      110m
      operator-lifecycle-manager-packageserver   4.13.0-0.nightly-2023-03-07-131556   True        False         False      104m
      service-ca                                 4.13.0-0.nightly-2023-03-07-131556   True        False         False      111m
      storage                                    4.13.0-0.nightly-2023-03-07-131556   True        False         False      110m
      $ ./oc describe co monitoring
      Name:         monitoring
      Namespace:
      Labels:       <none>
      Annotations:  include.release.openshift.io/ibm-cloud-managed: true
                    include.release.openshift.io/self-managed-high-availability: true
                    include.release.openshift.io/single-node-developer: true
      API Version:  config.openshift.io/v1
      Kind:         ClusterOperator
      Metadata:
        Creation Timestamp:  2023-03-08T02:13:25Z
        Generation:          1
        Managed Fields:
          API Version:  config.openshift.io/v1
          Fields Type:  FieldsV1
          fieldsV1:
            f:metadata:
              f:annotations:
                .:
                f:include.release.openshift.io/ibm-cloud-managed:
                f:include.release.openshift.io/self-managed-high-availability:
                f:include.release.openshift.io/single-node-developer:
              f:ownerReferences:
                .:
                k:{"uid":"cbb4bb0c-d5ec-4d29-ab4b-292d512073c1"}:
            f:spec:
          Manager:      cluster-version-operator
          Operation:    Update
          Time:         2023-03-08T02:13:25Z
          API Version:  config.openshift.io/v1
          Fields Type:  FieldsV1
          fieldsV1:
            f:status:
              .:
              f:extension:
              f:relatedObjects:
          Manager:      cluster-version-operator
          Operation:    Update
          Subresource:  status
          Time:         2023-03-08T02:13:26Z
          API Version:  config.openshift.io/v1
          Fields Type:  FieldsV1
          fieldsV1:
            f:status:
              f:conditions:
          Manager:      operator
          Operation:    Update
          Subresource:  status
          Time:         2023-03-08T02:39:15Z
        Owner References:
          API Version:     config.openshift.io/v1
          Controller:      true
          Kind:            ClusterVersion
          Name:            version
          UID:             cbb4bb0c-d5ec-4d29-ab4b-292d512073c1
        Resource Version:  31168
        UID:               0c45f5ca-5672-420f-8402-e8914ae08f7a
      Spec:
      Status:
        Conditions:
          Last Transition Time:  2023-03-08T02:39:15Z
          Message:               NoPodReady: shard 0: pod prometheus-k8s-0: containers with incomplete status: [init-config-reloader]
      shard 0: pod prometheus-k8s-1: containers with incomplete status: [init-config-reloader]
          Reason:                UpdatingPrometheusK8SFailed
          Status:                False
          Type:                  Available
          Last Transition Time:  2023-03-08T02:39:15Z
          Message:               NoPodReady: shard 0: pod prometheus-k8s-0: containers with incomplete status: [init-config-reloader]
      shard 0: pod prometheus-k8s-1: containers with incomplete status: [init-config-reloader]
          Reason:                UpdatingPrometheusK8SFailed
          Status:                True 
          Type:                  Degraded
          Last Transition Time:  2023-03-08T02:22:47Z
          Message:               Rolling out the stack.
          Reason:                RollOutInProgress
          Status:                True 
          Type:                  Progressing
          Last Transition Time:  2023-03-08T02:22:47Z
          Status:                Unknown
          Type:                  Upgradeable
        Extension:               <nil>
        Related Objects:
          Group:
          Name:      openshift-monitoring
          Resource:  namespaces
          Group:
          Name:      openshift-user-workload-monitoring
          Resource:  namespaces
          Group:     monitoring.coreos.com
          Name:
          Resource:  servicemonitors  
          Group:     monitoring.coreos.com
          Name:
          Resource:  podmonitors
          Group:     monitoring.coreos.com
          Name:
          Resource:  prometheusrules  
          Group:     monitoring.coreos.com
          Name:
          Resource:  alertmanagers
          Group:     monitoring.coreos.com
          Name:
          Resource:  prometheuses
          Group:     monitoring.coreos.com
          Name:
          Resource:  thanosrulers
          Group:     monitoring.coreos.com
          Name:
          Resource:  alertmanagerconfigs
      Events:        <none>
      $ ./oc get pods -n openshift-monitoring
      NAME                                                    READY   STATUS     RESTARTS      AGE
      alertmanager-main-0                                     6/6     Running    1 (96m ago)   96m
      alertmanager-main-1                                     6/6     Running    1 (97m ago)   97m
      cluster-monitoring-operator-6fc7495c9f-v26jg            1/1     Running    0             117m
      kube-state-metrics-5844868575-88hr4                     3/3     Running    0             103m
      node-exporter-52zf5                                     2/2     Running    0             103m
      node-exporter-7rfb6                                     2/2     Running    0             103m
      node-exporter-8xt24                                     2/2     Running    0             103m
      node-exporter-9w4sl                                     2/2     Running    0             103m
      node-exporter-bwvj8                                     2/2     Running    0             103m
      node-exporter-k9tpb                                     2/2     Running    0             103m
      openshift-state-metrics-8666f44bb-vxhwn                 3/3     Running    0             103m
      prometheus-adapter-5c57687ddb-fr5rk                     1/1     Running    0             102m
      prometheus-adapter-5c57687ddb-kkwjq                     1/1     Running    0             102m
      prometheus-k8s-0                                        0/6     Init:0/1   0             102m
      prometheus-k8s-1                                        0/6     Init:0/1   0             96m
      prometheus-operator-58b5f659fd-hptbn                    2/2     Running    0             104m
      prometheus-operator-admission-webhook-6c7b57b57-v4cxm   1/1     Running    0             114m
      prometheus-operator-admission-webhook-6c7b57b57-wf6rs   1/1     Running    0             114m
      telemeter-client-6bc4594c4-6g6l4                        3/3     Running    0             102m
      thanos-querier-6d47f89d57-69925                         6/6     Running    0             102m
      thanos-querier-6d47f89d57-dcbgp                         6/6     Running    0             102m
      $ ./oc logs prometheus-k8s-0 -n openshift-monitoring
      Error from server (BadRequest): container "prometheus" in pod "prometheus-k8s-0" is waiting to start: PodInitializing
      $ ./oc logs prometheus-k8s-1 -n openshift-monitoring
      Error from server (BadRequest): container "prometheus" in pod "prometheus-k8s-1" is waiting to start: PodInitializing
      $ 
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rhn-support-jiwei Jianli Wei
              Gaoyun Pei Gaoyun Pei
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: