Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-19564

GCP cluster installation fails as some operators are unstable

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Undefined Undefined
    • None
    • 4.14
    • None
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Installation fails as cluster operators are not stable.
      
      Reprinting Cluster State:
      When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
      ClusterID: 807269fa-0d64-48f9-817b-00d3799f67eb
      ClusterVersion: Installing "4.14.0-0.nightly-2023-09-20-033502" for 3 hours: Unable to apply 4.14.0-0.nightly-2023-09-20-033502: some cluster operators are not available
      ClusterOperators:
          clusteroperator/config-operator is not upgradeable because FeatureGatesUpgradeable: "TechPreviewNoUpgrade" does not allow updates
          clusteroperator/image-registry is degraded because Degraded: Registry deployment has timed out progressing: ReplicaSet "image-registry-7d8667cdf7" has timed out progressing.
          clusteroperator/ingress is degraded because The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: DeploymentReplicasAllAvailable=False (DeploymentReplicasNotAvailable: 1/2 of replicas are available)
          clusteroperator/kube-apiserver is not upgradeable because FeatureGatesUpgradeable: "TechPreviewNoUpgrade" does not allow updates
          clusteroperator/kube-controller-manager is degraded because GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: no such host
          clusteroperator/machine-config is not upgradeable because One or more machine config pools are updating, please see `oc get mcp` for further details
          clusteroperator/monitoring is not available (reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: context deadline exceeded) because reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: context deadline exceeded
          clusteroperator/storage is not available (SHARESCSIDriverOperatorCRAvailable: SharedResourcesDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service) because GCPPDCSIDriverOperatorCRDegraded: All is well
      SHARESCSIDriverOperatorCRDegraded: All is well

      Version-Release number of selected component (if applicable):

      4.14

      How reproducible:

      1 of 2 attempts fail

      Steps to Reproduce:

      1. Install GCP cluster with latest builld
      2.
      3.
      

      Actual results:

      Cluster install fails

      Expected results:

      Cluster install should succeed every time

      Additional info:

      The cluster is created with feature_set: "TechPreviewNoUpgrade"

            rh-ee-bbarbach Brent Barbachem
            rhn-support-asood Arti Sood
            Jianli Wei Jianli Wei
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: