Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20401

Master MCP is degraded because of MC not found

XMLWordPrintable

    • Critical
    • No
    • Approved
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, installations on AWS could fail because the installation program did not create the cloud.conf file with the necessary service endpoints in it. This led to the machine config operator creating an empty cloud.conf file that lacked the service endpoints, leading to an error. With this update, the installation program always creates the cloud.conf file so that the installation succeeds. (link:https://issues.redhat.com/browse/OCPBUGS-20401[*OCPBUGS-20401*])
      Show
      Previously, installations on AWS could fail because the installation program did not create the cloud.conf file with the necessary service endpoints in it. This led to the machine config operator creating an empty cloud.conf file that lacked the service endpoints, leading to an error. With this update, the installation program always creates the cloud.conf file so that the installation succeeds. (link: https://issues.redhat.com/browse/OCPBUGS-20401 [* OCPBUGS-20401 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-12707. The following is the description of the original issue:

      Description of problem:

      
      When we deploy a cluster in AWS using this template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-sts-private-s3-custom_endpoints-ci master MCP is degraded and reports this error:
      
        - lastTransitionTime: "2023-04-25T07:48:45Z"
          message: 'Node ip-10-0-55-111.us-east-2.compute.internal is reporting: "machineconfig.machineconfiguration.openshift.io
            \"rendered-master-8ef3f9cb45adb7bbe5f819eb831ffd7d\" not found", Node ip-10-0-60-138.us-east-2.compute.internal
            is reporting: "machineconfig.machineconfiguration.openshift.io \"rendered-master-8ef3f9cb45adb7bbe5f819eb831ffd7d\"
            not found", Node ip-10-0-69-137.us-east-2.compute.internal is reporting: "machineconfig.machineconfiguration.openshift.io
            \"rendered-master-8ef3f9cb45adb7bbe5f819eb831ffd7d\" not found"'
          reason: 3 nodes are reporting degraded status on sync
          status: "True"
          type: NodeDegraded
      
      
      

      Version-Release number of selected component (if applicable):

      $ oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version             False       False         3h12m   Error while reconciling 4.14.0-0.nightly-2023-04-19-125337: the cluster operator machine-config is degraded
      
      

      How reproducible:

      2 out of 2.
      
      

      Steps to Reproduce:

      1. Install OCP using this template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-sts-private-s3-custom_endpoints-ci
      
      We can see examples of this installation here:
      https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/198964/
      
      and here:
      https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/199028/
      
      
      Builds have been marked as keep forever, but just in case, the parameters are:
      
      INSTANCE_NAME_PREFIX: Your ID, any short string just make it sure it is unit.
      VARIABLES_LOCATION: private-templates/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-sts-private-s3-custom_endpoints-ci
      LAUNCHER_VARS: <leave empty>
      BUSHSLICER_CONFIG:<leave emtpy>
      
      

      Actual results:

      
      The installation failed reporting a degrade master MCP
      
      $ oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version             False       False         3h12m   Error while reconciling 4.14.0-0.nightly-2023-04-19-125337: the cluster operator machine-config is degraded
      
      $ oc get mcp
      NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      master                                                      False     True       True       3              0                   0                     3                      4h21m
      worker   rendered-worker-166729d2617b1b63cf5d9bb818dd9cf8   True      False      False      3              3                   3                     0                      4h21m
      
      
      

      Expected results:

      Installation should finish without problems and no MCP should be degraded
      

      Additional info:

      Must gather linked in the first comment
      

            padillon Patrick Dillon
            openshift-crt-jira-prow OpenShift Prow Bot
            Yunfei Jiang Yunfei Jiang
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: