Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-34158

infra machine going to failed status unexpectedly

XMLWordPrintable

    • +
    • Low
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, a transient failure to fetch bootstrap data during machine creation, like a transient failure to connect to the API server, caused the machine to enter a terminal failed state. Now, failure to fetch bootstrap data during machine creation is retried. indefinitely until it eventually succeeds. (link:https://issues.redhat.com/browse/OCPBUGS-34158[*OCPBUGS-34158*])
      Show
      * Previously, a transient failure to fetch bootstrap data during machine creation, like a transient failure to connect to the API server, caused the machine to enter a terminal failed state. Now, failure to fetch bootstrap data during machine creation is retried. indefinitely until it eventually succeeds. (link: https://issues.redhat.com/browse/OCPBUGS-34158 [* OCPBUGS-34158 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-33954. The following is the description of the original issue:

      Description of problem:

      Infra machine is going to failed status:

      2024-05-18 07:26:49.815 | NAMESPACE               NAME                          PHASE     TYPE     REGION      ZONE   AGE
      2024-05-18 07:26:49.822 | openshift-machine-api   ostest-wgdc2-infra-0-4sqdh    Running   master   regionOne   nova   31m
      2024-05-18 07:26:49.826 | openshift-machine-api   ostest-wgdc2-infra-0-ssx8j    Failed                                31m
      2024-05-18 07:26:49.831 | openshift-machine-api   ostest-wgdc2-infra-0-tfkf5    Running   master   regionOne   nova   31m
      2024-05-18 07:26:49.841 | openshift-machine-api   ostest-wgdc2-master-0         Running   master   regionOne   nova   38m
      2024-05-18 07:26:49.847 | openshift-machine-api   ostest-wgdc2-master-1         Running   master   regionOne   nova   38m
      2024-05-18 07:26:49.852 | openshift-machine-api   ostest-wgdc2-master-2         Running   master   regionOne   nova   38m
      2024-05-18 07:26:49.858 | openshift-machine-api   ostest-wgdc2-worker-0-d5cdp   Running   worker   regionOne   nova   31m
      2024-05-18 07:26:49.868 | openshift-machine-api   ostest-wgdc2-worker-0-jcxml   Running   worker   regionOne   nova   31m
      2024-05-18 07:26:49.873 | openshift-machine-api   ostest-wgdc2-worker-0-t29fz   Running   worker   regionOne   nova   31m 

      Logs from machine-controller shows below error:

      2024-05-18T06:59:11.159013162Z I0518 06:59:11.158938       1 controller.go:156] ostest-wgdc2-infra-0-ssx8j: reconciling Machine
      2024-05-18T06:59:11.159589148Z I0518 06:59:11.159529       1 recorder.go:104] events "msg"="Reconciled machine ostest-wgdc2-worker-0-jcxml" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ostest-wgdc2-worker-0-jcxml","uid":"245bac8e-c110-4bef-ac11-3d3751a93353","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"18617"} "reason"="Reconciled" "type"="Normal"
      2024-05-18T06:59:12.749966746Z I0518 06:59:12.749845       1 controller.go:349] ostest-wgdc2-infra-0-ssx8j: reconciling machine triggers idempotent create
      2024-05-18T07:00:00.487702632Z E0518 07:00:00.486365       1 leaderelection.go:332] error retrieving resource lock openshift-machine-api/cluster-api-provider-openstack-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-api-provider-openstack-leader": http2: client connection lost
      2024-05-18T07:00:00.487702632Z W0518 07:00:00.486497       1 controller.go:351] ostest-wgdc2-infra-0-ssx8j: failed to create machine: error creating bootstrap for ostest-wgdc2-infra-0-ssx8j: Get "https://172.30.0.1:443/api/v1/namespaces/openshift-machine-api/secrets/worker-user-data": http2: client connection lost
      2024-05-18T07:00:00.487702632Z I0518 07:00:00.486534       1 controller.go:391] Actuator returned invalid configuration error: error creating bootstrap for ostest-wgdc2-infra-0-ssx8j: Get "https://172.30.0.1:443/api/v1/namespaces/openshift-machine-api/secrets/worker-user-data": http2: client connection lost
      2024-05-18T07:00:00.487702632Z I0518 07:00:00.486548       1 controller.go:404] ostest-wgdc2-infra-0-ssx8j: going into phase "Failed"   

      The openstack VM is not even created:

      2024-05-18 07:26:50.911 | +--------------------------------------+-----------------------------+--------+---------------------------------------------------------------------------------------------------------------------+--------------------+--------+
      2024-05-18 07:26:50.917 | | ID                                   | Name                        | Status | Networks                                                                                                            | Image              | Flavor |
      2024-05-18 07:26:50.924 | +--------------------------------------+-----------------------------+--------+---------------------------------------------------------------------------------------------------------------------+--------------------+--------+
      2024-05-18 07:26:50.929 | | 3a1b9af6-d284-4da5-8ebe-434d3aa95131 | ostest-wgdc2-worker-0-jcxml | ACTIVE | StorageNFS=172.17.5.187; network-dualstack=192.168.192.185, fd2e:6f44:5dd8:c956:f816:3eff:fe3e:4e7c                 | ostest-wgdc2-rhcos | worker |
      2024-05-18 07:26:50.935 | | 5c34b78a-d876-49fb-a307-874d3c197c44 | ostest-wgdc2-infra-0-tfkf5  | ACTIVE | network-dualstack=192.168.192.133, fd2e:6f44:5dd8:c956:f816:3eff:fee6:4410, fd2e:6f44:5dd8:c956:f816:3eff:fef2:930a | ostest-wgdc2-rhcos | master |
      2024-05-18 07:26:50.941 | | d2025444-8e11-409d-8a87-3f1082814af1 | ostest-wgdc2-infra-0-4sqdh  | ACTIVE | network-dualstack=192.168.192.156, fd2e:6f44:5dd8:c956:f816:3eff:fe82:ae56, fd2e:6f44:5dd8:c956:f816:3eff:fe86:b6d1 | ostest-wgdc2-rhcos | master |
      2024-05-18 07:26:50.947 | | dcbde9ac-da5a-44c8-b64f-049f10b6b50c | ostest-wgdc2-worker-0-t29fz | ACTIVE | StorageNFS=172.17.5.233; network-dualstack=192.168.192.13, fd2e:6f44:5dd8:c956:f816:3eff:fe94:a2d2                  | ostest-wgdc2-rhcos | worker |
      2024-05-18 07:26:50.951 | | 8ad98adf-147c-4268-920f-9eb5c43ab611 | ostest-wgdc2-worker-0-d5cdp | ACTIVE | StorageNFS=172.17.5.217; network-dualstack=192.168.192.173, fd2e:6f44:5dd8:c956:f816:3eff:fe22:5cff                 | ostest-wgdc2-rhcos | worker |
      2024-05-18 07:26:50.957 | | f01d6740-2954-485d-865f-402b88789354 | ostest-wgdc2-master-2       | ACTIVE | StorageNFS=172.17.5.177; network-dualstack=192.168.192.198, fd2e:6f44:5dd8:c956:f816:3eff:fe1f:3c64                 | ostest-wgdc2-rhcos | master |
      2024-05-18 07:26:50.963 | | d215a70f-760d-41fb-8e30-9f3106dbaabe | ostest-wgdc2-master-1       | ACTIVE | StorageNFS=172.17.5.163; network-dualstack=192.168.192.152, fd2e:6f44:5dd8:c956:f816:3eff:fe4e:67b6                 | ostest-wgdc2-rhcos | master |
      2024-05-18 07:26:50.968 | | 53fe495b-f617-412d-9608-47cd355bc2e5 | ostest-wgdc2-master-0       | ACTIVE | StorageNFS=172.17.5.170; network-dualstack=192.168.192.193, fd2e:6f44:5dd8:c956:f816:3eff:febd:a836                 | ostest-wgdc2-rhcos | master |
      2024-05-18 07:26:50.975 | +--------------------------------------+-----------------------------+--------+---------------------------------------------------------------------------------------------------------------------+--------------------+--------+ 

      Version-Release number of selected component (if applicable):

      RHOS-17.1-RHEL-9-20240123.n.1
      4.15.0-0.nightly-2024-05-16-091947

      Additional info:

         Must-gather link provided on private comment.

            rhn-gps-mbooth Matthew Booth
            openshift-crt-jira-prow OpenShift Prow Bot
            Itay Matza Itay Matza
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: