Uploaded image for project: 'Multiple Architecture Enablement'
  1. Multiple Architecture Enablement
  2. MULTIARCH-2013

Unable to connect to etcd service using OCP 4.10.0-0.nightly-s390x-2021-12-08-123942 with disconnected installation

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • 4.10.0
    • 4.10
    • Multi-Arch
    • None
    • False
    • False
    • NEW
    • NEW

    Description

      Description of problem:
      When performing a disconnected installation when using any of the latest OCP 4.10 nightly builds, the bootstrap.service will encounter this error and loop:

      Dec 08 17:40:38 bootstrap-0.pok-239-dec-qemu.ocptest.pok.stglabs.ibm.com bootkube.sh[2345]:

      {"level":"warn","ts":1638985238.1713336,"logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00043c380/#initially=[https://localhost:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp [::1]:2379: connect: connection refused\""}

      Dec 08 17:40:38 bootstrap-0.pok-239-dec-qemu.ocptest.pok.stglabs.ibm.com bootkube.sh[2345]: https://localhost:2379 is unhealthy: failed to commit proposal: context deadline exceeded
      Dec 08 17:40:38 bootstrap-0.pok-239-dec-qemu.ocptest.pok.stglabs.ibm.com bootkube.sh[2345]: Error: unhealthy cluster
      Dec 08 17:40:38 bootstrap-0.pok-239-dec-qemu.ocptest.pok.stglabs.ibm.com bootkube.sh[2345]: etcdctl failed. Retrying in 5 seconds...

      The problem seems to only happen when performing disconnected installs on KVM. We do not see this issue on zVM.

      Version-Release number of selected component (if applicable):
      4.10.0-0.nightly-s390x-2021-12-08-123942

      How reproducible:
      Consistently reproducible.

      Steps to Reproduce:
      1. Use OCP build 4.10.0-0.nightly-s390x-2021-12-08-123942 to perform disconnected install.
      2. Start bootstap node.
      3. Monitor bootkube.service on bootstrap node for etcd error. Attached is a copy of the bootkube.service log.

      Actual results:
      The etcd server is unavailable.

      Expected results:
      A healthy etcd server to establish connection to.

      Additional info:
      The etcd issue was first reported in an earlier bugzilla 2029289. For KVM, we had two issues - the common problem for KVM and zVM was resolved by consolidating the pullSecret stanza. However, only KVM would go on and encounter this etcd failure as detailed above.

      Attachments

        Issue Links

          Activity

            People

              chanphil Philip Chan (Inactive)
              chanphil Philip Chan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: