Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11533

SNO installation fails when more than 1 IP address is present on node

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Normal Normal
    • None
    • 4.10.z
    • Etcd
    • None
    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When customer tries an SNO installation the api server fails to reach the ETCD because it tries to connect to the wrong address. The installation eventually fails too.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Always

      Steps to Reproduce:

      1. Try to install SNO on a node with 2 IP address
      2.
      3.
      

      Actual results:

      API server excerpt:
      $ oc logs -n openshift-apiserver apiserver-559d986669-ps2sx openshift-apiserver
      2023-03-31T12:32:56.028893956Z Copying system trust bundle
      2023-03-31T12:32:56.087920219Z I0331 12:32:56.087824       1 dynamic_serving_content.go:112] "Loaded a new cert/key pair" name="serving-cert::/var/run/secrets/serving-cert/tls.crt::/var/run/secrets/serving-cert/tls.key"
      2023-03-31T12:32:56.451517004Z I0331 12:32:56.451485       1 requestheader_controller.go:244] Loaded a new request header values for RequestHeaderAuthRequestController
      2023-03-31T12:32:56.452644985Z I0331 12:32:56.452624       1 audit.go:353] Using audit backend: ignoreErrors<log>
      2023-03-31T12:32:56.453520829Z I0331 12:32:56.453491       1 plugins.go:84] "Registered admission plugin" plugin="NamespaceLifecycle"
      2023-03-31T12:32:56.453529906Z I0331 12:32:56.453523       1 plugins.go:84] "Registered admission plugin" plugin="ValidatingAdmissionWebhook"
      2023-03-31T12:32:56.453536253Z I0331 12:32:56.453530       1 plugins.go:84] "Registered admission plugin" plugin="MutatingAdmissionWebhook"
      2023-03-31T12:32:56.453902013Z I0331 12:32:56.453890       1 admission.go:48] Admission plugin "project.openshift.io/ProjectRequestLimit" is not configured so it will be disabled.
      2023-03-31T12:32:56.454432959Z I0331 12:32:56.454421       1 plugins.go:158] Loaded 5 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,build.openshift.io/BuildConfigSecretInjector,image.openshift.io/ImageLimitRange,image.openshift.io/ImagePolicy,MutatingAdmissionWebhook.
      2023-03-31T12:32:56.454439216Z I0331 12:32:56.454431       1 plugins.go:161] Loaded 9 validating admission controller(s) successfully in the following order: OwnerReferencesPermissionEnforcement,build.openshift.io/BuildConfigSecretInjector,build.openshift.io/BuildByStrategy,image.openshift.io/ImageLimitRange,image.openshift.io/ImagePolicy,quota.openshift.io/ClusterResourceQuota,route.openshift.io/RequiredRouteAnnotations,ValidatingAdmissionWebhook,ResourceQuota.
      2023-03-31T12:32:56.460960855Z W0331 12:32:56.460948       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting...
      2023-03-31T12:32:57.465246823Z W0331 12:32:57.465139       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting...
      2023-03-31T12:32:58.783470542Z W0331 12:32:58.783435       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting...
      2023-03-31T12:33:01.325191617Z W0331 12:33:01.325091       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting...
      2023-03-31T12:33:05.609634050Z W0331 12:33:05.609512       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting...
      2023-03-31T12:33:12.663589220Z W0331 12:33:12.663484       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting...
      2023-03-31T12:33:16.457605396Z F0331 12:33:16.457507       1 openshift_apiserver.go:379] context deadline exceeded
      
      Node IPs:
      $ oc get node -o wide
      NAME   STATUS   ROLES           AGE   VERSION            INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                        KERNEL-VERSION                 CONTAINER-RUNTIME
      vdu    Ready    master,worker   14h   v1.23.12+8a6bfe4   10.240.92.13   <none>        Red Hat Enterprise Linux CoreOS 410.84.202212061416-0 (Ootpa)   4.18.0-305.72.1.el8_4.x86_64   cri-o://1.23.3-20.rhaos4.10.git89344de.el8
      
      $ oc get node -o wide -o yaml | grep host-addresses
            k8s.ovn.org/host-addresses: '["10.240.92.13","16.1.15.2"]'
      
      KubeAPIServer:
      $ oc get KubeAPIServer cluster -o yaml | grep -i etcd-servers -A2
            etcd-servers:
            - https://16.1.15.2:2379
            - https://localhost:2379
      
      ETCD:
      $ oc get po -n openshift-etcd -o wide
      NAME              READY   STATUS      RESTARTS   AGE   IP             NODE
      etcd-vdu          4/5     Running     166        14h   10.240.92.13   vdu
      installer-2-vdu   0/1     Completed   0          14h   10.128.0.39    vdu
      installer-3-vdu   0/1     Completed   0          14h   10.128.0.51    vdu
      
      $ oc get ep -n openshift-etcd etcd -o yaml | yq .subsets
      [
        {
          "notReadyAddresses": [
            {
              "ip": "10.240.92.13",
              "nodeName": "vdu",
              "targetRef": {
                "kind": "Pod",
                "name": "etcd-vdu",
                "namespace": "openshift-etcd",
                "resourceVersion": "581549",
                "uid": "a63c5723-6b94-4b15-90fd-bbcfcdf88e81"
              }
            }
          ],
          "ports": [
            {
              "name": "etcd",
              "port": 2379,
              "protocol": "TCP"
            },
            {
              "name": "etcd-metrics",
              "port": 9979,
              "protocol": "TCP"
            }
          ]
        }
      ]
      
         

      Expected results:

      ETCD should be contacted on its pod IP address, and the installation should succeed.

      Additional info:

       

            dwest@redhat.com Dean West
            fcristin1@redhat.com Francesco Cristini
            ge liu ge liu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: