Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-6075

nodelink controller silently fails to update node labels

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • 4.12, 4.11
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      In cases when the user defined labels in a MachineSet's .spec.template.spec.metadata.labels field are malformed (meaning they are not valid labels as defined by Kubernetes[0]), they are unable to be copied to the related Node object. In these cases the nodelink-controller does not show any error in its logs.

      Version-Release number of selected component (if applicable):

      confirmed on 4.12 and 4.11, but might go back further.

      How reproducible:

      always

      Steps to Reproduce:

      1. Add the value `foo: bar/baz` to the .spec.metadata.labels field of a Machine object.
      2. Check the related Node object to see if the label is added to the .metadata.labels field
      3. Check the nodelink-controller logs for signs of failure.
      

      Actual results:

      when testing in a default deployment of the nodelink-controller with the verbosity flag set as `--v=3`, the log output looks like this:
      
      I0119 22:26:15.468903       1 nodelink_controller.go:412] Finding machine from node "ip-10-0-250-158.us-east-2.compute.internal"
      I0119 22:26:15.468910       1 nodelink_controller.go:429] Finding machine from node "ip-10-0-250-158.us-east-2.compute.internal" by ProviderID
      I0119 22:26:15.469011       1 nodelink_controller.go:445] Found machine "ci-ln-lhbdlf2-76ef8-7wtws-worker-us-east-2c-qjrfr" for node "ip-10-0-250-158.us-east-2.compute.internal" with providerID "aws:///us-east-2c/i-0feac3583b8e4d7c9"
      I0119 22:26:15.469063       1 nodelink_controller.go:250] Node "ip-10-0-250-158.us-east-2.compute.internal" has changed, updating
      
      if the verbosity is increased to `--v=4`, the logs change to this:
      
      I0119 22:30:43.687109       1 nodelink_controller.go:412] Finding machine from node "ip-10-0-250-158.us-east-2.compute.internal"
      I0119 22:30:43.687115       1 nodelink_controller.go:429] Finding machine from node "ip-10-0-250-158.us-east-2.compute.internal" by ProviderID
      I0119 22:30:43.687150       1 nodelink_controller.go:445] Found machine "ci-ln-lhbdlf2-76ef8-7wtws-worker-us-east-2c-qjrfr" for node "ip-10-0-250-158.us-east-2.compute.internal" with providerID "aws:///us-east-2c/i-0feac3583b8e4d7c9"
      I0119 22:30:43.687198       1 nodelink_controller.go:243] Copying label foo = bar/baz
      I0119 22:30:43.687206       1 nodelink_controller.go:243] Copying label machine.openshift.io/interruptible-instance = 
      I0119 22:30:43.687221       1 nodelink_controller.go:250] Node "ip-10-0-250-158.us-east-2.compute.internal" has changed, updating
      
      we can clearly see that the labels are being updated.
      
      if we then increase the log verbosity to `--v=11`, the true error emerges:
      
      I0119 22:38:03.592617       1 round_trippers.go:466] curl -v -XPUT  -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "Content-Type: application/vnd.kubernetes.protobuf" -H "User-Agent: nodelink-controller/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer <masked>" 'https://172.30.0.1:443/api/v1/nodes/ip-10-0-250-158.us-east-2.compute.internal'
      I0119 22:38:03.602130       1 round_trippers.go:553] PUT https://172.30.0.1:443/api/v1/nodes/ip-10-0-250-158.us-east-2.compute.internal 422 Unprocessable Entity in 9 milliseconds
      I0119 22:38:03.602151       1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 9 ms Duration 9 ms
      I0119 22:38:03.602159       1 round_trippers.go:577] Response Headers:
      I0119 22:38:03.602166       1 round_trippers.go:580]     Cache-Control: no-cache, private
      I0119 22:38:03.602173       1 round_trippers.go:580]     Content-Type: application/vnd.kubernetes.protobuf
      I0119 22:38:03.602179       1 round_trippers.go:580]     Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
      I0119 22:38:03.602186       1 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: a11a6e32-ed56-4b16-b80c-cadffc291b7a
      I0119 22:38:03.602192       1 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: 5a06dd69-d0d1-43d3-948d-83c6fb7fb1cc
      I0119 22:38:03.602198       1 round_trippers.go:580]     Content-Length: 1838
      I0119 22:38:03.602205       1 round_trippers.go:580]     Date: Thu, 19 Jan 2023 22:38:03 GMT
      I0119 22:38:03.602212       1 round_trippers.go:580]     Audit-Id: dcbb1da4-a0dd-452b-8211-78078efb506f
      
      there is a 422 error returned, with the error being that the content is malformed. diving into the error response body we can see that labels are not allowed to have a `/` in them. i am not reproducing the error output in this form (it's too verbose) but will attach the file.
      
      looking at the code in the nodelink controller it seems like we should be producing an error:
      
      nodelink_controller.go@249
      
          if !reflect.DeepEqual(node, modNode) {
              klog.V(3).Infof("Node %q has changed, updating", modNode.GetName())
              if err := r.client.Update(context.Background(), modNode); err != nil {
                  return reconcile.Result{}, fmt.Errorf("error updating node: %v", err)
              }
          }
      
      
      
      

      Expected results:

      The nodelink controller logs should show an error about the label being incorrect.

      Additional info:

       

              rmanak@redhat.com Radek Manak
              mimccune@redhat.com Michael McCune
              None
              None
              Huali Liu Huali Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: