Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-76331

Two-Node Fencing (TNF) Setup Job Failure due to Hostname Length

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.21
    • Two Node Fencing
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • 0
    • None
    • None
    • None
    • None
    • OCPEDGE Sprint 284, OCPEDGE Sprint 285
    • 2
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description:
      The tnf-after-setup-job fails to be created by the cluster-etcd-operator when the cluster nodes have long hostnames (FQDNs). The operator attempts to use the full node name as a label value in the Job's pod template, violating the Kubernetes 63-character limit for label values.

      During the installation of Two-Node Fencing (TNF) on OpenShift (specifically observed in version 4.21 via ZTP), the etcd ClusterOperator remains in Progressing: True state indefinitely.

      The oc describe co etcd shows: Message: tnf-after-setup-job-<hostname> Progressing: Job is running

      However, the Job and its associated Pods never appear in the openshift-etcd namespace. Investigation of the cluster-etcd-operator logs reveals a JobCreateFailed warning because the generated label exceeds the maximum allowed length of 63 characters.

      The operator logs show the following validation error:

      I0205 13:20:14.344362 1 event.go:377] Event(...): type: 'Warning' reason: 'JobCreateFailed' 
      Failed to create Job.batch/tnf-after-setup-job-worker-01.cnf77.se-lab.eng.rdu2.dc.redhat.com -n openshift-etcd: 
      Job.batch "tnf-after-setup-job-worker-01.cnf77.se-lab.eng.rdu2.dc.redhat.com" is invalid: 
      spec.template.labels: Invalid value: "tnf-after-setup-job-worker-01.cnf77.se-lab.eng.rdu2.dc.redhat.com": 
      must be no more than 63 characters

      Expected Results:
      The operator should handle long hostnames by either:

      • Truncating the hostname used in labels.
      • Using a hash of the hostname for the label value.
      • Ensuring the label value conforms to DNS_LABEL standards as defined in official Kubernetes documentation.

      Steps to Reproduce:

      1. Deploy an OpenShift cluster (version 4.21) with Two-Node Fencing enabled.
      2. Use hostnames/FQDNs that exceed 40-50 characters (so that the prefix tnf-after-setup-job- + hostname exceeds 63 chars).
      3. Monitor the openshift-etcd-operator logs and the etcd ClusterOperator status

      Suggested Fix:
      The logic within the cluster-etcd-operator that generates the Job manifest for TNF needs to implement a helper function to sanitize and truncate the spec.template.labels strings.

              rh-ee-fcappa Francesco Cappa
              alosadag@redhat.com Alberto Losada
              Douglas Hensel Douglas Hensel
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: