Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4552

Backport 4.10: Guard Pod Hostnames Too Long and Truncated Down Into Collisions With Other Masters

XMLWordPrintable

    • Important
    • None
    • Build + Jenkins Sprint 231, Build + Jenkins Sprint 232, Build + Jenkins Sprint 233
    • 3
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Bug Fix
    • Done

      Discovered in the must gather kubelet_service.log from https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-gcp-sdn-upgrade/1586093220087992320

      It appears the guard pod names are too long, and being truncated down to where they will collide with those from the other masters.

      From kubelet logs in this run:

      ❯ grep openshift-kube-scheduler-guard-ci-op-3hj6pnwf-4f6ab-lv57z-maste kubelet_service.log
      Oct 28 23:58:55.693391 ci-op-3hj6pnwf-4f6ab-lv57z-master-1 kubenswrapper[1657]: E1028 23:58:55.693346    1657 kubelet_pods.go:413] "Hostname for pod was too long, truncated it" podName="openshift-kube-scheduler-guard-ci-op-3hj6pnwf-4f6ab-lv57z-master-1" hostnameMaxLen=63 truncatedHostname="openshift-kube-scheduler-guard-ci-op-3hj6pnwf-4f6ab-lv57z-maste"
      Oct 28 23:59:03.735726 ci-op-3hj6pnwf-4f6ab-lv57z-master-0 kubenswrapper[1670]: E1028 23:59:03.735671    1670 kubelet_pods.go:413] "Hostname for pod was too long, truncated it" podName="openshift-kube-scheduler-guard-ci-op-3hj6pnwf-4f6ab-lv57z-master-0" hostnameMaxLen=63 truncatedHostname="openshift-kube-scheduler-guard-ci-op-3hj6pnwf-4f6ab-lv57z-maste"
      Oct 28 23:59:11.168082 ci-op-3hj6pnwf-4f6ab-lv57z-master-2 kubenswrapper[1667]: E1028 23:59:11.168041    1667 kubelet_pods.go:413] "Hostname for pod was too long, truncated it" podName="openshift-kube-scheduler-guard-ci-op-3hj6pnwf-4f6ab-lv57z-master-2" hostnameMaxLen=63 truncatedHostname="openshift-kube-scheduler-guard-ci-op-3hj6pnwf-4f6ab-lv57z-maste"
      

      This also looks to be happening for openshift-kube-scheduler-guard, kube-controller-manager-guard, possibly others.

      Looks like they should be truncated further to make room for random suffixes in https://github.com/openshift/library-go/blame/bd9b0e19121022561dcd1d9823407cd58b2265d0/pkg/operator/staticpod/controller/guard/guard_controller.go#L97-L98

      Unsure of the implications here, it looks a little scary.

            rh-ee-lseveroa Lucas Severo Alves
            rhn-engineering-dgoodwin Devan Goodwin
            Corey Daley Corey Daley
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: