Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5789

KCM container is not restarted on failure

    XMLWordPrintable

Details

    • Bug
    • Resolution: Can't Do
    • Normal
    • None
    • 4.13.0
    • Node / Kubelet
    • OCPNODE Sprint 233 (Blue), OCPNODE Sprint 234 (Blue)
    • 2
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      Discovered in KCM bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=2100171
      
      During time chaos test kube-controller-manager becomes degraded.
      It causes KCM kube-controller-mananger container to crash, but the container is not restarted by kubelet afterwards.

      Version-Release number of selected component (if applicable):

      4.13.0-0.ci-2023-01-11-123219
      4.11.0-0.nightly-2022-06-22-015220

      How reproducible:

      100%

      Steps to Reproduce:

      1.
      for pod in `oc get pods -oname -n openshift-apiserver`; do
      oc exec $pod -n openshift-apiserver -- date -s 01:01:01 &
      done
      
      2.wait for some time until KCM gets degraded 
      3.watch oc get co kube-controller-manager
      4.examine KCM pods: oc get pods -n openshift-kube-controller-manager
      
      or use kraken as mentioned in the original BZ

      Actual results:

       

      Expected results:

      kube-controller-mananger container gets restarted and running again

      Additional info:

      kubelet logs show:
      
      Jan 12 15:20:13.985008 ip-10-0-183-145 kubenswrapper[1485]: E0112 15:20:13.984813    1485 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with RunContainerError: \"container e396b5021f210b78e470a0161cd45eb34c650b7742538655e46403428e931903 is not in created state: running\"" pod="openshift-kube-controller-manager/kube-controller-manager-ip-10-0-183-145.eu-north-1.compute.internal" podUID=40bae0f0fa982f6f44ead6c84dd248aa
      
      
      kube-controller-mananger container after it fails
      
        - containerID: cri-o://a1cc365f308c77e48ff6e523d95e301c1e62587144c4a2c5c7d613dd16830971
          image: registry.ci.openshift.org/ocp/4.13-2023-01-11-123219@sha256:f9d04e98d280566ac4c4dea26d6d7085c0e0640df9c2a3e462b4aef67bfe5ef0
          imageID: registry.ci.openshift.org/ocp/4.13-2023-01-11-123219@sha256:f9d04e98d280566ac4c4dea26d6d7085c0e0640df9c2a3e462b4aef67bfe5ef0
          lastState:
            terminated:
              containerID: cri-o://a1cc365f308c77e48ff6e523d95e301c1e62587144c4a2c5c7d613dd16830971
              exitCode: 1
              finishedAt: "2023-01-12T01:01:14Z"
              message: |
                atch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthorized
                E0112 01:01:12.271715       1 reflector.go:140] vendor/k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server has asked for the client to provide crede
      ntials
                W0112 01:01:12.760779       1 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:134: failed to list *v1.ResourceQuota: Unauthorized
                E0112 01:01:12.760838       1 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.ResourceQuota: failed to list *v1.ResourceQuota: Unauthorized
                W0112 01:01:13.341535       1 reflector.go:424] vendor/k8s.io/client-go/metadata/metadatainformer/informer.go:90: failed to list *v1.PartialObjectMetadata: Unauthorized
                E0112 01:01:13.341567       1 reflector.go:140] vendor/k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthoriz
      ed
                W0112 01:01:13.524521       1 reflector.go:424] vendor/k8s.io/client-go/metadata/metadatainformer/informer.go:90: failed to list *v1.PartialObjectMetadata: Unauthorized
                E0112 01:01:13.524546       1 reflector.go:140] vendor/k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Unauthoriz
      ed
                E0112 01:01:14.629356       1 reflector.go:140] vendor/k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server has asked for the client to provide crede
      ntials
                E0112 01:01:14.667114       1 reflector.go:140] vendor/k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server has asked for the client to provide credentials
                I0112 01:01:14.693929       1 leaderelection.go:283] failed to renew lease kube-system/kube-controller-manager: timed out waiting for the condition
                E0112 01:01:14.694384       1 controllermanager.go:318] "leaderelection lost"
              reason: Error
              startedAt: "2023-01-12T14:37:02Z"
          name: kube-controller-manager
          ready: false
          restartCount: 2
          started: false
          state:
            waiting:
              message: 'container e396b5021f210b78e470a0161cd45eb34c650b7742538655e46403428e931903
                is not in created state: running'
              reason: RunContainerError

      Attachments

        Activity

          People

            harpatil@redhat.com Harshal Patil
            fkrepins@redhat.com Filip Krepinsky
            Sunil Choudhary Sunil Choudhary
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: