-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
4.12.z
-
Important
-
No
-
Sprint 244
-
1
-
False
-
Description of problem:
While doing agent based install the cluster is taking too long (approx 20 hours) for installation. It is 4.12.23 ABI install. Here are our observations : a) We could observe OS is installed on all the 3 servers but only the master-0 (rendezvous host) node is accessible through ssh. The other nodes are pingable though. b) In the issue state we could see that APIVIP and Ingress VIP ip address were not configured to the main interface of master-0 (rendezvous host) node. c) We also observed in the boot up journal log of master-0 (rendezvous host) that bootstrap-kube-controller-manager pod failed to start with crashloopback error as shown below : Oct 30 23:46:35 openshift-master-0 kubelet.sh[11122]: E1030 23:46:35.281509 11122 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller -manager\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-controller-manager pod=bootstrap-kube-controller-manager-openshift-master-0_kube-system(59bfa1a805b5bc1e621 b485bee8944a6)\"" pod="kube-system/bootstrap-kube-controller-manager-openshift-master-0" podUID=59bfa1a805b5bc1e621b485bee8944a6 Oct 30 23:46:36 openshift-master-0 kubelet.sh[11122]: E1030 23:46:36.283747 11122 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller -manager\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-controller-manager pod=bootstrap-kube-controller-manager-openshift-master-0_kube-system(59bfa1a805b5bc1e621 b485bee8944a6)\"" pod="kube-system/bootstrap-kube-controller-manager-openshift-master-0" podUID=59bfa1a805b5bc1e621b485bee8944a6 Oct 30 23:46:37 openshift-master-0 kubelet.sh[11122]: E1030 23:46:37.284910 11122 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller -manager\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-controller-manager pod=bootstrap-kube-controller-manager-openshift-master-0_kube-system(59bfa1a805b5bc1e621 b485bee8944a6)\"" pod="kube-system/bootstrap-kube-controller-manager-openshift-master-0" podUID=59bfa1a805b5bc1e621b485bee8944a6 d) We also observed in the background that the following pods failed to start with crash loopback error continuously on master-0 (rendezvous host): Oct 31 01:42:32 openshift-master-0 kubelet.sh[11122]: E1031 01:42:32.210639 11122 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cluster-version-operator\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cluster-version-operator pod=bootstrap-cluster-version-operator-openshift-master-0_openshift-cluster-version(1fe02b9e38781cc99a4ffe0e9086726b)\"" pod="openshift-cluster-version/bootstrap-cluster-version-operator-openshift-master-0" podUID=1fe02b9e38781cc99a4ffe0e9086726b Oct 31 01:42:33 openshift-master-0 kubelet.sh[11122]: E1031 01:42:33.211270 11122 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-apiserver pod=bootstrap-kube-apiserver-openshift-master-0_openshift-kube-apiserver(f43d4c9183dea070316db1eeec8ee359)\"" pod="openshift-kube-apiserver/bootstrap-kube-apiserver-openshift-master-0" podUID=f43d4c9183dea070316db1eeec8ee359 Oct 31 01:42:44 openshift-master-0 kubelet.sh[11122]: E1031 01:42:44.211319 11122 pod_workers.go:965] "Error syncing pod, skipping" err="[failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-controller-manager pod=bootstrap-kube-controller-manager-openshift-master-0_kube-system(59bfa1a805b5bc1e621b485bee8944a6)\", failed to \"StartContainer\" for \"cluster-policy-controller\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cluster-policy-controller pod=bootstrap-kube-controller-manager-openshift-master-0_kube-system(59bfa1a805b5bc1e621b485bee8944a6)\"]" pod="kube-system/bootstrap-kube-controller-manager-openshift-master-0" podUID=59bfa1a805b5bc1e621b485bee8944a6 Oct 31 01:42:45 openshift-master-0 kubelet.sh[11122]: E1031 01:42:45.211215 11122 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cluster-version-operator\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cluster-version-operator pod=bootstrap-cluster-version-operator-openshift-master-0_openshift-cluster-version(1fe02b9e38781cc99a4ffe0e9086726b)\"" pod="openshift-cluster-version/bootstrap-cluster-version-operator-openshift-master-0" podUID=1fe02b9e38781cc99a4ffe0e9086726b Oct 31 01:42:56 openshift-master-0 kubelet.sh[11122]: E1031 01:42:56.211955 11122 pod_workers.go:965] "Error syncing pod, skipping" err="[failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-controller-manager pod=bootstrap-kube-controller-manager-openshift-master-0_kube-system(59bfa1a805b5bc1e621b485bee8944a6)\", failed to \"StartContainer\" for \"cluster-policy-controller\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cluster-policy-controller pod=bootstrap-kube-controller-manager-openshift-master-0_kube-system(59bfa1a805b5bc1e621b485bee8944a6)\"]" pod="kube-system/bootstrap-kube-controller-manager-openshift-master-0" podUID=59bfa1a805b5bc1e621b485bee8944a6 e) We also observed previously that after approximately 16+ hours the cluster recovers on its own.
Version-Release number of selected component (if applicable):
How reproducible:
Always at customer's end
Steps to Reproduce:
1. create agent based ISO 2. boot the system with ISO 3. observe installation which eventually fails
Actual results:
It takes installation to recover in 16+ hours to have cluster installed but again the cluster is unstable
Expected results:
Installation should not fail and should not take this much time
Additional info: