Details
-
Bug
-
Resolution: Obsolete
-
Normal
-
None
-
4.11
-
Moderate
-
Sprint 234, Sprint 235, Sprint 236, Sprint 237, Sprint 238, Sprint 239, Sprint 240
-
7
-
Rejected
-
Unspecified
-
If docs needed, set a value
Description
Description of problem:
In OSP cluster of both SDN and OVN, prempt delay is not working as expected.
The master which is rebooted is not becoming master again after the premept delay expires.
Version-Release number of selected component (if applicable):
ocp 4.11 (SDN and OVN)
How reproducible:
Steps to Reproduce:
1.. Create ServiceAccount for ipfailover, add SCC permissions
$ oc create sa ipfailover
$ oc adm policy add-scc-to-user priviledged -z ipfailover
$ oc adm policy add-scc-to-user hostnetwork -z ipfailover
2.Create ipfailover through deployment
$ oc create -f https://github.com/jechen0648/ipfailover/blob/main/deploy-ipfailover.yaml
melvinjoseph@mjoseph-mac openshift-tests-private % oc get all
NAME READY STATUS RESTARTS AGE
pod/ipf-41030-898cf9c58-rhnbh 1/1 Running 0 12m
pod/ipf-41030-898cf9c58-tkpqm 1/1 Running 0 12m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/ipf-41030 2/2 2 2 13m
NAME DESIRED CURRENT READY AGE
replicaset.apps/ipf-41030-5c7bdb694c 2 2 2 12m
3.find the master node and slave node.
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-898cf9c58-rhnbh | grep Entering
Wed Aug 10 08:28:11 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:28:34 2022: (ipfailover_VIP_1) Entering MASTER STATE
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-898cf9c58-tkpqm | grep Entering
Wed Aug 10 08:28:34 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Pod pf-41030-898cf9c58-rhnbh is master
4. configure preempt delay
melvinjoseph@mjoseph-mac openshift-tests-private % oc set env deployment.apps/ipf-41030 'OPENSHIFT_HA_PREEMPTION=preempt_delay 90'
deployment.apps/ipf-41030 updated
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
NAME READY STATUS RESTARTS AGE
ipf-41030-747fcf4c95-9smr5 1/1 Running 0 5s
ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 5s
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-9smr5 | grep Entering
Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:50:34 2022: (ipfailover_VIP_1) Entering MASTER STATE
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private %
Pod ipf-41030-747fcf4c95-9smr5 is master
4.reboot the master
melvinjoseph@mjoseph-mac openshift-tests-private % oc delete pod ipf-41030-747fcf4c95-9smr5
pod "ipf-41030-747fcf4c95-9smr5" deleted
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
NAME READY STATUS RESTARTS AGE
ipf-41030-747fcf4c95-5zk57 1/1 Running 0 6s
ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 2m47s
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-5zk57 | grep Entering
Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % wait for 90s
Pod ipf-41030-747fcf4c95-w9bxs is master
5. checking the status again after 90seconds
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
NAME READY STATUS RESTARTS AGE
ipf-41030-747fcf4c95-5zk57 1/1 Running 0 3m59s
ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 6m40s
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-5zk57 | grep Entering
Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering MASTER STATE
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc delete pod ipf-41030-747fcf4c95-5zk57
pod "ipf-41030-747fcf4c95-5zk57" deleted
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
NAME READY STATUS RESTARTS AGE
ipf-41030-747fcf4c95-c9czx 1/1 Running 0 5s
ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 8m1s
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-c9czx | grep Entering
Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering MASTER STATE
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
NAME READY STATUS RESTARTS AGE
ipf-41030-747fcf4c95-c9czx 1/1 Running 0 101s
ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 9m37s
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-c9czx | grep Entering
Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering MASTER STATE
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
NAME READY STATUS RESTARTS AGE
ipf-41030-747fcf4c95-c9czx 1/1 Running 0 28m
ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 36m
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc delete pod ipf-41030-747fcf4c95-c9czx
pod "ipf-41030-747fcf4c95-c9czx" deleted
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
NAME READY STATUS RESTARTS AGE
ipf-41030-747fcf4c95-svw9z 1/1 Running 0 6s
ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 40m
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering MASTER STATE
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-ipf-41030-747fcf4c95-svw9z | grep Entering
Error from server (NotFound): pods "ipf-41030-747fcf4c95-ipf-41030-747fcf4c95-svw9z" not found
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-svw9z | grep Entering
Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
NAME READY STATUS RESTARTS AGE
ipf-41030-747fcf4c95-svw9z 1/1 Running 0 51s
ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 40m
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-svw9z | grep Entering
Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
NAME READY STATUS RESTARTS AGE
ipf-41030-747fcf4c95-svw9z 1/1 Running 0 89s
ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 41m
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-svw9z | grep Entering
Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
NAME READY STATUS RESTARTS AGE
ipf-41030-747fcf4c95-svw9z 1/1 Running 0 96s
ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 41m
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-svw9z | grep Entering
Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering MASTER STATE
Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering BACKUP STATE
Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering MASTER STATE
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
NAME READY STATUS RESTARTS AGE
ipf-41030-747fcf4c95-svw9z 1/1 Running 0 113s
ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 41m
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-svw9z | grep Entering
Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private %
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-svw9z | grep Entering
Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-svw9z | grep Entering
Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
NAME READY STATUS RESTARTS AGE
ipf-41030-747fcf4c95-svw9z 1/1 Running 0 5m22s
ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 45m
Actual results:
We can see that sometimes the prempt delay works, some times not.
Even some times the failover happens immediately, not waiting for the delay timer to expire
Expected results:
Failover should honor the prempt env variable
Additional info:
melvinjoseph@mjoseph-mac openshift-tests-private % oc get deployment.apps/ipf-41030 -oyaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "4"
creationTimestamp: "2022-08-10T08:27:51Z"
generation: 4
labels:
ipfailover: hello-openshift
name: ipf-41030
namespace: e2e-test-router-ipfailover-d7kqq
resourceVersion: "181303"
uid: 7f4431ef-f65f-4ac5-b86f-e242660459f9
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
ipfailover: hello-openshift
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
ipfailover: hello-openshift
spec:
containers:
- env:
- name: OPENSHIFT_HA_CONFIG_NAME
value: ipfailover - name: OPENSHIFT_HA_VIRTUAL_IPS
value: 192.168.1.100 - name: OPENSHIFT_HA_VIP_GROUPS
value: "10" - name: OPENSHIFT_HA_NETWORK_INTERFACE
value: br-ex - name: OPENSHIFT_HA_MONITOR_PORT
value: "22" - name: OPENSHIFT_HA_VRRP_ID_OFFSET
value: "0" - name: OPENSHIFT_HA_REPLICA_COUNT
value: "2" - name: OPENSHIFT_HA_IPTABLES_CHAIN
value: INPUT - name: OPENSHIFT_HA_PREEMPTION
value: preempt_delay 90 - name: OPENSHIFT_HA_CHECK_INTERVAL
value: "5"
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7d95fd565df0f63efc05a53bbaae9f605e6aca924471ee38cd5e266eaaa5fdf2
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command: - pgrep
- keepalived
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: openshift-ipfailover
ports: - containerPort: 63000
hostPort: 63000
protocol: TCP
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts: - mountPath: /lib/modules
name: lib-modules
readOnly: true - mountPath: /host
mountPropagation: HostToContainer
name: host-slash
readOnly: true - mountPath: /etc/sysconfig
name: etc-sysconfig
readOnly: true
dnsPolicy: ClusterFirst
hostNetwork: true
nodeSelector:
node-role.kubernetes.io/worker: ""
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: ipfailover
serviceAccountName: ipfailover
terminationGracePeriodSeconds: 30
volumes: - hostPath:
path: /lib/modules
type: ""
name: lib-modules - hostPath:
path: /
type: ""
name: host-slash - hostPath:
path: /etc/sysconfig
type: ""
name: etc-sysconfig
status:
availableReplicas: 2
conditions: - lastTransitionTime: "2022-08-10T08:27:51Z"
lastUpdateTime: "2022-08-10T08:50:31Z"
message: ReplicaSet "ipf-41030-747fcf4c95" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing - lastTransitionTime: "2022-08-10T08:53:12Z"
lastUpdateTime: "2022-08-10T08:53:12Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 4
readyReplicas: 2
replicas: 2
updatedReplicas: 2