Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9457

In OSP cluster preemption delay timer of IPFailover is not functioning as expect

    XMLWordPrintable

Details

    • Moderate
    • Sprint 234, Sprint 235, Sprint 236, Sprint 237, Sprint 238, Sprint 239, Sprint 240
    • 7
    • Rejected
    • Unspecified
    • If docs needed, set a value

    Description

      Description of problem:
      In OSP cluster of both SDN and OVN, prempt delay is not working as expected.
      The master which is rebooted is not becoming master again after the premept delay expires.

      Version-Release number of selected component (if applicable):
      ocp 4.11 (SDN and OVN)

      How reproducible:

      Steps to Reproduce:
      1.. Create ServiceAccount for ipfailover, add SCC permissions
      $ oc create sa ipfailover
      $ oc adm policy add-scc-to-user priviledged -z ipfailover
      $ oc adm policy add-scc-to-user hostnetwork -z ipfailover

      2.Create ipfailover through deployment
      $ oc create -f https://github.com/jechen0648/ipfailover/blob/main/deploy-ipfailover.yaml
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get all
      NAME READY STATUS RESTARTS AGE
      pod/ipf-41030-898cf9c58-rhnbh 1/1 Running 0 12m
      pod/ipf-41030-898cf9c58-tkpqm 1/1 Running 0 12m

      NAME READY UP-TO-DATE AVAILABLE AGE
      deployment.apps/ipf-41030 2/2 2 2 13m

      NAME DESIRED CURRENT READY AGE
      replicaset.apps/ipf-41030-5c7bdb694c 2 2 2 12m

      3.find the master node and slave node.
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-898cf9c58-rhnbh | grep Entering
      Wed Aug 10 08:28:11 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:28:34 2022: (ipfailover_VIP_1) Entering MASTER STATE
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-898cf9c58-tkpqm | grep Entering
      Wed Aug 10 08:28:34 2022: (ipfailover_VIP_1) Entering BACKUP STATE

      Pod pf-41030-898cf9c58-rhnbh is master

      4. configure preempt delay
      melvinjoseph@mjoseph-mac openshift-tests-private % oc set env deployment.apps/ipf-41030 'OPENSHIFT_HA_PREEMPTION=preempt_delay 90'
      deployment.apps/ipf-41030 updated
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
      NAME READY STATUS RESTARTS AGE
      ipf-41030-747fcf4c95-9smr5 1/1 Running 0 5s
      ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 5s
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-9smr5 | grep Entering
      Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:50:34 2022: (ipfailover_VIP_1) Entering MASTER STATE
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
      Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      melvinjoseph@mjoseph-mac openshift-tests-private %

      Pod ipf-41030-747fcf4c95-9smr5 is master

      4.reboot the master
      melvinjoseph@mjoseph-mac openshift-tests-private % oc delete pod ipf-41030-747fcf4c95-9smr5
      pod "ipf-41030-747fcf4c95-9smr5" deleted
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
      NAME READY STATUS RESTARTS AGE
      ipf-41030-747fcf4c95-5zk57 1/1 Running 0 6s
      ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 2m47s
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
      Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-5zk57 | grep Entering
      Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % wait for 90s

      Pod ipf-41030-747fcf4c95-w9bxs is master
      5. checking the status again after 90seconds
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
      NAME READY STATUS RESTARTS AGE
      ipf-41030-747fcf4c95-5zk57 1/1 Running 0 3m59s
      ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 6m40s
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-5zk57 | grep Entering
      Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering MASTER STATE
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
      Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc delete pod ipf-41030-747fcf4c95-5zk57
      pod "ipf-41030-747fcf4c95-5zk57" deleted
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
      NAME READY STATUS RESTARTS AGE
      ipf-41030-747fcf4c95-c9czx 1/1 Running 0 5s
      ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 8m1s
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
      Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-c9czx | grep Entering
      Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering MASTER STATE
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
      Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
      NAME READY STATUS RESTARTS AGE
      ipf-41030-747fcf4c95-c9czx 1/1 Running 0 101s
      ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 9m37s
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-c9czx | grep Entering
      Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering MASTER STATE
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
      Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
      Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
      NAME READY STATUS RESTARTS AGE
      ipf-41030-747fcf4c95-c9czx 1/1 Running 0 28m
      ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 36m
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
      Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc delete pod ipf-41030-747fcf4c95-c9czx
      pod "ipf-41030-747fcf4c95-c9czx" deleted
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
      NAME READY STATUS RESTARTS AGE
      ipf-41030-747fcf4c95-svw9z 1/1 Running 0 6s
      ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 40m
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
      Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering MASTER STATE
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-ipf-41030-747fcf4c95-svw9z | grep Entering
      Error from server (NotFound): pods "ipf-41030-747fcf4c95-ipf-41030-747fcf4c95-svw9z" not found
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-svw9z | grep Entering
      Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
      NAME READY STATUS RESTARTS AGE
      ipf-41030-747fcf4c95-svw9z 1/1 Running 0 51s
      ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 40m
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-svw9z | grep Entering
      Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %

      melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
      NAME READY STATUS RESTARTS AGE
      ipf-41030-747fcf4c95-svw9z 1/1 Running 0 89s
      ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 41m
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-svw9z | grep Entering
      Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
      NAME READY STATUS RESTARTS AGE
      ipf-41030-747fcf4c95-svw9z 1/1 Running 0 96s
      ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 41m
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-svw9z | grep Entering
      Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-w9bxs | grep Entering
      Wed Aug 10 08:50:30 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:53:11 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:54:45 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 08:58:26 2022: (ipfailover_VIP_1) Entering MASTER STATE
      Wed Aug 10 08:58:29 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering MASTER STATE
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
      NAME READY STATUS RESTARTS AGE
      ipf-41030-747fcf4c95-svw9z 1/1 Running 0 113s
      ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 41m
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-svw9z | grep Entering
      Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private %
      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-svw9z | grep Entering
      Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering BACKUP STATE

      melvinjoseph@mjoseph-mac openshift-tests-private % oc logs ipf-41030-747fcf4c95-svw9z | grep Entering
      Wed Aug 10 09:30:31 2022: (ipfailover_VIP_1) Entering BACKUP STATE
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get po
      NAME READY STATUS RESTARTS AGE
      ipf-41030-747fcf4c95-svw9z 1/1 Running 0 5m22s
      ipf-41030-747fcf4c95-w9bxs 1/1 Running 0 45m

      Actual results:
      We can see that sometimes the prempt delay works, some times not.
      Even some times the failover happens immediately, not waiting for the delay timer to expire

      Expected results:
      Failover should honor the prempt env variable

      Additional info:
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get deployment.apps/ipf-41030 -oyaml
      apiVersion: apps/v1
      kind: Deployment
      metadata:
      annotations:
      deployment.kubernetes.io/revision: "4"
      creationTimestamp: "2022-08-10T08:27:51Z"
      generation: 4
      labels:
      ipfailover: hello-openshift
      name: ipf-41030
      namespace: e2e-test-router-ipfailover-d7kqq
      resourceVersion: "181303"
      uid: 7f4431ef-f65f-4ac5-b86f-e242660459f9
      spec:
      progressDeadlineSeconds: 600
      replicas: 2
      revisionHistoryLimit: 10
      selector:
      matchLabels:
      ipfailover: hello-openshift
      strategy:
      type: Recreate
      template:
      metadata:
      creationTimestamp: null
      labels:
      ipfailover: hello-openshift
      spec:
      containers:

      • env:
      • name: OPENSHIFT_HA_CONFIG_NAME
        value: ipfailover
      • name: OPENSHIFT_HA_VIRTUAL_IPS
        value: 192.168.1.100
      • name: OPENSHIFT_HA_VIP_GROUPS
        value: "10"
      • name: OPENSHIFT_HA_NETWORK_INTERFACE
        value: br-ex
      • name: OPENSHIFT_HA_MONITOR_PORT
        value: "22"
      • name: OPENSHIFT_HA_VRRP_ID_OFFSET
        value: "0"
      • name: OPENSHIFT_HA_REPLICA_COUNT
        value: "2"
      • name: OPENSHIFT_HA_IPTABLES_CHAIN
        value: INPUT
      • name: OPENSHIFT_HA_PREEMPTION
        value: preempt_delay 90
      • name: OPENSHIFT_HA_CHECK_INTERVAL
        value: "5"
        image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7d95fd565df0f63efc05a53bbaae9f605e6aca924471ee38cd5e266eaaa5fdf2
        imagePullPolicy: IfNotPresent
        livenessProbe:
        exec:
        command:
      • pgrep
      • keepalived
        failureThreshold: 3
        initialDelaySeconds: 10
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
        name: openshift-ipfailover
        ports:
      • containerPort: 63000
        hostPort: 63000
        protocol: TCP
        resources: {}
        securityContext:
        privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
      • mountPath: /lib/modules
        name: lib-modules
        readOnly: true
      • mountPath: /host
        mountPropagation: HostToContainer
        name: host-slash
        readOnly: true
      • mountPath: /etc/sysconfig
        name: etc-sysconfig
        readOnly: true
        dnsPolicy: ClusterFirst
        hostNetwork: true
        nodeSelector:
        node-role.kubernetes.io/worker: ""
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: ipfailover
        serviceAccountName: ipfailover
        terminationGracePeriodSeconds: 30
        volumes:
      • hostPath:
        path: /lib/modules
        type: ""
        name: lib-modules
      • hostPath:
        path: /
        type: ""
        name: host-slash
      • hostPath:
        path: /etc/sysconfig
        type: ""
        name: etc-sysconfig
        status:
        availableReplicas: 2
        conditions:
      • lastTransitionTime: "2022-08-10T08:27:51Z"
        lastUpdateTime: "2022-08-10T08:50:31Z"
        message: ReplicaSet "ipf-41030-747fcf4c95" has successfully progressed.
        reason: NewReplicaSetAvailable
        status: "True"
        type: Progressing
      • lastTransitionTime: "2022-08-10T08:53:12Z"
        lastUpdateTime: "2022-08-10T08:53:12Z"
        message: Deployment has minimum availability.
        reason: MinimumReplicasAvailable
        status: "True"
        type: Available
        observedGeneration: 4
        readyReplicas: 2
        replicas: 2
        updatedReplicas: 2

      Attachments

        Activity

          People

            cholman@redhat.com Candace Holman
            rhn-support-mjoseph Melvin Joseph
            Melvin Joseph Melvin Joseph
            Red Hat Employee
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 1 week, 3 days
                1w 3d