Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-24397

openshift-gcp-routes.sh exits prematurely, causing critical systemd service restarts

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • 4.14.z
    • 4.15.0
    • None
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-20499. The following is the description of the original issue:

      This test triggers failures shortly after node reboot. Of course the node isn't ready, it rebooted.

      : [sig-node] nodes should not go unready after being upgraded and go unready only once

      { 1 nodes violated upgrade expectations: Node ci-op-q38yw8yd-8aaeb-lsqxj-master-0 went unready multiple times: 2023-10-11T21:58:45Z, 2023-10-11T22:05:45Z Node ci-op-q38yw8yd-8aaeb-lsqxj-master-0 went ready multiple times: 2023-10-11T21:58:46Z, 2023-10-11T22:07:18Z }

      Both of those times, the master-0 was rebooted or being rebooted.

      https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/2060/pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade/1712203703311667200

            [OCPBUGS-24397] openshift-gcp-routes.sh exits prematurely, causing critical systemd service restarts

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Important: OpenShift Container Platform 4.14.7 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2023:7831

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: OpenShift Container Platform 4.14.7 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:7831

            Verified using IPI on GCP ONVKubernetes with version:

            $ oc get clusterversion
            NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
            version   4.14.0-0.nightly-2023-12-08-151207   True        False         38m     Cluster version is 4.14.0-0.nightly-2023-12-08-151207
            
            

            We can see that the route service is avoiding the failure in crictl commands to add and remove policies

             # cat /opt/libexec/openshift-gcp-routes.sh| grep "|| true" |grep cric
                        crictl exec -i ${ovnkContainerID} ovn-nbctl lr-policy-del ovn_cluster_router 1010 "inport == \"rtos-${host}\" && ip4.dst == ${route_vip}" || true
                            crictl exec -i ${ovnkContainerID} ovn-nbctl lr-policy-add ovn_cluster_router 1010 "inport == \"rtos-${host}\" && ip4.dst == ${vip}" reroute "${ovnK8sMp0v4}" || true
                crictl exec -i ${ovnkContainerID} ovn-nbctl lr-policy-del ovn_cluster_router 1010 || true
            
            

            We can see that the service is ok

            sh-5.1# systemctl status openshift-gcp-routes.service
            ● openshift-gcp-routes.service - Update GCP routes for forwarded IPs.
                 Loaded: loaded (/etc/systemd/system/openshift-gcp-routes.service; enabled; preset: disabled)
                 Active: active (running) since Mon 2023-12-11 12:53:16 UTC; 1h 2min ago
               Main PID: 882 (bash)
                  Tasks: 2 (limit: 101671)
                 Memory: 41.0M
                    CPU: 3min 33.232s
                 CGroup: /system.slice/openshift-gcp-routes.service
                         ├─   882 /bin/bash /opt/libexec/openshift-gcp-routes.sh start
                         └─172216 sleep 1
            
            Dec 11 13:55:42 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: ensuring route for 10.0.0.2 for internal clients
            Dec 11 13:55:42 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: OVNK Routes on ovn-cluster-router at 1010 priority:       1010 inport == "rtos-sregidor-gcp2-rnrgv-master>
            Dec 11 13:55:42 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: Route exists
            Dec 11 13:55:42 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: ensuring route for 34.173.76.144 for internal clients
            Dec 11 13:55:43 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: OVNK Routes on ovn-cluster-router at 1010 priority:       1010 inport == "rtos-sregidor-gcp2-rnrgv-master>
            Dec 11 13:55:43 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: Route exists
            Dec 11 13:55:43 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: ensuring route for 34.16.89.5 for internal clients
            Dec 11 13:55:43 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: OVNK Routes on ovn-cluster-router at 1010 priority:       1010 inport == "rtos-sregidor-gcp2-rnrgv-master>
            Dec 11 13:55:43 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: Route exists
            Dec 11 13:55:43 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: done applying vip rules
            

            We move the status to verified

            Sergio Regidor de la Rosa added a comment - Verified using IPI on GCP ONVKubernetes with version: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.nightly-2023-12-08-151207 True False 38m Cluster version is 4.14.0-0.nightly-2023-12-08-151207 We can see that the route service is avoiding the failure in crictl commands to add and remove policies # cat /opt/libexec/openshift-gcp-routes.sh| grep "|| true" |grep cric crictl exec -i ${ovnkContainerID} ovn-nbctl lr-policy-del ovn_cluster_router 1010 "inport == \"rtos-${host}\" && ip4.dst == ${route_vip}" || true crictl exec -i ${ovnkContainerID} ovn-nbctl lr-policy-add ovn_cluster_router 1010 "inport == \"rtos-${host}\" && ip4.dst == ${vip}" reroute "${ovnK8sMp0v4}" || true crictl exec -i ${ovnkContainerID} ovn-nbctl lr-policy-del ovn_cluster_router 1010 || true We can see that the service is ok sh-5.1# systemctl status openshift-gcp-routes.service ● openshift-gcp-routes.service - Update GCP routes for forwarded IPs. Loaded: loaded (/etc/systemd/system/openshift-gcp-routes.service; enabled; preset: disabled) Active: active (running) since Mon 2023-12-11 12:53:16 UTC; 1h 2min ago Main PID: 882 (bash) Tasks: 2 (limit: 101671) Memory: 41.0M CPU: 3min 33.232s CGroup: /system.slice/openshift-gcp-routes.service ├─ 882 /bin/bash /opt/libexec/openshift-gcp-routes.sh start └─172216 sleep 1 Dec 11 13:55:42 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: ensuring route for 10.0.0.2 for internal clients Dec 11 13:55:42 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: OVNK Routes on ovn-cluster-router at 1010 priority: 1010 inport == "rtos-sregidor-gcp2-rnrgv-master> Dec 11 13:55:42 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: Route exists Dec 11 13:55:42 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: ensuring route for 34.173.76.144 for internal clients Dec 11 13:55:43 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: OVNK Routes on ovn-cluster-router at 1010 priority: 1010 inport == "rtos-sregidor-gcp2-rnrgv-master> Dec 11 13:55:43 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: Route exists Dec 11 13:55:43 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: ensuring route for 34.16.89.5 for internal clients Dec 11 13:55:43 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: OVNK Routes on ovn-cluster-router at 1010 priority: 1010 inport == "rtos-sregidor-gcp2-rnrgv-master> Dec 11 13:55:43 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: Route exists Dec 11 13:55:43 sregidor-gcp2-rnrgv-master-0.c.openshift-qe.internal bash[882]: done applying vip rules We move the status to verified

            Hi team-mco,

            Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            OpenShift Jira Bot added a comment - Hi team-mco , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

              team-mco Team MCO
              openshift-crt-jira-prow OpenShift Prow Bot
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: