Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-19408

Cluster can't be in the health state after enabling and disabling ipsec during runtime.

XMLWordPrintable

    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Known Issue

      Description of problem:

      After enabling and disabling ipsec during runtime., the cluster is not in the health state any more with Error while reconciling 4.14.0-0.nightly-2023-09-15-233408: an unknown error has occurred: MultipleErrors
      

      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-09-15-233408

      How reproducible:

      Most times

      Steps to Reproduce:

      1.Install a GCP cluster without ipsec
      2.Enable ipsec in the cluster
      oc patch networks.operator.openshift.io cluster --type=merge -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipsecConfig":{ }}}}}'
      3.Create some testing pods
      4.Disable ipsec in the cluster 
      oc patch networks.operator.openshift.io cluster --type=merge -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipsecConfig":null}}}}'
      

      Actual results:

      # From 4.14
      [weliang@weliang Test]$ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.14.0-0.nightly-2023-09-15-233408   True        False         5h8m    Error while reconciling 4.14.0-0.nightly-2023-09-15-233408: an unknown error has occurred: MultipleErrors
      [weliang@weliang Test]$ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.14.0-0.nightly-2023-09-15-233408   True        False         5h8m    Error while reconciling 4.14.0-0.nightly-2023-09-15-233408: an unknown error has occurred: MultipleErrors
      [weliang@weliang Test]$ oc get co   --no-headers | grep -v '.True.*False.*False'  
      authentication                             4.14.0-0.nightly-2023-09-15-233408   True    False   True    12s     OAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: Get "https://10.128.0.59:6443/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
      console                                    4.14.0-0.nightly-2023-09-15-233408   False   False   False   20s     RouteHealthAvailable: console route is not admitted
      monitoring                                 4.14.0-0.nightly-2023-09-15-233408   False   True    True    2m44s   reconciling Thanos Querier Route failed: updating Route object failed: the server is currently unable to handle the request (put routes.route.openshift.io thanos-querier), deleting UserWorkload federate Route failed: the server is currently unable to handle the request (delete routes.route.openshift.io federate), reconciling Prometheus Federate Route failed: updating Route object failed: the server is currently unable to handle the request (put routes.route.openshift.io prometheus-k8s-federate)
      [weliang@weliang Test]$ 
      
      ## From 4.13
      [weliang@weliang verification-tests]$ oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.13.13   True        False         5h11m   Error while reconciling 4.13.13: an unknown error has occurred: MultipleErrors
      [weliang@weliang verification-tests]$ oc get co   --no-headers | grep -v '.True.*False.*False'  
      authentication                             4.13.13   True    False   True    29s     OAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: Get "https://10.128.0.40:6443/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
      monitoring                                 4.13.13   False   True    True    25s     reconciling Alertmanager Route failed: updating Route object failed: the server is currently unable to handle the request (put routes.route.openshift.io alertmanager-main), deleting UserWorkload federate Route failed: the server is currently unable to handle the request (delete routes.route.openshift.io federate), reconciling Prometheus Federate Route failed: updating Route object failed: the server is currently unable to handle the request (put routes.route.openshift.io prometheus-k8s-federate)
      [weliang@weliang verification-tests]$ 

      Expected results:

      Cluster should be in a heath state

      Additional info:

      The issue happened in both 4.14 and 4.13
      
      must-gather 
      https://people.redhat.com/~weliang/must-gather-4.14.tar.gz
      https://people.redhat.com/~weliang/must-gather-4.13.tar.gz
      

              ykashtan Yuval Kashtan
              weliang1@redhat.com Weibin Liang
              Weibin Liang Weibin Liang
              Weibin Liang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: