Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-53433

HCP ignition-server silently overloading the host cluster API if not able to get ignition payload

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • 4.14.z
    • 4.14
    • HyperShift
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • Done
    • Bug Fix
    • Hide
      * Previously, on a `condition.Status`, the ignition-server controller overloaded the Kubernetes agent server (KAS) by updating that condition with the same message in every reconcile loop. With this release, the controller checks the message and validates whether it is the existing message so that the KAS is not overloaded. (link:https://issues.redhat.com/browse/OCPBUGS-53433[OCPBUGS-53433])
      Show
      * Previously, on a `condition.Status`, the ignition-server controller overloaded the Kubernetes agent server (KAS) by updating that condition with the same message in every reconcile loop. With this release, the controller checks the message and validates whether it is the existing message so that the KAS is not overloaded. (link: https://issues.redhat.com/browse/OCPBUGS-53433 [ OCPBUGS-53433 ])
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-50867. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-50557. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-47533. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-45960. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-42320. The following is the description of the original issue:

      Description of problem:

      Opening this but to report how difficult was to identify an API overload caused by the ignition-server in a disconnected HCP. The customer reported an increased level of network traffic in a ACM cluster hosting 3 HCPs. Troubleshooting the issue required to observe the audit logs of the kube-apiserver in the host cluster were was identified the `ignition-server` serviceaccount generating 382320 requests in 22 hours (it's 289 requests per minute). No alerts were present in the cluster suggesting the ignition-server as source of the issue, and the ignition-server pods were not even restarting. Was possible to identify exactly the ignition-server as source of the issue by disabling it and seen the network traffic dropping in the cluster metrics. 

      Version-Release number of selected component (if applicable):

      OpenShift 4.14, MCE 2.5  

      How reproducible:

      Always at the customer cluster. 

      Steps to Reproduce:

      1. Start a disconnected HCP cluster with incorrect mirror-registry information
      2. Verify an increased overload of the API when ignition-server pods start
      3.     

      Actual results:

      The ignition-server is continuously failing overloading the host cluster API and it is difficult to identify it. 

      Expected results:

      An alert should be triggered, or the ignition-server pods should fail to start. Or at least not overloading the API.

      Additional info:

          

              jparrill@redhat.com Juan Manuel Parrilla Madrid
              openshift-crt-jira-prow OpenShift Prow Bot
              None
              None
              Liangquan Li Liangquan Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: