Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-45960

HCP ignition-server silently overloading the host cluster API if not able to get ignition payload

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • None
    • 4.14
    • HyperShift
    • Important
    • None
    • Hypershift Sprint 263
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the ignition payload could not be generated because of an incorrect setting related to IDMS/ICSP. As a consequence, the ignition server flooded the Kubernetes API with patch requests to continuously modify the `status.Condition` value. With this release, the `status.Condition` value is patched only in certain circumstances. (link:https://issues.redhat.com/browse/OCPBUGS-45960[*OCPBUGS-45960*])
      Show
      * Previously, the ignition payload could not be generated because of an incorrect setting related to IDMS/ICSP. As a consequence, the ignition server flooded the Kubernetes API with patch requests to continuously modify the `status.Condition` value. With this release, the `status.Condition` value is patched only in certain circumstances. (link: https://issues.redhat.com/browse/OCPBUGS-45960 [* OCPBUGS-45960 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-42320. The following is the description of the original issue:
      —
      Description of problem:

      Opening this but to report how difficult was to identify an API overload caused by the ignition-server in a disconnected HCP. The customer reported an increased level of network traffic in a ACM cluster hosting 3 HCPs. Troubleshooting the issue required to observe the audit logs of the kube-apiserver in the host cluster were was identified the `ignition-server` serviceaccount generating 382320 requests in 22 hours (it's 289 requests per minute). No alerts were present in the cluster suggesting the ignition-server as source of the issue, and the ignition-server pods were not even restarting. Was possible to identify exactly the ignition-server as source of the issue by disabling it and seen the network traffic dropping in the cluster metrics. 

      Version-Release number of selected component (if applicable):

      OpenShift 4.14, MCE 2.5  

      How reproducible:

      Always at the customer cluster. 

      Steps to Reproduce:

      1. Start a disconnected HCP cluster with incorrect mirror-registry information
      2. Verify an increased overload of the API when ignition-server pods start
      3.     

      Actual results:

      The ignition-server is continuously failing overloading the host cluster API and it is difficult to identify it. 

      Expected results:

      An alert should be triggered, or the ignition-server pods should fail to start. Or at least not overloading the API.

      Additional info:

          

              jparrill@redhat.com Juan Manuel Parrilla Madrid
              openshift-crt-jira-prow OpenShift Prow Bot
              Liangquan Li Liangquan Li
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: