Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-2923

Scrub AWS RequestID from DNSZone "DNSError" status condition message (thrashing)

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Undefined Undefined
    • openshift-4.20
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None

      Last night a handful of hivei01ue1 clusters started alerting on high error rates in the CD and DNSZone controllers. The logs show these two controllers thrashing without backoff. The root cause seems to be that the Message field of the DNSError status condition on the DNSZone CR contains a RequestID, a per-request UUID from the AWS API. Since this changes every call, we're updating the CR every time, which triggers an immediate requeue.

      Upon investigation, we already have a scrubber for this, and we're already using it in the appropriate spot. But its regex assumes a space between request and id, which is absent in the offending message.

      This is possibly a result of the AWS SDK v2 transition (HIVE-2849); or possibly a recent AWS API change.

              efried.openshift Eric Fried
              efried.openshift Eric Fried
              None
              None
              Mingxia Huang Mingxia Huang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: