Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42199

Cert-manager failing to hand out certificates with `Time limit exceeded`

XMLWordPrintable

    • None
    • CFE Sprint 260
    • 1
    • False
    • Hide

      None

      Show
      None

      Affected version is not relevant to my knowledge as the operator is not part of the OCP release cycle. 

      Description of problem:

       We're using a DNS-01 clusterissuer (letsencrypt) where the _acme recordsets are created in route53 for the certificate creation. Certmanager repeatedly runs in the following error:

      E0518 16:10:22.720047       1 controller.go:167] cert-manager/challenges "msg"="re-queuing item due to error processing" "error"="Time limit exceeded. Last error: " "key"="ocm-production-id/cluster-api-cert-zxkfj-2403688073-3486302731 

      This error comes from here in certmanager-operator. This happens when the _acme record change doesn't transition to InSync within 2 minutes, see here. If this fails, a new ChangeResourceSet will be triggered, with a new change ID - we're not even checking if the record exists from a previous change, we just create new changes over and over without looking back on the old ones to become eventually consistent. We should likely be calling ChangeResourceSet only once and continue checking the change for a configurable amount of time (currently hardcoded to 2 minutes). 

      Version-Release number of selected component (if applicable):

      app.kubernetes.io/name=cert-manager
      app.kubernetes.io/version=v1.11.4

      How reproducible:

          Intermittent, depends on AWS's time for changes to become INSYNC

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          Certificate creation sometimes takes > 30 minutes because we create new changes instead of waiting for the initial change to complete. 

      Expected results:

          Certificate creation is delayed only by the time it takes for an initial record creation to become INSYNC. 

      Additional info:

          

              tgeer@redhat.com Trilok Geer
              cbusse.openshift Claudio Busse
              Yuedong Wu Yuedong Wu
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: