[OCPBUGS-42199] Cert-manager failing to hand out certificates with `Time limit exceeded` - Red Hat Issue Tracker

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.17.z
Component/s: cert-manager
Labels:

Regression:
None
Sprint:
CFE Sprint 260, OAPE Sprint 264, OAPE Sprint 265
sprint_count:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Affected version is not relevant to my knowledge as the operator is not part of the OCP release cycle.

Description of problem:

We're using a DNS-01 clusterissuer (letsencrypt) where the _acme recordsets are created in route53 for the certificate creation. Certmanager repeatedly runs in the following error:

E0518 16:10:22.720047       1 controller.go:167] cert-manager/challenges "msg"="re-queuing item due to error processing" "error"="Time limit exceeded. Last error: " "key"="ocm-production-id/cluster-api-cert-zxkfj-2403688073-3486302731

This error comes from here in certmanager-operator. This happens when the _acme record change doesn't transition to InSync within 2 minutes, see here. If this fails, a new ChangeResourceSet will be triggered, with a new change ID - we're not even checking if the record exists from a previous change, we just create new changes over and over without looking back on the old ones to become eventually consistent. We should likely be calling ChangeResourceSet only once and continue checking the change for a configurable amount of time (currently hardcoded to 2 minutes).

Version-Release number of selected component (if applicable):

app.kubernetes.io/name=cert-manager
app.kubernetes.io/version=v1.11.4

How reproducible:

    Intermittent, depends on AWS's time for changes to become INSYNC

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    Certificate creation sometimes takes > 30 minutes because we create new changes instead of waiting for the initial change to complete.

Expected results:

    Certificate creation is delayed only by the time it takes for an initial record creation to become INSYNC.

Additional info:

Assignee:: Swarup Ghosh

Reporter:: Claudio Busse

QA Contact:: Yuedong Wu

Need Info From:: Claudio Busse

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2024/09/19 10:27 AM

Updated:: 2025/02/14 1:01 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide