-
Bug
-
Resolution: Unresolved
-
Major
-
4.21, 4.20.z
-
None
Description of problem:
[AWS EFS CSI Driver Operator] efs_tags_controller failed to refresh cached credentials on hypershift hosted cluster
Version-Release number of selected component (if applicable):
4.21.0-0-2025-11-19-092506-test-ci-ln-mm7xb0b-latest
How reproducible:
Always
Steps to Reproduce:
1. Install a hypershift hosted cluster on AWS.
2. Install the efs csi driver operator and dirver on the hosted cluster.
3. Add annotation to the aws-efs-csi-driver-operator sa to make sure the efs_tags_controller could access the "web_identity_token_file".
$ och patch sa/aws-efs-csi-driver-operator -n openshift-cluster-csi-drivers --type='merge' -p "{\"metadata\": {\"annotations\": {\"eks.amazonaws.com/role-arn\": \"arn:aws:iam::301721915996:role/hypershift-ci-362507-efs-operator-role\", \"eks.amazonaws.com/audience\": \"sts.amazonaws.com\"}}}"
4. Create a pvc with the efs storageclass and wait for the pvc Bound with pv(the accesspoint provisioned).
5. Patch the hc with extra tags.
$ oc patch -n clusters hostedcluster/hypershift-ci-362507 -p {"spec":{"platform":{"aws":{"resourceTags":[{"key":"ocp_storage_owned","value":"true"}]}}}} --type=merge'
6. Check the pv(accesspoint) should sync with the new tags(waiting for the efs_tags_controller sync the tags))
Actual results:
In step6, the efs_tags_controller sync the tags failed of E1120 19:21:11.678829 1 aws_efs_tags_controller.go:256] Error updating tags for PV fs-01e68e58f4765f3f8::fsap-0e4b293d057c3ae3b: operation error EFS: TagResource, exceeded maximum number of attempts, 3, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post "https://sts.us-east-2.amazonaws.com/": dial tcp 52.95.18.19:443: i/o timeout
Expected results:
In step6, the efs_tags_controller could sync the tags successfully.
Additional info:
$ och logs aws-efs-csi-driver-operator-7979d7bf9c-kpf69|grep -i 'error'
...
E1120 19:21:11.678829 1 aws_efs_tags_controller.go:256] Error updating tags for PV fs-01e68e58f4765f3f8::fsap-0e4b293d057c3ae3b: operation error EFS: TagResource, exceeded maximum number of attempts, 3, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post "https://sts.us-east-2.amazonaws.com/": dial tcp 52.95.18.19:443: i/o timeout
E1120 19:21:11.678953 1 aws_efs_tags_queue_worker.go:99] Failed to update tags for volume pvc-b489a13a-9fa2-48ea-b3ba-d0994fcde46b: operation error EFS: TagResource, exceeded maximum number of attempts, 3, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post "https://sts.us-east-2.amazonaws.com/": dial tcp 52.95.18.19:443: i/o timeout
I1120 19:21:11.679290 1 event_expansion.go:147] "Request Body" body="{\"count\":2,\"lastTimestamp\":\"2025-11-20T19:21:11Z\",\"message\":\"Failed to update tags for volume pvc-b489a13a-9fa2-48ea-b3ba-d0994fcde46b: operation error EFS: TagResource, exceeded maximum number of attempts, 3, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post \\\"https://sts.us-east-2.amazonaws.com/\\\": dial tcp 52.95.18.19:443: i/o timeout\"}"
I1120 19:21:11.679399 1 event.go:377] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-csi-drivers", Name:"aws-efs-csi-driver-operator", UID:"8f92a95a-4f78-4e65-9865-fe1f8c49d902", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'EFSAccessPointTagsUpdateFailed' Failed to update tags for volume pvc-b489a13a-9fa2-48ea-b3ba-d0994fcde46b: operation error EFS: TagResource, exceeded maximum number of attempts, 3, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post "https://sts.us-east-2.amazonaws.com/": dial tcp 52.95.18.19:443: i/o timeout
I1120 19:21:11.690010 1 event_expansion.go:147] "Response Body" body="{\"kind\":\"Event\",\"apiVersion\":\"v1\",\"metadata\":{\"name\":\"aws-efs-csi-driver-operator.1879cd09d5f061de\",\"namespace\":\"openshift-cluster-csi-drivers\",\"uid\":\"03e0f294-6dc3-474b-a06b-35ab4dfb32a3\",\"resourceVersion\":\"180083\",\"creationTimestamp\":\"2025-11-20T19:16:11Z\",\"managedFields\":[{\"manager\":\"aws-efs-csi-driver-operator\",\"operation\":\"Update\",\"apiVersion\":\"v1\",\"time\":\"2025-11-20T19:21:11Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:count\":{},\"f:firstTimestamp\":{},\"f:involvedObject\":{},\"f:lastTimestamp\":{},\"f:message\":{},\"f:reason\":{},\"f:reportingComponent\":{},\"f:source\":{\"f:component\":{}},\"f:type\":{}}}]},\"involvedObject\":{\"kind\":\"Deployment\",\"namespace\":\"openshift-cluster-csi-drivers\",\"name\":\"aws-efs-csi-driver-operator\",\"uid\":\"8f92a95a-4f78-4e65-9865-fe1f8c49d902\",\"apiVersion\":\"apps/v1\"},\"reason\":\"EFSAccessPointTagsUpdateFailed\",\"message\":\"Failed to update tags for volume pvc-b489a13a-9fa2-48ea-b3ba-d0994fcde46b: operation error EFS: TagResource, exceeded maximum number of attempts, 3, get identity: get credentials: fa [truncated 563 chars]"
...
$ och get deploy/aws-efs-csi-driver-operator -oyaml|grep 'network'
openshift.storage.network-policy.api-server: allow
openshift.storage.network-policy.dns: allow
openshift.storage.network-policy.operator-metrics-range: allow $ och get networkpolicy
NAME POD-SELECTOR AGE
allow-all-egress openshift.storage.network-policy.all-egress=allow 9h
allow-egress-to-api-server openshift.storage.network-policy.api-server=allow 9h
allow-ingress-to-metrics-range openshift.storage.network-policy.metrics-range=allow 9h
allow-ingress-to-operator-metrics-range openshift.storage.network-policy.operator-metrics-range=allow 9h
allow-to-dns openshift.storage.network-policy.dns=allow 9h
# Compare with ebs csi driver operator
$ oc get deploy aws-ebs-csi-driver-operator -oyaml|grep 'network'
openshift.storage.network-policy.all-egress: allow
openshift.storage.network-policy.api-server: allow
openshift.storage.network-policy.dns: allow
openshift.storage.network-policy.operator-metrics-range: allow
secretName: service-network-admin-kubeconfig
secretName: service-network-admin-kubeconfig
I manually hack the csv add "openshift.storage.network-policy.all-egress: allow" annatation to the operator deploy, the connection issue solved, we should also add the all-egress: allow NP for the efs csi driver operator.
- blocks
-
OCPBUGS-65895 [AWS EFS CSI Driver Operator] efs_tags_controller failed to refresh cached credentials on hypershift hosted cluster
-
- New
-
- is cloned by
-
OCPBUGS-65895 [AWS EFS CSI Driver Operator] efs_tags_controller failed to refresh cached credentials on hypershift hosted cluster
-
- New
-
- links to