-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.18
-
Important
-
None
-
3
-
MCO Sprint 261, MCO Sprint 262
-
2
-
Rejected
-
False
-
Description of problem:
The pinned images functionality is not working
Version-Release number of selected component (if applicable):
IPI on AWS version: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.18.0-0.nightly-2024-10-28-052434 True False 6h46m Cluster version is 4.18.0-0.nightly-2024-10-28-052434
How reproducible:
Always
Steps to Reproduce:
1. Enable techpreview 2. Create a pinnedimagesets resource $ oc create -f - << EOF apiVersion: machineconfiguration.openshift.io/v1alpha1 kind: PinnedImageSet metadata: labels: machineconfiguration.openshift.io/role: worker name: tc-73623-worker-pinned-images spec: pinnedImages: - name: "quay.io/openshifttest/busybox@sha256:0415f56ccc05526f2af5a7ae8654baec97d4a614f24736e8eef41a4591f08019" - name: quay.io/openshifttest/alpine@sha256:be92b18a369e989a6e86ac840b7f23ce0052467de551b064796d67280dfa06d5 EOF
Actual results:
The images are not pinned and the pool is degraded We can see these logs in the MCDs I1028 14:26:32.514096 2341 pinned_image_set.go:304] Reconciling pinned image set: tc-73623-worker-pinned-images: generation: 1 E1028 14:26:32.514183 2341 pinned_image_set.go:240] failed to get image status for "quay.io/openshifttest/busybox@sha256:0415f56ccc05526f2af5a7ae8654baec97d4a614f24736e8eef41a4591f08019": rpc error: code = Unavailable desc = name resolver error: produced zero addresses And we can see the machineconfignodes resources reporting pinnedimagesets degradation: - lastTransitionTime: "2024-10-28T14:27:58Z" message: 'failed to get image status for "quay.io/openshifttest/busybox@sha256:0415f56ccc05526f2af5a7ae8654baec97d4a614f24736e8eef41a4591f08019": rpc error: code = Unavailable desc = name resolver error: produced zero addresses' reason: PrefetchFailed status: "True" type: PinnedImageSetsDegraded
Expected results:
The images should be pinned without errors.
Additional info:
Slack conversation: https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1730125766377509 This is Sam's guess (thank you [~sbatschelet] for your quick help, I really appreciate it): My guess is that it is related to https://github.com/openshift/machine-config-operator/pull/4629 Specifically the changes to pkg/daemon/cri/cri.go where we swapped out DialContext for NewClient. Per docs. One subtle difference between NewClient and Dial and DialContext is that the former uses "dns" as the default name resolver, while the latter use "passthrough" for backward compatibility. This distinction should not matter to most users, but could matter to legacy users that specify a custom dialer and expect it to receive the target string directly.