-
Bug
-
Resolution: Done
-
Critical
-
1.34.0
-
None
tracking upstream https://github.com/knative/serving/issues/15466
On SO 1.34 CI builds, having a Revision that initially failed to resolve a digest due to
Unable to fetch image "image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp":
failed to resolve image to digest: GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?scope=repository%3Aocf-qe-images%2Freceiverhttp%3Apull&service=:
unexpected status code 401 Unauthorized
Failure to resolve a digest is a recoverable error, so eventually, the digest was resolved and the deployment was created and available.
The Revision however stayed in this state:
apiVersion: serving.knative.dev/v1
kind: Revision
metadata:
annotations:
autoscaling.knative.dev/max-scale: "1"
autoscaling.knative.dev/min-scale: "1"
autoscaling.knative.dev/target-burst-capacity: "0"
serving.knative.dev/creator: system:admin
serving.knative.dev/routes: receiver30
serving.knative.dev/routingStateModified: "2024-08-12T22:28:04Z"
creationTimestamp: "2024-08-12T22:28:04Z"
generation: 1
labels:
qe.ocf.redhat.com/role: receiver
serving.knative.dev/configuration: receiver30
serving.knative.dev/configurationGeneration: "1"
serving.knative.dev/configurationUID: 3a809bb8-8ba7-4a93-9d94-66a565f1612c
serving.knative.dev/routingState: active
serving.knative.dev/service: receiver30
serving.knative.dev/serviceUID: c8827240-822f-489b-b7d6-48848da0593f
name: receiver30-00001
namespace: ksnk-dn-tls-0
ownerReferences:
- apiVersion: serving.knative.dev/v1
blockOwnerDeletion: true
controller: true
kind: Configuration
name: receiver30
uid: 3a809bb8-8ba7-4a93-9d94-66a565f1612c
resourceVersion: "860881"
uid: d3f218f8-4fd4-4111-b225-7e084eeb8f3d
spec:
containerConcurrency: 0
containers:
- args:
- --salt
- "30"
- --rejectIndexModulo
- "0"
- --rejectEvery
- "0"
- --rejectEachIndexNTimes
- "0"
- --durationBufferSize
- "1"
- --delay
- 0s
- --randomDelay
- 0s
- --idempotent
- "true"
- --code
- "500"
command:
- /receiver
image: image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp
imagePullPolicy: IfNotPresent
name: user-container
readinessProbe:
httpGet:
path: /health
port: 0
successThreshold: 1
resources: {}
enableServiceLinks: false
timeoutSeconds: 300
status:
actualReplicas: 1
conditions:
- lastTransitionTime: "2024-08-12T22:30:16Z"
severity: Info
status: "True"
type: Active
- lastTransitionTime: "2024-08-12T22:28:04Z"
message: 'Unable to fetch image "image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp":
failed to resolve image to digest: GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?scope=repository%3Aocf-qe-images%2Freceiverhttp%3Apull&service=:
unexpected status code 401 Unauthorized'
reason: ContainerMissing
status: "False"
type: ContainerHealthy
- lastTransitionTime: "2024-08-12T22:28:04Z"
message: 'Unable to fetch image "image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp":
failed to resolve image to digest: GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?scope=repository%3Aocf-qe-images%2Freceiverhttp%3Apull&service=:
unexpected status code 401 Unauthorized'
reason: ContainerMissing
status: "False"
type: Ready
- lastTransitionTime: "2024-08-12T22:30:12Z"
status: "True"
type: ResourcesAvailable
containerStatuses:
- imageDigest: image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp@sha256:e915478407c5c882346c4fc72078007fd2511d9e1796345db1873facafddf836
name: user-container
desiredReplicas: 1
observedGeneration: 1
Notice the containerStatus is filled (with a revision), and the ResourcesAvailable condition is True. The ContainerHealthy condition however stays False with ContainerMissing, even though the digest resolution was successful. This causes the overall Ready state to stay False, despite the Deployment being available.
Due to other changes in Resource lifecycle , (perhaps by https://github.com/knative/serving/pull/14744/files#diff-831a9383e7db7880978acf31f7dfec777beb08b900b1d0e1c55a5aed42e602cb ) , this actually causes a regression since 1.33 in how this particular issue propagates towards the ksvc status.
When the same problem occurs with 1.33, the ksvc itself would actually turn into a Ready state. With 1.34, this causes the overall ksvc to not become Ready.