Uploaded image for project: 'Knative Serving'
  1. Knative Serving
  2. SRVKS-1262

[regression] Revision ContainerHealthy ContainerMissing .status.condition not cleared after successful digest resolution

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • 1.34.0
    • 1.34.0
    • None

      tracking upstream https://github.com/knative/serving/issues/15466

      On SO 1.34 CI builds, having a Revision that initially failed to resolve a digest due to

      Unable to fetch image "image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp":
      failed to resolve image to digest: GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?scope=repository%3Aocf-qe-images%2Freceiverhttp%3Apull&service=:
      unexpected status code 401 Unauthorized

      Failure to resolve a digest is a recoverable error, so eventually, the digest was resolved and the deployment was created and available.

      The Revision however stayed in this state:

      apiVersion: serving.knative.dev/v1
      kind: Revision
      metadata:
        annotations:
          autoscaling.knative.dev/max-scale: "1"
          autoscaling.knative.dev/min-scale: "1"
          autoscaling.knative.dev/target-burst-capacity: "0"
          serving.knative.dev/creator: system:admin
          serving.knative.dev/routes: receiver30
          serving.knative.dev/routingStateModified: "2024-08-12T22:28:04Z"
        creationTimestamp: "2024-08-12T22:28:04Z"
        generation: 1
        labels:
          qe.ocf.redhat.com/role: receiver
          serving.knative.dev/configuration: receiver30
          serving.knative.dev/configurationGeneration: "1"
          serving.knative.dev/configurationUID: 3a809bb8-8ba7-4a93-9d94-66a565f1612c
          serving.knative.dev/routingState: active
          serving.knative.dev/service: receiver30
          serving.knative.dev/serviceUID: c8827240-822f-489b-b7d6-48848da0593f
        name: receiver30-00001
        namespace: ksnk-dn-tls-0
        ownerReferences:
        - apiVersion: serving.knative.dev/v1
          blockOwnerDeletion: true
          controller: true
          kind: Configuration
          name: receiver30
          uid: 3a809bb8-8ba7-4a93-9d94-66a565f1612c
        resourceVersion: "860881"
        uid: d3f218f8-4fd4-4111-b225-7e084eeb8f3d
      spec:
        containerConcurrency: 0
        containers:
        - args:
          - --salt
          - "30"
          - --rejectIndexModulo
          - "0"
          - --rejectEvery
          - "0"
          - --rejectEachIndexNTimes
          - "0"
          - --durationBufferSize
          - "1"
          - --delay
          - 0s
          - --randomDelay
          - 0s
          - --idempotent
          - "true"
          - --code
          - "500"
          command:
          - /receiver
          image: image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp
          imagePullPolicy: IfNotPresent
          name: user-container
          readinessProbe:
            httpGet:
              path: /health
              port: 0
            successThreshold: 1
          resources: {}
        enableServiceLinks: false
        timeoutSeconds: 300
      status:
        actualReplicas: 1
        conditions:
        - lastTransitionTime: "2024-08-12T22:30:16Z"
          severity: Info
          status: "True"
          type: Active
        - lastTransitionTime: "2024-08-12T22:28:04Z"
          message: 'Unable to fetch image "image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp":
            failed to resolve image to digest: GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?scope=repository%3Aocf-qe-images%2Freceiverhttp%3Apull&service=:
            unexpected status code 401 Unauthorized'
          reason: ContainerMissing
          status: "False"
          type: ContainerHealthy
        - lastTransitionTime: "2024-08-12T22:28:04Z"
          message: 'Unable to fetch image "image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp":
            failed to resolve image to digest: GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?scope=repository%3Aocf-qe-images%2Freceiverhttp%3Apull&service=:
            unexpected status code 401 Unauthorized'
          reason: ContainerMissing
          status: "False"
          type: Ready
        - lastTransitionTime: "2024-08-12T22:30:12Z"
          status: "True"
          type: ResourcesAvailable
        containerStatuses:
        - imageDigest: image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp@sha256:e915478407c5c882346c4fc72078007fd2511d9e1796345db1873facafddf836
          name: user-container
        desiredReplicas: 1
        observedGeneration: 1
      

      Notice the containerStatus is filled (with a revision), and the ResourcesAvailable condition is True. The ContainerHealthy condition however stays False with ContainerMissing, even though the digest resolution was successful. This causes the overall Ready state to stay False, despite the Deployment being available.

       

      Due to other changes in Resource lifecycle , (perhaps by https://github.com/knative/serving/pull/14744/files#diff-831a9383e7db7880978acf31f7dfec777beb08b900b1d0e1c55a5aed42e602cb  ) , this actually causes a regression since 1.33 in how this particular issue propagates towards the ksvc status.

      When the same problem occurs with 1.33, the ksvc itself would actually turn into a Ready state. With 1.34, this causes the overall ksvc to not become Ready.

        1. controller-66bb97d6dc-jswfv.controller.log.bz2
          187.47 MB
          Marek Schmidt
        2. must-gather.local.9033937588891806942.tar.bz2
          25.96 MB
          Marek Schmidt
        3. SRVKS-1262-reproducer-multi.sh
          1 kB
          Marek Schmidt

              skontopo@redhat.com Stavros Kontopoulos
              maschmid@redhat.com Marek Schmidt
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: