Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5885

packageserver CSV in unknown state since 3h45m during upgrade to OCP 4.10.35

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Minor
    • None
    • 4.10.z
    • OLM
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      during Openshift 4 installation  operator-lifecycle-manager has not yet successfully rolled out
      
      ~~~
      for i in `omg get co | awk ' $3 == "False" || $4 == "True" || $5 == "True" {print $1}'` ; do echo $i ;omg get co $i -o json | jq '.status.conditions'; done
      operator-lifecycle-manager
      [{ "lastTransitionTime": "2022-12-11T23:14:08Z", "message": "Waiting to see update 0.19.0 succeed", "status": "True", "type": "Progressing" },{ "lastTransitionTime": "2022-12-11T23:14:08Z", "status": "False", "type": "Available" },{ "lastTransitionTime": "2022-12-11T23:14:08Z", "message": "Waiting for updates to take effect", "status": "True", "type": "Degraded" },{ "lastTransitionTime": "2022-12-11T23:14:08Z", "message": "Waiting for updates to take effect", "status": "False", "type": "Upgradeable" }]  
       olm operator logs 
      ~~~
      2022-12-11T23:13:05.257068673Z time="2022-12-11T23:13:05Z" level=info msg="ClusterOperator api not present, skipping update (Get \"
      https://172.30.0.1:443/api?timeout=32s
      ": dial tcp
      172.30.0.1
      :443: connect: connection refused)" monitor=clusteroperator
      2022-12-11T23:13:05.257068673Z time="2022-12-11T23:13:05Z" level=info msg="exiting from clusteroperator monitor loop" monitor=clusteroperator
      2022-12-11T23:13:05.257095625Z E1211 23:13:05.257062 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:255: Failed to watch *v2.OperatorCondition: failed to list *v2.OperatorCondition: Get "
      https://172.30.0.1:443/apis/operators.coreos.com/v2/operatorconditions?limit=500&resourceVersion=0
      ": dial tcp
      172.30.0.1
      :443: connect: connection refused2022-12-23T09:31:05.806427847Z {"level":"error","ts":1671787865.8063347,"logger":"controllers.operatorcondition","msg":"Error ensuring OperatorCondition Deployment EnvVars","request":"openshift-operator-lifecycle-manager/packageserver","error":"Deployment.apps \"packageserver\" not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
      ~~~  
      
      
      
      In the time-frame I see kube-apiserver is active and responding, 
      ~~~
      oc rsh olm-operator-864479dd5d-bfll4
      sh-4.4$ curl -v -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" --cert //srv-cert/tls.crt --key srv-cert/tls.key https://172.30.0.1:443/apis/operators.coreos.com/v1alpha1/clusterserviceversions?labelSelector=%21olm.copiedFrom
      [2]+  Done                    limit=500
      sh-4.4$ ls srv
      srv/      srv-cert/
      sh-4.4$ ls srv-cert/
      ..2022_12_11_23_12_26.3582295664/ ..data/                           tls.crt                           tls.key
      sh-4.4$
      sh-4.4$
      sh-4.4$
      sh-4.4$
      sh-4.4$
      sh-4.4$ curl -vk -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" --cert srv-cert/tls.crt --key srv-cert/tls.key https://172.30.0.1:443/apis/operators.coreos.com/v1alpha1/clusterserviceversions?labelSelector=%21olm.copiedFrom&limit=500&resourceVersion=
      [1] 33
      [2] 34
      sh-4.4$ *   Trying 172.30.0.1...TCP_NODELAY setConnected to 172.30.0.1 (172.30.0.1) port 443 (#0)ALPN, offering h2ALPN, offering http/1.1successfully set certificate verify locations:  CAfile: /etc/pki/tls/certs/ca-bundle.crt
        CApath: noneTLSv1.3 (OUT), TLS handshake, Client hello (1):TLSv1.3 (IN), TLS handshake, Server hello (2):TLSv1.3 (IN), TLS handshake, [no content] (0):TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):TLSv1.3 (IN), TLS handshake, [no content] (0):TLSv1.3 (IN), TLS handshake, Request CERT (13):TLSv1.3 (IN), TLS handshake, [no content] (0):TLSv1.3 (IN), TLS handshake, Certificate (11):TLSv1.3 (IN), TLS handshake, [no content] (0):TLSv1.3 (IN), TLS handshake, CERT verify (15):TLSv1.3 (IN), TLS handshake, [no content] (0):TLSv1.3 (IN), TLS handshake, Finished (20):TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):TLSv1.3 (OUT), TLS handshake, [no content] (0):TLSv1.3 (OUT), TLS handshake, Certificate (11):TLSv1.3 (OUT), TLS handshake, [no content] (0):TLSv1.3 (OUT), TLS handshake, CERT verify (15):TLSv1.3 (OUT), TLS handshake, [no content] (0):TLSv1.3 (OUT), TLS handshake, Finished (20):SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256ALPN, server accepted to use h2Server certificate: subject: CN=172.30.0.1 start date: Jan 10 23:11:19 2023 GMT expire date: Feb  9 23:11:20 2023 GMT issuer: OU=openshift; CN=kube-apiserver-service-network-signer SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.Using HTTP2, server supports multi-useConnection state changed (HTTP/2 confirmed)Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0TLSv1.3 (OUT), TLS app data, [no content] (0):TLSv1.3 (OUT), TLS app data, [no content] (0):TLSv1.3 (OUT), TLS app data, [no content] (0):Using Stream ID: 1 (easy handle 0x55eb63892d60)TLSv1.3 (OUT), TLS app data, [no content] (0):
      > GET /apis/operators.coreos.com/v1alpha1/clusterserviceversions?labelSelector=%21olm.copiedFrom HTTP/2
      > Host: 172.30.0.1
      > User-Agent: curl/7.61.1
      > Accept: /
      > Authorization: Bearer .
      .
      ..   <response code is 200>~~~

      this looks similar to issue reported via https://issues.redhat.com/browse/OCPBUGS-5044

       

      Version-Release number of selected component (if applicable):

      4.10.z

      How reproducible:

       

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

       

      Attachments

        Activity

          People

            agreene1991 Alexander Greene
            rhn-support-dmule Dhananjay Mule (Inactive)
            Kui Wang Kui Wang
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: