Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3487

hw-event-proxy deployment not auto-recovered by operator

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • 4.12
    • BMER Events
    • None
    • CNF RAN Sprint 228
    • 1
    • Proposed
    • True
    • Hide

      None

      Show
      None

      Lab Setup:

      BMER is successfully deployed through hw-event-proxy-operator by ztp. `HardwareEvent` CR is deployed and hw-event-proxy pods are running.

       

      Step to Reproduce:

      Delete hw-event-proxy deployment as below:

      oc -n openshift-bare-metal-events delete deployment hw-event-proxy

      The deployment is not recreated by operator. Logs from hw-event-proxy-operator-controller-manager indicates the redfish secret is not found.

      2022-11-10T09:51:35.104Z    INFO    controllers.HardwareEvent    Reconciling Hardware event proxy    {"Request.Namespace": "openshift-bare-metal-events", "Request.Name": "openshift-bare-metal-events"}
      2022-11-10T09:51:35.104Z    INFO    controllers.HardwareEvent    redfish secret not found, please create a secret to access hardware events     {"Request.Namespace": "openshift-bare-metal-events", "Request.Name": "openshift-bare-metal-events"}
      

      However the redfish secret is still there:

      [jacding@fedora notes]$ oc get secret | grep redfish
      redfish-basic-auth-gtkh496bhf                                Opaque                                3      5d21h 

       

      Workaround:

       

      Manually re-create redfish secret:

      oc -n openshift-bare-metal-events create secret generic redfish-basic-auth --from-literal username=$REDFISH_USERNAME --from-literal password=$REDFISH_PASSWORD --from-literal hostaddr="$REDFISH_HOSTADDR"

      This step created a secret with name `redfish-basic-auth` in addition to original redfish secret with different names but same content.

      $ oc get secret | grep redfish
      redfish-basic-auth                                           Opaque                                3      3s
      redfish-basic-auth-gtkh496bhf                                Opaque                                3      5d21h 

      After this secret is created, the hw-event-proxy pod is deployed successfully, however there is a rolling error in hw-event-proxy-operator-controller-manager logs:

      2022-11-10T14:00:35.480Z    INFO    controllers.HardwareEvent    Reconciling Hardware event proxy    {"Request.Namespace": "openshift-bare-metal-events", "Request.Name": "openshift-bare-metal-events"}
      2022-11-10T14:00:35.480Z    INFO    controllers.HardwareEvent    redfish secret not found, please create a secret to access hardware events     {"Request.Namespace": "openshift-bare-metal-events", "Request.Name": "openshift-bare-metal-events"}
      2022-11-10T14:01:05.481Z    INFO    controllers.HardwareEvent    Reconciling Hardware event proxy    {"Request.Namespace": "openshift-bare-metal-events", "Request.Name": "openshift-bare-metal-events"}
      2022/11/10 14:01:05 reconciling (apps/v1, Kind=Deployment) openshift-bare-metal-events/hw-event-proxy
      2022/11/10 14:01:05 does not exist, creating (apps/v1, Kind=Deployment) openshift-bare-metal-events/hw-event-proxy
      2022/11/10 14:01:05 successfully created (apps/v1, Kind=Deployment) openshift-bare-metal-events/hw-event-proxy
      2022/11/10 14:01:05 reconciling (/v1, Kind=ServiceAccount) openshift-bare-metal-events/hw-event-proxy-sa
      2022-11-10T14:01:05.520Z    ERROR    controllers.HardwareEvent    failed to sync hardware event proxy deployment     {"Request.Namespace": "openshift-bare-metal-events", "Request.Name": "openshift-bare-metal-events", "error": "failed to apply hw-event-proxy-sa object &{map[apiVersion:v1 imagePullSecrets:[map[name:hw-event-proxy-sa-dockercfg-4xk2t]] kind:ServiceAccount metadata:map[annotations:map[kubectl.kubernetes.io/last-applied-configuration:{\"apiVersion\":\"v1\",\"kind\":\"ServiceAccount\",\"metadata\":{\"annotations\":{},\"name\":\"hw-event-proxy-sa\",\"namespace\":\"openshift-bare-metal-events\"}}\n] creationTimestamp:2022-11-04T16:29:28Z labels:map[] name:hw-event-proxy-sa namespace:openshift-bare-metal-events ownerReferences:[map[apiVersion:event.redhat-cne.org/v1alpha1 blockOwnerDeletion:true controller:true kind:HardwareEvent name:openshift-bare-metal-events uid:eb309e4b-a741-40fa-97a6-a53a0b4955af]] resourceVersion:429400 uid:767f6936-78be-4bde-ab9c-0ccca913c53c] secrets:[map[name:hw-event-proxy-sa-dockercfg-4xk2t]]]} with err: could not update object (/v1, Kind=ServiceAccount) openshift-bare-metal-events/hw-event-proxy-sa: serviceaccounts \"hw-event-proxy-sa\" is forbidden: cannot set an ownerRef on a resource you can't delete: , <nil>"}
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
          /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
          /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
          /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
      2022-11-10T14:01:05.520Z    INFO    controllers.HardwareEvent    Reconciling Hardware event proxy    {"Request.Namespace": "openshift-bare-metal-events", "Request.Name": "openshift-bare-metal-events"}
      2022/11/10 14:01:05 reconciling (apps/v1, Kind=Deployment) openshift-bare-metal-events/hw-event-proxy
      2022-11-10T14:01:05.529Z    ERROR    controllers.HardwareEvent    failed to sync hardware event proxy deployment     {"Request.Namespace": "openshift-bare-metal-events", "Request.Name": "openshift-bare-metal-events", "error": "failed to apply hw-event-proxy object &{map[apiVersion:apps/v1 kind:Deployment metadata:map[annotations:map[release.openshift.io/version:] creationTimestamp:2022-11-10T14:01:05Z generation:1 labels:map[app:hw-event-proxy] name:hw-event-proxy namespace:openshift-bare-metal-events ownerReferences:[map[apiVersion:event.redhat-cne.org/v1alpha1 blockOwnerDeletion:true controller:true kind:HardwareEvent name:openshift-bare-metal-events uid:eb309e4b-a741-40fa-97a6-a53a0b4955af]] resourceVersion:3441976 uid:d7bd6442-d164-443d-824a-3ec18fe2d980] spec:map[replicas:1 selector:map[matchLabels:map[app:hw-event-proxy]] template:map[metadata:map[labels:map[app:hw-event-proxy]] spec:map[containers:[map[args:[--api-port=9085] env:[map[name:NODE_NAME valueFrom:map[fieldRef:map[fieldPath:spec.nodeName]]] map[name:HW_EVENT_PROXY_SERVICE_SERVICE_PORT value:9087] map[name:MSG_PARSER_PORT value:9097] map[name:MSG_PARSER_TIMEOUT value:10] map[name:REDFISH_USERNAME valueFrom:map[secretKeyRef:map[key:username name:redfish-basic-auth]]] map[name:REDFISH_PASSWORD valueFrom:map[secretKeyRef:map[key:password name:redfish-basic-auth]]] map[name:REDFISH_HOSTADDR valueFrom:map[secretKeyRef:map[key:hostaddr name:redfish-basic-auth]]] map[name:LOG_LEVEL value:trace]] image:quay.io/openshift/origin-baremetal-hardware-event-proxy:latest name:hw-event-proxy ports:[map[containerPort:9087 name:hw-event-port]] resources:map[limits:map[cpu:20m] requests:map[cpu:10m]]] map[args:[--metrics-addr=127.0.0.1:9091 --store-path=/store --transport-host=amqp://amq-router.amq-router.svc.cluster.local --api-port=9085] env:[map[name:NODE_NAME valueFrom:map[fieldRef:map[fieldPath:spec.nodeName]]]] image:quay.io/openshift/origin-cloud-event-proxy:4.12 imagePullPolicy:IfNotPresent name:cloud-event-proxy ports:[map[containerPort:9091 name:metrics-port] map[containerPort:9085 name:api-port]] volumeMounts:[map[mountPath:/store name:pubsubstore]]] map[args:[--logtostderr --secure-listen-address=:8443 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 --upstream=http://127.0.0.1:9091/ --tls-private-key-file=/etc/metrics/tls.key --tls-cert-file=/etc/metrics/tls.crt] image:registry.redhat.io/openshift4/ose-kube-rbac-proxy@sha256:b420e87225e0bcb49906f6070b69564a04335ca05eb8f6bb7855666e6f4061e3 imagePullPolicy:IfNotPresent name:kube-rbac-proxy ports:[map[containerPort:8443 name:https]] resources:map[requests:map[cpu:10m memory:20Mi]] terminationMessagePolicy:FallbackToLogsOnError volumeMounts:[map[mountPath:/etc/metrics name:hw-event-proxy-certs readOnly:true]]]] serviceAccountName:hw-event-proxy-sa volumes:[map[emptyDir:map[] name:pubsubstore] map[name:hw-event-proxy-certs secret:map[secretName:hw-event-proxy-secret]]]]]]]} with err: could not update object (apps/v1, Kind=Deployment) openshift-bare-metal-events/hw-event-proxy: Operation cannot be fulfilled on deployments.apps \"hw-event-proxy\": the object has been modified; please apply your changes to the latest version and try again"}
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
          /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
          /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
          /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
      2022-11-10T14:01:05.529Z    INFO    controllers.HardwareEvent    Reconciling Hardware event proxy    {"Request.Namespace": "openshift-bare-metal-events", "Request.Name": "openshift-bare-metal-events"}
      2022/11/10 14:01:05 reconciling (apps/v1, Kind=Deployment) openshift-bare-metal-events/hw-event-proxy
      2022/11/10 14:01:05 update was successful
      2022/11/10 14:01:05 reconciling (/v1, Kind=ServiceAccount) openshift-bare-metal-events/hw-event-proxy-sa
      2022-11-10T14:01:05.547Z    ERROR    controllers.HardwareEvent    failed to sync hardware event proxy deployment     {"Request.Namespace": "openshift-bare-metal-events", "Request.Name": "openshift-bare-metal-events", "error": "failed to apply hw-event-proxy-sa object &{map[apiVersion:v1 imagePullSecrets:[map[name:hw-event-proxy-sa-dockercfg-4xk2t]] kind:ServiceAccount metadata:map[annotations:map[kubectl.kubernetes.io/last-applied-configuration:{\"apiVersion\":\"v1\",\"kind\":\"ServiceAccount\",\"metadata\":{\"annotations\":{},\"name\":\"hw-event-proxy-sa\",\"namespace\":\"openshift-bare-metal-events\"}}\n] creationTimestamp:2022-11-04T16:29:28Z labels:map[] name:hw-event-proxy-sa namespace:openshift-bare-metal-events ownerReferences:[map[apiVersion:event.redhat-cne.org/v1alpha1 blockOwnerDeletion:true controller:true kind:HardwareEvent name:openshift-bare-metal-events uid:eb309e4b-a741-40fa-97a6-a53a0b4955af]] resourceVersion:429400 uid:767f6936-78be-4bde-ab9c-0ccca913c53c] secrets:[map[name:hw-event-proxy-sa-dockercfg-4xk2t]]]} with err: could not update object (/v1, Kind=ServiceAccount) openshift-bare-metal-events/hw-event-proxy-sa: serviceaccounts \"hw-event-proxy-sa\" is forbidden: cannot set an ownerRef on a resource you can't delete: , <nil>"}
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
          /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
          /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
          /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
      2022-11-10T14:01:05.547Z    INFO    controllers.HardwareEvent    Reconciling Hardware event proxy    {"Request.Namespace": "openshift-bare-metal-events", "Request.Name": "openshift-bare-metal-events"}
      2022/11/10 14:01:05 reconciling (apps/v1, Kind=Deployment) openshift-bare-metal-events/hw-event-proxy
      2022/11/10 14:01:05 update was successful
      
       

       

              jacding@redhat.com Jack Ding
              jacding@redhat.com Jack Ding
              Niv Gal Waizer Niv Gal Waizer (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: