Uploaded image for project: 'Red Hat Internal Developer Platform'
  1. Red Hat Internal Developer Platform
  2. RHIDP-2597

[janus-idp/operator] Existing Backstage operand not upgraded (stuck on mounting a ConfigMap) after upgrading operator from 1.1.x to 1.2.x

Prepare for Y ReleasePrepare for Z ReleaseRemove QuarterXMLWordPrintable

    • 2
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide
      This update fixes an issue preventing operator-backed Red Hat Developer Hub (RHDH) instances from being upgraded seamlessly when the RHDH Operator is itself upgraded.
      The Operator has been refactored in this update, and, when trying to reconcile already-existing RHDH Custom Resources, it might be denied patching certain fields that are restricted or read-only by Kubernetes/OpenShift. To overcome this and successfully reach the desired state, it will attempt to replace such objects forcibly in case it is not able to patch them.
      As a known issue, if you had set any custom labels or annotations on any of the underlying resources managed by the RHDH Operator, you might need to add them again after the upgrade.
      Show
      This update fixes an issue preventing operator-backed Red Hat Developer Hub (RHDH) instances from being upgraded seamlessly when the RHDH Operator is itself upgraded. The Operator has been refactored in this update, and, when trying to reconcile already-existing RHDH Custom Resources, it might be denied patching certain fields that are restricted or read-only by Kubernetes/OpenShift. To overcome this and successfully reach the desired state, it will attempt to replace such objects forcibly in case it is not able to patch them. As a known issue, if you had set any custom labels or annotations on any of the underlying resources managed by the RHDH Operator, you might need to add them again after the upgrade.
    • Known Issue
    • Proposed
    • RHDH Core Team 3258

      [2340552701] Upstream Reporter: Armel Soro
      Upstream issue status: Closed
      Upstream description:

      /kind bug

      What did you do exactly?

      • From the operator repo, switch to the 1.1.x branch and deploy the operator from RHDH 1.1 (setting the IMG arg because the default image for this branch (quay.io/janus-idp/operator:0.1.3) has expired and no longer exists on quay.io)
      git switch 1.1.x
      make deploy IMG=quay.io/rhdh/rhdh-rhel9-operator:1.1
      kubectl apply -f examples/bs1.yaml
      • Wait a few seconds until all the resources are created
      • Check the Backstage Custom Resource status. Reason should be DeployOK:
      $ kubectl describe backstage bs1                                                                                                                                                           
      Name:         bs1                                                                                                                                                                              
      Namespace:    my-ns                                                                                                                                                                            
      Labels:       <none>                                                                                                                                                                           
      Annotations:  <none>                                                                                                                                                                           
      API Version:  rhdh.redhat.com/v1alpha1                                                                                                                                                         
      Kind:         Backstage                                                                                                                                                                        
      Metadata:                                      
        Creation Timestamp:  2024-06-07T12:21:58Z                                                                                                                                                    
        Generation:          1                                                                       
        Resource Version:    48634                   
        UID:                 e4e9766f-6c32-4b44-85cd-1eac93b56f16                                                                                                                                    
      Status:                                                                                                                                                                                        
        Conditions:                      
          Last Transition Time:  2024-06-07T12:21:58Z 
          Message:                                                                                                                                                                                       Reason:                DeployOK                                                                                                                                                            
          Status:                True                                                                                                                                                                
          Type:                  Deployed            
      Events:                    <none>
      • Switch to the 1.2.x branch and deploy the upcoming 1.2 operator
      git switch 1.2.x
      make deploy
      • Wait a few seconds until the new version of the operator pod is running and the existing CR is reconciled again. Then check the CR status again. Reason will be DeployFailed with an error :
      $ kubectl describe backstage bs1
      Name:         bs1
      Namespace:    my-ns
      Labels:       <none>
      Annotations:  <none>
      API Version:  rhdh.redhat.com/v1alpha1
      Kind:         Backstage
      Metadata:
        Creation Timestamp:  2024-06-07T13:44:15Z
        Generation:          1
        Resource Version:    2846
        UID:                 5cdad2eb-0840-4c8e-8c2b-08be46e3856a
      Status:
        Conditions:
          Last Transition Time:  2024-06-07T13:49:06Z
          Message:               failed to apply backstage objects failed to patch object &Service{ObjectMeta:{backstage-psql-bs1  my-ns   2100 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/instance:bs1 app.kubernetes.io/name:backstage rhdh.redhat.com/app:backstage-psql-bs1] map[] [{rhdh.redhat.com/v1alpha1 Backstage bs1 5cdad2eb-0840-4c8e-8c2b-08be46e3856a 0xc000590f55 0xc000590f54}] [] []},Spec:ServiceSpec{Ports:[]ServicePort{ServicePort{Name:,Protocol:,Port:5432,TargetPort:{0 0 },NodePort:0,AppProtocol:nil,},},Selector:map[string]string{rhdh.redhat.com/app: backstage-psql-bs1,},ClusterIP:None,Type:,ExternalIPs:[],SessionAffinity:,LoadBalancerIP:,LoadBalancerSourceRanges:[],ExternalName:,ExternalTrafficPolicy:,HealthCheckNodePort:0,PublishNotReadyAddresses:false,SessionAffinityConfig:nil,IPFamilyPolicy:nil,ClusterIPs:[],IPFamilies:[],AllocateLoadBalancerNodePorts:nil,LoadBalancerClass:nil,InternalTrafficPolicy:nil,},Status:ServiceStatus{LoadBalancer:LoadBalancerStatus{Ingress:[]LoadBalancerIngress{},},Conditions:[]Condition{},},}: failed to patch object *v1.Service: Service "backstage-psql-bs1" is invalid: spec.clusterIPs[0]: Invalid value: []string{"None"}: may not change once set
          Reason:                DeployFailed
          Status:                False
          Type:                  Deployed
      Events:                    <none>

      Actual behavior

      It seems the existing CR could not be reconciled successfully with the new version of the operator because it was unable to patch the existing database Service object.

      If we take a look at the resources, a new Backstage pod was in the process of being created, but is stuck on trying a mount a ConfigMap (which could not be created because of the failure to patch the DB Service):

      $ kubectl get pod
      
      NAME                             READY   STATUS     RESTARTS   AGE
      backstage-psql-bs1-0             1/1     Running    0          9m22s
      backstage-bs1-655f659ddc-n7grw   1/1     Running    0          9m22s
      backstage-bs1-6469fdd48f-ldq5h   0/1     Init:0/1   0          4m31s
      
      $ kubectl describe pod backstage-bs1-6469fdd48f-ldq5h
      
      [...]
      Events:
        Type     Reason            Age                   From               Message
        ----     ------            ----                  ----               -------
        Warning  FailedScheduling  5m18s                 default-scheduler  0/1 nodes are available: waiting for ephemeral volume controller to create the persistentvolumeclaim "backstage-bs1-6469fdd48f-ldq5h-dynamic-plugins-root". preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..
        Normal   Scheduled         5m11s                 default-scheduler  Successfully assigned my-ns/backstage-bs1-6469fdd48f-ldq5h to k3d-k3s-default-server-0
        Warning  FailedMount       3m9s                  kubelet            Unable to attach or mount volumes: unmounted volumes=[backstage-appconfig-bs1], unattached volumes=[backstage-appconfig-bs1 dynamic-plugins-root dynamic-plugins-npmrc backstage-dynamic-plugins-bs1]: timed out waiting for the condition
        Warning  FailedMount       62s (x10 over 5m12s)  kubelet            MountVolume.SetUp failed for volume "backstage-appconfig-bs1" : configmap "backstage-appconfig-bs1" not found
        Warning  FailedMount       51s                   kubelet            Unable to attach or mount volumes: unmounted volumes=[backstage-appconfig-bs1], unattached volumes=[dynamic-plugins-root dynamic-plugins-npmrc backstage-dynamic-plugins-bs1 backstage-appconfig-bs1]: timed out waiting for the condition

      Note that the same issue happens when upgrading from the operator channels on OpenShift (https://github.com/janus-idp/operator/blob/main/.rhdh/docs/installing-ci-builds.adoc).

      Expected behavior

      CR reconciliation should be successful, and the application should be upgraded.


      Upstream URL: https://github.com/janus-idp/operator/issues/382

              rh-ee-asoro Armel Soro
              rh-ee-asoro Armel Soro
              RHIDP - Install
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: