Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-13563

[vmware csi driver] vsphere-syncher does not retry populate the CSINodeTopology with topology information when registration fails

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Critical
    • None
    • 4.13
    • Storage / Operators
    • None
    • Critical
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      This is a clone of issue OCPBUGS-13314. The following is the description of the original issue:

      Description of problem:

      [vmware csi driver] vsphere-syncher does not retry populate the CSINodeTopology with topology information when registration fails
      
      When syncer starts it watches for node events, but it does not retry if registration fails and in the meanwhile any csinodetopoligy requests might not get served, because VM is not found

      Version-Release number of selected component (if applicable):

      4.13.0-0.nightly-2023-05-04-090524

      How reproducible:

      Randomly

      Steps to Reproduce:

      1. Install OCP cluster by UPI with encrypt 
      2. Check the cluster storage operator not degrade
      

      Actual results:

      cluster storage operator degrade that VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods 
      
      ...
      2023-05-09T06:06:22.146861934Z I0509 06:06:22.146850       1 main.go:183] ServeMux listening at "0.0.0.0:10300"
      2023-05-09T06:07:00.283007138Z E0509 06:07:00.282912       1 main.go:64] failed to establish connection to CSI driver: context canceled
      2023-05-09T06:07:07.283109412Z W0509 06:07:07.283061       1 connection.go:173] Still connecting to unix:///csi/csi.sock
      ...
      
      # Many error logs in csi driver related timed out while waiting for topology labels to be updated in \"compute-2\" CSINodeTopology instance .
      
      ...
      2023-05-09T06:19:16.499856730Z {"level":"error","time":"2023-05-09T06:19:16.499687071Z","caller":"k8sorchestrator/topology.go:837","msg":"timed out while waiting for topology labels to be updated in \"compute-2\" CSINodeTopology instance.","TraceId":"b8d9305e-9681-4eba-a8ac-330383227a23","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common/commonco/k8sorchestrator.(*nodeVolumeTopology).GetNodeTopologyLabels\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/common/commonco/k8sorchestrator/topology.go:837\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).NodeGetInfo\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/node.go:429\ngithub.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetInfo_Handler\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:6231\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:922"}
      ...

      Expected results:

      Install vsphere ocp cluster succeed and the cluster storage operator is healthy

      Additional info:

       

      Attachments

        Issue Links

          Activity

            People

              hekumar@redhat.com Hemant Kumar
              openshift-crt-jira-prow OpenShift Prow Bot
              Penghao Wang Penghao Wang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: