Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-2476

OpenStack MachinePool segfault with autoscaling minReplicas=0

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • None
    • False

      When using an autoscaling MachinePool with OpenStack, setting minReplicas=0 results in a nil pointer panic.

      See HIVE-2415 for context.

            [HIVE-2476] OpenStack MachinePool segfault with autoscaling minReplicas=0

            Eric Fried added a comment -

            mihuang@redhat.com Backports are all done, I think we're ready to close this.

            Eric Fried added a comment - mihuang@redhat.com Backports are all done, I think we're ready to close this.

            Eric Fried added a comment - - edited

            Gah, don't close mihuang@redhat.com – I see we're still working through the original issue for backports.

            Thanks!

            Eric Fried added a comment - - edited Gah, don't close mihuang@redhat.com – I see we're still working through the original issue for backports. Thanks!

            Eric Fried added a comment - - edited

            Since we've fixed the segfault, I'm splitting the autoscaling problem out into a separate card: HIVE-2590

            We can close this one mihuang@redhat.com, thanks!

            Eric Fried added a comment - - edited Since we've fixed the segfault, I'm splitting the autoscaling problem out into a separate card: HIVE-2590 We can close this one mihuang@redhat.com , thanks!

            Eric Fried added a comment -

            mihuang@redhat.com bump

            Looks like we're seeing this in the field.

            I've rebuilt the sandbox image at quay.io/2uasimojo/hive:osp-npe-2.5 – can you please retest and try to reproduce the autoscaler error? If it manifests again, please capture the hive-controllers logs for me. Thanks!

            Eric Fried added a comment - mihuang@redhat.com bump Looks like we're seeing this in the field . I've rebuilt the sandbox image at quay.io/2uasimojo/hive:osp-npe-2.5 – can you please retest and try to reproduce the autoscaler error? If it manifests again, please capture the hive-controllers logs for me. Thanks!

            Eric Fried added a comment -

            mihuang@redhat.com I'm going to need to see the hive-controllers logs, which aren't captured in the hub must-gather.

            (And I'll continue trying to set up my own OpenStack env as I'm able to get around to it.)

            Thanks!

            Eric Fried added a comment - mihuang@redhat.com I'm going to need to see the hive-controllers logs, which aren't captured in the hub must-gather. (And I'll continue trying to set up my own OpenStack env as I'm able to get around to it.) Thanks!

            Eric Fried added a comment -

            mihuang@redhat.com Sorry to do this to you, but

            • I'm still blocked deploying on OpenStack
            • The fix on MCE branches is different (can't use the upstream fix, have to do it locally, can't even use the unit tests)

            Would you mind rerunning the tests using quay.io/2uasimojo/hive:osp-npe-2.5 (built from https://github.com/openshift/hive/pull/2274)?

            Thanks in advance!

            Eric Fried added a comment - mihuang@redhat.com Sorry to do this to you, but I'm still blocked deploying on OpenStack The fix on MCE branches is different (can't use the upstream fix, have to do it locally, can't even use the unit tests) Would you mind rerunning the tests using quay.io/2uasimojo/hive:osp-npe-2.5 (built from https://github.com/openshift/hive/pull/2274)? Thanks in advance!

            Eric Fried added a comment -

            mihuang@redhat.com Is it possible that the hive-controllers pods hadn't rolled over to the new image yet when you encountered the failure? If you have the logs from that attempt, L3 should indicate which commit the image was running.

            Anyway, I'm glad it's behaving now!

            I'll go ahead and merge the PR, and then we can close this card. I'll continue to pursue using the OpenStack env separately. Thanks!

            Eric Fried added a comment - mihuang@redhat.com Is it possible that the hive-controllers pods hadn't rolled over to the new image yet when you encountered the failure? If you have the logs from that attempt, L3 should indicate which commit the image was running. Anyway, I'm glad it's behaving now! I'll go ahead and merge the PR, and then we can close this card. I'll continue to pursue using the OpenStack env separately. Thanks!

            Eric Fried added a comment -

            I'm going to continue trying to set up an OpenStack env

            Got stuck on creds

            Eric Fried added a comment - I'm going to continue trying to set up an OpenStack env Got stuck on creds

            Eric Fried added a comment - - edited

            Commit 8d2b041 is from before the fix. The crash happens here because pool.Replicas is nil. In the fixed version I revendored the upstream installer code where I had added this chunk to ensure that is never the case.

            I clone your repo code to my local, then built the image quay.io/mihuang/hive:8d2b041

            Can you please explain the steps you took to do this? Your image is tagged with that old commit hash, and L3 in your log dump confirms that you built at that commit level.


            Would you please clarify the error you saw when running with quay.io/2uasimojo/hive:openstack-npe? Was it the same nil pointer exception in the logs?


            I'm going to continue trying to set up an OpenStack env so I can test this myself; but in the meantime, would you please try again with quay.io/2uasimojo/hive:openstack-npe?

            Eric Fried added a comment - - edited Commit 8d2b041 is from before the fix. The crash happens here because pool.Replicas is nil . In the fixed version I revendored the upstream installer code where I had added this chunk to ensure that is never the case. I clone your repo code to my local, then built the image quay.io/mihuang/hive:8d2b041 Can you please explain the steps you took to do this? Your image is tagged with that old commit hash, and L3 in your log dump confirms that you built at that commit level. Would you please clarify the error you saw when running with quay.io/2uasimojo/hive:openstack-npe? Was it the same nil pointer exception in the logs? I'm going to continue trying to set up an OpenStack env so I can test this myself; but in the meantime, would you please try again with quay.io/2uasimojo/hive:openstack-npe?

            Eric Fried added a comment -

            Hi mihuang@redhat.com. This should be resolved now. Thanks!

            Eric Fried added a comment - Hi mihuang@redhat.com . This should be resolved now. Thanks!

              efried.openshift Eric Fried
              efried.openshift Eric Fried
              Mingxia Huang Mingxia Huang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: