-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
4.16
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
No
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
[AWS] CAPI machine stuck in Pending and capa log shows panic when set publicIP: true
Version-Release number of selected component (if applicable):
4.16.0-0.nightly-2024-05-07-025557
How reproducible:
always
Steps to Reproduce:
1.Install a AWS 4.16 tech preview cluster, we use automated template: ipi-on-aws/versioned-installer-ovn-ci with feature_set: "TechPreviewNoUpgrade" liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.16.0-0.nightly-2024-05-07-025557 True False 20m Cluster version is 4.16.0-0.nightly-2024-05-07-025557 2.Create CAPI machine with publicIP: true liuhuali@Lius-MacBook-Pro huali-test % oc get machine.cluster.x-k8s.io NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION capi-machineset-51071-cwdb5 huliu-aws520b-bp6bq Pending 10m liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachinetemplate NAME AGE aws-machinetemplate 11m liuhuali@Lius-MacBook-Pro huali-test % oc get awsmachinetemplate aws-machinetemplate -oyaml apiVersion: infrastructure.cluster.x-k8s.io/v1beta2 kind: AWSMachineTemplate metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"infrastructure.cluster.x-k8s.io/v1beta2","kind":"AWSMachineTemplate","metadata":{"annotations":{},"name":"aws-machinetemplate","namespace":"openshift-cluster-api"},"spec":{"template":{"spec":{"additionalSecurityGroups":[{"filters":[{"name":"tag:Name","values":["huliu-aws520b-bp6bq-worker-sg"]}]}],"ami":{"id":"ami-0ae9b509738034a2c"},"failureDomain":"us-east-2c","iamInstanceProfile":"huliu-aws520b-bp6bq-worker-profile","ignition":{"storageType":"UnencryptedUserData","version":"3.2"},"instanceType":"m6i.xlarge","publicIP":true,"subnet":{"filters":[{"name":"tag:Name","values":["huliu-aws520b-bp6bq-private-us-east-2c"]}]},"uncompressedUserData":true}}}} creationTimestamp: "2024-05-20T09:53:44Z" generation: 1 name: aws-machinetemplate namespace: openshift-cluster-api ownerReferences: - apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster name: huliu-aws520b-bp6bq uid: 7a718e08-b245-4cfb-85f8-5db0ef99e2fe resourceVersion: "246389" uid: c0d90468-392b-449d-b9fb-0c58eac73d7a spec: template: spec: additionalSecurityGroups: - filters: - name: tag:Name values: - huliu-aws520b-bp6bq-worker-sg ami: id: ami-0ae9b509738034a2c iamInstanceProfile: huliu-aws520b-bp6bq-worker-profile ignition: storageType: UnencryptedUserData version: "3.2" instanceType: m6i.xlarge publicIP: true subnet: filters: - name: tag:Name values: - huliu-aws520b-bp6bq-private-us-east-2c uncompressedUserData: true liuhuali@Lius-MacBook-Pro huali-test % liuhuali@Lius-MacBook-Pro huali-test % oc get pod NAME READY STATUS RESTARTS AGE capa-controller-manager-6d695d96c9-tnppx 1/1 Running 1 (7h21m ago) 7h26m capi-controller-manager-68456c4d64-c7dsq 1/1 Running 1 (7h21m ago) 7h26m cluster-capi-operator-5d6b7d67b4-2lvl6 1/1 Running 1 (7h24m ago) 7h27m liuhuali@Lius-MacBook-Pro huali-test % oc logs capa-controller-manager-6d695d96c9-tnppx … I0520 10:01:47.104698 1 awsmachine_controller.go:680] "Creating EC2 instance" E0520 10:01:47.204987 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 439 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x2f1c1c0?, 0x5a601b0}) /build/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:108 +0xb2 panic({0x2f1c1c0?, 0x5a601b0?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/ec2.(*Service).findSubnet(0xc0031f2930, 0xc0044142a0) /build/pkg/cloud/services/ec2/instances.go:353 +0x1e2d sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/ec2.(*Service).CreateInstance(0xc0031f2930, 0xc0044142a0, {0xc002a1dc00, 0x6cf, 0x6cf}, {0xc003f4b2e0, 0x8}) /build/pkg/cloud/services/ec2/instances.go:179 +0xa25 sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).createInstance(0xc0044142a0?, {0x3cc1d08, 0xc0031f2930}, 0xc0044142a0, {0x3cc4c28, 0xc002e02d80}, {0x3ca3aa0, 0xc0004eec60}) /build/controllers/awsmachine_controller.go:687 +0xaf sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).reconcileNormal(0xc001376cf0, {0x3cd1148?, 0xc002e02d80?}, 0xc0044142a0, {0x3cc4c28, 0xc002e02d80}, {0x3cd1148, 0xc002e02d80}, {0x3cd0b48, 0xc002e02d80}, ...) /build/controllers/awsmachine_controller.go:518 +0x345 sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).Reconcile(0xc001376cf0, {0x3ca3778, 0xc003080b10}, {{{0xc003d17620?, 0x0?}, {0xc002ad99e0?, 0xc001122d08?}}}) /build/controllers/awsmachine_controller.go:236 +0xac9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x3ca7d60?, {0x3ca3778?, 0xc003080b10?}, {{{0xc003d17620?, 0xb?}, {0xc002ad99e0?, 0x0?}}}) /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xb7 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0006841e0, {0x3ca37b0, 0xc001458690}, {0x30cf3e0?, 0xc003d6f3a0?}) /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3cc sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0006841e0, {0x3ca37b0, 0xc001458690}) /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1af sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x79 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 262 /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x565 E0520 10:01:47.205036 1 controller.go:329] "Reconciler error" err="panic: runtime error: invalid memory address or nil pointer dereference [recovered]" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="openshift-cluster-api/aws-machinetemplate-ptr2z" namespace="openshift-cluster-api" name="aws-machinetemplate-ptr2z" reconcileID="332c196f-96e6-4225-bafb-9f132a285ef1" liuhuali@Lius-MacBook-Pro huali-test %
Actual results:
Machine stuck in Pending
Expected results:
Machine get Running
Additional info:
We have such case for MAPI https://github.com/openshift/openshift-tests-private/blob/master/test/extended/clusterinfrastructure/machines.go#L636 I tested it for CAPI today, and found it failed. Must gather: https://drive.google.com/file/d/1sun15h4Sem7I936qJMHc0oY1AjrAluH_/view?usp=sharing