Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-12354

BZ#2321300 [openstack-16.2] Get panic error when trying to create an OpenStackControlPlane object

XMLWordPrintable

    • Low

      +++ This bug was initially created as a clone of Bug #2254605 +++

      Description of problem:
      I'm trying to create an OpenStackControlPlane object with the following YAML file:

      ~~~
      apiVersion: osp-director.openstack.org/v1beta2
      kind: OpenStackControlPlane
      metadata:
      name: overcloud
      namespace: openstack
      spec:
      domainName: overcloud.tlvlab.local
      openStackClientImageURL: 'registry.redhat.io/rhosp-rhel9/openstack-tripleoclient:17.1'
      openStackClientNetworks:

      • ctlplane
      • internal_api
      • external
        openStackClientStorageClass: host-nfs-storageclass
        openStackRelease: '17.1'
        passwordSecret: openstack-root-password
        virtualMachineRoles:
        controller:
        roleName: Controller
        ` roleCount: 3
        isTripleoRole: true
        ctlplaneInterface: enp2s0
        cores: 6
        memory: 20
        networks:
      • ctlplane
      • internal_api
      • external
      • tenant
      • storage
      • storage_mgmt
        rootDisk:
        name: root
        diskSize: 50
        baseImageVolumeName: openstack-base-img
        storageClass: host-nfs-storageclass
        storageAccessMode: ReadWriteMany
        storageVolumeMode: Filesystem
        ~~~

      I get the following output:
      $ oc create -f openstack-controller.yaml -n openstack
      Error from server (InternalError): error when creating "openstack-controller.yaml": Internal error occurred: failed calling webhook "vopenstackcontrolplane.kb.io": failed to call webhook: Post "https://osp-director-operator-controller-manager-service.openstack.svc:4343/validate-osp-director-openstack-org-v1beta2-openstackcontrolplane?timeout=10s": EOF
      ~~~

      In the osp-director-operator-controller-manager-f66c67dbb-jgmmx pod (which runs the service) log I see this panic error: `http: panic serving 10.128.0.2:33178: runtime error: invalid memory address or nil pointer dereference`

      Full output here for reference:

      ~~~
      2023-12-14T18:50:18.337Z INFO controlplane-resource adding network labels: map[ooo-subnetname/ctlplane:true]
      2023-12-14T18:50:18.337Z INFO controlplane-resource OpenStackControlPlane overcloud labels set to map[ooo-subnetname/ctlplane:true osnetconfig-ref:openstacknetconfig]
      2023-12-14T18:50:18.338Z DEBUG controller-runtime.webhook.webhooks wrote response

      {"webhook": "/mutate-osp-director-openstack-org-v1beta2-openstackcontrolplane", "code": 200, "reason": "", "UID": "47484661-da72-4a8e-817a-e9c05cd40a87", "allowed": true}

      2023-12-14T18:50:18.349Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/validate-osp-director-openstack-org-v1beta2-openstackcontrolplane", "UID": "2a4c0ee3-fb2d-49d9-9b0a-05142d804f13", "kind": "osp-director.openstack.org/v1beta2, Kind=OpenStackControlPlane", "resource": {"group":"osp-director.openstack.org","version":"v1beta2","resource":"openstackcontrolplanes"}}
      2023-12-14T18:50:18.349Z INFO controlplane-resource validate create

      {"name": "overcloud"}

      2023/12/14 18:50:18 http: panic serving 10.128.0.2:33178: runtime error: invalid memory address or nil pointer dereference
      goroutine 231021 [running]:
      net/http.(*conn).serve.func1()
      /usr/lib/golang/src/net/http/server.go:1850 +0xbf
      panic(

      {0x1beea60, 0x319e440}

      )
      /usr/lib/golang/src/runtime/panic.go:890 +0x262
      github.com/openstack-k8s-operators/osp-director-operator/api/v1beta1.ValidateNetworks(

      {0xc000717100, 0x9}

      ,

      {0xc001f59860?, 0x6, 0xc0011ba048?}

      )
      /remote-source/app/api/v1beta1/common_openstacknet.go:237 +0x1f5
      github.com/openstack-k8s-operators/osp-director-operator/api/v1beta2.(*OpenStackControlPlane).ValidateCreate(0xc0006358c0)
      /remote-source/app/api/v1beta2/openstackcontrolplane_webhook.go:248 +0x485
      sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*validatingHandler).Handle(, {, _}, {{

      {0xc000f292f0, 0x24}, 0xc001da7e60, 0x1a}, {0xc000716af0, 0x7}, {0xc000e91f38, ..., ...}})
      /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/webhook/admission/validator.go:71 +0x239
      sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle(, {, _}, {{{0xc000f292f0, 0x24}

      , {{0xc001da7e60, 0x1a},

      {0xc000716af0, 0x7}

      , {0xc000e91f38, ...}}, ...}})
      /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/webhook/admission/webhook.go:169 +0xfd
      sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP(0xc000446680,

      {0x7fd3b88f9cf8?, 0xc001896b90}

      , 0xc001fc6900)
      /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/webhook/admission/http.go:98 +0xed2
      github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1(

      {0x7fd3b88f9cf8, 0xc001896b90}

      , 0x21db300?)
      /remote-source/deps/gomod/pkg/mod/github.com/prometheus/client_golang@v1.12.2/prometheus/promhttp/instrument_server.go:40 +0xd4
      net/http.HandlerFunc.ServeHTTP(0x21db378?,

      {0x7fd3b88f9cf8?, 0xc001896b90?}

      , 0xc0008d5a68?)
      /usr/lib/golang/src/net/http/server.go:2109 +0x2f
      github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1(

      {0x21db378?, 0xc000826700?}, 0xc001fc6900)
      /remote-source/deps/gomod/pkg/mod/github.com/prometheus/client_golang@v1.12.2/prometheus/promhttp/instrument_server.go:117 +0xaa
      net/http.HandlerFunc.ServeHTTP(0xc0008d59e0?, {0x21db378?, 0xc000826700?}

      , 0xc0017cb000?)
      /usr/lib/golang/src/net/http/server.go:2109 +0x2f
      github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2(

      {0x21db378, 0xc000826700}, 0xc001fc6900)
      /remote-source/deps/gomod/pkg/mod/github.com/prometheus/client_golang@v1.12.2/prometheus/promhttp/instrument_server.go:84 +0xbf
      net/http.HandlerFunc.ServeHTTP(0xc000826700?, {0x21db378?, 0xc000826700?}, 0x1ec5998?)
      /usr/lib/golang/src/net/http/server.go:2109 +0x2f
      net/http.(*ServeMux).ServeHTTP(0xc0020ea048?, {0x21db378, 0xc000826700}

      , 0xc001fc6900)
      /usr/lib/golang/src/net/http/server.go:2487 +0x149
      net/http.serverHandler.ServeHTTP(

      {0x21cdb80?}

      ,

      {0x21db378, 0xc000826700}

      , 0xc001fc6900)
      /usr/lib/golang/src/net/http/server.go:2947 +0x30c
      net/http.(*conn).serve(0xc00066bf40,

      {0x21dc420, 0xc000968780}

      )
      /usr/lib/golang/src/net/http/server.go:1991 +0x607
      created by net/http.(*Server).Serve
      /usr/lib/golang/src/net/http/server.go:3102 +0x4db
      ~~~

      — Additional comment from Juan Pablo Marti on 2023-12-14 19:29:43 UTC —

      My OpenStackNetConfig was created using this YAML:

      ~~~
      apiVersion: osp-director.openstack.org/v1beta1
      kind: OpenStackNetConfig
      metadata:
      name: openstacknetconfig
      spec:
      attachConfigurations:
      br-osp:
      nodeNetworkConfigurationPolicy:
      nodeSelector:
      node-role.kubernetes.io/worker: ""
      desiredState:
      interfaces:

      • bridge:
        options:
        stp:
        enabled: false
        port:
      • name: enp2s0
        description: Linux bridge with enp2s0 as a port
        name: br-osp
        state: up
        type: linux-bridge
        mtu: 1500
        br-vlans:
        nodeNetworkConfigurationPolicy:
        nodeSelector:
        node-role.kubernetes.io/worker: ""
        desiredState:
        interfaces:
      • bridge:
        options:
        stp:
        enabled: false
        port:
      • name: enp3s0
        description: Linux bridge with enp3s0 as a port
        name: br-vlans
        state: up
        type: linux-bridge
        mtu: 1500
      1. optional DnsServers list
        dnsServers:
      • 10.47.242.10
      • 10.38.5.26
      1. DomainName of the OSP environment
        domainName: overcloud.tlvlab.local
        networks:
      • name: Control
        nameLower: ctlplane
        subnets:
      • name: ctlplane
        ipv4:
        allocationEnd: 192.168.24.250
        allocationStart: 192.168.24.100
        cidr: 192.168.24.0/24
        gateway: 192.168.24.254
        attachConfiguration: br-osp
      • name: Tenant
        nameLower: tenant
        mtu: 1350
        subnets:
      • name: tenant_subnet
        attachConfiguration: br-vlans
        vlan: 101
        ipv4:
        allocationEnd: 172.17.101.250
        allocationStart: 172.17.101.4
        cidr: 172.17.101.0/24
        gateway: 172.17.101.1
      • name: Storage
        nameLower: storage
        mtu: 1350
        subnets:
      • name: storage_subnet
        attachConfiguration: br-vlans
        vlan: 102
        ipv4:
        allocationEnd: 172.17.102.250
        allocationStart: 172.17.102.4
        cidr: 172.17.102.0/24
        gateway: 172.17.102.1
      • name: InternalApi
        nameLower: internal_api
        mtu: 1350
        subnets:
      • name: internal_api_subnet
        attachConfiguration: br-vlans
        vlan: 103
        ipv4:
        allocationEnd: 172.17.103.250
        allocationStart: 172.17.103.4
        cidr: 172.17.103.0/24
        gateway: 172.17.103.1
      • name: StorageMgmt
        nameLower: storage_mgmt
        mtu: 1350
        subnets:
      • name: storage_mgmt_subnet
        attachConfiguration: br-vlans
        vlan: 104
        ipv4:
        allocationEnd: 172.17.104.250
        allocationStart: 172.17.104.4
        cidr: 172.17.104.0/24
        gateway: 172.17.104.1
      • name: External
        nameLower: external
        mtu: 1350
        subnets:
      • name: external_subnet
        attachConfiguration: br-vlans
        vlan: 105
        ipv4:
        allocationEnd: 172.17.200.250
        allocationStart: 172.17.200.4
        cidr: 172.17.200.0/24
        gateway: 172.17.200.1
        reservations:
        controlplane:
        ipReservations:
        ctlplane: 192.168.24.254
        external: 172.17.200.254
        internal_api: 172.17.103.254
        storage: 172.17.102.254
        storage_mgmt: 172.17.104.254
        macReservations: {}
        openstackclient-0:
        ipReservations:
        ctlplane: 192.168.24.253
        external: 172.17.200.253
        internal_api: 172.17.103.253
        macReservations: {}
        ~~~

      After replacing the subnet names with the default values (without the _subnet part) the problem was solved. (Thanks wladek for the help to find out that!)

      Although the problem was fixed, the panic error doesn't seem to be giving much information about this. So it should be addressed somehow.

      Thanks in advance!

      — Additional comment from Brendan Shephard on 2023-12-15 00:01:24 UTC —

      The panic is probably because it fails to find a network using the subnet name as a label. So when we try to format the error here:
      https://github.com/bshephar/osp-director-operator/blob/master/api/v1beta1/common_openstacknet.go#L237

      osnet is nil at that point causing the panic.

      As to why the _subnet makes a difference, I'm not sure. Maybe someone from the osp-director-operator team will have some more insights for that. My initial suspicion would be that the OpenStackNet object is created using the nameLower as the label, so it can't find any networks labeled with {{ nameLower }}_subnet.

      — Additional comment from RHEL Program Management on 2024-09-24 13:13:11 UTC —

      This bugzilla has been removed from the release since it does not have an acked release flag. For details, see https://mojo.redhat.com/docs/DOC-1144661#jive_content_id_OSP_Release_Planning.'

      — Additional comment from Andrew Bays on 2024-10-03 10:43:40 UTC —

      Regardless of proper or improper network config, I think we can at least fix the panic. We are getting an "not found" error here...

      https://github.com/openstack-k8s-operators/osp-director-operator/blob/562777c57f39[…]fd8a7ad1568874f136d7d0efb482/api/v1beta1/common_openstacknet.go

      ...which we've returned from here...

      https://github.com/openstack-k8s-operators/osp-director-operator/blob/562777c57f39[…]fd8a7ad1568874f136d7d0efb482/api/v1beta1/common_openstacknet.go

      ...so we just need to fix this line (as Brendan noted):

      https://github.com/openstack-k8s-operators/osp-director-operator/blob/562777c57f39[…]fd8a7ad1568874f136d7d0efb482/api/v1beta1/common_openstacknet.go

      We could probably just remove osnet.GetObjectKind().GroupVersionKind().Kind and hardcode OpenStackNet as the Kind, since it is always that anyhow.

      — Additional comment from RHEL Program Management on 2024-10-03 12:46:17 UTC —

      This bugzilla has been removed from the release since it does not have an acked release flag. For details, see https://mojo.redhat.com/docs/DOC-1144661#jive_content_id_OSP_Release_Planning.'

      — Additional comment from RHEL Program Management on 2024-10-03 12:55:29 UTC —

      This bugzilla has been removed from the release since it does not have an acked release flag. For details, see https://mojo.redhat.com/docs/DOC-1144661#jive_content_id_OSP_Release_Planning.'

      — Additional comment from RHEL Program Management on 2024-10-03 12:55:29 UTC —

      This item has been properly Triaged and planned for the release, and Target Release is now set to match the release flag.

              abays@redhat.com Andrew Bays
              jira-bugzilla-migration RH Bugzilla Integration
              rhos-conplat-core-operators
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: