Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-65121

continue: arm ci: coordinate with arm people to work on new ci cluster

XMLWordPrintable

    • Quality / Stability / Reliability
    • 2
    • UpstreamCI Platform Sprint 274
    • None

      Task: ARM Cluster Integration with Prow Federated Setup

      Contact points:

      • Dean Arnold (slack) is key contact from ARM team,
      • Aankhi Talukdar (slack) is engineer providing the configured cluster.

      History

      Current Status and Key Issues:

      Initially, there were connectivity issues to the ARM cluster (`eu-west-1.rancher-dev.arm.com`) from the Red Hat control plane. This was resolved by the ARM team on July 16th.

      After resolving connectivity, the Red Hat team (Daniel Hiller) attempted to integrate the cluster, encountering RBAC authorization errors. This was due to an incorrect assumption that a dedicated service account and kubeconfig needed to be created. It was clarified by the ARM team (Aankhi Talukdar) that the provided kubeconfig (generated for a Rancher user with cluster owner access) should be used directly without additional setup.

      Upon using the correct kubeconfig, Prow jobs were successfully initiated on the ARM cluster, but they failed early with errors related to `nftables`.

      Troubleshooting of the `nftables` issue has been ongoing:

      • Initial checks: ARM team confirmed `nft` was not installed on the node where the failing job ran and subsequently installed and enabled it on all nodes.
      • Further investigation: It was determined that `nftables` might not be present or correctly configured within the container image used by the Prow jobs.
      • Recent findings: The ARM team identified that `netavark` (a container networking tool) might be expecting certain `nftables` tables to be present, and their absence was causing the "No such file or directory" error. They added the `inet netavark` table to the nodes.

      Next Steps:

      The current understanding is that the `nftables` issue is originating from within the container used by the Prow jobs. The proposed solution is to modify the ProwJob YAML to create and configure the necessary `nftables` table inside the main bootstrap container before the test script executes.

      Daniel Hiller has created a pull request ( https://github.com/kubevirt/project-infra/pull/4281 ) to enable rehearsing the new ARM cluster, allowing for easier testing of changes.

      The ARM team has requested access to trigger CI jobs for troubleshooting but this is not possible for non-GitHub org members due to security measures.

              dhiller72 Daniel Hiller
              dhiller72 Daniel Hiller
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: