Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20120

[4.12] api-int i/o timeout during ARO cluster installation

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      ARO SRE team is facing an issue with the  Azure Red Hat OpenShift 4.12.25 installation. for which installer wrapper ARO-installer(https://github.com/openshift/ARO-Installer/tree/release-4.12) is being used. 
      
      Eventually the cluster is getting to up and running state and working fine, However our ARO installer does some configuration (for example: Installing ARO Operator in the cluster).   
      
      While these ARO installer things are running, it fails with below error:
      
      ~~~~
      Post "/namespaces/openshift-config/secrets": dial tcp api-int-xxxx-xxx:6443: i/o timeout
      ~~~~
      
      Tracker JIRA for this issue in ARO project: https://issues.redhat.com/browse/ARO-4306
      
      
      
      
      
      

      Version-Release number of selected component (if applicable):

      4.12.25

      How reproducible:

      Sometimes, not guaranteed

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      Post "/namespaces/openshift-config/secrets": dial tcp api-int-xxxx-xxx:6443: i/o timeout

      Expected results:

      api-int should respond properly

      Additional info:

      - The service which is trying to access api-int is not in-cluster pod
      
      - It’s our service ARO-Resource Provider(https://github.com/Azure/ARO-RP) basically ARO cluster provider (managed by ARO SREs) which talks to api-int via a `Private Link Service` Configured in the cluster resource group in Azure
      
      - This service does creation and provision of resources in azure resource group and installs ARO - OpenShift Cluster in azureThe cluster gets installed properly in the backend and we can hack our way to get a kubeconfig and login.But in some Day-2 tasks this service fails to interact to API-int due to above mentioned timeout issue 
      
      - We tried to see from the Kube-API logs there were some API readyz check failed events around the bootstrap VM removal stage in the cluster installation
      
      - Could see these check fails were popping there in some successful installation as well, however in the failed ones those are appearing for few more minutes

       

            Unassigned Unassigned
            rhn-support-sople Shivkumar Ople
            Ke Wang Ke Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: