Loading...

Type: Bug
Resolution: Not a Bug
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.12, 4.11
Component/s: Networking / router
Labels:
- upgrade-ci-watcher
- upgrade_ci_watcher

Regression:
None
Story Points:
1
Sprint:
Sprint 228
sprint_count:
1
Release Blocker:
Rejected
Blocked:
False
Release Note Text:
undefined

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Upgrade_CI_	ORIGINAL_BUILD	TARGET_BUILD	Matrix
21457	4.11.0-0.nightly-arm64-2022-09-05-130837	4.11.0-0.nightly-arm64-2022-09-05-162049	06_aarch64_UPI on AWS & Private cluster

Console logs: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-tools/job/collect-logs/5447/console

From the console logs:

09-06 05:23:37.688  ClusterID: 8bbe3aec-fcdf-4e74-a97a-688f33f69607
09-06 05:23:37.688  ClusterVersion: Stable at "4.11.0-0.nightly-arm64-2022-09-05-130837"
09-06 05:23:37.688  ClusterOperators:
09-06 05:23:37.688  	clusteroperator/authentication is not available (OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.newugd-21457.qe.devcluster.openshift.com/healthz": EOF
09-06 05:23:37.688  ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 2 worker nodes, 3 master nodes, 0 custom target nodes (none are schedulable or ready for ingress pods).) because OAuthServerRouteEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.newugd-21457.qe.devcluster.openshift.com/healthz": EOF
09-06 05:23:37.688  	clusteroperator/console is not available (RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.newugd-21457.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.newugd-21457.qe.devcluster.openshift.com": EOF) because RouteHealthDegraded: failed to GET route (https://console-openshift-console.apps.newugd-21457.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.newugd-21457.qe.devcluster.openshift.com": EOF
09-06 05:23:37.688  	clusteroperator/dns is progressing: DNS "default" reports Progressing=True: "Have 3 available node-resolver pods, want 5."
09-06 05:23:37.688  	clusteroperator/image-registry is not available (Available: The deployment does not have available replicas
09-06 05:23:37.688  NodeCADaemonAvailable: The daemon set node-ca has available replicas
09-06 05:23:37.688  ImagePrunerAvailable: Pruner CronJob has been created) because Degraded: The deployment does not have available replicas
09-06 05:23:37.688  	clusteroperator/ingress is not available (The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)) because The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-845b776dfb-2b25s" cannot be scheduled: 0/5 nodes are available: 2 node(s) didn't match pod anti-affinity rules, 2 node(s) had untolerated taint {node.kubernetes.io/unreachable: }, 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling. Pod "router-default-845b776dfb-8hhsz" cannot be scheduled: 0/5 nodes are available: 2 node(s) didn't match pod anti-affinity rules, 2 node(s) had untolerated taint {node.kubernetes.io/unreachable: }, 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling. Make sure you have sufficient worker nodes.), DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1), DeploymentReplicasAllAvailable=False (DeploymentReplicasNotAvailable: 0/2 of replicas are available), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
09-06 05:23:37.688  	clusteroperator/kube-controller-manager is degraded because GarbageCollectorDegraded: error querying alerts: Post "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query": dial tcp 172.30.36.174:9091: i/o timeout
09-06 05:23:37.688  	clusteroperator/machine-config is not available (Cluster not available for [{operator 4.11.0-0.nightly-arm64-2022-09-05-130837}]) because Failed to resync 4.11.0-0.nightly-arm64-2022-09-05-130837 because: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [timed out waiting for the condition, daemonset machine-config-daemon is not ready. status: (desired: 5, updated: 5, ready: 3, unavailable: 2)]
09-06 05:23:37.688  	clusteroperator/monitoring is not available (Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.) because Failed to rollout the stack. Error: updating prometheus operator: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 2 unavailable replicas
09-06 05:23:37.688  	clusteroperator/network is degraded because DaemonSet "/openshift-sdn/sdn" rollout is not making progress - last change 2022-09-06T06:33:57Z
09-06 05:23:37.688  DaemonSet "/openshift-multus/multus" rollout is not making progress - last change 2022-09-06T06:33:59Z
09-06 05:23:37.688  DaemonSet "/openshift-multus/multus-additional-cni-plugins" rollout is not making progress - last change 2022-09-06T06:33:59Z
09-06 05:23:37.688  	clusteroperator/storage is progressing: AWSEBSCSIDriverOperatorCRProgressing: AWSEBSDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods

duplicates

OCPBUGS-1681 Worker nodes become NotReady when put load on the arm cluster.

Closed

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates