Loading...

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 4.12.0
Component/s: Management Console
Labels:
None

Regression:
No
Release Blocker:
Proposed
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Links:

Description

Clusterversion: 4.12.0

After running a longevity test on a ZTP SNO cluster under moderate load for 30-days, we experienced a random restart of many containers running on the cluster as well as a temporarily unresponsive kubeapi.
See more details in https://issues.redhat.com/browse/OCPBUGS-10510

At some point the node was restarted in attempt to fix everything.

After the reboot the console CO doesn't come up.

Running describe on console co was showing:

Name:         console
Namespace:    
Labels:       <none>
Annotations:  capability.openshift.io/name: Console
              include.release.openshift.io/ibm-cloud-managed: true
              include.release.openshift.io/self-managed-high-availability: true
              include.release.openshift.io/single-node-developer: true
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2023-02-14T23:44:31Z
  Generation:          1
  Managed Fields:
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:capability.openshift.io/name:
          f:include.release.openshift.io/ibm-cloud-managed:
          f:include.release.openshift.io/self-managed-high-availability:
          f:include.release.openshift.io/single-node-developer:
        f:ownerReferences:
          .:
          k:{"uid":"0297348c-5756-4997-bfa9-ea68024b6351"}:
      f:spec:
    Manager:      cluster-version-operator
    Operation:    Update
    Time:         2023-02-14T23:44:31Z
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:extension:
    Manager:      cluster-version-operator
    Operation:    Update
    Subresource:  status
    Time:         2023-02-14T23:44:31Z
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
        f:relatedObjects:
        f:versions:
    Manager:      console
    Operation:    Update
    Subresource:  status
    Time:         2023-03-17T21:51:26Z
  Owner References:
    API Version:     config.openshift.io/v1
    Kind:            ClusterVersion
    Name:            version
    UID:             0297348c-5756-4997-bfa9-ea68024b6351
  Resource Version:  12940156
  UID:               91f86953-433d-4d98-a0c5-17fc7fe40522
Spec:
Status:
  Conditions:
    Last Transition Time:  2023-03-17T20:33:40Z
    Message:               ConsoleNotificationSyncDegraded: Delete "https://172.30.0.1:443/apis/console.openshift.io/v1/consolenotifications/cluster-upgrade": net/http: TLS handshake timeout
RouteHealthDegraded: console route is not admitted
    Reason:                ConsoleNotificationSync_FailedDelete::RouteHealth_RouteNotAdmitted
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2023-03-17T21:26:05Z
    Message:               All is well
    Reason:                AsExpected
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2023-03-17T21:51:26Z
    Message:               RouteHealthAvailable: console route is not admitted
    Reason:                RouteHealth_RouteNotAdmitted
    Status:                False
    Type:                  Available
    Last Transition Time:  2023-02-15T00:15:58Z
    Message:               All is well
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:               <nil>
  Related Objects:
    Group:      operator.openshift.io
    Name:       cluster
    Resource:   consoles
    Group:      config.openshift.io
    Name:       cluster
    Resource:   consoles
    Group:      config.openshift.io
    Name:       cluster
    Resource:   infrastructures
    Group:      config.openshift.io
    Name:       cluster
    Resource:   proxies
    Group:      config.openshift.io
    Name:       cluster
    Resource:   oauths
    Group:      oauth.openshift.io
    Name:       console
    Resource:   oauthclients
    Group:      
    Name:       openshift-console-operator
    Resource:   namespaces
    Group:      
    Name:       openshift-console
    Resource:   namespaces
    Group:      
    Name:       console-public
    Namespace:  openshift-config-managed
    Resource:   configmaps
  Versions:
    Name:     operator
    Version:  4.12.0
Events:       <none>

The following was showing in the console logs:

oc logs -n openshift-console                                  console-67f8b7674f-hxh8r 
W0317 21:41:21.763514       1 main.go:227] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!
I0317 21:41:21.763558       1 main.go:346] cookies are secure!
E0317 21:41:22.317033       1 auth.go:232] error contacting auth provider (retrying in 10s): Get "https://kubernetes.default.svc/.well-known/oauth-authorization-server": dial tcp: lookup kubernetes.default.svc on 172.30.0.10:53: read udp 10.128.0.193:59214->172.30.0.10:53: read: connection refused
E0317 21:41:37.319330       1 auth.go:232] error contacting auth provider (retrying in 10s): Get "https://kubernetes.default.svc/.well-known/oauth-authorization-server": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
E0317 21:41:48.013176       1 auth.go:232] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.qe2.kni.lab.eng.bos.redhat.com/oauth/token failed: Head "https://oauth-openshift.apps.qe2.kni.lab.eng.bos.redhat.com": dial tcp 10.19.134.5:443: connect: connection refused
E0317 21:41:58.158333       1 auth.go:232] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.qe2.kni.lab.eng.bos.redhat.com/oauth/token failed: Head "https://oauth-openshift.apps.qe2.kni.lab.eng.bos.redhat.com": dial tcp 10.19.134.5:443: connect: connection refused
E0317 21:42:10.079348       1 auth.go:232] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.qe2.kni.lab.eng.bos.redhat.com/oauth/token failed: Head "https://oauth-openshift.apps.qe2.kni.lab.eng.bos.redhat.com": dial tcp 10.19.134.5:443: connect: connection refused
E0317 21:42:21.232435       1 auth.go:232] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.qe2.kni.lab.eng.bos.redhat.com/oauth/token failed: Head "https://oauth-openshift.apps.qe2.kni.lab.eng.bos.redhat.com": dial tcp 10.19.134.5:443: connect: connection refused
E0317 21:42:32.731721       1 auth.go:232] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.qe2.kni.lab.eng.bos.redhat.com/oauth/token failed: Head "https://oauth-openshift.apps.qe2.kni.lab.eng.bos.redhat.com": dial tcp 10.19.134.5:443: connect: connection refused
I0317 21:42:43.018052       1 main.go:796] Binding to [::]:8443...
I0317 21:42:43.018152       1 main.go:798] using TLS
2023/03/17 21:49:28 Failed to dial backend: 'dial tcp 172.30.0.1:443: connect: connection refused'
2023/03/17 21:49:28 http: proxy error: dial tcp 172.30.0.1:443: connect: connection refused
2023/03/17 21:49:28 Failed to dial backend: 'dial tcp 172.30.0.1:443: connect: connection refused'
2023/03/17 21:49:31 Failed to dial backend: 'dial tcp 172.30.0.1:443: connect: connection refused'
2023/03/17 21:49:31 Failed to dial backend: 'dial tcp 172.30.0.1:443: connect: connection refused'
2023/03/17 21:49:33 Failed to dial backend: 'dial tcp 172.30.0.1:443: connect: connection refused'
2023/03/17 21:49:33 Failed to dial backend: 'dial tcp 172.30.0.1:443: connect: connection refused'
2023/03/17 21:49:37 Failed to dial backend: 'dial tcp 172.30.0.1:443: connect: connection refused'
2023/03/17 21:49:38 Failed to dial backend: 'dial tcp 172.30.0.1:443: connect: connection refused'
2023/03/17 21:49:49 http: TLS handshake error from 10.128.0.2:36436: EOF
2023/03/17 21:49:49 http: TLS handshake error from 10.128.0.2:36442: read tcp 10.128.0.193:8443->10.128.0.2:36442: read: connection reset by peer
2023/03/17 21:50:11 http: proxy error: context canceled
2023/03/17 21:50:11 http: proxy error: context canceled
2023/03/17 21:50:11 http: proxy error: context canceled
2023/03/17 21:50:11 http: proxy error: context canceled
2023/03/17 21:50:32 http: proxy error: context canceled
2023/03/17 21:50:42 http: proxy error: context canceled
2023/03/17 21:50:42 http: proxy error: context canceled
2023/03/17 21:50:42 http: proxy error: context canceled
2023/03/17 21:51:19 http: TLS handshake error from 10.128.0.2:46060: EOF
2023/03/17 21:51:19 http: TLS handshake error from 10.128.0.2:46074: EOF
2023/03/17 21:51:59 http: TLS handshake error from 10.128.0.2:35182: EOF
2023/03/17 21:52:00 http: TLS handshake error from 10.128.0.2:35194: EOF

ConsoleNotificationSyncDegraded: Delete "https://172.30.0.1:443/apis/console.openshift.io/v1/consolenotifications/cluster-upgrade": net/http: TLS handshake timeout

Details

Description

Attachments

Activity

People

Dates