Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-47685

Consumer is losing subscription to events after rebooting node

XMLWordPrintable

    • Important
    • None
    • CNF RAN Sprint 264, CNF RAN Sprint 265
    • 2
    • False
    • Hide

      None

      Show
      None

      Description of problem:

       After a node cold reboot the consumer is losing the subscription and not received events.

      Version-Release number of selected component (if applicable):

      ocp version 4.16.28   ptp operator version 4.16.0-202412291208

      How reproducible:

          75%

      Steps to Reproduce:

          1. Verify consumer subscription and event received in cloud-event pod
          2. Cold reboot of the node
          3. Check consumer subscription and events on cloud-event pod again
          

      Actual results:

      The subscription pod lost connection. No events received.

      Expected results:

       No subscription lost after the cold reboot of the node

      Additional info:

      cloud-event logs before the node cold reboot:
      time="2024-12-29T18:33:12Z" level=info msg="Latency for the event: 1 ms"
      time="2024-12-29T18:33:12Z" level=info msg="Event: Context Attributes,\n  specversion: 1.0\n  type: event.sync.sync-status.os-clock-sync-state-change\n  source: /sync/sync-status/os-clock-sync-state\n  id: b80fbfa0-7ed0-489f-9be6-aa674f1d534c\n  time: 2024-12-29T18:33:12.972767814Z\nData,\n  {\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/CLOCK_REALTIME\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"FREERUN\"},{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/CLOCK_REALTIME\",\"data_type\":\"metric\",\"value_type\":\"decimal64.3\",\"value\":\"54\"}]}\n"
      time="2024-12-29T18:33:12Z" level=info msg="received event {\"specversion\":\"1.0\",\"id\":\"e26e1e98-b581-4819-aec3-97e8da0299bf\",\"source\":\"/sync/sync-status/sync-state\",\"type\":\"event.sync.sync-status.synchronization-state-change\",\"time\":\"2024-12-29T18:33:12.972923191Z\",\"data\":{\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/CLOCK_REALTIME\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"FREERUN\"}]}}"
      time="2024-12-29T18:33:12Z" level=info msg="Latency for the event: 2 ms"
      time="2024-12-29T18:33:12Z" level=info msg="Event: Context Attributes,\n  specversion: 1.0\n  type: event.sync.sync-status.synchronization-state-change\n  source: /sync/sync-status/sync-state\n  id: e26e1e98-b581-4819-aec3-97e8da0299bf\n  time: 2024-12-29T18:33:12.972923191Z\nData,\n  {\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/CLOCK_REALTIME\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"FREERUN\"}]}\n"
      time="2024-12-29T18:33:12Z" level=info msg="received event {\"specversion\":\"1.0\",\"id\":\"fffb1d78-a218-4b4f-80ba-f0270f74dacf\",\"source\":\"/sync/ptp-status/lock-state\",\"type\":\"event.sync.ptp-status.ptp-state-change\",\"time\":\"2024-12-29T18:33:12.990556531Z\",\"data\":{\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"LOCKED\"},{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"metric\",\"value_type\":\"decimal64.3\",\"value\":\"38\"}]}}"
      time="2024-12-29T18:33:12Z" level=info msg="Latency for the event: 1 ms"
      time="2024-12-29T18:33:12Z" level=info msg="Event: Context Attributes,\n  specversion: 1.0\n  type: event.sync.ptp-status.ptp-state-change\n  source: /sync/ptp-status/lock-state\n  id: fffb1d78-a218-4b4f-80ba-f0270f74dacf\n  time: 2024-12-29T18:33:12.990556531Z\nData,\n  {\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"LOCKED\"},{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"metric\",\"value_type\":\"decimal64.3\",\"value\":\"38\"}]}\n"
      time="2024-12-29T18:33:12Z" level=info msg="received event {\"specversion\":\"1.0\",\"id\":\"f23cf9e7-08d4-4444-90a4-8403137f860a\",\"source\":\"/sync/sync-status/sync-state\",\"type\":\"event.sync.sync-status.synchronization-state-change\",\"time\":\"2024-12-29T18:33:12.990693456Z\",\"data\":{\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"LOCKED\"}]}}"
      time="2024-12-29T18:33:12Z" level=info msg="Latency for the event: 1 ms"
      time="2024-12-29T18:33:12Z" level=info msg="Event: Context Attributes,\n  specversion: 1.0\n  type: event.sync.sync-status.synchronization-state-change\n  source: /sync/sync-status/sync-state\n  id: f23cf9e7-08d4-4444-90a4-8403137f860a\n  time: 2024-12-29T18:33:12.990693456Z\nData,\n  {\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"LOCKED\"}]}\n"
      time="2024-12-29T18:33:13Z" level=info msg="received event {\"specversion\":\"1.0\",\"id\":\"1643449a-8f68-4b67-88d0-7d7e5a65575a\",\"source\":\"/sync/sync-status/os-clock-sync-state\",\"type\":\"event.sync.sync-status.os-clock-sync-state-change\",\"time\":\"2024-12-29T18:33:13.034719314Z\",\"data\":{\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/CLOCK_REALTIME\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"LOCKED\"},{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/CLOCK_REALTIME\",\"data_type\":\"metric\",\"value_type\":\"decimal64.3\",\"value\":\"61\"}]}}"
      time="2024-12-29T18:33:13Z" level=info msg="Latency for the event: 1 ms"
      time="2024-12-29T18:33:13Z" level=info msg="Event: Context Attributes,\n  specversion: 1.0\n  type: event.sync.sync-status.os-clock-sync-state-change\n  source: /sync/sync-status/os-clock-sync-state\n  id: 1643449a-8f68-4b67-88d0-7d7e5a65575a\n  time: 2024-12-29T18:33:13.034719314Z\nData,\n  {\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/CLOCK_REALTIME\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"LOCKED\"},{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/CLOCK_REALTIME\",\"data_type\":\"metric\",\"value_type\":\"decimal64.3\",\"value\":\"61\"}]}\n"
      time="2024-12-29T18:33:13Z" level=info msg="received event {\"specversion\":\"1.0\",\"id\":\"45276fed-c7d8-46f9-bf0a-ef3e53a9b1bd\",\"source\":\"/sync/ptp-status/lock-state\",\"type\":\"event.sync.ptp-status.ptp-state-change\",\"time\":\"2024-12-29T18:33:13.42830188Z\",\"data\":{\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"FREERUN\"},{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"metric\",\"value_type\":\"decimal64.3\",\"value\":\"-220\"}]}}"
      time="2024-12-29T18:33:13Z" level=info msg="Latency for the event: 0 ms"
      time="2024-12-29T18:33:13Z" level=info msg="Event: Context Attributes,\n  specversion: 1.0\n  type: event.sync.ptp-status.ptp-state-change\n  source: /sync/ptp-status/lock-state\n  id: 45276fed-c7d8-46f9-bf0a-ef3e53a9b1bd\n  time: 2024-12-29T18:33:13.42830188Z\nData,\n  {\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"FREERUN\"},{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"metric\",\"value_type\":\"decimal64.3\",\"value\":\"-220\"}]}\n"
      time="2024-12-29T18:33:13Z" level=info msg="received event {\"specversion\":\"1.0\",\"id\":\"5077dacd-6f86-48ec-9b26-7cf62eef8a71\",\"source\":\"/sync/sync-status/sync-state\",\"type\":\"event.sync.sync-status.synchronization-state-change\",\"time\":\"2024-12-29T18:33:13.428589336Z\",\"data\":{\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"FREERUN\"}]}}"
      time="2024-12-29T18:33:13Z" level=info msg="Latency for the event: 1 ms"
      time="2024-12-29T18:33:13Z" level=info msg="Event: Context Attributes,\n  specversion: 1.0\n  type: event.sync.sync-status.synchronization-state-change\n  source: /sync/sync-status/sync-state\n  id: 5077dacd-6f86-48ec-9b26-7cf62eef8a71\n  time: 2024-12-29T18:33:13.428589336Z\nData,\n  {\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"FREERUN\"}]}\n"
      time="2024-12-29T18:33:14Z" level=info msg="received event {\"specversion\":\"1.0\",\"id\":\"3aaef365-c229-4482-8714-bc1a9d05aa9b\",\"source\":\"/sync/ptp-status/lock-state\",\"type\":\"event.sync.ptp-status.ptp-state-change\",\"time\":\"2024-12-29T18:33:14.827514899Z\",\"data\":{\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"LOCKED\"},{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"metric\",\"value_type\":\"decimal64.3\",\"value\":\"-188\"}]}}"
      time="2024-12-29T18:33:14Z" level=info msg="Latency for the event: 1 ms"
      time="2024-12-29T18:33:14Z" level=info msg="Event: Context Attributes,\n  specversion: 1.0\n  type: event.sync.ptp-status.ptp-state-change\n  source: /sync/ptp-status/lock-state\n  id: 3aaef365-c229-4482-8714-bc1a9d05aa9b\n  time: 2024-12-29T18:33:14.827514899Z\nData,\n  {\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"LOCKED\"},{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"metric\",\"value_type\":\"decimal64.3\",\"value\":\"-188\"}]}\n"
      time="2024-12-29T18:33:14Z" level=info msg="received event {\"specversion\":\"1.0\",\"id\":\"c9f51ce4-d864-42e5-9091-4745a896c171\",\"source\":\"/sync/sync-status/sync-state\",\"type\":\"event.sync.sync-status.synchronization-state-change\",\"time\":\"2024-12-29T18:33:14.827726398Z\",\"data\":{\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"LOCKED\"}]}}"
      time="2024-12-29T18:33:14Z" level=info msg="Latency for the event: 1 ms"
      time="2024-12-29T18:33:14Z" level=info msg="Event: Context Attributes,\n  specversion: 1.0\n  type: event.sync.sync-status.synchronization-state-change\n  source: /sync/sync-status/sync-state\n  id: c9f51ce4-d864-42e5-9091-4745a896c171\n  time: 2024-12-29T18:33:14.827726398Z\nData,\n  {\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/helix66.lab.eng.rdu2.redhat.com/eno12399npx/master\",\"data_type\":\"notification\",\"value_type\":\"enumeration\",\"value\":\"LOCKED\"}]}\n"
      time="2024-12-29T18:33:21Z" level=info msg="checking for rest service health"
      time="2024-12-29T18:33:21Z" level=info msg="health check http://ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/health"
      time="2024-12-29T18:33:21Z" level=info msg="rest service returned healthy status"
      
      cloud-event logs after node cold reboot:
      time="2024-12-29T18:41:41Z" level=info msg="transport host path is set to  ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043"
      time="2024-12-29T18:41:41Z" level=info msg="apiVersion=2.0, updated apiAddr=ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043, apiPath=/api/ocloudNotifications/v2/"
      time="2024-12-29T18:41:41Z" level=info msg="Starting local API listening to :9043"
      time="2024-12-29T18:41:46Z" level=info msg="transport host path is set to  ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043"
      time="2024-12-29T18:41:46Z" level=info msg="checking for rest service health"
      time="2024-12-29T18:41:46Z" level=info msg="health check http://ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/health"
      time="2024-12-29T18:41:46Z" level=warning msg="try 0, return health check of the rest service for error Get \"http://ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/health\": dial tcp: lookup ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local: no such host"
      time="2024-12-29T18:41:48Z" level=info msg="health check http://ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/health"
      time="2024-12-29T18:41:49Z" level=info msg="rest service returned healthy status"
      time="2024-12-29T18:41:49Z" level=info msg="healthy publisher; subscribing to events"
      time="2024-12-29T18:41:49Z" level=error msg="failed to create subscription: api at http://ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/subscriptions returned status 400 for /cluster/node/helix66.lab.eng.rdu2.redhat.com/sync/ptp-status/clock-class"
      time="2024-12-29T18:41:49Z" level=error msg="failed to create subscription: api at http://ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/subscriptions returned status 400 for /cluster/node/helix66.lab.eng.rdu2.redhat.com/sync/ptp-status/lock-state"
      time="2024-12-29T18:41:49Z" level=error msg="failed to create subscription: api at http://ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/subscriptions returned status 400 for /cluster/node/helix66.lab.eng.rdu2.redhat.com/sync/gnss-status/gnss-sync-status"
      time="2024-12-29T18:41:49Z" level=error msg="failed to create subscription: api at http://ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/subscriptions returned status 400 for /cluster/node/helix66.lab.eng.rdu2.redhat.com/sync/sync-status/sync-state"
      time="2024-12-29T18:41:49Z" level=error msg="failed to create subscription: api at http://ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/subscriptions returned status 400 for /cluster/node/helix66.lab.eng.rdu2.redhat.com/sync/sync-status/os-clock-sync-state"
      time="2024-12-29T18:41:49Z" level=info msg="publisher endpoint updated from ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043 to ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043 healthStatusOk true"
      time="2024-12-29T18:41:49Z" level=info msg="waiting for events"
      time="2024-12-29T18:41:54Z" level=error msg="CurrentState:error 404 from url http://ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/cluster/node/helix66.lab.eng.rdu2.redhat.com/sync/ptp-status/clock-class/CurrentState, "
      time="2024-12-29T18:41:54Z" level=error msg="CurrentState:error 404 from url http://ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/cluster/node/helix66.lab.eng.rdu2.redhat.com/sync/ptp-status/lock-state/CurrentState, "
      time="2024-12-29T18:41:54Z" level=error msg="CurrentState:error 404 from url http://ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/cluster/node/helix66.lab.eng.rdu2.redhat.com/sync/gnss-status/gnss-sync-status/CurrentState, "
      time="2024-12-29T18:41:54Z" level=error msg="CurrentState:error 404 from url http://ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/cluster/node/helix66.lab.eng.rdu2.redhat.com/sync/sync-status/sync-state/CurrentState, "
      time="2024-12-29T18:41:54Z" level=error msg="CurrentState:error 404 from url http://ptp-event-publisher-service-helix66.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/cluster/node/helix66.lab.eng.rdu2.redhat.com/sync/sync-status/os-clock-sync-state/CurrentState, "

      Links to logs:

      https://drive.google.com/drive/folders/1bVQIDxwPcQH560w1Wqx7s_dFawNYqQln?usp=sharing 

              jacding@redhat.com Jack Ding
              henshay Hen Shay Hassid
              Hen Shay Hassid Hen Shay Hassid
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: