Uploaded image for project: 'Community Linux Engineering'
  1. Community Linux Engineering
  2. CLE-1935

Multiple pods crashlooping in staging OpenShift

XMLWordPrintable

    • 5
    • Fedora Infra sprint 04, Fedora Infra sprint 05
    • rhel-cle-pnp

      1. Describe what you would like us to do:

        Today with the log01 running out of space I noticed that there is a really big log file for staging OpenShift worker and found out that there are 6 pods crashlooping.

      Here is the list of the pods:

      • compose-tracker
        ```
        2025-11-11 14:04:16,169 INFO fedora_messaging.cli - Starting consumer with compose_tracker:Consumer callback
        2025-11-11 14:04:16,169 INFO compose_tracker - Using detected token to talk to pagure.
        2025-11-11 14:04:16,169 INFO compose_tracker - Targeting repo releng/failed-composes on https://stg.pagure.io/
        2025-11-11 14:04:18,171 ERROR ogr.services.pagure.service - {'error': 'Invalid or expired token. Please visit https://stg.pagure.io/settings#nav-api-tab to get or renew your API token.', 'error_code': 'EINVALIDTOK', 'errors': 'Expired token'}

        Traceback (most recent call last):
        File "/usr/bin/fedora-messaging", line 33, in <module>
        sys.exit(load_entry_point('fedora-messaging==3.4.1', 'console_scripts', 'fedora-messaging')())
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/click/core.py", line 1130, in _call_
        return self.main(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/click/core.py", line 1055, in main
        rv = self.invoke(ctx)
        ^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/click/core.py", line 760, in invoke
        return __callback(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/fedora_messaging/cli.py", line 133, in consume
        _consume(exchange, queue_name, routing_key, callback, app_name)
        File "/usr/lib/python3.11/site-packages/fedora_messaging/cli.py", line 169, in _consume
        deferred_consumers = api.twisted_consume(
        ^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/fedora_messaging/api.py", line 151, in twisted_consume
        callback = _check_callback(callback)
        ^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/fedora_messaging/api.py", line 75, in _check_callback
        callback_object = callback()
        ^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/compose_tracker.py", line 138, in _init_
        self.gitproject = gitservice.get_project(
        ^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/ogr/abstract.py", line 116, in wrapper
        __check_for_internal_failure(ex)
        File "/usr/lib/python3.11/site-packages/ogr/abstract.py", line 58, in __check_for_internal_failure
        raise ex
        File "/usr/lib/python3.11/site-packages/ogr/abstract.py", line 106, in wrapper
        return function(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/ogr/services/pagure/service.py", line 100, in get_project
        username=self.user.get_username(),
        ^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/ogr/abstract.py", line 116, in wrapper
        __check_for_internal_failure(ex)
        File "/usr/lib/python3.11/site-packages/ogr/abstract.py", line 58, in __check_for_internal_failure
        raise ex
        File "/usr/lib/python3.11/site-packages/ogr/abstract.py", line 106, in wrapper
        return function(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/ogr/services/pagure/user.py", line 23, in get_username
        return_value = self.service.call_api(url=request_url, method="POST", data={})
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/ogr/abstract.py", line 116, in wrapper
        __check_for_internal_failure(ex)
        File "/usr/lib/python3.11/site-packages/ogr/abstract.py", line 58, in __check_for_internal_failure
        raise ex
        File "/usr/lib/python3.11/site-packages/ogr/abstract.py", line 106, in wrapper
        return function(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/ogr/services/pagure/service.py", line 174, in call_api
        raise PagureAPIException(
        ogr.exceptions.PagureAPIException: Pagure API returned an error when calling 'https://stg.pagure.io//api/0/-/whoami': Invalid or expired token. Please visit https://stg.pagure.io/settings#nav-api-tab to get or renew your API token. - Expired token
        ```

      • flatpak-indexer-quay-io
        ```
        ---> Running application from script (app.sh) ...
        Traceback (most recent call last):
        File "/opt/app-root/lib64/python3.9/site-packages/flatpak_indexer/fedora_monitor.py", line 332, in _run
        self._wait_for_messages()
        File "/opt/app-root/lib64/python3.9/site-packages/flatpak_indexer/fedora_monitor.py", line 252, in _wait_for_messages
        queue_name_raw = self.thread_redis_client.get('fedora-messaging-queue')
        File "/opt/app-root/lib64/python3.9/site-packages/redis/commands/core.py", line 1834, in get
        return self.execute_command("GET", name, keys=[name])
        File "/opt/app-root/lib64/python3.9/site-packages/redis/client.py", line 657, in execute_command
        return self._execute_command(*args, **options)
        File "/opt/app-root/lib64/python3.9/site-packages/redis/client.py", line 663, in _execute_command
        conn = self.connection or pool.get_connection()
        File "/opt/app-root/lib64/python3.9/site-packages/redis/utils.py", line 196, in wrapper
        return func(*args, **kwargs)
        File "/opt/app-root/lib64/python3.9/site-packages/redis/connection.py", line 2601, in get_connection
        connection.connect()
        File "/opt/app-root/lib64/python3.9/site-packages/redis/connection.py", line 846, in connect
        self.connect_check_health(check_health=True)
        File "/opt/app-root/lib64/python3.9/site-packages/redis/connection.py", line 869, in connect_check_health
        self.on_connect_check_health(check_health=check_health)
        File "/opt/app-root/lib64/python3.9/site-packages/redis/connection.py", line 941, in on_connect_check_health
        auth_response = self.read_response()
        File "/opt/app-root/lib64/python3.9/site-packages/redis/connection.py", line 1133, in read_response
        response = self._parser.read_response(disable_decoding=disable_decoding)
        File "/opt/app-root/lib64/python3.9/site-packages/redis/_parsers/resp2.py", line 15, in read_response
        result = self._read_response(disable_decoding=disable_decoding)
        File "/opt/app-root/lib64/python3.9/site-packages/redis/_parsers/resp2.py", line 38, in _read_response
        raise error
        redis.exceptions.AuthenticationError: AUTH <password> called without any password configured for the default user. Are you sure your configuration is correct?

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
      File "/opt/app-root/bin/flatpak-indexer", line 8, in <module>
      sys.exit(cli())
      File "/opt/app-root/lib64/python3.9/site-packages/click/core.py", line 1161, in _call_
      return self.main(*args, **kwargs)
      File "/opt/app-root/lib64/python3.9/site-packages/click/core.py", line 1082, in main
      rv = self.invoke(ctx)
      File "/opt/app-root/lib64/python3.9/site-packages/click/core.py", line 1697, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/opt/app-root/lib64/python3.9/site-packages/click/core.py", line 1443, in invoke
      return ctx.invoke(self.callback, **ctx.params)
      File "/opt/app-root/lib64/python3.9/site-packages/click/core.py", line 788, in invoke
      return __callback(*args, **kwargs)
      File "/opt/app-root/lib64/python3.9/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
      File "/opt/app-root/lib64/python3.9/site-packages/flatpak_indexer/cli.py", line 47, in daemon
      updater.start()
      File "/opt/app-root/lib64/python3.9/site-packages/flatpak_indexer/datasource/fedora/updater.py", line 70, in start
      self.change_monitor.start()
      File "/opt/app-root/lib64/python3.9/site-packages/flatpak_indexer/fedora_monitor.py", line 78, in start
      self._maybe_reraise_failure("Failed to start connection to fedora-messaging")
      File "/opt/app-root/lib64/python3.9/site-packages/flatpak_indexer/fedora_monitor.py", line 180, in _maybe_reraise_failure
      raise RuntimeError(msg) from self.failure
      RuntimeError: Failed to start connection to fedora-messaging
      ```

      • flatpak-indexer-registry-fedora
        ```
        ---> Running application from script (app.sh) ...
        Traceback (most recent call last):
        File "/opt/app-root/lib64/python3.9/site-packages/flatpak_indexer/fedora_monitor.py", line 332, in _run
        self._wait_for_messages()
        File "/opt/app-root/lib64/python3.9/site-packages/flatpak_indexer/fedora_monitor.py", line 252, in _wait_for_messages
        queue_name_raw = self.thread_redis_client.get('fedora-messaging-queue')
        File "/opt/app-root/lib64/python3.9/site-packages/redis/commands/core.py", line 1834, in get
        return self.execute_command("GET", name, keys=[name])
        File "/opt/app-root/lib64/python3.9/site-packages/redis/client.py", line 657, in execute_command
        return self._execute_command(*args, **options)
        File "/opt/app-root/lib64/python3.9/site-packages/redis/client.py", line 663, in _execute_command
        conn = self.connection or pool.get_connection()
        File "/opt/app-root/lib64/python3.9/site-packages/redis/utils.py", line 196, in wrapper
        return func(*args, **kwargs)
        File "/opt/app-root/lib64/python3.9/site-packages/redis/connection.py", line 2601, in get_connection
        connection.connect()
        File "/opt/app-root/lib64/python3.9/site-packages/redis/connection.py", line 846, in connect
        self.connect_check_health(check_health=True)
        File "/opt/app-root/lib64/python3.9/site-packages/redis/connection.py", line 869, in connect_check_health
        self.on_connect_check_health(check_health=check_health)
        File "/opt/app-root/lib64/python3.9/site-packages/redis/connection.py", line 941, in on_connect_check_health
        auth_response = self.read_response()
        File "/opt/app-root/lib64/python3.9/site-packages/redis/connection.py", line 1133, in read_response
        response = self._parser.read_response(disable_decoding=disable_decoding)
        File "/opt/app-root/lib64/python3.9/site-packages/redis/_parsers/resp2.py", line 15, in read_response
        result = self._read_response(disable_decoding=disable_decoding)
        File "/opt/app-root/lib64/python3.9/site-packages/redis/_parsers/resp2.py", line 38, in _read_response
        raise error
        redis.exceptions.AuthenticationError: AUTH <password> called without any password configured for the default user. Are you sure your configuration is correct?

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
      File "/opt/app-root/bin/flatpak-indexer", line 8, in <module>
      sys.exit(cli())
      File "/opt/app-root/lib64/python3.9/site-packages/click/core.py", line 1161, in _call_
      return self.main(*args, **kwargs)
      File "/opt/app-root/lib64/python3.9/site-packages/click/core.py", line 1082, in main
      rv = self.invoke(ctx)
      File "/opt/app-root/lib64/python3.9/site-packages/click/core.py", line 1697, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/opt/app-root/lib64/python3.9/site-packages/click/core.py", line 1443, in invoke
      return ctx.invoke(self.callback, **ctx.params)
      File "/opt/app-root/lib64/python3.9/site-packages/click/core.py", line 788, in invoke
      return __callback(*args, **kwargs)
      File "/opt/app-root/lib64/python3.9/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
      File "/opt/app-root/lib64/python3.9/site-packages/flatpak_indexer/cli.py", line 47, in daemon
      updater.start()
      File "/opt/app-root/lib64/python3.9/site-packages/flatpak_indexer/datasource/fedora/updater.py", line 70, in start
      self.change_monitor.start()
      File "/opt/app-root/lib64/python3.9/site-packages/flatpak_indexer/fedora_monitor.py", line 78, in start
      self._maybe_reraise_failure("Failed to start connection to fedora-messaging")
      File "/opt/app-root/lib64/python3.9/site-packages/flatpak_indexer/fedora_monitor.py", line 180, in _maybe_reraise_failure
      raise RuntimeError(msg) from self.failure
      RuntimeError: Failed to start connection to fedora-messaging
      ```

      • resultsdb-ci-listener
        ```
        [fedora_messaging.cli INFO] Starting consumer with resultsdb_listener.consumer:Consumer callback
        [fedora_messaging.twisted.service INFO] Authenticating with server using x509 (certfile: /etc/pki/rabbitmq/crt/resultsdb-ci-listener.crt, keyfile: /etc/pki/rabbitmq/key/resultsdb-ci-listener.key)
        [fedora_messaging.cli ERROR] The TCP connection appears to have started, but the TLS or AMQP handshake with the broker failed; check your connection and authentication parameters and ensure your user has permission to access the vhost
        ```
      • koschei - watcher
        ```
        /usr/share/koschei/koschei/models.py:1311: SAWarning: relationship 'CoprRebuild.request' will copy column copr_rebuild_request.id to column copr_rebuild.request_id, which conflicts with relationship(s): 'CoprRebuildRequest.rebuilds' (copies copr_rebuild_request.id to copr_rebuild.request_id). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards. To silence this warning, add the parameter 'overlaps="rebuilds"' to the 'CoprRebuild.request' relationship. (Background on this error at: https://sqlalche.me/e/14/qzyx)
        CoprRebuild.request = relationship(CoprRebuildRequest)
        watcher started
        Log opened.
        Loading configuration from /etc/koschei/fedora-messaging.toml
        Authenticating with server using x509 (certfile: /etc/koschei/rabbitmq-client.crt, keyfile: /etc/koschei/rabbitmq-client.key)
        Starting factory FedoraMessagingFactoryV2(parameters=<URLParameters host=rabbitmq.stg.fedoraproject.org port=5671 virtual_host=/pubsub ssl=True>, confirms=True)
        AMQP stack terminated, failed to connect, or aborted: opened=False, error-arg=None; pending-error=ConnectionDone()
        Connection setup terminated due to ConnectionDone()
        <twisted.internet.tcp.Connector instance at 0x7f7949a30290 disconnected IPv4Address(type='TCP', host='rabbitmq.stg.fedoraproject.org', port=5671)> will retry in 2 seconds
        Stopping factory FedoraMessagingFactoryV2(parameters=<URLParameters host=rabbitmq.stg.fedoraproject.org port=5671 virtual_host=/pubsub ssl=True>, confirms=True)
        Consuming raised an unexpected error, please report a bug:
        Traceback (most recent call last):
          • <exception caught here> —
            File "/usr/lib/python3.12/site-packages/fedora_messaging/api.py", line 166, in _twisted_consume_wrapper
            consumers = yield twisted_consume(callback, bindings=bindings, queues=queues)
            File "/usr/lib/python3.12/site-packages/fedora_messaging/twisted/factory.py", line 291, in consume
            protocol = yield self.when_connected()
            File "/usr/lib/python3.12/site-packages/fedora_messaging/twisted/factory.py", line 203, in when_connected
            yield self._client_deferred
            fedora_messaging.exceptions.ConnectionException:

      Service watcher crashed.
      Traceback (most recent call last):
      File "/usr/share/koschei/koschei/backend/main.py", line 58, in main
      svc(backend.KoscheiBackendSession()).run_service()
      File "/usr/share/koschei/koschei/backend/service.py", line 100, in run_service
      self.main()
      File "/usr/share/koschei/koschei/plugins/fedmsg_plugin/backend/services/watcher.py", line 72, in main
      fedmsg.consume(callback)
      File "/usr/lib/python3.12/site-packages/fedora_messaging/api.py", line 235, in consume
      eventual_result.wait(timeout=2**31)
      File "/usr/lib/python3.12/site-packages/crochet/_eventloop.py", line 196, in wait
      result.raiseException()
      File "/usr/lib/python3.12/site-packages/twisted/python/failure.py", line 505, in raiseException
      raise self.value.with_traceback(self.tb)
      fedora_messaging.exceptions.ConnectionException
      Traceback (most recent call last):
      File "<frozen runpy>", line 198, in _run_module_as_main
      File "<frozen runpy>", line 88, in _run_code
      File "/usr/share/koschei/koschei/backend/main.py", line 67, in <module>
      main()
      File "/usr/share/koschei/koschei/backend/main.py", line 58, in main
      svc(backend.KoscheiBackendSession()).run_service()
      File "/usr/share/koschei/koschei/backend/service.py", line 100, in run_service
      self.main()
      File "/usr/share/koschei/koschei/plugins/fedmsg_plugin/backend/services/watcher.py", line 72, in main
      fedmsg.consume(callback)
      File "/usr/lib/python3.12/site-packages/fedora_messaging/api.py", line 235, in consume
      eventual_result.wait(timeout=2**31)
      File "/usr/lib/python3.12/site-packages/crochet/_eventloop.py", line 196, in wait
      result.raiseException()
      File "/usr/lib/python3.12/site-packages/twisted/python/failure.py", line 505, in raiseException
      raise self.value.with_traceback(self.tb)
      fedora_messaging.exceptions.ConnectionException
      ```

      I scaled them all to 0, but if anybody has spare time to look at them, please do.

      1. When do you need this to be done by? (YYYY/MM/DD)

        Not urgent, for now they are disabled

              Unassigned Unassigned
              cle_bot CLE bot
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: