Uploaded image for project: 'Openshift sandboxed containers'
  1. Openshift sandboxed containers
  2. KATA-4308

Trustee trustee-deployment pod fails start up if the host CPU number is 256

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Medium Medium
    • OSC 1.11
    • None
    • trustee-operator
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • False
    • Bugs and Vulnerability Issues
    • Important
    • 8

      Description

      Setup the trustee-operator.v1.0.0 on the SNP server, if the server cpu number is 256, the trustee-deployment pod will fail start up. Disable the Hyper Threading of the server, change the cpu number from 256 to 128, then the trustee-deployment pod can start up well.

      Steps to reproduce

      1. Setup the trustee-operator.v1.0.0 on the server which cpu number is 256

      2. Check the pods under the namespace trustee-operator-system

      $ oc get pods -n trustee-operator-system
      NAME                                                   READY   STATUS             RESTARTS        AGE
      trustee-deployment-55ccd8d44d-wk4gh                    0/1     CrashLoopBackOff   45 (3m3s ago)   3h28m
      trustee-operator-controller-manager-558cfb7d95-7gxpv   1/1     Running            0               3h28m
      
      $ oc logs trustee-deployment-55ccd8d44d-wk4gh -n trustee-operator-system
      [2025-11-06T09:36:41Z INFO  kbs] Using config file /etc/kbs-config/kbs-config.toml
      [2025-11-06T09:36:41Z WARN  kbs::admin] insecure admin APIs are enabled
      [2025-11-06T09:36:41Z INFO  attestation_service::rvps] launch a built-in RVPS.
      [2025-11-06T09:36:41Z WARN  reference_value_provider_service::extractors] No configuration for SWID extractor provided. Default will be used.
      [2025-11-06T09:36:41Z WARN  attestation_service::policy_engine::opa] Policy default_cpu already exists, so the default policy will not be written.
      [2025-11-06T09:36:41Z WARN  attestation_service::policy_engine::opa] Policy default_gpu already exists, so the default policy will not be written.
      [2025-11-06T09:36:41Z INFO  attestation_service::token::ear_broker] No Token Signer key in config file, create an ephemeral key and without CA pubkey cert
      [2025-11-06T09:36:41Z INFO  kbs::api_server] Starting HTTP server at [0.0.0.0:8080]
      [2025-11-06T09:36:41Z INFO  actix_server::builder] starting 256 workers
      [2025-11-06T09:36:41Z INFO  actix_server::server] Actix runtime found; starting in Actix runtime
      [2025-11-06T09:36:41Z INFO  actix_server::server] starting service: "actix-web-service-0.0.0.0:8080", workers: 256, listening on: 0.0.0.0:8080
      thread 'actix-rt|system:0|arbiter:253' panicked at /cachi2/output/deps/cargo/actix-server-2.6.0/src/worker.rs:428:30:
      called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }
      note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
      thread 'main' panicked at /cachi2/output/deps/cargo/actix-rt-2.11.0/src/arbiter.rs:153:25:
      called `Result::unwrap()` on an `Err` value: RecvError
      Error: HTTP initialization failed
      
      Caused by:
          task 4 panicked with message "called `Result::unwrap()` on an `Err` value: RecvError"
      

      Expected result

      The trustee-deployment pod should be running no mater how much cpus the server has.

      $ oc get pods -n trustee-operator-system
      NAME                                                   READY   STATUS    RESTARTS       AGE
      trustee-deployment-665684688-jxft9                     1/1     Running   0              17s
      trustee-operator-controller-manager-558cfb7d95-7gxpv   1/1     Running   2 (100s ago)   5h8m
      

      Impact

      If the customer server has CPU >= 256, the trustee pod will not start up and the attestation will not work.

      Env

      $ oc version
      Client Version: 4.12.0-0.nightly-2023-10-09-102237
      Kustomize Version: v4.5.7
      Server Version: 4.20.0
      Kubernetes Version: v1.33.5
      
      trustee-operator.v1.0.0 
      

      Additional helpful info

      • When the servers has 128 cpus or 56 cores, the trustee pods start well.
      • Deploying the previous trustee version(CoCo BM 0.2.0), this issue doesn't exits.

              rh-ee-lmilleri Leonardo Milleri
              rhn-support-pezhang Pei Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: