-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhos-18.0.15
-
None
-
2
-
False
-
-
False
-
?
-
rhos-ops-platform-services-security
-
None
-
-
-
-
DFG Security: Sprint 22, DFG Security: Sprint 23
-
2
-
Important
To Reproduce
Steps to reproduce the behavior:
- Deploy Barbican with the PKCS#11 crypto plugin configured against a Trustway Proteccio HSM (Eviden/Atos).
- Configure Barbican with multiple worker processes (default: typically 4 or more, matching CPU count).
- Run any workload that triggers secret store operations (e.g., create secrets, image signature verification via Nova/Glance).
- Observe CKR_DEVICE_MEMORY (0x00000031) errors in the Barbican worker logs.
The error manifests as:
pkcs11.exceptions.PKCS11Error: CKR_DEVICE_MEMORY (0x00000031)
followed by:
oslo_config.cfg.MissingArgumentError: Missing value auth-url required for auth plugin password
The secondary error occurs because the PKCS#11 session initialization failure leaves the Barbican worker in a partially initialized state.
Expected behavior
Barbican workers should manage PKCS#11 sessions efficiently, reusing or properly releasing sessions so that the HSM's session limit is never exceeded under normal operating conditions. When multiple Barbican worker processes run concurrently, each should maintain only the sessions it needs and release them when idle.
Device Info (please complete the following information):
- Hardware Specs: Not applicable (server-side issue on any hardware running Barbican with an HSM)
- OS Version: RHEL 9.x
- Affected Product: RHOSO 18 / OSP 17.1 (both use the same Barbican PKCS#11 plugin code)
- HSM: Trustway Proteccio (Eviden/Atos), firmware 3.06.05
Bug impact
When this issue occurs, all Barbican secret operations fail, making the key-manager service completely unavailable. This has cascading effects on any OpenStack service that depends on Barbican for cryptographic operations, including:
- Nova: Image signature verification fails, preventing instances from booting with signed images (Unable to retrieve certificate with ID).
- Cinder: Encrypted volume creation fails.
- Any service using Castellan: Unable to retrieve or store secrets.
The issue is particularly impactful in production environments where Barbican is configured as the global default secret store with PKCS#11, as the failure renders the entire key management layer inoperable until the Barbican service is restarted.
Known workaround
Limit the number of Barbican worker processes to 1 to reduce the total number of concurrent PKCS#11 sessions opened against the HSM.
- OSP 17.1 (TripleO): Set BarbicanWorkers: 1 in the Heat environment file under parameter_defaults.
- RHOSO 18 (podified): Set the worker count in the Barbican CR or equivalent configuration.
This workaround reduces concurrency and throughput but prevents session exhaustion. A proper fix should implement session pooling, reuse, or on-demand session management within the barbican-api / barbican-worker PKCS#11 plugin, rather than relying on each worker opening and holding its own set of sessions indefinitely.
Additional context
The Proteccio HSM has a finite limit on the number of concurrent PKCS#11 sessions it can support. Each Barbican worker process opens multiple sessions during initialization (for encryption, HMAC, and key wrapping operations) and holds them open for the lifetime of the process. With the default worker count (typically matching the number of CPU cores), the total number of sessions quickly exceeds the HSM's capacity.
The root cause is in the Barbican PKCS#11 plugin (`barbican/plugin/crypto/p11_crypto_plugin.py`), specifically in the `P11CryptoPlugin` class. Sessions are opened in `_create_pkcs11_session()` and stored per-process, but there is no mechanism for session pooling, limiting the total session count, or gracefully handling `CKR_DEVICE_MEMORY` by retrying after closing stale sessions.
This issue has been reproduced in both OSP 17.1 standalone (Podman-based) and RHOSO 18 (OpenShift-based) deployments using the same Proteccio HSM infrastructure in the RDU2 lab environment.
- is triggered by
-
RHOSSTRAT-946 Feature - Proteccio HSM Adoption
-
- In Progress
-