Uploaded image for project: 'CoreOS OCP'
  1. CoreOS OCP
  2. COS-3078

Impact RTE pods fail to start due to selinux issues

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • 0
    • 0

      Impact statement for the  OCPBUGS-45639 series:

      Which 4.y.z to 4.y'.z' updates increase vulnerability?

      • Updates into 4.16.25, 4.16.26, 4.17.7, and 4.17.8

      Which types of clusters?

      • Clusters using the numaresources operator, which you can detect in PromQL with csv_succeeded{name=~"numaresources-operator[.].*"}

      What is the impact? Is it serious enough to warrant removing update recommendations?

      • Critical pods in the numa resources operator crashloop backoff due to selinux changes

      How involved is remediation?

      • If the update as a z-stream update that could be rolled back after consulting with support
      • No other known remediation short of updating to a fixed version

      Is this a regression?

      • Yes it is

       

            [COS-3078] Impact RTE pods fail to start due to selinux issues

            graph-data#6444 merged, declaring this issue as NUMAResourcesOperatorCrashLoopBackOff. I'm closing this impact-statement ticket now that the risk is declared, but please comment here if you think the impact statement or risk declaration need further refinement.

            For the fixes themselves, watch OCPBUGS-45639 (4.17) or OCPBUGS-45983 (4.16).

            W. Trevor King added a comment - graph-data#6444  merged, declaring this issue as NUMAResourcesOperatorCrashLoopBackOff . I'm closing this impact-statement ticket now that the risk is declared, but please comment here if you think the impact statement or risk declaration need further refinement. For the fixes themselves, watch OCPBUGS-45639 (4.17) or OCPBUGS-45983 (4.16).

            W. Trevor King added a comment - - edited

            I picked COS for this, because the issue is with an RPM that feeds RHEL which feeds RHCOS which feeds OCP, and the RHCOS / COS step is as far as I'm able to map Jira projects towards the appropriate team for Component/s: Containers. Please feel free to move this ticket, if you're aware of a more-specific team.

            My impression, mostly based on the OCPBUGS-45983 4.16 backport's Release Note Text, is that the impact statement answers are something like:

            Which 4.y.z to 4.y'.z' updates increase vulnerability?

            My current guess: Updates from releases older than 4.16.25 and 4.17.7 into 4.16.25, 4.16.26, 4.17.7, or 4.17.8.

            Which types of clusters?

            My current guess: Clusters with the NUMA Resources Operator installed. I don't think we have a way to detect "I don't have that operator installed now, but I plan on installing it after updating to an exposed release". But we do have PromQL to check current OLM-installed operators. Poking at Telemetry, I'm guessing we want to look at csv_succeeded{name=~"numaresources-operator[.].*"}

            What is the impact? Is it serious enough to warrant removing update recommendations?

            The bugs say "The RTE pods gets stuck on CrashLoopBackOff due an selinux issue.", but I don't know what that means. Maybe all cluster-admins who have the NUMA Resources Operator installed will know? Maybe someone who knows should unpack it for people like me who don't know yet?

            How involved is remediation?

            Also not covered in the bugs on my brief skim.

            Is this a regression?

            My current guess: Yup, in 4.16.25 and 4.17.7.

            W. Trevor King added a comment - - edited I picked COS for this, because the issue is with an RPM that feeds RHEL which feeds RHCOS which feeds OCP, and the RHCOS / COS step is as far as I'm able to map Jira projects towards the appropriate team for Component/s: Containers . Please feel free to move this ticket, if you're aware of a more-specific team. My impression, mostly based on the OCPBUGS-45983 4.16 backport's Release Note Text , is that the impact statement answers are something like: Which 4.y.z to 4.y'.z' updates increase vulnerability? My current guess: Updates from releases older than 4.16.25 and 4.17.7 into 4.16.25, 4.16.26, 4.17.7, or 4.17.8. Which types of clusters? My current guess: Clusters with the NUMA Resources Operator installed. I don't think we have a way to detect "I don't have that operator installed now, but I plan on installing it after updating to an exposed release". But we do have PromQL to check current OLM-installed operators. Poking at Telemetry, I'm guessing we want to look at csv_succeeded{name=~"numaresources-operator [.] .*" } What is the impact? Is it serious enough to warrant removing update recommendations? The bugs say "The RTE pods gets stuck on CrashLoopBackOff due an selinux issue.", but I don't know what that means. Maybe all cluster-admins who have the NUMA Resources Operator installed will know? Maybe someone who knows should unpack it for people like me who don't know yet? How involved is remediation? Also not covered in the bugs on my brief skim. Is this a regression? My current guess: Yup, in 4.16.25 and 4.17.7.

              rhn-support-sdodson Scott Dodson
              trking W. Trevor King
              Jindrich Novy
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: