Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-362

CI: fail update suite if any ClusterOperator go Available=False

    • Icon: Story Story
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • OTA 243, OTA 244, OTA 245

      These are alarming conditions which may frighten customers, and we don't want to see them in our own, controlled, repeatable update CI. This example job had logs like:

      Feb 18 21:11:25.799 E clusteroperator/openshift-apiserver changed Degraded to True: APIServerDeployment_UnavailablePod: APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()
      

      And the job failed, but none of the failures were "something made openshift-apiserver mad enough to go Degraded".

            [OTA-362] CI: fail update suite if any ClusterOperator go Available=False

            OpenShift Jira Automation Bot made changes -
            Work Type New: BU Product Work [ 40155 ]
            W. Trevor King made changes -
            Link New: This issue is related to OCPBUGS-36462 [ OCPBUGS-36462 ]
            W. Trevor King made changes -
            Link New: This issue is related to OCPBUGS-35892 [ OCPBUGS-35892 ]
            W. Trevor King made changes -
            Link New: This issue is related to OCPBUGS-32089 [ OCPBUGS-32089 ]

            I've set all related bugs to priority Major and left comments indicating we'd like to have these addressed by 4.16.

            Scott Dodson added a comment - I've set all related bugs to priority Major and left comments indicating we'd like to have these addressed by 4.16.
            W. Trevor King made changes -
            Link New: This issue is related to OCPBUGS-9108 [ OCPBUGS-9108 ]
            W. Trevor King made changes -
            Link New: This issue is depended on by OTA-1167 [ OTA-1167 ]
            W. Trevor King made changes -
            Resolution New: Done [ 1 ]
            Status Original: In Progress [ 10018 ] New: Closed [ 6 ]

            origin#27231 landed

            W. Trevor King added a comment - origin#27231 landed
            W. Trevor King made changes -
            Link New: This issue is related to OCPBUGS-24228 [ OCPBUGS-24228 ]
            W. Trevor King made changes -
            Link New: This issue is related to OCPBUGS-24041 [ OCPBUGS-24041 ]
            W. Trevor King made changes -
            Link New: This issue is related to OCPBUGS-22364 [ OCPBUGS-22364 ]
            W. Trevor King made changes -
            Link New: This issue is related to OCPBUGS-23746 [ OCPBUGS-23746 ]
            W. Trevor King made changes -
            Link New: This issue is related to OCPBUGS-23745 [ OCPBUGS-23745 ]
            W. Trevor King made changes -
            Link New: This issue is related to OCPBUGS-23744 [ OCPBUGS-23744 ]

            Lalatendu Mohanty added a comment - - edited

            rhn-engineering-dgoodwinThis is high priority for us. We want to make progress on this before 4.15 release. This is technical debt we should have addressed long back. I am happy to give it you if you want to take over this and we would help anyway we can. Please sync with trking about how we can pass it to you.

            Lalatendu Mohanty added a comment - - edited rhn-engineering-dgoodwin This is high priority for us. We want to make progress on this before 4.15 release. This is technical debt we should have addressed long back. I am happy to give it you if you want to take over this and we would help anyway we can. Please sync with trking about how we can pass it to you.
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 243, OTA 244 [ 54436, 55117 ] New: OTA 243, OTA 244, OTA 245 [ 54436, 55117, 55578 ]

            lmohanty@redhat.com trking just wondering how this effort sits in priority, specific to https://github.com/openshift/origin/pull/27231. I'm working on correlating alerts SD struggles with against CI alerts that fire and this one is a top offender, the change looked like a really big win. If you'd like TRT to take over that PR and wrap it up we'd be happy to just say the word. If it's something you all would like to complete that's fine I'm just looking to see where things are at and if we can help.

            Devan Goodwin added a comment - lmohanty@redhat.com trking just wondering how this effort sits in priority, specific to https://github.com/openshift/origin/pull/27231 . I'm working on correlating alerts SD struggles with against CI alerts that fire and this one is a top offender, the change looked like a really big win. If you'd like TRT to take over that PR and wrap it up we'd be happy to just say the word. If it's something you all would like to complete that's fine I'm just looking to see where things are at and if we can help.
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 243 [ 54436 ] New: OTA 243, OTA 244 [ 54436, 55117 ]
            Petr Muller made changes -
            Link New: This issue is related to TRT-1235 [ TRT-1235 ]
            W. Trevor King made changes -
            Link New: This issue is related to OCPBUGS-20062 [ OCPBUGS-20062 ]
            W. Trevor King made changes -
            Link New: This issue is related to OCPBUGS-20061 [ OCPBUGS-20061 ]
            W. Trevor King made changes -
            Link New: This issue is related to OCPBUGS-20056 [ OCPBUGS-20056 ]
            DPTP Bot made changes -
            Remote Link Original: This issue links to "openshift/origin#27231: OTA-362: pkg/synthetictests/operators: Fatal unless Available=False in allow-list (Web Link)" [ 896493 ] New: This issue links to "openshift/origin#27231: OTA-362: pkg/monitortests/clusterversionoperator: Fatal unless Available=False in allow-list (Web Link)" [ 896493 ]
            DPTP Bot made changes -
            Remote Link Original: This issue links to "origin#27231 (fatal Available=False with exception carve-outs) (Web Link)" [ 896493 ] New: This issue links to "openshift/origin#27231: OTA-362: pkg/synthetictests/operators: Fatal unless Available=False in allow-list (Web Link)" [ 896493 ]
            Lalatendu Mohanty made changes -
            Sprint New: OTA 243 [ 54436 ]
            W. Trevor King made changes -
            Link New: This issue blocks OTA-980 [ OTA-980 ]
            W. Trevor King made changes -
            Sprint Original: OTA 232 [ 49664 ]
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 231 [ 49256 ] New: OTA 232 [ 49664 ]
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 230 [ 48737 ] New: OTA 231 [ 49256 ]
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 229 [ 47367 ] New: OTA 230 [ 48737 ]
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 228 [ 47341 ] New: OTA 229 [ 47367 ]
            W. Trevor King made changes -
            Link New: This issue is related to OSD-13696 [ OSD-13696 ]
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 227 [ 46697 ] New: OTA 228 [ 47341 ]
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 226 [ 45894 ] New: OTA 227 [ 46697 ]
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 225 [ 45405 ] New: OTA 226 [ 45894 ]
            Amol Dongare made changes -
            Link New: This issue is related to OTA-699 [ OTA-699 ]
            Amol Dongare made changes -
            Link Original: This issue is followed up by OTA-699 [ OTA-699 ]
            W. Trevor King made changes -
            Link New: This issue is related to OCPBUGS-825 [ OCPBUGS-825 ]
            Lalatendu Mohanty made changes -
            Rank New: Ranked higher
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 224 [ 44895 ] New: OTA 225 [ 45405 ]
            Lalatendu Mohanty made changes -
            Rank New: Ranked lower
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 223 [ 44379 ] New: OTA 224 [ 44895 ]
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 222 [ 44377 ] New: OTA 223 [ 44379 ]
            Lalatendu Mohanty made changes -
            Status Original: To Do [ 10020 ] New: In Progress [ 10018 ]
            W. Trevor King made changes -
            Link New: This issue is blocked by OTA-701 [ OTA-701 ]
            W. Trevor King made changes -
            Epic Link New: OTA-700 [ 13421280 ]
            W. Trevor King made changes -
            Link Original: This issue is related to OTA-700 [ OTA-700 ]
            Lalatendu Mohanty made changes -
            Rank New: Ranked higher
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 221 [ 43880 ] New: OTA 222 [ 44377 ]
            Lalatendu Mohanty made changes -
            Story Points Original: 3 New: 2
            Lalatendu Mohanty made changes -
            Story Points Original: 5 New: 3
            Lalatendu Mohanty made changes -
            Summary Original: CI: fail update suite if any ClusterOperator go Degraded=True or Available=False New: CI: fail update suite if any ClusterOperator go Available=False
            W. Trevor King made changes -
            Link New: This issue is followed up by OTA-699 [ OTA-699 ]

            Lalatendu Mohanty added a comment - - edited

            Created card for operators going degraded during upgrade https://issues.redhat.com/browse/OTA-699

            Changed the title of this Jira card to only cover available condition. Also we need a new card to cover the work to communicate available and degraded condition to teams.

            Lalatendu Mohanty added a comment - - edited Created card for operators going degraded during upgrade https://issues.redhat.com/browse/OTA-699 Changed the title of this Jira card to only cover available condition. Also we need a new card to cover the work to communicate available and degraded condition to teams.
            W. Trevor King made changes -
            Link New: This issue is related to TRT-41 [ TRT-41 ]
            Lalatendu Mohanty made changes -
            Assignee New: W. Trevor King [ trking ]
            W. Trevor King made changes -
            Remote Link New: This issue links to "origin#27231 (fatal Available=False with exception carve-outs) (Web Link)" [ 896493 ]

            Slack thread to understand what others (staff engineers, TRT) think about this card https://coreos.slack.com/archives/CEGKQ43CP/p1654625537501139

            Lalatendu Mohanty added a comment - Slack thread to understand what others (staff engineers, TRT) think about this card https://coreos.slack.com/archives/CEGKQ43CP/p1654625537501139
            Lalatendu Mohanty made changes -
            Rank New: Ranked lower
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 220 [ 43371 ] New: OTA 221 [ 43880 ]

            We should do it during beginning of 4.12 release cycle, so that teams can accommodate the extra work coming out of this.

            Lalatendu Mohanty added a comment - We should do it during beginning of 4.12 release cycle, so that teams can accommodate the extra work coming out of this.
            Lalatendu Mohanty made changes -
            Rank New: Ranked lower
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 219 [ 42821 ] New: OTA 220 [ 43371 ]

            Lets take this to pillar arch call and see what other teams has to say about this.

            Lalatendu Mohanty added a comment - Lets take this to pillar arch call and see what other teams has to say about this.
            Lalatendu Mohanty made changes -
            Rank New: Ranked lower
            Lalatendu Mohanty made changes -
            Sprint Original: OTA 217 [ 41841 ] New: OTA 219 [ 42821 ]
            Tyler Mcconnico made changes -
            Workflow Original: Software Simplified Workflow for Project TEMPLATE Revised [ 14454798 ] New: OJA-WF-Y [ 17392392 ]
            Aileen Cunningham made changes -
            Priority Original: High [ 10003 ] New: Major [ 3 ]
            Lalatendu Mohanty made changes -
            Rank New: Ranked lower
            Lalatendu Mohanty made changes -
            Sprint New: OTA 217 [ 41841 ]
            Clark Everson made changes -
            Workflow Original: Software Simplified Workflow for Project TEMPLATE [ 13574402 ] New: Software Simplified Workflow for Project TEMPLATE Revised [ 14454798 ]
            W. Trevor King made changes -
            Remote Link New: This issue links to "origin#25920 (Web Link)" [ 276759 ]

            W. Trevor King added a comment - - edited

            Clayton floated origin#25920 for updates.

            W. Trevor King added a comment - - edited Clayton floated origin#25920 for updates.

            Related, but not for update jobs: origin#25918.

            W. Trevor King added a comment - Related, but not for update jobs: origin#25918 .
            Jack Ottofaro (Inactive) made changes -
            Story Points New: 5
            Labels New: groomed
            Priority Original: Unprioritized [ 10004 ] New: High [ 10003 ]

            Clayton approving of this guard in Slack.

            W. Trevor King added a comment - Clayton approving of this guard in Slack .
            W. Trevor King created issue -

              trking W. Trevor King
              trking W. Trevor King
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: