Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18830

[AWS SC2S] ec2:DescribeSecurityGroupRules is not supported in SC2S region.

    • Critical
    • No
    • Sprint 242
    • 1
    • Approved
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, when installing an AWS cluster to the Secret Commercial Cloud Services (SC2S) region and specifying existing AWS security groups, the installation failed with an error that stated that the functionality was not available in the region. With this fix, the installation succeeds. (link:https://issues.redhat.com/browse/OCPBUGS-18830[*OCPBUGS-18830*])
      Show
      Previously, when installing an AWS cluster to the Secret Commercial Cloud Services (SC2S) region and specifying existing AWS security groups, the installation failed with an error that stated that the functionality was not available in the region. With this fix, the installation succeeds. (link: https://issues.redhat.com/browse/OCPBUGS-18830 [* OCPBUGS-18830 *])
    • Bug Fix
    • Done

      Description of problem:

      Failed to install cluster on SC2S region as:
      
      level=error msg=Error: reading Security Group (sg-0b0cd054dd599602f) Rules: UnsupportedOperation: The functionality you requested is not available in this region. 

      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-09-11-201102
       

      How reproducible:

      Always
       

      Steps to Reproduce:

      1. Create an OCP cluster on SC2S
      

      Actual results:

      Install fail:
      level=error msg=Error: reading Security Group (sg-0b0cd054dd599602f) Rules: UnsupportedOperation: The functionality you requested is not available in this region.

      Expected results:

      Install succeed.
       

      Additional info:

      * C2S region is not affected

            [OCPBUGS-18830] [AWS SC2S] ec2:DescribeSecurityGroupRules is not supported in SC2S region.

            Per the announcement sent regarding the removal of "Blocker" as an option in the Priority field, this issue (which was already closed at the time of the bulk update) had Priority = "Blocker." It is being updated to Priority = Critical. No additional fields were changed.

            OpenShift Jira Automation Bot added a comment - Per the announcement sent regarding the removal of "Blocker" as an option in the Priority field, this issue (which was already closed at the time of the bulk update) had Priority = "Blocker." It is being updated to Priority = Critical. No additional fields were changed.

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Critical: OpenShift Container Platform 4.15.0 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2023:7198

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Critical: OpenShift Container Platform 4.15.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:7198

            Verified on both pre-merge test and registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2023-09-25-101013.. PASS.

            Yunfei Jiang added a comment - Verified on both pre-merge test and registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2023-09-25-101013.. PASS.

            rh-ee-bbarbach Yes, the account used by installer does have DescribeSecurityGroupRules permission.

            Yunfei Jiang added a comment - rh-ee-bbarbach Yes, the account used by installer does have DescribeSecurityGroupRules permission.

            yunjiang-1 I actually just noticed the installconfig. I was looking for credentials mode at the top but it was at the bottom. Since you were in manual mode can we verify that the service account(s) used during installation have the DescribeSecurityGroupRules permission? Your install would not fail early even if that permission is not present as the check was skipped in manual mode. 

             

            I don't think that is what is going on here, but let's explore that route. 

            Thank you.

            Brent Barbachem added a comment - yunjiang-1 I actually just noticed the installconfig. I was looking for credentials mode at the top but it was at the bottom. Since you were in manual mode can we verify that the service account(s) used during installation have the DescribeSecurityGroupRules permission? Your install would not fail early even if that permission is not present as the check was skipped in manual mode.    I don't think that is what is going on here, but let's explore that route.  Thank you.

            yunjiang-1 I pulled the logs but I didn't see what I was looking for. Do you happen to know if the run had a credentialsMode set or was the default used?

            Brent Barbachem added a comment - yunjiang-1 I pulled the logs but I didn't see what I was looking for. Do you happen to know if the run had a credentialsMode set or was the default used?

            padillon Yes, you are correct, the issue is caused by 7274.

            Pre-merge test against 7387 on 4.13 got the same issue.

            See 4.13 logs https://drive.google.com/file/d/1D90IjaMPouLtStuEIT3bXqeNr5Lg2Eey/view?usp=drive_link

            Yunfei Jiang added a comment - padillon Yes, you are correct, the issue is caused by 7274. Pre-merge test against 7387 on 4.13 got the same issue. See 4.13 logs https://drive.google.com/file/d/1D90IjaMPouLtStuEIT3bXqeNr5Lg2Eey/view?usp=drive_link

            One possible cause of this bug would be our bump of the aws terraform provider (which landed in #7274).

            There is a 4.13 backport open at https://github.com/openshift/installer/pull/7387. I have put a hold on that until we can rule out the bump as the cause.

            yunjiang-1 would it be possible to perform a pre-submit test on #7387 in the govcloud region to help determine if the provider bump is the cause of the issue?

            Patrick Dillon added a comment - One possible cause of this bug would be our bump of the aws terraform provider (which landed in #7274 ). There is a 4.13 backport open at https://github.com/openshift/installer/pull/7387. I have put a hold on that until we can rule out the bump as the cause. yunjiang-1 would it be possible to perform a pre-submit test on #7387 in the govcloud region to help determine if the provider bump is the cause of the issue?

            Logs indicate the failure is happening during bootstrap destroy:

            time="2023-09-07T06:37:20Z" level=error msg="Error: reading Security Group (sg-056d8f7fdf50a5648) Rules: UnsupportedOperation: The functionality you requested is not available in this region."
            time="2023-09-07T06:37:20Z" level=error msg="\tstatus code: 400, request id: "
            time="2023-09-07T06:37:20Z" level=error
            time="2023-09-07T06:37:20Z" level=error msg="  with aws_security_group_rule.ssh,"
            time="2023-09-07T06:37:20Z" level=error msg="  on main.tf line 223, in resource \"aws_security_group_rule\" \"ssh\":"
            time="2023-09-07T06:37:20Z" level=error msg=" 223: resource \"aws_security_group_rule\" \"ssh\" {"
            time="2023-09-07T06:37:20Z" level=error
            time="2023-09-07T06:37:20Z" level=error
            time="2023-09-07T06:37:20Z" level=error msg="Error: reading Security Group (sg-056d8f7fdf50a5648) Rules: UnsupportedOperation: The functionality you requested is not available in this region."
            time="2023-09-07T06:37:20Z" level=error msg="\tstatus code: 400, request id: "
            time="2023-09-07T06:37:20Z" level=error
            time="2023-09-07T06:37:20Z" level=error msg="  with aws_security_group_rule.bootstrap_journald_gateway,"
            time="2023-09-07T06:37:20Z" level=error msg="  on main.tf line 234, in resource \"aws_security_group_rule\" \"bootstrap_journald_gateway\":"
            time="2023-09-07T06:37:20Z" level=error msg=" 234: resource \"aws_security_group_rule\" \"bootstrap_journald_gateway\" {"
            time="2023-09-07T06:37:20Z" level=error
            time="2023-09-07T06:37:20Z" level=fatal msg="terraform destroy: failed doing terraform destroy: exit status 1\n\nError: reading Security Group (sg-056d8f7fdf50a5648) Rules: UnsupportedOperation: The functionality you requested is not available in this region.\n\tstatus code: 400, request id: \n\n  with aws_security_group_rule.ssh,\n  on main.tf line 223, in resource \"aws_security_group_rule\" \"ssh\":\n 223: resource \"aws_security_group_rule\" \"ssh\" {\n\n\nError: reading Security Group (sg-056d8f7fdf50a5648) Rules: UnsupportedOperation: The functionality you requested is not available in this region.\n\tstatus code: 400, request id: \n\n  with aws_security_group_rule.bootstrap_journald_gateway,\n  on main.tf line 234, in resource \"aws_security_group_rule\" \"bootstrap_journald_gateway\":\n 234: resource \"aws_security_group_rule\" \"bootstrap_journald_gateway\" {\n\n" 

            Patrick Dillon added a comment - Logs indicate the failure is happening during bootstrap destroy: time= "2023-09-07T06:37:20Z" level=error msg= "Error: reading Security Group (sg-056d8f7fdf50a5648) Rules: UnsupportedOperation: The functionality you requested is not available in this region." time= "2023-09-07T06:37:20Z" level=error msg= "\tstatus code: 400, request id: " time= "2023-09-07T06:37:20Z" level=error time= "2023-09-07T06:37:20Z" level=error msg= "  with aws_security_group_rule.ssh," time= "2023-09-07T06:37:20Z" level=error msg= "  on main.tf line 223, in resource \" aws_security_group_rule\ " \" ssh\ ":" time= "2023-09-07T06:37:20Z" level=error msg= " 223: resource \" aws_security_group_rule\ " \" ssh\ " {" time= "2023-09-07T06:37:20Z" level=error time= "2023-09-07T06:37:20Z" level=error time= "2023-09-07T06:37:20Z" level=error msg= "Error: reading Security Group (sg-056d8f7fdf50a5648) Rules: UnsupportedOperation: The functionality you requested is not available in this region." time= "2023-09-07T06:37:20Z" level=error msg= "\tstatus code: 400, request id: " time= "2023-09-07T06:37:20Z" level=error time= "2023-09-07T06:37:20Z" level=error msg= "  with aws_security_group_rule.bootstrap_journald_gateway," time= "2023-09-07T06:37:20Z" level=error msg= "  on main.tf line 234, in resource \" aws_security_group_rule\ " \" bootstrap_journald_gateway\ ":" time= "2023-09-07T06:37:20Z" level=error msg= " 234: resource \" aws_security_group_rule\ " \" bootstrap_journald_gateway\ " {" time= "2023-09-07T06:37:20Z" level=error time= "2023-09-07T06:37:20Z" level=fatal msg= "terraform destroy: failed doing terraform destroy: exit status 1\n\nError: reading Security Group (sg-056d8f7fdf50a5648) Rules: UnsupportedOperation: The functionality you requested is not available in this region.\n\tstatus code: 400, request id: \n\n  with aws_security_group_rule.ssh,\n  on main.tf line 223, in resource \" aws_security_group_rule\ " \" ssh\ ":\n 223: resource \" aws_security_group_rule\ " \" ssh\ " {\n\n\nError: reading Security Group (sg-056d8f7fdf50a5648) Rules: UnsupportedOperation: The functionality you requested is not available in this region.\n\tstatus code: 400, request id: \n\n  with aws_security_group_rule.bootstrap_journald_gateway,\n  on main.tf line 234, in resource \" aws_security_group_rule\ " \" bootstrap_journald_gateway\ ":\n 234: resource \" aws_security_group_rule\ " \" bootstrap_journald_gateway\ " {\n\n"

            Thanks, yunjiang-1. Can you provide more complete logs so we can determine which code is invoking this unsupported operation? Also, can you provide an install config or clarify whether you are specifying additionalSecurityGroups in the install config?

            Patrick Dillon added a comment - Thanks, yunjiang-1 . Can you provide more complete logs so we can determine which code is invoking this unsupported operation? Also, can you provide an install config or clarify whether you are specifying additionalSecurityGroups in the install config?

              rh-ee-bbarbach Brent Barbachem
              yunjiang-1 Yunfei Jiang
              Yunfei Jiang Yunfei Jiang
              Mike Pytlak Mike Pytlak (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: