Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.14
Component/s: Storage / Operators
Labels:
None

Severity:
Important
Regression:
No
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
* Creating pods with Azure File NFS volumes that are scheduled to the control plane node causes the mount to be denied. (link:https://issues.redhat.com/browse/OCPBUGS-18581[*~~OCPBUGS-18581~~*])
+
To work around this issue: If your control plane nodes are schedulable, and the pods can run on worker nodes, use `nodeSelector` or Affinity to schedule the pod in worker nodes.

Show
* Creating pods with Azure File NFS volumes that are scheduled to the control plane node causes the mount to be denied. (link: https://issues.redhat.com/browse/OCPBUGS-18581 [* OCPBUGS-18581 *]) + To work around this issue: If your control plane nodes are schedulable, and the pods can run on worker nodes, use `nodeSelector` or Affinity to schedule the pod in worker nodes.
Release Note Type:
Known Issue
Release Note Status:
Done

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

In an Azure compact cluster(only 3 master nodes but all have the worker role), I created sc with skuname: Premium_LRS (I found this is easier to reproduce than other type) and pvc/pod, the CSI Driver helps create a storagceaccount when provisoning the volume, sometimes the storagceaccount allows "all Public network" access as below:

      "networkAcls": {
            "bypass": "AzureServices",
            "virtualNetworkRules": [],
            "ipRules": [],
            "defaultAction": "Allow"
        },

But in some cases, it only allows "selected virtual networks and IP addresses" and "*.worker-subnet" is the only allowed subnet as below:

       "networkAcls": {
            "bypass": "AzureServices",
            "virtualNetworkRules": [
                {
                    "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/wduan-0906a-az-p95c4-rg/providers/Microsoft.Network/virtualNetworks/wduan-0906a-az-p95c4-vnet/subnets/wduan-0906a-az-p95c4-worker-subnet",
                    "action": "Allow",
                    "state": "Succeeded"
                }
            ],
            "ipRules": [],
            "defaultAction": "Deny"
        },

But actually the scheduled node is master node and only has the "*.master-subnet", so azure-file failed mount due to access denied from master as below:

Mounting arguments: -t nfs -o vers=4,minorversion=1,sec=sys f79137987692a4afea86fb6.file.core.windows.net:/f79137987692a4afea86fb6/pvcn-5dcfcd81-4b29-4876-b2eb-1a778657a35c /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/091066f6c53b5709246f64097bd117917b9daedba792ff9a507b72e6f2cbb4b9/globalmount
  Output: mount.nfs: access denied by server while mounting f79137987692a4afea86fb6.file.core.windows.net:/f79137987692a4afea86fb6/pvcn-5dcfcd81-4b29-4876-b2eb-1a778657a35c

Checked with installer team, it makes sense to have "*.worker-subnet" even there is no worker node yet, it might be used to computer provisioning as day-2 action, also it might impact several scenarios:

compact/SNO cluster as mentioned above
regular cluster when try to schedule pod on master node with Azure-file pvc

So I think we need to check how Azure-File CSI Driver generate networl access rule when creating storageaccount, I think "allow all" might be better or at least all ".master-subnet"/".worker-subnet" subnet should be allowed.

I'm not sure if this is the right code: https://github.com/openshift/azure-file-csi-driver/blob/master/vendor/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_storageaccount.go#L314

~~Again, it doesn't happen always, so if in regular cluster, I think we might try with:~~

~~1. create pvc (with sc skuname: Premium_LRS) and pod (make it scheduled to master only)~~

~~2. check if pod is running and check storageaccount used in the portal~~

~~3. remove the storageaccount and try again if not reproduce~~

See https://issues.redhat.com/browse/OCPBUGS-18581?focusedId=22953323&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-22953323

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-arm64-2023-09-05-140644 (I found/checked in an arm64 build, but I guess it is the same as x86 platform)

And reproduced in 4.14.0-0.nightly-2023-09-02-132842 as well.

How reproducible:

Sometimes

Steps to Reproduce:

See Description

Actual results:

Mount failed and pod is not running

Expected results:

Mount succeed and pod is running

Assignee:: Fabio Bertinatto

Reporter:: Wei Duan

QA Contact:: Wei Duan

Doc Contact:: Lisa Pettyjohn

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2023/09/06 9:47 AM

Updated:: 2023/10/16 10:43 AM

Resolved:: 2023/10/16 10:43 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates