Resolution: Won't Do
Known Issue
Description of problem:
In an Azure compact cluster(only 3 master nodes but all have the worker role), I created sc with skuname: Premium_LRS (I found this is easier to reproduce than other type) and pvc/pod, the CSI Driver helps create a storagceaccount when provisoning the volume, sometimes the storagceaccount allows "all Public network" access as below:
"networkAcls": { "bypass": "AzureServices", "virtualNetworkRules": [], "ipRules": [], "defaultAction": "Allow" },
But in some cases, it only allows "selected virtual networks and IP addresses" and "*.worker-subnet" is the only allowed subnet as below:
"networkAcls": { "bypass": "AzureServices", "virtualNetworkRules": [ { "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/wduan-0906a-az-p95c4-rg/providers/Microsoft.Network/virtualNetworks/wduan-0906a-az-p95c4-vnet/subnets/wduan-0906a-az-p95c4-worker-subnet", "action": "Allow", "state": "Succeeded" } ], "ipRules": [], "defaultAction": "Deny" },
But actually the scheduled node is master node and only has the "*.master-subnet", so azure-file failed mount due to access denied from master as below:
Mounting arguments: -t nfs -o vers=4,minorversion=1,sec=sys f79137987692a4afea86fb6.file.core.windows.net:/f79137987692a4afea86fb6/pvcn-5dcfcd81-4b29-4876-b2eb-1a778657a35c /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/091066f6c53b5709246f64097bd117917b9daedba792ff9a507b72e6f2cbb4b9/globalmount Output: mount.nfs: access denied by server while mounting f79137987692a4afea86fb6.file.core.windows.net:/f79137987692a4afea86fb6/pvcn-5dcfcd81-4b29-4876-b2eb-1a778657a35c
Checked with installer team, it makes sense to have "*.worker-subnet" even there is no worker node yet, it might be used to computer provisioning as day-2 action, also it might impact several scenarios:
- compact/SNO cluster as mentioned above
- regular cluster when try to schedule pod on master node with Azure-file pvc
So I think we need to check how Azure-File CSI Driver generate networl access rule when creating storageaccount, I think "allow all" might be better or at least all ".master-subnet"/".worker-subnet" subnet should be allowed.
I'm not sure if this is the right code: https://github.com/openshift/azure-file-csi-driver/blob/master/vendor/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_storageaccount.go#L314
Again, it doesn't happen always, so if in regular cluster, I think we might try with:
1. create pvc (with sc skuname: Premium_LRS) and pod (make it scheduled to master only)
2. check if pod is running and check storageaccount used in the portal
3. remove the storageaccount and try again if not reproduce
Version-Release number of selected component (if applicable):
4.14.0-0.nightly-arm64-2023-09-05-140644 (I found/checked in an arm64 build, but I guess it is the same as x86 platform) And reproduced in 4.14.0-0.nightly-2023-09-02-132842 as well.
How reproducible:
Steps to Reproduce:
See Description
Actual results:
Mount failed and pod is not running
Expected results:
Mount succeed and pod is running