Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-65959

all 4.21 machine-config-daemon targets are down

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.21.0
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • Yes
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      4.21.0-0.nightly-2025-11-22-193140 cluster, all machine-config-daemon targets are down

      $ token=`oc create token prometheus-k8s -n openshift-monitoring`
      $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=ALERTS{alertname="TargetDown",namespace="openshift-machine-config-operator"}' | jq {
        "status": "success",
        "data": {
          "resultType": "vector",
          "result": [
            {
              "metric": {
                "__name__": "ALERTS",
                "alertname": "TargetDown",
                "alertstate": "firing",
                "job": "machine-config-daemon",
                "namespace": "openshift-machine-config-operator",
                "prometheus": "openshift-monitoring/k8s",
                "service": "machine-config-daemon",
                "severity": "warning"
              },
              "value": [
                1764061506.224,
                "1"
              ]
            }
          ],
          "analysis": {}
        }
      }
      

      admin user go to web console, "Observe - > Targets", search openshift-machine-config-operator targets, all  machine-config-daemon targets are down, click one down target, error is 

      Get "https://10.0.50.248:9001/metrics": tls: failed to verify certificate: x509: certificate is valid for kube-rbac-proxy.openshift-machine-config-operator.svc, kube-rbac-proxy.openshift-machine-config-operator.svc.cluster.local, not machine-config-daemon.openshift-machine-config-operator.svc

       
      see picture: https://drive.google.com/file/d/1UVE9a-oX3pnQKkoMnFoLCKZSdWcfRZGJ/view?usp=drive_link

      checked servicemonitor, serverName is machine-config-daemon.openshift-machine-config-operator.svc

      $ oc -n openshift-machine-config-operator get servicemonitor machine-config-daemon -oyaml
      ...
      spec:
        endpoints:
        - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
          interval: 30s
          path: /metrics
          port: metrics
          relabelings:
          - action: replace
            regex: ;(.*)
            replacement: $1
            separator: ;
            sourceLabels:
            - node
            - __meta_kubernetes_pod_node_name
            targetLabel: node
          scheme: https
          tlsConfig:
            caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
            serverName: machine-config-daemon.openshift-machine-config-operator.svc
      ...

      but from cert, it's for DNS:kube-rbac-proxy.openshift-machine-config-operator.svc, DNS:kube-rbac-proxy.openshift-machine-config-operator.svc.cluster.local, not for machine-config-daemon.openshift-machine-config-operator.svc, match with the error on targets page

      $ oc -n openshift-machine-config-operator get pod -o wide | grep machine-config-daemon | grep 10.0.50.248
      machine-config-daemon-ktlft                                      2/2     Running   0            8h    10.0.50.248   ip-10-0-50-248.us-east-2.compute.internal   <none>           <none>
      
      
      $ oc -n  openshift-machine-config-operator exec -c machine-config-daemon machine-config-daemon-ktlft -- openssl s_client -connect 10.0.50.248:9001 -servername 10.0.50.248  | openssl x509 -noout -text
      depth=1 CN = openshift-service-serving-signer@1764031164
      verify error:num=19:self-signed certificate in certificate chain
      verify return:1
      depth=1 CN = openshift-service-serving-signer@1764031164
      verify return:1
      depth=0 CN = kube-rbac-proxy.openshift-machine-config-operator.svc
      verify return:1
      DONE
      Certificate:
          Data:
              Version: 3 (0x2)
              Serial Number: 2439036565063936562 (0x21d932d556d32a32)
              Signature Algorithm: sha256WithRSAEncryption
              Issuer: CN=openshift-service-serving-signer@1764031164
              Validity
                  Not Before: Nov 25 00:39:42 2025 GMT
                  Not After : Nov 25 00:39:43 2027 GMT
              Subject: CN=kube-rbac-proxy.openshift-machine-config-operator.svc
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                      Public-Key: (2048 bit)
              ...
                      Exponent: 65537 (0x10001)
              X509v3 extensions:
                  X509v3 Key Usage: critical
                      Digital Signature, Key Encipherment
                  X509v3 Extended Key Usage: 
                      TLS Web Server Authentication
                  X509v3 Basic Constraints: critical
                      CA:FALSE
                  X509v3 Subject Key Identifier: 
                      C1:B5:E1:AF:F3:2F:43:6F:75:4F:3C:48:B0:44:55:FF:9C:64:B0:0C
                  X509v3 Authority Key Identifier: 
                      3B:D0:50:C1:5C:76:C1:FE:F5:1C:F5:53:E3:14:2F:65:68:B7:44:B2
                  X509v3 Subject Alternative Name: 
                      DNS:kube-rbac-proxy.openshift-machine-config-operator.svc, DNS:kube-rbac-proxy.openshift-machine-config-operator.svc.cluster.local
                  1.3.6.1.4.1.2312.17.100.2.1: 
                      .$597ff7b2-a9cb-4af2-ad34-2b7da12df615

      this is a 4.21 regression issue, no issue for 4.20

      Version-Release number of selected component (if applicable):

      4.21

      How reproducible:

      always

      Steps to Reproduce:

      1. check targets status

      Actual results:

      all 4.21 machine-config-daemon targets are down

      Expected results:

      should be up

      Additional info:

          

              team-mco Team MCO
              juzhao@redhat.com Junqi Zhao
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: