Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57456

TNF Podman-etcd should leave podman container for history logs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • 4.19, 4.20
    • Two Node Fencing
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      When intentially crashing my nodes to simulate an etcd failure, the data directory ended up being corrupted mid write. In a user scenario, the right way to handle this would be to restore etcd from a backup. But when I went to podman to gather data about why etcd wasn't running, I discovered that the podman was missing.
      

      I pulled this from pacemaker:

      Jun 13 18:10:45  podman-etcd(etcd)[151667]:    ERROR: Newly created podman container exited after start
      Jun 13 18:10:45  podman-etcd(etcd)[151667]:    INFO: {"level":"info","ts":"2025-06-13T18:10:43.488492Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_STATE","variable-value":"new"} {"level":"info","ts":"2025-06-13T18:10:43.488503Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_NAME","variable-value":"master-0"} {"level":"info","ts":"2025-06-13T18:10:43.488515Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_QUOTA_BACKEND_BYTES","variable-value":"8589934592"} {"level":"info","ts":"2025-06-13T18:10:43.488521Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_SOCKET_REUSE_ADDRESS","variable-value":"true"} {"level":"warn","ts":"2025-06-13T18:10:43.488575Z","caller":"embed/config.go:694","msg":"Running http and grpc server on single port. This is not recommended for production."} {"level":"info","ts":"2025-06-13T18:10:43.488597Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["/usr/bin/etcd","--logger=zap","--log-level=info","--experimental-initial-corrupt-check=true","--snapshot-count=10000","--initial-advertise-peer-urls=https://192.168.111.20:2380","--cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-master-0.crt","--key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-master-0.key","--trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-all-bundles/server-ca-bundle.crt","--client-cert-auth=true","--peer-cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-master-0.crt","--peer-key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-master-0.key","--peer-trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-all-bundles/server-ca-bundle.crt","--peer-client-cert-auth=true","--advertise-client-urls=https://192.168.111.20:2379","--listen-client-urls=https://0.0.0.0:2379,unixs://192.168.111.20:0","--listen-peer-urls=https://0.0.0.0:2380","--metrics=extensive","--listen-metrics-urls=https://0.0.0.0:9978","--force-new-cluster"]} {"level":"warn","ts":"2025-06-13T18:10:43.488669Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"revision.json","data-dir":"/var/lib/etcd"} {"level":"info","ts":"2025-06-13T18:10:43.488683Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/lib/etcd","dir-type":"member"} {"level":"warn","ts":"2025-06-13T18:10:43.488698Z","caller":"embed/config.go:694","msg":"Running http and grpc server on single port. This is not recommended for production."} {"level":"info","ts":"2025-06-13T18:10:43.488704Z","caller":"embed/etcd.go:134","msg":"configuring socket options","reuse-address":true,"reuse-port":false} {"level":"info","ts":"2025-06-13T18:10:43.488719Z","caller":"embed/etcd.go:140","msg":"configuring peer listeners","listen-peer-urls":["https://0.0.0.0:2380"]} {"level":"info","ts":"2025-06-13T18:10:43.488748Z","caller":"embed/etcd.go:531","msg":"starting with peer TLS","tls-info":"cert = /etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-master-0.crt, key = /etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-master-0.key, client-cert=, client-key=, trusted-ca = /etc/kubernetes/static-pod-certs/configmaps/etcd-all-bundles/server-ca-bundle.crt, client-cert-auth = true, crl-file = ","cipher-suites":["TLS_AES_128_GCM_SHA256","TLS_AES_256_GCM_SHA384","TLS_CHACHA20_POLY1305_SHA256","TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256","TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256","TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256","TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256"]} {"level":"info","ts":"2025-06-13T18:10:43.489422Z","caller":"embed/etcd.go:148","msg":"configuring client listeners","listen-client-urls":["https://0.0.0.0:2379","unixs://192.168.111.20:0"]} {"level":"info","ts":"2025-06-13T18:10:43.489470Z","caller":"embed/etcd.go:657","msg":"pprof is enabled","path":"/debug/pprof"} {"level":"info","ts":"2025-06-13T18:10:43.489693Z","caller":"embed/etcd.go:325","msg":"starting an etcd server","etcd-version":"3.5.21","git-sha":"c4ca73a4","go-version":"go1.24.3 (Red Hat 1.24.3-3.el9) X:strictfipsruntime","go-os":"linux","go-arch":"amd64","max-cpu-set":8,"max-cpu-available":8,"member-initialized":true,"name":"master-0","data-dir":"/var/lib/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/etcd/member","force-new-cluster":true,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://192.168.111.20:2380"],"listen-peer-urls":["https://0.0.0.0:2380"],"advertise-client-urls":["https://192.168.111.20:2379"],"listen-client-urls":["https://0.0.0.0:2379","unixs://192.168.111.20:0"],"listen-metrics-urls":["https://0.0.0.0:9978"],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":8589934592,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":true,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s","max-learners":2} {"level":"warn","ts":"2025-06-13T18:10:43.489753Z","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"/var/lib/etcd\" exist, but the permission is \"drwxr-xr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"} {"level":"info","ts":"2025-06-13T18:10:43.510853Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/etcd/member/snap/db","took":"20.753345ms"} {"level":"info","ts":"2025-06-13T18:10:43.816037Z","caller":"etcdserver/server.go:516","msg":"recovered v2 store from snapshot","snapshot-index":30096,"snapshot-size":"11 kB"} {"level":"info","ts":"2025-06-13T18:10:43.816108Z","caller":"etcdserver/server.go:529","msg":"recovered v3 backend from snapshot","backend-size-bytes":51957760,"backend-size":"52 MB","backend-size-in-use-bytes":51929088,"backend-size-in-use":"52 MB"} {"level":"fatal","ts":"2025-06-13T18:10:44.020098Z","caller":"etcdserver/storage.go:105","msg":"failed to read WAL, cannot be repaired","error":"wal: conflicting metadata found","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.readWAL\n\tgo.etcd.io/etcd/server/v3/etcdserver/storage.go:105\ngo.etcd.io/etcd/server/v3/etcdserver.restartAsStandaloneNode\n\tgo.etcd.io/etcd/server/v3/etcdserver/raft.go:582\ngo.etcd.io/etcd/server/v3/etcdserver.NewServer\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:543\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:262\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:228\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:123\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:283"} 
      Jun 13 18:10:45.404 master-0 pacemaker-execd     [19426] (log_op_output) 	info: etcd_start_0[151667] error output [ ocf-exit-reason:Newly created podman container exited after start ]
      
          Version-Release number of selected component (if applicable):{code:none}
      
          

      How reproducible:

      Intentionally corrupt the data directory and use pcs to refresh the etcd resource.
          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

      `$ podman ps -a` is empty 
          

      Expected results:

      Podman shows logs for the failed container.
          

      Additional info:

      
          

              rh-ee-clobrano Carlo Lobrano
              jpoulin Jeremy Poulin
              None
              None
              Douglas Hensel Douglas Hensel
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: