-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhel-7.9.z
-
No
-
Low
-
rhel-se-cs-infra-services
-
3
-
False
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
-
x86_64
-
None
What were you trying to do that didn't work?
The command "host <some hostname>" frequently returns without a status. Below is the configuration of /etc/resolv.conf.
options timeout:1 attempts:1 rotate nameserver DNSSERVER1 nameserver DNSSERVER2 nameserver DNSSERVER3 nameserver DNSSERVER4
If we collect a coredump file at this time, we will see that the host command is stuck in the epoll_wait() function.
Core was generated by `host'. #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 185 62: movl (%rsp), %edi (gdb) bt #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f8cacb2635b in dispatch (manager=0x7f8cad9bc010, qid=qid@entry=isc_taskqueue_normal) at ../../../lib/isc/task.c:1095 #2 0x00007f8cacb2710b in run_normal (uap=<optimized out>) at ../../../lib/isc/task.c:1325 #3 0x00007f8cab6cbea5 in start_thread (arg=0x7f8ca975d700) at pthread_create.c:307 #4 0x00007f8caa73eb0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) info threads Id Target Id Frame 5 Thread 0x7f8cad9f8840 (LWP 8957) 0x00007f8caa676702 in do_sigsuspend (set=0x7ffc698aaf20) at ../sysdeps/unix/sysv/linux/sigsuspend.c:31 4 Thread 0x7f8ca7f5a700 (LWP 8963) 0x00007f8caa73f0e3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 3 Thread 0x7f8ca875b700 (LWP 8962) pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 2 Thread 0x7f8ca8f5c700 (LWP 8961) pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 * 1 Thread 0x7f8ca975d700 (LWP 8960) pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 (gdb) t 4 [Switching to thread 4 (Thread 0x7f8ca7f5a700 (LWP 8963))] #0 0x00007f8caa73f0e3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x00007f8caa73f0e3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f8cacb3b616 in watcher (uap=0x7f8cad9c0010) at ../../../../lib/isc/unix/socket.c:4258 #2 0x00007f8cab6cbea5 in start_thread (arg=0x7f8ca7f5a700) at pthread_create.c:307 #3 0x00007f8caa73eb0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
We can also collect strace output, which shows the following symptoms. The query is made to 127.0.0.1 and DNSSERVER1 in a row, then 1 second later to DNSSERVER2 and DNSSERVER3:
5427 09:16:17.589619 epoll_wait(5<anon_inode:[eventpoll]>, <unfinished ...> 5424 09:16:17.598288 sendmsg(6<UDP:[127.0.0.1:35502]>, {msg_name={sa_family=AF_INET, sin_port=htons(35502), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=16, msg_iov=[{iov_base="\0", iov_len=1}], msg_iovlen=1, msg_control=[{cmsg_len=17, cmsg_level=SOL_IP, cmsg_type=IP_TOS, cmsg_data=[0xb8]}], msg_controllen=24, msg_flags=0}, 0) = -1 EINVAL (Invalid argument) <0.000015> 5424 09:16:17.605705 recvmsg(20<UDP:[0.0.0.0:42685]>, {msg_namelen=128}, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000014> 5424 09:16:17.606074 sendmsg(20<UDP:[0.0.0.0:42685]>, {msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<IP1>")}, msg_namelen=16, msg_iov=[{iov_base="\20{\1\0\0\1\0\0\0\0\0\0\<hostname>\0\0\1\0\1", iov_len=28}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0 <unfinished ...> 5427 09:16:17.606342 epoll_wait(5<anon_inode:[eventpoll]>, [{EPOLLIN, {u32=20, u64=20}}], 2048, -1) = 1 <0.000790> 5427 09:16:17.607431 epoll_wait(5<anon_inode:[eventpoll]>, <unfinished ...> 5424 09:16:17.607458 recvmsg(20<UDP:[0.0.0.0:42685]>, {msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<IP1>")}, msg_namelen=128->16, msg_iov=[{iov_base="\20{\201\202\0\1\0\0\0\0\0\0\<hostname>\0\0\1\0\1", iov_len=65535}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SCM_TIMESTAMP, cmsg_data={tv_sec=1750986977, tv_usec=607082}}, {cmsg_len=17, cmsg_level=SOL_IP, cmsg_type=IP_TOS, cmsg_data=[0]}], msg_controllen=56, msg_flags=0}, 0) = 28 <0.000024> 5424 09:16:17.610671 recvmsg(21<UDP:[0.0.0.0:57561]>, {msg_namelen=128}, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000014> 5424 09:16:17.611011 sendmsg(21<UDP:[0.0.0.0:57561]>, {msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<IP2>")}, msg_namelen=16, msg_iov=[{iov_base="\20{\1\0\0\1\0\0\0\0\0\0\<hostname>\0\0\1\0\1", iov_len=28}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0 <unfinished ...> 5427 09:16:17.611396 epoll_wait(5<anon_inode:[eventpoll]>, <unfinished ...> 5427 09:16:17.611961 epoll_wait(5<anon_inode:[eventpoll]>, <unfinished ...> 5424 09:16:18.611900 recvmsg(20<UDP:[0.0.0.0:56378]>, {msg_namelen=128}, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000019> 5424 09:16:18.612273 sendmsg(20<UDP:[0.0.0.0:56378]>, {msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<IP3>")}, msg_namelen=16, msg_iov=[{iov_base="\20{\1\0\0\1\0\0\0\0\0\0\<hostname>\0\0\1\0\1", iov_len=28}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0 <unfinished ...> 5427 09:16:18.612808 epoll_wait(5<anon_inode:[eventpoll]>, [{EPOLLIN, {u32=20, u64=20}}], 2048, -1) = 1 <0.000572> 5427 09:16:18.613758 epoll_wait(5<anon_inode:[eventpoll]>, <unfinished ...> 5424 09:16:18.613792 recvmsg(20<UDP:[0.0.0.0:56378]>, {msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<IP3>")}, msg_namelen=128->16, msg_iov=[{iov_base="\20{\201\202\0\1\0\0\0\0\0\0\<hostname>\0\0\1\0\1", iov_len=65535}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SCM_TIMESTAMP, cmsg_data={tv_sec=1750986978, tv_usec=613339}}, {cmsg_len=17, cmsg_level=SOL_IP, cmsg_type=IP_TOS, cmsg_data=[0]}], msg_controllen=56, msg_flags=0}, 0) = 28 <0.000057> 5427 09:16:18.614878 epoll_wait(5<anon_inode:[eventpoll]>, <unfinished ...> <<<-------------- HERE 5427 09:17:55.910254 epoll_wait(5<anon_inode:[eventpoll]>, <unfinished ...>
However, the host command never returns.
Seems the cancellation code is not always working.
What is the impact of this issue to you?
Unable to use the host command.
Please provide the package NVR for which the bug is seen:
bind-utils-9.11.4-26.P2.el7_9.18.x86_64
How reproducible is this bug?:
Very easy.
Steps to reproduce
- RHEL7.9 ELS
Expected results
host command should return.
Actual results
host command is stuck.
- is related to
-
RHEL-10722 "host" command doesn't always return even though some DNS server provided an answer
-
- Closed
-