-
Bug
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
Description of problem:
vllm doesn't do it's own hostname gathering on this input (comes from CLI), the hostname itself is a kubernetes service. So zmq not doing a DNS lookup breaks what is expected to work upstream
Version numbers (base image, wheels, builder, etc):
quay.io/aipcc/rhaiis/cuda-ubi9:3.2.2-175769903
pyzmq
Issue observed when libzmq is 4.3.4, not observed in upstream images when libzmq is 4.3.5.
Steps to Reproduce:
import zmq import socket as sock import platformprint("=== Environment Info ===") print("ZMQ version:", zmq.zmq_version()) print("PyZMQ version:", zmq.pyzmq_version()) print("Platform:", platform.platform()) print("Python version:", platform.python_version())# Check hostname resolution try: localhost_ip = sock.gethostbyname("localhost") print("localhost resolves to:", localhost_ip) except Exception as e: print("localhost resolution failed:", str(e))print("\\n=== ZMQ Binding Tests ===") ctx = zmq.Context()# Test localhost binding socket = ctx.socket(zmq.REP) try: socket.bind("tcp://localhost:0") port = socket.getsockopt(zmq.LAST_ENDPOINT).decode() print("✅ localhost bind succeeded on", port) localhost_success = True except Exception as e: print("❌ localhost bind failed -", str(e)) localhost_success = False socket.close()# Test IP binding socket2 = ctx.socket(zmq.REP) try: socket2.bind("tcp://127.0.0.1:0") port2 = socket2.getsockopt(zmq.LAST_ENDPOINT).decode() print("✅ IP bind succeeded on", port2) except Exception as e: print("❌ IP bind failed -", str(e)) socket2.close()# Test wildcard binding socket3 = ctx.socket(zmq.REP) try: socket3.bind("tcp://*:0") port3 = socket3.getsockopt(zmq.LAST_ENDPOINT).decode() print("✅ Wildcard bind succeeded on", port3) except Exception as e: print("❌ Wildcard bind failed -", str(e)) socket3.close()ctx.term() ''' cmd = ['podman', 'run', '--rm', '--entrypoint=', image, 'python3', '-c', test_script] try: result = subprocess.run(cmd, capture_output=True, text=True, timeout=60) print(result.stdout) if result.stderr: print("STDERR:", result.stderr) except subprocess.TimeoutExpired: print("❌ Container test timed out") except Exception as e: print(f"❌ Error running container: {e}") print()def main(): print("Testing ZMQ hostname binding across images...") print() images = [ ("AIPCC Image (Expected to FAIL on localhost)", "quay.io/aipcc/rhaiis/cuda-ubi9:3.2.2-1757699034", True), ("WSEATON Image (Expected to SUCCEED on localhost)", "quay.io/wseaton/vllm:llmdnixlfix-01", False), ("LLM-D Image (Expected to SUCCEED on localhost)", "ghcr.io/llm-d/llm-d-dev:sha-b3f0b0d", False) ] for name, image, should_fail in images: test_zmq_in_container(name, image, should_fail)if __name__ == "__main__": main()
Actual results:
Fails on the AIPCC image
Expected results:
localhost bind succeeds
Additional info:
- blocks
-
AIPCC-3181 Support for llm-d
-
- Closed
-
- links to
-
RHBA-2025:154563 Update ZeroMQ to 4.3.5
- mentioned on