Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-5208

[llm-d] libzmq shipped in RHAIIS doesn't do DNS lookup on bind

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Description of problem:

          vllm doesn't do it's own hostname gathering on this input (comes from CLI), the hostname itself is a kubernetes service. So zmq not doing a DNS lookup breaks what is expected to work upstream

      Version numbers (base image, wheels, builder, etc):

         quay.io/aipcc/rhaiis/cuda-ubi9:3.2.2-175769903 

      pyzmq

       

      Issue observed when libzmq is 4.3.4, not observed in upstream images when libzmq is 4.3.5.

       

      Steps to Reproduce:

      import zmq
      import socket as sock
      import platformprint("=== Environment Info ===")
      print("ZMQ version:", zmq.zmq_version())
      print("PyZMQ version:", zmq.pyzmq_version())
      print("Platform:", platform.platform())
      print("Python version:", platform.python_version())# Check hostname resolution
      try:
          localhost_ip = sock.gethostbyname("localhost")
          print("localhost resolves to:", localhost_ip)
      except Exception as e:
          print("localhost resolution failed:", str(e))print("\\n=== ZMQ Binding Tests ===")
      ctx = zmq.Context()# Test localhost binding
      socket = ctx.socket(zmq.REP)
      try:
          socket.bind("tcp://localhost:0")
          port = socket.getsockopt(zmq.LAST_ENDPOINT).decode()
          print("✅ localhost bind succeeded on", port)
          localhost_success = True
      except Exception as e:
          print("❌ localhost bind failed -", str(e))
          localhost_success = False
      socket.close()# Test IP binding
      socket2 = ctx.socket(zmq.REP)
      try:
          socket2.bind("tcp://127.0.0.1:0")
          port2 = socket2.getsockopt(zmq.LAST_ENDPOINT).decode()
          print("✅ IP bind succeeded on", port2)
      except Exception as e:
          print("❌ IP bind failed -", str(e))
      socket2.close()# Test wildcard binding
      socket3 = ctx.socket(zmq.REP)
      try:
          socket3.bind("tcp://*:0")
          port3 = socket3.getsockopt(zmq.LAST_ENDPOINT).decode()
          print("✅ Wildcard bind succeeded on", port3)
      except Exception as e:
          print("❌ Wildcard bind failed -", str(e))
      socket3.close()ctx.term()
      '''    cmd = ['podman', 'run', '--rm', '--entrypoint=', image, 'python3', '-c', test_script]    try:
              result = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
              print(result.stdout)
              if result.stderr:
                  print("STDERR:", result.stderr)
          except subprocess.TimeoutExpired:
              print("❌ Container test timed out")
          except Exception as e:
              print(f"❌ Error running container: {e}")    print()def main():
          print("Testing ZMQ hostname binding across images...")
          print()    images = [
              ("AIPCC Image (Expected to FAIL on localhost)", "quay.io/aipcc/rhaiis/cuda-ubi9:3.2.2-1757699034", True),
              ("WSEATON Image (Expected to SUCCEED on localhost)", "quay.io/wseaton/vllm:llmdnixlfix-01", False),
              ("LLM-D Image (Expected to SUCCEED on localhost)", "ghcr.io/llm-d/llm-d-dev:sha-b3f0b0d", False)
          ]    for name, image, should_fail in images:
              test_zmq_in_container(name, image, should_fail)if __name__ == "__main__":
          main()

      Actual results:

       Fails on the AIPCC image

      Expected results:

      localhost bind succeeds

      Additional info:

              cheimes@redhat.com Christian Heimes
              rhn-support-weaton Will Eaton
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: