Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-3411

JDK hangs in process.forkAndExec at native readFully

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • None
    • Important
    • rhel-sst-java
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None
    • 57,005

      Description of problem:

      For some time now we observe a JDK hang, when running our Eclipse-based product. Mostly we see the hang during tests, though we have observed it during normal product execution:

      "Worker-23: Preparing SmarTest Workspace" #86 prio=5 os_prio=0 tid=0x00007fff88426800 nid=0x2311 runnable [0x00007fff812e8000]
      java.lang.Thread.State: RUNNABLE
      at java.lang.UNIXProcess.forkAndExec(Native Method)
      at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
      at java.lang.ProcessImpl.start(ProcessImpl.java:134)
      at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
      at java.lang.Runtime.exec(Runtime.java:620)
      at java.lang.Runtime.exec(Runtime.java:528)
      at org.eclipse.debug.core.DebugPlugin.exec(DebugPlugin.java:943)
      at org.eclipse.debug.core.DebugPlugin.exec(DebugPlugin.java:893)
      at org.eclipse.jdt.internal.launching.StandardVMType.generateLibraryInfo(StandardVMType.java:712)

      Thread 23 (Thread 0x7fff812ea700 (LWP 8977)):
      #0 0x00007ffff7bcd70d in read () at ../sysdeps/unix/syscall-template.S:81
      #1 0x00007ffff4e9689e in readFully (__nbytes=4, __buf=0x7fff812e8ca0, __fd=173) at /usr/include/bits/unistd.h:44
      #2 0x00007ffff4e9689e in readFully (fd=173, buf=0x7fff812e8ca0, nbyte=4) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/jdk/src/solaris/native/java/lang/childproc.c:145
      #3 0x00007ffff4e93ffa in Java_java_lang_UNIXProcess_forkAndExec (env=0x7fff88426a58, process=<optimized out>, mode=<optimized out>, helperpath=0x7fff812e8e40, prog=0x7fff812e8e38, argBlock=0x7fff812e8e30, argc=4, envBlock=0x0, envc=0, dir=0x0, std_fds=0x7fff812e8e08, redirectErrorStream=0 '\000') at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/jdk/src/solaris/native/java/lang/UNIXProcess_md.c:665
      #4 0x00007fffe0018407 in ()
      #5 0x00007fff00000004 in ()
      #6 0x0000000000000000 in ()

      Version-Release number of selected component (if applicable):

      We are on RHEL 7.4, and have observed the problem with JDK 11.0.5, JDK 11.0.7 and JDK 1.8.0_181.

      How reproducible:

      We cannot reproduce the problem reliably, or outside of our product.

      Actual results:

      The JDK hangs while executing different commands with Runtime.exec().

      Expected results:

      The JDK should not hang.

      Additional info:

      We observe that the problem is seen far less frequently (nearly never) if we disable Eclipse tips dialog on start-up. Additionally, we had a status line contribution that seemed to increase the frequency of the hang; the contribution was WebKit2 browser based, same as the Eclipse tips dialog.

      The code at which the JDK was hanging was added here:

      https://bugs.openjdk.java.net/browse/JDK-6671051?focusedCommentId=12150614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12150614

      We have tried the following patch:

      — ./BUILD/java-11-openjdk-11.0.7.10-5.el7.x86_64/openjdk/src/java.base/unix/native/libjava/childproc.c 2020-03-31 22:46:52.000000000 +0200
      +++ ./BUILD/java-11-openjdk-11.0.7.10-5.el7.x86_64/openjdk/src/java.base/unix/native/libjava/childproc.c 2020-07-15 12:26:40.950489000 +0200
      @@ -143,6 +143,7 @@
      ssize_t
      readFully(int fd, void *buf, size_t nbyte)
      {
      + int counter = 0;
      ssize_t remaining = nbyte;
      for (;

      { ssize_t n = read(fd, buf, remaining); @@ -156,6 +157,10 @@ * Unlikely, but possible. */ buf = (void *) (((char *)buf) + n); }

      else if (errno == EINTR) {
      + ++counter;
      + if (counter > 10000)

      { + return -1; + }

      /* Strange signals like SIGJVM1 are possible at any time.

      With this change, we observe a different problem:

      java.lang.IllegalStateException: Process not finished.
      Command line arguments: /usr/lib/jvm/java-1.8.0/bin/java -Xmx16m -classpath .../eclipseForTest/configuration/org.eclipse.osgi/557/0/.cp/lib/launchingsupport.jar org.eclipse.jdt.internal.launching.support.LibraryDetector
      Output: Standard output:
      1.8.0_181|/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/resources.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/rt.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/jsse.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/jce.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/charsets.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/jfr.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/classes|/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/ext:/usr/java/packages/lib/ext|/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/endorsed
      Standard error:

      at org.eclipse.jdt.internal.launching.StandardVMType.checkProcessResult(StandardVMType.java:908)
      at org.eclipse.jdt.internal.launching.StandardVMType.generateLibraryInfo(StandardVMType.java:728)
      at org.eclipse.jdt.internal.launching.StandardVMType.getLibraryInfo(StandardVMType.java:240)
      at org.eclipse.jdt.internal.launching.StandardVMType.getDefaultLibraryLocations(StandardVMType.java:457)
      at org.eclipse.jdt.internal.launching.StandardVMType.canDetectDefaultSystemLibraries(StandardVMType.java:267)
      at org.eclipse.jdt.internal.launching.StandardVMType.validateInstallLocation(StandardVMType.java:661)
      at org.eclipse.jdt.internal.launching.VMDefinitionsContainer.addVM(VMDefinitionsContainer.java:139)
      at com.verigy.itee.ui.internal.JreUpdater$1.run(JreUpdater.java:283)
      at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)

      We are not sure whether there is a blocking read() call, or whether there is an endless read(), interrupt, read(), interrupt... loop.

      See: https://github.com/openjdk/jdk11u-dev/blame/a09254f3d9536b9062aa9b797847667092479f18/src/java.base/unix/native/libjava/childproc.c#L144

      When looping our best-effort reproducer, we observe the issue in anywhere between 2 hours and 2 days.

              rhn-engineering-aph Andrew Haley
              jira-bugzilla-migration RH Bugzilla Integration
              Andrew Haley Andrew Haley
              David Kutalek David Kutalek
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: