Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Jun 2014 16:45:04 +0200
From:      "Ronald Klop" <ronald-lists@klop.ws>
To:        freebsd-java@freebsd.org
Subject:   Re: JVM BUG(s) - Hadoop's threads hanging
Message-ID:  <op.xhjxxemtkndu52@ronaldradial.radialsg.local>
In-Reply-To: <E14F86A5-C7FE-49B5-8A11-F5237C557AE2@gmail.com>
References:  <E14F86A5-C7FE-49B5-8A11-F5237C557AE2@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 16 Jun 2014 14:44:10 +0200, Dmitry Sivachenko  
<trtrmitya@gmail.com> wrote:

> Hello!
>
> I set up hadoop-1.2.1 (from ports) on FreeBSD-10/stable with openjdk  
> version 1.7.0_60.
>
> On the first glance it is doing well except one annoying thing:  after  
> executing some tasks, tasktracker process starts to eat CPU when idle.
> Sometimes it is 10-20% (numbers from top(1) output), sometimes it is  
> 100-150%.
>
> In tasktrackers's log I see numerious records like this:
>
> 2014-06-09 13:08:29,858 INFO org.mortbay.log:  
> org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) -  
> injecting delay59 times
> 2014-06-09 13:08:29,859 INFO org.mortbay.log:  
> org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) -  
> recreating selector 59 times, canceled keys 944 times
> 2014-06-09 13:09:29,862 INFO org.mortbay.log:  
> org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) -  
> injecting delay58 times
> 2014-06-09 13:09:29,862 INFO org.mortbay.log:  
> org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) -  
> recreating selector 58 times, canceled keys 928 times
> 2014-06-09 13:10:29,901 INFO org.mortbay.log:  
> org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) -  
> injecting delay58 times
> 2014-06-09 13:10:29,901 INFO org.mortbay.log:  
> org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) -  
> recreating selector 58 times, canceled keys 928 times
> <...>
>
>
> The more jobs I run, more java threads start to consume CPU after all  
> tasks finished.  After several job execution, top(1) output looks like  
> this (splitted by thread, the same PID):
>
>  PID USERNAME     PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
> 79045 hadoop        47    0  1948M   867M uwait   2  20:49  37.50%  
> java{java}
> 79045 hadoop        31    0  1948M   867M uwait  31   1:45  19.29%  
> java{java}
> 79045 hadoop        33    0  1948M   867M uwait  21   2:51  19.19%  
> java{java}
> 79045 hadoop        30    0  1948M   867M uwait  17   2:51  18.65%  
> java{java}
> 79045 hadoop        30    0  1948M   867M uwait  11   1:52  18.36%  
> java{java}
> 79045 hadoop        30    0  1948M   867M uwait  22   1:45  18.36%  
> java{java}
> 79045 hadoop        31    0  1948M   867M uwait  29   2:50  18.26%  
> java{java}
> 79045 hadoop        31    0  1948M   867M uwait   6   1:57  18.16%  
> java{java}
> 79045 hadoop        31    0  1948M   867M uwait  13   4:55  17.97%  
> java{java}
> 79045 hadoop        31    0  1948M   867M uwait  26   3:39  17.77%  
> java{java}
> 79045 hadoop        33    0  1948M   867M uwait   8   1:21  17.48%  
> java{java}
> 79045 hadoop        30    0  1948M   867M uwait   1   3:32  16.70%  
> java{java}
> 79045 hadoop        32    0  1948M   867M uwait  24   3:12  16.70%  
> java{java}
> 79045 hadoop        26    0  1948M   867M uwait   4   1:27  10.35%  
> java{java}
> 72417 root          20    0 19828K  3252K CPU21  21   0:00   0.29% top
>  836 root          20    0 36104K  1952K select 14   6:51   0.00% snmpd
> 79045 hadoop        20    0  1948M   867M uwait  20   6:51   0.00%  
> java{java}
> 79045 hadoop        20    0  1948M   867M uwait  27   3:45   0.00%  
> java{java}
> 79045 hadoop        20    0  1948M   867M uwait  30   2:37   0.00%  
> java{java}
> 79045 hadoop        20    0  1948M   867M uwait  15   0:54   0.00%  
> java{java}
> 79045 hadoop        20    0  1948M   867M uwait   2   0:48   0.00%  
> java{java}
> 79045 hadoop        20    0  1948M   867M uwait  14   0:48   0.00%  
> java{java}
> 79045 hadoop        20    0  1948M   867M uwait   2   0:48   0.00%  
> java{java}
> <....>
>
> This is on absolutely idle cluster, no single task is running.
> I am attaching truss(1) output for that java process.
>
> I asked that on user@hadoop.apache.org without any luck.
>
> Any clues?  (I can provide any additional information if needed)

Hi,

 From your information it is hard to say something about it. The bug can be  
in FreeBSD, OpenJDK (the Oracle part or in the BSD port part), in Hadoop  
or in your own code running on top of Hadoop.

My first idea would be to eliminate some of the possibilities.
- Run a Linux machine with the same versions of the software.
- Try FreeBSD 9-stable.
- Try an older version of OpenJDK on FreeBSD.
- Try a very simple 'Hello world' style application on Hadoop which mimics  
the thread usage.

Did you ever run your Hadoop application on FreeBSD before without this  
symptom? If so, what are the differences between then and now?

Another possibly valuable piece of information is what those threads are  
doing. Using jstack you can very easily get a dump of all stacks of all  
threads in Java.

Regards,
Ronald.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?op.xhjxxemtkndu52>