Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Sep 2013 17:07:10 +0200 (CEST)
From:      Jimmy Olgeni <olgeni@olgeni.com>
To:        freebsd-stable@freebsd.org
Subject:   Possible kqueue related issue on STABLE/RC.
Message-ID:  <alpine.BSF.2.00.1309111705460.89324@olgeni.olgeni>

next in thread | raw e-mail | index | archive | help

Hello,

Perhaps I found something weird while running 9.2-RC3 FreeBSD
9.2-RC3 #0 r255393 (ZFS-only setup).

Quick history of the problem:

- Lately, using a very recent -STABLE, the host would hang randomly while
   building ports with poudriere (-J2) and using X11, without producing a
   core dump (solid deadlock, apparently). It works perfectly when using the
   console only, and it can run a large build overnight without hanging.
   Being on X11 I could not find out what was happening on the console;
   desktop PC does not have a proper serial port so there's not much I can
   see. In any case it does not reboot automatically.

- To rule out recent -STABLE changes I moved to 9.2-RC3 using SVN, but the
   system kept hanging on the same conditions.

- I also enabled DDB to get a minidump, but still I could only get solid
   locks.

- I downgraded the nvidia-driver port, just in case it has something to do
   with the crashes, but the crashes continued.

- I downgraded to a known-safe -STABLE of July, then June, but the host
   would still crash. The very weird thing is that I have been always
   building stuff while using X11, and it never hanged. After downgrading
   both the OS and nvidia-driver I effectively got back a configuration that
   did not hang at the time, but the issue persisted.

- However, this time I managed to get a minidump from the old -STABLE. I
   saved it here:

     http://olgeni.olgeni.com/~olgeni/core.txt.0

- After seeing the reference to kqueue, I remembered another thing that
   changed when the crashes started: gio-fam-backend went away, and glib20
   uses kqueue (r324037).

- I tried the same workload while using X11 with openbox only, and it
   worked fine.

- Then, I came back to Gnome but made sure that anything related to gvfsd
   was periodically killed by a script, and the system returned to normal
   (i.e. flawless builds).

- I remember that the gamin implementation uses to open and poll a lot of
   files, even files that were not used by the X11 environment or Nautilus
   specifically, and the gamin daemon could steal a good 5% of CPU for
   polling; restarting it brought it to 0%.

- Not sure if it is related in any way, but running a standard "buildworld"
   does not crash the host. The only difference that I could think of is
   that poudriere uses jails.

Unfortunately I'm not able to get a minidump for the latest RC, but at this
point I suspect that something is going on with glib20 and kqueue on both
-STABLE and -RC.

If anybody has any idea I can test it easily, as it usually takes only a
few minutes to hang everything.

--
jimmy



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1309111705460.89324>