Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Mar 2013 00:50:37 +0100
From:      Jilles Tjoelker <jilles@stack.nl>
To:        Jan Beich <jbeich@tormail.org>
Cc:        Davide Italiano <davide@FreeBSD.org>, Dimitry Andric <dim@FreeBSD.org>, FreeBSD Current <freebsd-current@freebsd.org>, "Hartmann, O." <ohartman@zedat.fu-berlin.de>
Subject:   Re: r247839: broken pipe - for top, sudo and ports
Message-ID:  <20130310235037.GE94667@stack.nl>
In-Reply-To: <1UEoh3-000Ly7-Mf@internal.tormail.org>
References:  <5135B7E1.3050002@zedat.fu-berlin.de> <5135BBD9.9090009@FreeBSD.org> <51364914.2010104@zedat.fu-berlin.de> <51364E8D.5020608@zedat.fu-berlin.de> <20130306212408.GA15814@stack.nl> <1UDUQ3-0009HU-83@internal.tormail.org> <20130310212256.GD94667@stack.nl> <1UEoh3-000Ly7-Mf@internal.tormail.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Mar 10, 2013 at 08:26:03PM -0200, Jan Beich wrote:
> Jilles Tjoelker <jilles@stack.nl> writes:
> > On Thu, Mar 07, 2013 at 04:54:01AM -0100, Jan Beich wrote:
> >> Jilles Tjoelker <jilles@stack.nl> writes:
> >> > On Tue, Mar 05, 2013 at 08:59:09PM +0100, Hartmann, O. wrote:
> >> >> A "truss top" reveals this, is this of help?

> >> >> [...]
> >> >> stat("/etc/nsswitch.conf",{ mode=-rw-r--r--
> >> >> ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
> >> >> stat("/etc/nsswitch.conf",{ mode=-rw-r--r--
> >> >> ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
> >> >> stat("/etc/nsswitch.conf",{ mode=-rw-r--r--
> >> >> ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
> >> >> stat("/etc/nsswitch.conf",{ mode=-rw-r--r--
> >> >> ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
> >> >> stat("/etc/nsswitch.conf",{ mode=-rw-r--r--
> >> >> ,inode=162310,size=1007,blksize=32768 }) = 0 (0x0)
> >> >> socket(PF_LOCAL,SOCK_STREAM,0)                   = 4 (0x4)
> >> >> connect(4,{ AF_UNIX "/var/run/nscd" },15)        = 0 (0x0)
> >> >> fcntl(4,F_SETFL,O_NONBLOCK)                      = 0 (0x0)
> >> >> kqueue(0x80183b000,0x80122fc58,0x10,0x80062b308,0x80183b010,0x2)
> >> >> = 5 (0x5)
> >> >> kevent(5,{0x4,EVFILT_WRITE,EV_ADD,0,0x0,0x0},1,0x0,0,0x0) = 0 (0x0)
> >> >> kqueue(0x5,0x7fffffffd2e0,0x1,0x0,0x0,0x0)       = 6 (0x6)
> >> >> kevent(6,{0x4,EVFILT_READ,EV_ADD,0,0x0,0x0},1,0x0,0,0x0) = 0 (0x0)
> >> >> kevent(5,{0x4,EVFILT_WRITE,EV_ADD,1,0x4,0x0},1,0x0,0,0x0) = 0 (0x0)
> >> >> kevent(5,0x0,0,{0x4,EVFILT_WRITE,EV_EOF,0,0x2000,0x0},1,0x0) = 1 (0x1)
> >> >> sendmsg(0x4,0x7fffffffd290,0x0,0x1,0x1,0x0)      ERR#32 'Broken pipe'
> >> >> SIGNAL 13 (SIGPIPE)
> >> >> process exit, rval = 0

> >> > Apparently there is a bug that causes nscd to close the connection
> >> > immediately but even then it is wrong that this terminates the calling
> >> > program with SIGPIPE.

> >> > The below patch prevents the SIGPIPE but cannot revive the connection to
> >> > nscd. This may cause numeric UIDs in top or increase the load on the
> >> > directory server. It is compile tested only.
> >> [...]

> >> The patch seems to fix the issue in a world after r247804. I don't see
> >> numeric UIDs in top but without the patch top crashes with SIGPIPE a lot
> >> less frequently than sudo or make install (in base/ports) for me.

> >> In my case shutting down nscd helped, too. Compared to stock
> >> nsswitch.conf I only have "cache" added.

> > Can you find what causes nscd to close the connection quickly, such as
> > using ktrace?

> # single user mode
> $ ktrace -p $(pgrep nscd); top -b; ktrace -c; kdump
>     71 nscd     GIO   fd 5 wrote 0 bytes
>        ""
>     71 nscd     GIO   fd 5 read 32 bytes
>        0x0000 0400 0000 0000 0000 ffff 1000 0000 0000 0100  |..................|
>        0x0012 0000 0000 0000 0000 0000 0000 0000            |..............|
> 
>     71 nscd     RET   kevent 1
>     71 nscd     CALL  accept(0x4,0,0)
>     71 nscd     RET   accept 6

We are in usr.sbin/nscd/nscd.c accept_connection() here.

>     71 nscd     CALL  getsockopt(0x6,0,0x1,0x7fffff9fce28,0x7fffff9fce24)
>     71 nscd     RET   getsockopt 0

Probably getpeereid().

On another note, nscd leaks the file descriptor if this, the below
init_query_state() or the below kevent() fails.

>     71 nscd     CALL  kevent(0x5,0x7fffff9fcf00,0x2,0,0,0x7fffff9fcf50)
>     71 nscd     GIO   fd 5 wrote 64 bytes
>        0x0000 0600 0000 0000 0000 f9ff 1100 0000 0000 401f  |................@.|
>        0x0012 0000 401f 0000 40e6 4002 0800 0000 0600 0000  |..@...@.@.........|
>        0x0024 0000 0000 ffff 1100 0100 0000 0400 0000 0000  |..................|
>        0x0036 0000 40e6 4002 0800 0000                      |..@.@.....|
> 

Adding an EVFILT_TIMER and an EVFILT_READ.

The data field for the EVFILT_TIMER is a bit strange. I would expect
0x0000000000001f40 (8000 decimal) but it puts instead
0x00001f4000001f40. This does not happen when I run
tools/regression/kqueue/kqtest on a stable/9 amd64 machine or on
ref10-amd64 which currently runs r247722.

On a head (r248047) i386 machine, the data field looks right.

>     71 nscd     GIO   fd 5 read 0 bytes
>        ""
>     71 nscd     RET   kevent 0
>     71 nscd     CALL  kevent(0x5,0x7fffff9fcec0,0x1,0,0,0x7fffff9fcee0)
>     71 nscd     GIO   fd 5 wrote 32 bytes
>        0x0000 0400 0000 0000 0000 ffff 1100 0000 0000 0000  |..................|
>        0x0012 0000 0000 0000 0000 0000 0000 0000            |..............|
> 

Probably registering interest for the next connection.

>     71 nscd     GIO   fd 5 read 0 bytes
>        ""
>     71 nscd     RET   kevent 0
>     71 nscd     CALL  kevent(0x5,0,0,0x7fffff9fcec0,0x1,0)
>     71 nscd     GIO   fd 5 wrote 0 bytes
>        ""
>     71 nscd     GIO   fd 5 read 32 bytes
>        0x0000 0600 0000 0000 0000 f9ff 3000 0000 0000 0100  |..........0.......|
>        0x0012 0000 0000 0000 40e6 4002 0800 0000            |......@.@.....|
> 

The timer has already expired. This cannot be right. (It cannot be that
EVFILT_READ is broken and eight seconds actually passed because the send
calls would have worked in that case.)

tools/regression/kqueue/kqtest works correctly on the aforementioned
r248047 head i386 machine.

>     71 nscd     RET   kevent 1
>     71 nscd     CALL  close(0x6)
>     71 nscd     RET   close 0
>     71 nscd     CALL  kevent(0x5,0,0,0x7fffff9fcec0,0x1,0)

-- 
Jilles Tjoelker



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130310235037.GE94667>