From owner-freebsd-hackers@FreeBSD.ORG  Sun Dec  9 01:05:35 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 47E3BB87
 for <freebsd-hackers@freebsd.org>; Sun,  9 Dec 2012 01:05:35 +0000 (UTC)
 (envelope-from rsharpe@richardsharpe.com)
Received: from zmail.servaris.com (zmail.servaris.com [107.6.51.160])
 by mx1.freebsd.org (Postfix) with ESMTP id ECB768FC1C
 for <freebsd-hackers@freebsd.org>; Sun,  9 Dec 2012 01:05:34 +0000 (UTC)
Received: (qmail 70620 invoked by uid 89); 9 Dec 2012 01:05:33 -0000
Received: from unknown (HELO ?192.168.2.23?)
 (rsharpe@richardsharpe.com@108.225.16.199)
 by mail.richardsharpe.com with ESMTPA; 9 Dec 2012 01:05:33 -0000
Subject: Re: Possible obscure socket leak when system under load and
 listener is slow to accept
From: Richard Sharpe <rsharpe@richardsharpe.com>
To: Andre Oppermann <andre@freebsd.org>
In-Reply-To: <50C3D22D.3060008@freebsd.org>
References: <50C3D22D.3060008@freebsd.org>
Content-Type: text/plain; charset="UTF-8"
Date: Sat, 08 Dec 2012 17:05:31 -0800
Message-ID: <1355015131.6752.12.camel@localhost.localdomain>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.3 (2.32.3-1.fc14) 
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 09 Dec 2012 01:05:35 -0000

On Sun, 2012-12-09 at 00:50 +0100, Andre Oppermann wrote:
> > Hi folks,
> >
> > Our QA group (at xxx) using Samba and smbtorture has been seeing a
> > lot of cases where accept returns ECONNABORTED because the system load
> > is high and Samba has a large listen backlog.
> >
> > Every now and then we get a crash in smbd or in winbindd and winbindd
> > complains of too many open files in the system.
> >
> > In looking at kern_accept, it seems to me that FreeBSD can leak a socket
> > when kern_accept calls soaccept on it but gets ECONNABORTED. This error
> > is the only error returned from tcp_usr_accept.
> >
> > It seems like the socket taken off so_comp is never freed in this case
> > and that there has been a call on soref on it as well, so that something
> > like the following is needed in the error path:
> >
> > ==== //some-path/freebsd/sys/kern/uipc_syscalls.c#1
> > - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c ====
> > @@ -433,6 +433,14 @@
> >                  */
> >                 if (name)
> >                         *namelen = 0;
> > +               /*
> > +                * We need to close the socket we unlinked
> > +                * so we do not leak it.
> > +                */
> > +               ACCEPT_LOCK();
> > +               SOCK_LOCK(so);
> > +               soclose(so);
> >                 goto noconnection;
> >         }
> >         if (sa == NULL) {
> >
> > I think an soclose is needed at this point because soisconnected has
> > been called on the socket.
> >
> > Do you think this analysis is reasonable?
>  >
> > We are using FreeBSD 8.0 but it seems the same is true for 9.0. However,
> > maybe I am wrong since I am not sure if the fdclose call would free the
> > socket, but a quick look suggested that it doesn't.
> 
> The fdclose should properly tear down the file descriptor.  The call
> graph is: fdclose() -> fdrop() -> _fdrop() -> fo_close()/soo_close() ->
> soclose() -> sorele() -> sofree() -> sodealloc().
> 
> A socket leak would not count against "kern.maxfiles" unless the file
> descriptor leaks as well.  So it is unlikely that this is the problem.

OK, thanks for the feedback. I will keep looking.

> Samba may open a large number of files (real files and sockets) and
> you may run into the maxfiles limit.  You can check the limit with
> "sysctl kern.maxfiles" and increase it at boot time in boot/loader.conf
> with "kern.maxfiles=100000" for example.

Well, some of the smbds are dying, but it is possible that there is a
file leak in Samba or our VFS that we are tripping as well.