FreeBSD Mail Archives

Date:      Tue, 8 May 2007 15:14:29 +0200 (CEST)
From:      Oliver Fromme <olli@lurza.secnetix.de>
To:        freebsd-stable@FreeBSD.ORG, scrappy@FreeBSD.ORG
Subject:   Re: Socket leak (Was: Re: What triggers "No Buffer Space) ?Available"?
Message-ID:  <200705081314.l48DETdC084404@lurza.secnetix.de>
In-Reply-To: <FEFED2059AA65F621ECB6111@ganymede.hub.org>

Marc G. Fournier wrote:
 > Oliver Fromme wrote:
 > > If I remember correctly, you wrote that 11k sockets are
 > > in use with 90 jails.  That's about 120 sockets per jail,
 > > which isn't out of the ordinary.  Of course it depends on
 > > what is running in those jails, but my guess is that you
 > > just need to increase the limit on the number of sockets
 > > (i.e. kern.ipc.maxsockets).
 > 
 > The problem is that if I compare it to another server, running 2/3 as
 > many jails, I'm finding its using 1/4 as many sockets, after over 60
 > days of uptime:
 > 
 > kern.ipc.numopensockets: 3929
 > kern.ipc.maxsockets: 12328

What kind of jails are those?  What applications are
running inside them?  It's quite possible that the
processes on one machine use 120 sockets per jail,
while on a different machine they use only half that
many per jail, on average.  Of course, I can't tell
for sure without knowing what is running in those
jails.

 > But, let's try what I think it was Matt suggested ...

Yes, that was a good suggestion.

 > right now, I'm at just over 11k sockets on that machine, so I'm going
 > to shutdown everything except bare minimum server (all jails shut
 > off) and see where sockets drop to after that ...
 > 
 > I'm down to ~7400 sockets:
 > 
 > kern.ipc.numopensockets: 7400
 > kern.ipc.maxsockets: 12328
 > 
 > ps looks like:
 > 
 > mars# ps aux
 > USER     PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
 > [kernel threads omitted]
 > root       1  0.0  0.0   768   232  ??  ILs  Sat12PM   3:22.01 /sbin/init --
 > root     480  0.0  0.0   528   244  ??  Is   Sat12PM   0:04.32 /sbin/devd
 > root     539  0.0  0.0  1388   848  ??  Ss   Sat12PM   0:07.21 /usr/sbin/syslogd -l /var/run/log -l /var/named/var/run/log -s -s
 > daemon   708  0.0  0.0  1316   748  ??  Ss   Sat12PM   0:02.49 /usr/sbin/rwhod
 > root     749  0.0  0.0  3532  1824  ??  Is   Sat12PM   0:07.60 /usr/sbin/sshd
 > root     768  0.0  0.0  1412   920  ??  Is   Sat12PM   0:02.23 /usr/sbin/cron -s
 > root    2087  0.0  0.0  2132  1360  ??  Ss   Sat01PM   0:04.73 screen -R
 > root   88103  0.0  0.1  6276  2600  ??  Ss   11:41PM   0:00.62 sshd: root@ttyp0 (sshd)
 > root   91218  0.0  0.1  6276  2664  ??  Ss   11:49PM   0:00.24 sshd: root@ttyp4 (sshd)
 > root     813  0.0  0.0  1352   748  v0  Is+  Sat12PM   0:00.00 /usr/libexec/getty Pc ttyv0
 > root   88106  0.0  0.1  5160  2516  p0  Ss   11:41PM   0:00.20 -tcsh (tcsh)
 > root   97563  0.0  0.0  1468   804  p0  R+   12:17AM   0:00.00 ps aux
 > root    2088  0.0  0.1  5352  2368  p2  Is+  Sat01PM   0:00.03 /bin/tcsh
 > root    2112  0.0  0.1  5220  2360  p3  Ss+  Sat01PM   0:00.04 /bin/tcsh
 > root   91221  0.0  0.1  5140  2440  p4  Ss+  11:49PM   0:00.12 -tcsh (tcsh)

I don't think those processes should consume 7400 sockets.
Indeed, this really looks like a leak in the kernel.

 > And netstat -n -funix shows 7355 lines similar to:
 > 
 > d05f1000 stream      0      0        0 d05f1090        0        0
 > d05f1090 stream      0      0        0 d05f1000        0        0
 > cf1be000 stream      0      0        0 cf1bdea0        0        0
 > cf1bdea0 stream      0      0        0 cf1be000        0        0
 > cec42bd0 stream      0      0        0 cf2ac480        0        0
 > cf2ac480 stream      0      0        0 cec42bd0        0        0
 > 
 > with the final few associated with running processes:

How do you determine that?  You _cannot_ tell from netstat
which sockets are associated with running processes.

 > I'm willing to shut everthing down like this again the next time it happens (in 
 > 2-3 days) if someone has some other command / output they'd like fo rme to 
 > provide the output of?

Maybe "sockstat -u" and/or "fstat | grep -w local" (both
of those commands should basically list the same kind of
information).  My guess is that the output will be rather
short, i.e. much shorter than 7355 lines.  If that's true,
it is another indication that the problem is caused by
a kernel leak.

 > And, I have the following outputs as of the above, where everythign is shutdown 
 > and its running on minimal processes:
 > 
 > # ls -lt
 > total 532
 > - -rw-r--r--  1 root  wheel   11142 May  8 00:20 fstat.out
 > - -rw-r--r--  1 root  wheel     742 May  8 00:20 netstat_m.out
 > - -rw-r--r--  1 root  wheel  486047 May  8 00:20 netstat_na.out
 > - -rw-r--r--  1 root  wheel     735 May  8 00:20 sockstat.out
                                   ^^^
Aha.  :-)

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"C++ is the only current language making COBOL look good."
        -- Bertrand Meyer

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200705081314.l48DETdC084404>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation