Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 3 Aug 2006 10:43:21 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Helge Oldach <freebsd-cvs-src@oldach.net>
Cc:        scottl@samsco.org, src-committers@FreeBSD.org, Hajimu UMEMOTO <ume@FreeBSD.org>, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org, bz@FreeBSD.org, kensmith@cse.Buffalo.EDU
Subject:   Re: cvs commit: src/sys/sys param.h src/include Makefile netdb.h res_update.h resolv.h src/include/arpa inet.h nameser.h nameser
Message-ID:  <20060803104026.A45647@fledge.watson.org>
In-Reply-To: <200608030536.k735aIT3081092@sep.oldach.net>
References:  <200608030536.k735aIT3081092@sep.oldach.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 3 Aug 2006, Helge Oldach wrote:

> Well... I've spotted a regression not with the ports tree but with 6-STABLE. 
> On several boxes with this change applied I see lots of sendmails stacking 
> up over time, for example:
>
>  713  ??  Ss     0:01.05 sendmail: accepting connections (sendmail)
>  717  ??  Is     0:00.02 sendmail: Queue runner@00:30:00 for /var/spool/client
> 31747  ??  I      0:00.00 sendmail: startup with 71.119.31.81 (sendmail)
> 32834  ??  I      0:00.00 sendmail: startup with 83.36.190.38 (sendmail)
> 33569  ??  I      0:00.00 sendmail: startup with 221.206.76.60 (sendmail)
> 34023  ??  I      0:00.00 sendmail: startup with 49.195.192.61.tokyo.flets.alph
> 34459  ??  I      0:00.00 sendmail: startup with 221.165.35.46 (sendmail)
> 36517  ??  I      0:00.00 sendmail: startup with 61.192.180.137 (sendmail)
> 38722  ??  I      0:00.00 sendmail: startup with 203.177.238.78 (sendmail)
> 39126  ??  I      0:00.00 sendmail: startup with 222.90.251.185 (sendmail)
> 39203  ??  I      0:00.00 sendmail: startup with 221.9.214.183 (sendmail)
> 39859  ??  I      0:00.00 sendmail: startup with 59.20.101.111 (sendmail)
> 41090  ??  I      0:00.00 sendmail: startup with 61.192.166.235 (sendmail)
> 41766  ??  I      0:00.00 sendmail: startup with 68.118.52.132 (sendmail)
> 42482  ??  I      0:00.00 sendmail: startup with 219.249.201.36 (sendmail)
> 42483  ??  I      0:00.00 sendmail: startup with 219.249.201.36 (sendmail)
> 43467  ??  I      0:00.00 sendmail: startup with 210.213.191.70 (sendmail)
> 43757  ??  I      0:00.00 sendmail: startup with 220.189.144.7 (sendmail)
> 44176  ??  I      0:00.00 sendmail: startup with 71.205.226.98 (sendmail)
> 44850  ??  I      0:00.00 sendmail: startup with 72.89.135.133 (sendmail)
> 44943  ??  I      0:00.00 sendmail: startup with 220.167.134.212 (sendmail)
> 48031  ??  I      0:00.00 sendmail: startup with 60.22.198.23 (sendmail)
>
> On one busy sendmail box I've seen literally thousands of such processes. 
> Note that these processes don't disappear, so it is not related to 
> sendmail.cf's timeouts.
>
> Broswing through the recent STABLE commits, I firstly thought it was related 
> to the recent socket code changes, but no, it's not. It is definitely this 
> introduction of BIND9's resolver. If I back out this change, all is fine 
> again.
>
> As said, this is a very recent 6-STABLE. I'm tracking CTM, not cvs.
>
> I would seriously suggest to more thoroughly test this. I'm not asking to 
> back it out right now, but this is definitely a breakage in 6-STABLE that 
> should be fixed before 6.2.

I've had a similar report from Bjoern Zeeb; at first we thought the reason he 
had stacking up TCP connections was a bug I introduced in 7.x, but it turns 
out it's because his sshd is wedging in name resolution, and not closing the 
TCP sockets (which are now visible in netstat in a way they weren't before). 
We only concluded that it was not a kernel socket bug a day or so ago, so I'm 
not sure he's had a chance to generate a resolver bug report.  He reported 
that the application appeared to have two connected UDP sockets for name 
resolution, and one bad name server entry, but that the resolver appeared to 
be blocked in a read on the UDP socket that didn't have data queued, rather 
than the one that did.  This was all from looking at netstat, and as far as I 
know, he's not dug into the resolver yet to see what might be happening.  I've 
CC'd Bjoern in case he has further insight or can offer some more suggestions 
on what might be going on.

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060803104026.A45647>