From owner-freebsd-current@FreeBSD.ORG Thu May 2 18:32:54 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AE9EF95F; Thu, 2 May 2013 18:32:54 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) by mx1.freebsd.org (Postfix) with ESMTP id 8B2161F41; Thu, 2 May 2013 18:32:54 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 0509EB911; Thu, 2 May 2013 14:32:54 -0400 (EDT) From: John Baldwin To: Ian FREISLICH Subject: Re: panic: in_pcblookup_local (?) Date: Thu, 2 May 2013 14:32:34 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <201305021209.41221.jhb@freebsd.org> <52B3AEE5-D24A-4ED3-BB11-E7E27BFB447F@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201305021432.34456.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 02 May 2013 14:32:54 -0400 (EDT) Cc: Glen Barber , freebsd-current@freebsd.org, "Robert N. M. Watson" , Peter Wemm X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 May 2013 18:32:54 -0000 On Thursday, May 02, 2013 1:53:47 pm Ian FREISLICH wrote: > John Baldwin wrote: > > On Thursday, May 02, 2013 7:25:08 am Robert N. M. Watson wrote: > > > > > > On 2 May 2013, at 11:42, Glen Barber wrote: > > > > > > > Hmm. Perhaps it would be worthwhile for me to rebuild the current > > > > kernel with DDB support. It looks like the machine has panicked a few > > > > times over the last two weeks or so, but based on the timestamps of the > > > > crash dumps and nagios complaints, happened during the middle of the > > > > night when I would not have really noticed, or otherwise would have just > > > > blamed my ISP. > > > > > > > > Two of the panics are ath(4) related. One looks similar to the one > > > > referenced in this thread, similarly triggered by a CFEngine process. > > > > > > > > In that case, the backtrace looks like: > > > > > > > > #4 0xffffffff808cdbb3 at calltrap+0x8 > > > > #5 0xffffffff807371d8 at in_pcb_lport+0x128 > > > > #6 0xffffffff8073745a at in_pcbbind_setup+0x16a > > > > #7 0xffffffff80737d8e at in_pcbconnect_setup+0x71e > > > > #8 0xffffffff80737df9 at in_pcbconnect_mbuf+0x59 > > > > #9 0xffffffff807bf29f at udp_connect+0x11f > > > > #10 0xffffffff80680615 at kern_connectat+0x275 > > > > > > > > Regarding DDB though, it would be rather difficult to access the machine > > > > if it drops to a DDB debugger session, since the machine acts as my > > > > firewall. > > > > > > Thanks -- will take a look at the attached. > > > > > > FWIW, though, I'm worried by the number of panics you are seeing, especiall > y > > given that they involve multiple subsystems, and in particular, John's > > observation about a potentially corrupted pointer. This makes me wonder > > whether (a) you are experiencing hardware faults -- it would be worth running > > > some memory/cpu/etc tests and (b) if we might be seeing a software memory > > corruption bug of some sort. > > > > Other users have reported this (Ian Lepore), and Peter Wemm can now reproduce > > these at will as well, so I think this is a software bug. What might be > > easiest if we can't figure this out from the crashdump is just to bisect the > > offending revision. > > I've started a binary search. I'll let you know what that turns up. Thanks, and sorry for getting my Ian's mixed up. :-/ -- John Baldwin