From owner-freebsd-stable@FreeBSD.ORG Thu Sep 18 19:13:59 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 31DFA1065672 for ; Thu, 18 Sep 2008 19:13:59 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 0AC588FC13 for ; Thu, 18 Sep 2008 19:13:59 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTP id 9810846B35; Thu, 18 Sep 2008 15:13:58 -0400 (EDT) Date: Thu, 18 Sep 2008 20:13:58 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: "Oleg V. Nauman" In-Reply-To: <20080918180543.pt7s2zmaio48ww8g@webmail.opentransfer.com> Message-ID: References: <20080918180543.pt7s2zmaio48ww8g@webmail.opentransfer.com> User-Agent: Alpine 1.10 (BSF 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@freebsd.org Subject: Re: RELENG_7: something is very wrong with UDP? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Sep 2008 19:13:59 -0000 On Thu, 18 Sep 2008, Oleg V. Nauman wrote: > It seems to be something is very wrong with UDP on latest RELENG_7 > > Well some symptoms I have seen today when I was trying to boot newly > compiled RELENG_7 on my laptop: > > a) rc scripts indefinitely waiting on logger to be completed during the boot > ( devd and ifconfig are good examples) If you hit "ctrl-t" while these are waiting, what is the output? > b) Sporadic DNS request failures I don't know what your comfortable level with debugging tools is, but if you're happy using tcpdump, etc, I think I'd recommend diagnosing this directly that way. I'd probably do something like this: (1) Start by deleting all but one nameserver entry in /etc/resolv.conf. Confirm that you can still reproduce the problem. (2) Use dig(1) and tcpdump(1) to watch wire-level DNS behavior -- do you see queries go out? Do you see replies come back? Is dig "waking up" and seeing the replies when they arrive, or is there a delay or hang in dig? If dig hangs, what does ctrl-t show the sleep state (wmesg) is? Could you also use procstat -k on the dig process to generate a kernel stack trace for it? > c) traceroute prints 0.00 like response time for every host > > d) was unable to reboot my laptop performing shutdown -r ( due to > logger/syslog related issues I think) Could you try killing syslogd by hand and see if it dies? If not, can you use procstat -kk to generate a stack trace for it? > e ) I was unable to start X session ( it seems to be freezes laptop because > I was unable to switch to another virtual console even) > > csup "backout" to date=2008.09.15.12.00.00 and recompiling the kernel fixes > this issue for me. This is approximately the date of my last UDP MFC. Could you try backing out just src/sys/netinet6/udp6_usrreq.c revision 1.81.2.7 and see if that helps? (specifically, restore the use of sosend_generic instead of sosend_dgram) Could you confirm that either you're not using any kernel modules from ports, or that if you are, you have recompiled them with your most recent update? Could you try compiling your kernel with WITNESS to see if we get any extended debugging information? > Is anybody experiencing the same issues with fresh RELENG_7? Unsure it is my > local issues though I'm not experiencing them, but these sorts of things can be quite subtle and workload-dependent. Robert N M Watson Computer Laboratory University of Cambridge