From owner-freebsd-stable@FreeBSD.ORG Sat Feb 26 10:41:28 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DD71516A4CE for ; Sat, 26 Feb 2005 10:41:27 +0000 (GMT) Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53]) by mx1.FreeBSD.org (Postfix) with ESMTP id 679D343D41 for ; Sat, 26 Feb 2005 10:41:27 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by cyrus.watson.org (Postfix) with SMTP id 7176D46B3B; Sat, 26 Feb 2005 05:41:26 -0500 (EST) Date: Sat, 26 Feb 2005 10:39:38 +0000 (GMT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Bob Johnson In-Reply-To: <200502260045.44558.bob89@bobj.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-stable@freebsd.org Subject: Re: HEADS UP: netipx mega-MFC (1/2) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Feb 2005 10:41:28 -0000 On Sat, 26 Feb 2005, Bob Johnson wrote: > I'm wasn't planning to be near that system again until Monday, although > if you need details about my configuration I can dig them out remotely > and send them to you. I'm not brave enough to risk another panic when > I'm not physically there to reset the system, but if you are willing to > work on it this weekend I'm willing to drive over there and test it. I > would REALLY like to see the Netware stuff working in 5.4R. > > A few weeks ago I had the same panic when I tried to set up Netware > support on 5.3-Release. It went away when I updated to -stable then (I > believe it was around Jan 24), but the packets going out on the wire > were not quite right, so I couldn't actually use it for networking. So to confirm what I think you're saying: - 5.3-R panicked configuring ipx/ncp/nwfs against a Netware server. - 5.3-S up until my recent changes didn't panic, but there appeared to be on-the-wire corruption. - 5.3-S (5.4-P) as of yesterday now panics again. So it sounds like were still dealing with at least two problems: some sort of panic, and an on-the-wire problem. I think the first course of business is to get the panic fixed -- chances are, it's a pointer botch of some sort, if you're seeing a fault. Here are some things that you could do to help me debug these problems: (1) Compile your kernel with DDB/KDB, and configure a dump partition. Make sure you have a kernel with debugging symbols on-hand. If you've not done this before, instructions can be found in the handbook. Many bugs can be debugged using just DDB/KDB, but a dump for post-mortem analysis can be quite helpful for more complex bugs. Even if we don't manage to get kernel dumps, we'll need the kernel with debugging symbols to convert addresses into lines of source code. (2) When reporting a panic, please report the exact steps it took to get to the panic. I can imagine a number of bugs we might have that might trigger at different points, and I'm not currently clear on which it is. For example, the IPX code might panic when ifconfig runs to configure an address, or the panic might happen at file system mount time when you call mount_nwfs. Or does the panic happen later on first file access, or after some period of activity? Knowing which of these it is would be very helpful in narrowing down the source of the problem. (3) When reporting a panic, it's helpful to have as much of the trap or panic output as possible. I don't know if you're currently using a serial console or not, but I find that a serial console is very helpful in gathering debugging information, as it makes it easy to copy and paste output. If you get into DDB following the panic, the commands "show pcpu", "ps", and "trace" are almost always good starting points for debugging. With a serial console, sending that output by e-mail will be dramatically easier :-). When not running with a serial console, many people will use digital cameras to take pictures of debugger output, because that's till more convenient than trying to write it down or type it in (lots of hex digits :-). I'll be pretty available this weekend to help with debugging this. Not sure if you've done it yet, but it might be useful to boot the previous kernel and just make sure that the panic only happens with the new kernel, and that it wasn't triggered by some other change in your environment. That seems fairly unlikely, but it's good to check assumptions because it can save a lot of time and confusion :-). Robert N M Watson