From owner-freebsd-current Mon Jan 26 21:58:38 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id VAA14811 for current-outgoing; Mon, 26 Jan 1998 21:58:38 -0800 (PST) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from skynet.ctr.columbia.edu (skynet.ctr.columbia.edu [128.59.64.70]) by hub.freebsd.org (8.8.8/8.8.8) with SMTP id VAA14802; Mon, 26 Jan 1998 21:58:30 -0800 (PST) (envelope-from wpaul@skynet.ctr.columbia.edu) Received: (from wpaul@localhost) by skynet.ctr.columbia.edu (8.6.12/8.6.9) id AAA07933; Tue, 27 Jan 1998 00:59:39 -0500 From: Bill Paul Message-Id: <199801270559.AAA07933@skynet.ctr.columbia.edu> Subject: Re: Another observation on -current and NFS To: grog@lemis.com (Greg Lehey) Date: Tue, 27 Jan 1998 00:59:37 -0500 (EST) Cc: current@FreeBSD.ORG, dyson@FreeBSD.ORG In-Reply-To: <19980127155557.45414@lemis.com> from "Greg Lehey" at Jan 27, 98 03:55:57 pm X-Mailer: ELM [version 2.4 PL24] Content-Type: text Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk Of all the gin joints in all the towns in all the world, Greg Lehey had to walk into mine and say: > OK, I've build a new -current as well on my test machine. I haven't > had long enough to decide if there are any differences from a kernel > built a month ago, but I've seen one similarity: > > I build my kernels on an NFS-mounted file system and leave my /usr > read-only (this speeds up reboots after panics :-). I find I > *consistently* can't strip a debug kernel across the net. The > connection hangs, the local keyboard hangs, and about the only thing > that works is a ping from another system. This is an indication that some kernel routine is caught in an infinite loop of some kind. In fact, this sounds suspiciously like the problem I had with mmap() and NFS. > I have dumps of this situation, and I'm sure I can create more. The > relevant part of the stack trace looks like: [...] > #12 0xf0134384 in vfs_vmio_release (bp=0xf24e556c) at ../../kern/vfs_bio.c:756 > #13 0xf0134080 in brelse (bp=0xf24e556c) at ../../kern/vfs_bio.c:647 > #14 0xf0166316 in nfs_bioread (vp=0xf0508900, uio=0xf3a41e34, ioflag=8, cred=0xf0547b80, getpages=1) > at ../../nfs/nfs_bio.c:346 > #15 0xf0165b64 in nfs_getpages (ap=0xf3a41e70) at ../../nfs/nfs_bio.c:116 > #16 0xf01d0468 in vnode_pager_getpages (object=0xf3cced80, m=0xf3a41f38, count=1, reqpage=0) at vnode_if.h:1011 > #17 0xf01cf0b3 in vm_pager_get_pages (object=0xf3cced80, m=0xf3a41f38, count=1, reqpage=0) at ../../vm/vm_pager.c:184 > #18 0xf01c4750 in vm_fault (map=0xf0520200, vaddr=538910720, fault_type=3 '\003', fault_flags=8) > at ../../vm/vm_fault.c:419 [...] Oh boy, does this look familiar. This is almost exactly what I saw with the mmap() problem. The problem is that nfs_bioread() is trying to evict a buffer from the buffer cache that it doesn't like for some reason, but in fact the buffer is not being flushed. In my case, the buffer in question was supposed to cover an 8K region, but only the first 4K was actually mapped. When my test program tried to touch the second 4K, it triggered a page fault, causing the system to try to load the new 4K page into memory. But when nfs_bioread() is called, it finds the cached buffer which says it covers the desired region but really only covers the first 4K. So nfs_bioread() calls brelse() thinking this will evict the buffer from the cache so that the next time through the loop it will perform an actual read from the server to fetch the desired block. In fact, brelse() doesn't actually free the buffer, so nfs_bioread() comes up with the same partially valid buffer (via nfs_getcacheblk()) all over again, and it tries to evict it again, then it gets it again, and it tries to evict it again, then it gets it again, and it tries to evict it again, then it gets it again, and it tries to evict it again, and so on, and so on, and so on... nfs_bioread() seems to think that setting the B_NOCACHE flag on the buffer is enough to make brelse() throw the buffer out of the cache, but this doesn't seem to be the case anymore. Since the loop carries through quite a few different functions, you're likely to get several stack traces that look quite different, except for nfs_bioread(). I know: I banged my head against this problem for a while myself. John Dyson made a couple of commits this weekend which I _thought_ would fix this thing; I gave him a sample program that triggers the bug to help him track it down. Assuming that he got my sample program to work, I'm guessing there's more than one way to trigger the condition. > The command in question is the strip. It appears that strip(1) uses mmap(), so it looks like the same bug. Either it isn't really fixed, or there's more than one way to trip it. I would have loved to have fixed it myself, but I don't understand enough about the VM system to do it: while I can understand the nature of the bug, I don't actually know the right way to flush a buffer from the cache so I have no idea how to correct it. I tried looking at the UFS code to see if I could spot any clues, but nothing jumped out at me. > If anybody wants more > information, or access to the machine to look at it more carefully, > please contact me. As I said before, this does not appear to have > anything to do with changes made in the last 6 weeks. Hmmm... well, the bug I encountered has been there for a couple months at least. If you've only started to encounter it now, then maybe John's attempted fix made it worse somehow. *sigh* -Bill PS: Based on some other traffic I've seen come through -current recently, I'd bet a quarter that the cd9660 filesystem has mmap() problems too. -- ============================================================================= -Bill Paul (212) 854-6020 | System Manager, Master of Unix-Fu Work: wpaul@ctr.columbia.edu | Center for Telecommunications Research Home: wpaul@skynet.ctr.columbia.edu | Columbia University, New York City ============================================================================= "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness" =============================================================================