From owner-freebsd-fs Mon Apr 17 13:27:56 2000 Delivered-To: freebsd-fs@freebsd.org Received: from ewok.creative.net.au (fuzzy.aussie.com.au [203.30.44.82]) by hub.freebsd.org (Postfix) with SMTP id 9573A37B9A1 for ; Mon, 17 Apr 2000 13:27:48 -0700 (PDT) (envelope-from freebsd@ewok.creative.net.au) Received: (qmail 67900 invoked by uid 1008); 17 Apr 2000 20:27:35 -0000 Date: Tue, 18 Apr 2000 04:27:35 +0800 From: Adrian Chadd To: Matthew Dillon Cc: Adrian Chadd , freebsd-fs@FreeBSD.ORG, dchapes@borderware.com, freebsd-hackers@FreeBSD.ORG Subject: Re: vnode_free_list corruption [patch] Message-ID: <20000418042733.I59015@ewok.creative.net.au> References: <00Apr14.141908edt.117140@gateway.borderware.com> <20000415023148.F34852@ewok.creative.net.au> <200004141835.LAA71253@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4i In-Reply-To: <200004141835.LAA71253@apollo.backplane.com>; from Matthew Dillon on Fri, Apr 14, 2000 at 11:35:21AM -0700 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, Apr 14, 2000, Matthew Dillon wrote: > > :On Fri, Apr 14, 2000, Dave Chapeskie wrote: > :> Greetings. > :> > :> I've been seeing a rash of "free vnode isn't" panics lately. Some > :> machines were panicing several times a day. Along with this we saw > :> occasional "object inconsistent state: RPC: %d, RC: %d" messages. > : > : > :Throw it into a PR, and I'll assign it to myself and take a squizz.. > : > : > :Adrian > > I'll take a look at it too. Either way we'll get something committed. > Beware, though, even though there is obviously a bug (Dave obviously > found the bug!), the vgone/vdone/VDEAD interaction is extremely complex > so we have to be careful not to break other things while fixing this > one. Ok, my take on the code is this: * with the trace given, the vnode shouldn't even be marked VDOOMED, as its meant to be in use, * a vnode shouldn't ever reach vbusy when marked VDOOMED, as it should be ref/held and so shouldn't ever be considered to be cleaned, * I think a KASSERT should be added in vbusy() On my machine (400Mhz Celeron, 64mb RAM, single 4.2gig IDE disk) running current from a day ago, I can't reproduce the bug. Are you running with multiple spindles/softupdates ? I'll look at the code some more over the next couple of days. Any opinions ? Adrian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Apr 17 15:54:37 2000 Delivered-To: freebsd-fs@freebsd.org Received: from borderware.com (gateway.borderware.com [207.236.65.226]) by hub.freebsd.org (Postfix) with ESMTP id DF60137BBA2; Mon, 17 Apr 2000 15:54:31 -0700 (PDT) (envelope-from dchapes@borderware.com) Received: by gateway.borderware.com id <117127>; Mon, 17 Apr 2000 18:51:17 -0400 From: Dave Chapeskie Message-Id: <00Apr17.185117edt.117127@gateway.borderware.com> Date: Mon, 17 Apr 2000 18:54:20 -0400 To: Adrian Chadd , Matthew Dillon Cc: freebsd-fs@freebsd.org Subject: Re: vnode_free_list corruption [patch] References: <00Apr14.141908edt.117140@gateway.borderware.com> <20000415023148.F34852@ewok.creative.net.au> <200004141835.LAA71253@apollo.backplane.com> <20000418042733.I59015@ewok.creative.net.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <20000418042733.I59015@ewok.creative.net.au>; from Adrian Chadd on Tue, Apr 18, 2000 at 04:27:35AM +0800 X-no-archive: yes Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org By the way, thanks for looking into this! On Tue, Apr 18, 2000 at 04:27:35AM +0800, Adrian Chadd wrote: > Ok, my take on the code is this: > > * with the trace given, the vnode shouldn't even be marked VDOOMED, as its > meant to be in use, > * a vnode shouldn't ever reach vbusy when marked VDOOMED, as it should be > ref/held and so shouldn't ever be considered to be cleaned, > * I think a KASSERT should be added in vbusy() Since the situation is known to happen, at least I know it does :-), I think it should be a real call to panic as in my patch instead of a KASSERT that is only enabled if options INVARIANTS is used. If the system is fixed to prevent this situation then it can always be changed to a KASSERT (if the quick check of the flag is too slow for people). > On my machine (400Mhz Celeron, 64mb RAM, single 4.2gig IDE disk) running > current from a day ago, I can't reproduce the bug. Are you running with > multiple spindles/softupdates ? Since I was able to reproduce it on machines with very different CPUs, memory, and disks I didn't bother to include machine specifications. The customers that were seeing the problem most often are running "high end" (whatever that means) machines with SCSI disks. The machine I used for testing was a 200 MHz Pentium machine with IDE disks. Softupdates was never enabled on any of these systems. Here is the dmesg output for the machine I did most of my testing on, only partitions on wd2 were mounted during the tests. On my workstation with 64MB of RAM it took much longer to happen and I had some other processes consuming memory, so it might be easier for you to reproduce it if you lower your available system memory to 32 MB or less (via MAXMEM or the boot loader of course). Also, when I did my test if it didn't paniced within 10-15 minutes it seemed to not panic at all. I imagine that once the number of vnodes grows to a certain size it's just less likely to happen or something. Most often it would panic and not only that it paniced within a few minutes. Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 200455163 Hz CPU: Pentium/P55C (200.46-MHz 586-class CPU) Origin = "GenuineIntel" Id = 0x543 Stepping = 3 Features=0x8001bf real memory = 33554432 (32768K bytes) avail memory = 23916544 (23356K bytes) Preloaded elf kernel "kernel" at 0xc0376000. Preloaded userconfig_script "/boot/kernel.conf" at 0xc037609c. Probing for devices on PCI bus 0: chip0: rev 0x01 on pci0.0.0 chip1: rev 0x01 on pci0.1.0 ide_pci0: rev 0x01 on pci0.1.1 chip2: rev 0x01 on pci0.1.3 xl0: <3Com 3c905B-TX Fast Etherlink XL> rev 0x24 int a irq 9 on pci0.10.0 xl0: Ethernet address: 00:10:4b:9e:8c:b2 xl0: autoneg complete, link status good (full-duplex, 100Mbps) xl1: <3Com 3c905B-TX Fast Etherlink XL> rev 0x24 int a irq 10 on pci0.11.0 xl1: Ethernet address: 00:10:4b:79:0a:10 xl1: autoneg complete, link status good (half-duplex, 10Mbps) vr0: rev 0x06 int a irq 11 on pci0.12.0 vr0: Ethernet address: 00:80:c8:ec:73:51 vr0: autoneg complete, link status good (half-duplex, 100Mbps) Probing for PnP devices: Probing for devices on the ISA bus: sc0 on isa sc0: VGA color <16 virtual consoles, flags=0x0> atkbdc0 at 0x60-0x6f on motherboard atkbd0 irq 1 on isa psm0 irq 12 on isa psm0: model Generic PS/2 mouse, device ID 0 sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa sio0: type 16550A sio1 at 0x2f8-0x2ff irq 3 on isa sio1: type 16550A fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fdc0: FIFO enabled, 8 bytes threshold fd0: 1.44MB 3.5in wdc0 at 0x1f0-0x1f7 irq 14 on isa wdc0: unit 0 (wd0): wd0: 3098MB (6346368 sectors), 6296 cyls, 16 heads, 63 S/T, 512 B/S [^^^ not used/mounted at all ^^^] wdc1 at 0x170-0x177 irq 15 on isa wdc1: unit 0 (wd2): wd2: 3093MB (6335280 sectors), 6704 cyls, 15 heads, 63 S/T, 512 B/S ida: port address (0xffffffff) out of range Vendor Specific Word = ffff Vendor Specific Word = ffff Vendor Specific Word = ffff Vendor Specific Word = ffff Vendor Specific Word = ffff vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa npx0 on motherboard npx0: INT 16 interface Intel Pentium detected, installing workaround for F00F bug IP packet filtering initialized, divert enabled, rule-based forwarding disabled, unlimited logging changing root device to wd2s1a Start pid=2 Start pid=3 Start pid=4 xl0: autoneg complete, link status good (full-duplex, 100Mbps) xl1: autoneg complete, link status good (half-duplex, 10Mbps) vr0: autoneg complete, link status good (half-duplex, 100Mbps) > I'll look at the code some more over the next couple of days. Any opinions ? I haven't had the time to look at the code since I came up with the patch (which works for our setups so we're reasonably happy and I'm busy doing other things) but after reading Kirk's opinions on the matter I'd tend to agree and think that vbusy/vhold shouldn't be mucking with the free list the way they do. I'd guess that either they need to be able to check for and return an error or else v_holdcnt should disappear in favour of just using v_usecount. I didn't see any semantically differences between the two (but I didn't look too hard either). -- Dave Chapeskie Senior Software Engineer Borderware Technologies Inc. Mississauga, Ontario, Canada To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Apr 18 2:49:26 2000 Delivered-To: freebsd-fs@freebsd.org Received: from ewok.creative.net.au (fuzzy.aussie.com.au [203.30.44.82]) by hub.freebsd.org (Postfix) with SMTP id 4565737B6C3 for ; Tue, 18 Apr 2000 02:49:21 -0700 (PDT) (envelope-from freebsd@ewok.creative.net.au) Received: (qmail 72134 invoked by uid 1008); 18 Apr 2000 09:46:11 -0000 Date: Tue, 18 Apr 2000 17:46:11 +0800 From: Adrian Chadd To: Dave Chapeskie Cc: Adrian Chadd , Matthew Dillon , freebsd-fs@freebsd.org Subject: Re: vnode_free_list corruption [patch] Message-ID: <20000418174608.C71428@ewok.creative.net.au> References: <00Apr14.141908edt.117140@gateway.borderware.com> <20000415023148.F34852@ewok.creative.net.au> <200004141835.LAA71253@apollo.backplane.com> <20000418042733.I59015@ewok.creative.net.au> <00Apr17.185117edt.117127@gateway.borderware.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4i In-Reply-To: <00Apr17.185117edt.117127@gateway.borderware.com>; from Dave Chapeskie on Mon, Apr 17, 2000 at 06:54:20PM -0400 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Mon, Apr 17, 2000, Dave Chapeskie wrote: > By the way, thanks for looking into this! > > On Tue, Apr 18, 2000 at 04:27:35AM +0800, Adrian Chadd wrote: > > Ok, my take on the code is this: > > > > * with the trace given, the vnode shouldn't even be marked VDOOMED, as its > > meant to be in use, > > * a vnode shouldn't ever reach vbusy when marked VDOOMED, as it should be > > ref/held and so shouldn't ever be considered to be cleaned, > > * I think a KASSERT should be added in vbusy() > > Since the situation is known to happen, at least I know it does :-), > I think it should be a real call to panic as in my patch instead of a > KASSERT that is only enabled if options INVARIANTS is used. If the > system is fixed to prevent this situation then it can always be changed > to a KASSERT (if the quick check of the flag is too slow for people). Yes, but from my take of the code, if a vnode reaches VDOOMED, its been earmarked for recycling and is in the process of being flushed. If the vnode is being used in the FS code somewhere (or anywhere for that matter :) it shouldn't ever be considered for recycling as it should be vref()'ed or at the least vhold()'ed. > > On my machine (400Mhz Celeron, 64mb RAM, single 4.2gig IDE disk) running > > current from a day ago, I can't reproduce the bug. Are you running with > > multiple spindles/softupdates ? > > Since I was able to reproduce it on machines with very different CPUs, > memory, and disks I didn't bother to include machine specifications. > The customers that were seeing the problem most often are running "high > end" (whatever that means) machines with SCSI disks. The machine I used > for testing was a 200 MHz Pentium machine with IDE disks. Softupdates > was never enabled on any of these systems. > > Here is the dmesg output for the machine I did most of my testing on, > only partitions on wd2 were mounted during the tests. On my workstation > with 64MB of RAM it took much longer to happen and I had some other > processes consuming memory, so it might be easier for you to reproduce > it if you lower your available system memory to 32 MB or less (via > MAXMEM or the boot loader of course). Right, I'll drop MAXMEM down and try to starve the system further, and see what happens. Adrian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Apr 18 4:37:51 2000 Delivered-To: freebsd-fs@freebsd.org Received: from ewok.creative.net.au (fuzzy.aussie.com.au [203.30.44.82]) by hub.freebsd.org (Postfix) with SMTP id 3BBAB37B523 for ; Tue, 18 Apr 2000 04:37:47 -0700 (PDT) (envelope-from freebsd@ewok.creative.net.au) Received: (qmail 72570 invoked by uid 1008); 18 Apr 2000 11:37:42 -0000 Date: Tue, 18 Apr 2000 19:37:42 +0800 From: Adrian Chadd To: freebsd-fs@freebsd.org Subject: FFS and ints Message-ID: <20000418193741.E71428@ewok.creative.net.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4i Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Hi, I just waded through a friends of a friends FFS panic, and it turns uot that the cg_rotor, cg_irotor and cg_frotor values were all outrageously wrong. Now, how they got to be wrong is another matter entirely, but the thing that kept tripping him was that fsck's cg checks were doing stuff like : if (cg->cg_frotor < newcg->cg_ndblk) newcg->cg_frotor = cg->cg_frotor; else newcg->cg_frotor = 0; Now, this makes sense EXCEPT that cg_rotor/frotor/irotor are defined as int32_ts which mean any weirdnesses that corrupt these values to negative values will not be picked up in fsck. So, my question is this: should the cg definition change to change things that should be unsigned to unsigned, or should fsck change and we leave the kernel alone? Adrian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Apr 18 6:32: 4 2000 Delivered-To: freebsd-fs@freebsd.org Received: from wcn4.wcnet.net (mail.wcnet.net [216.88.248.234]) by hub.freebsd.org (Postfix) with ESMTP id B239B37B61C for ; Tue, 18 Apr 2000 06:31:58 -0700 (PDT) (envelope-from jestess@wcnet.net) Received: from wcnet.net [216.88.249.119] by wcn4.wcnet.net with ESMTP (SMTPD32-6.00) id A3CB3DD301EC; Tue, 18 Apr 2000 08:31:55 -0500 Message-ID: <38FC6474.7E16D9EF@wcnet.net> Date: Tue, 18 Apr 2000 08:34:44 -0500 From: John Estess X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.0.36 i386) X-Accept-Language: en MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: FFS and ints Content-Type: multipart/mixed; boundary="------------4AAF98DA7ADECDBBB93BEEF3" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org This is a multi-part message in MIME format. --------------4AAF98DA7ADECDBBB93BEEF3 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Adrian Chadd wrote: > Now, this makes sense EXCEPT that cg_rotor/frotor/irotor are defined > as int32_ts which mean any weirdnesses that corrupt these values to > negative values will not be picked up in fsck. > > So, my question is this: should the cg definition change to change > things that should be unsigned to unsigned, or should fsck change > and we leave the kernel alone? > > Adrian Back to memory lane... About two months ago I was going through warnings generated by fsck during the compile and discovered three unsigned to signed warnings. /usr/src/sbin/fsck/dir.c: In function `dirscan': /usr/src/sbin/fsck/dir.c:127: warning: comparison between signed and unsigned /usr/src/sbin/fsck/dir.c: In function `expanddir': /usr/src/sbin/fsck/dir.c:620: warning: comparison between signed and unsigned /usr/src/sbin/fsck/dir.c:634: warning: comparison between signed and unsigned One of these is easily gotten rid of in fsck. The others can be tracked to (the input to )/ffs/fs.h to some uncapitalized macros (below - those bastards :-)). If you can cast input to be unsigned for everything, you might get rid of the unsigned/signed compares. A macro, or god forbid, a function - if it wasn't too slow, could take the place of this in fsck.h, if they aren't used elsewhere (I'll grep world later). Since I'm working nights (6pm to 6am - 5 days a week) and I'm moving by the end of the month, I have no time for this. Also, I'm still cutting my teeth on fs stuff, so everything I've written could be wrong... #define dblksize(fs, dip, lbn) \ (((lbn) >= NDADDR || (dip)->di_size >= smalllblktosize(fs, (lbn) + 1)) \ ? (fs)->fs_bsize \ : (fragroundup(fs, blkoff(fs, (dip)->di_size)))) #define smalllblktosize(fs, blk) /* calculates (blk * fs->fs_bsize) */ \ ((blk) << (fs)->fs_bshift) --------------4AAF98DA7ADECDBBB93BEEF3 Content-Type: message/rfc822 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mozilla-Status2: 00000000 Message-ID: <38FC6408.CD7DFB2E@wcnet.net> Date: Tue, 18 Apr 2000 08:32:56 -0500 From: John Estess X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.0.36 i386) X-Accept-Language: en MIME-Version: 1.0 To: Adrian Chadd Subject: Re: FFS and ints References: <20000418193741.E71428@ewok.creative.net.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Adrian Chadd wrote: > Now, this makes sense EXCEPT that cg_rotor/frotor/irotor are defined > as int32_ts which mean any weirdnesses that corrupt these values to > negative values will not be picked up in fsck. > > So, my question is this: should the cg definition change to change > things that should be unsigned to unsigned, or should fsck change > and we leave the kernel alone? > > Adrian Back to memory lane... About two months ago I was going through warnings generated by fsck during the compile and discovered three unsigned to signed warnings. /usr/src/sbin/fsck/dir.c: In function `dirscan': /usr/src/sbin/fsck/dir.c:127: warning: comparison between signed and unsigned /usr/src/sbin/fsck/dir.c: In function `expanddir': /usr/src/sbin/fsck/dir.c:620: warning: comparison between signed and unsigned /usr/src/sbin/fsck/dir.c:634: warning: comparison between signed and unsigned One of these is easily gotten rid of in fsck. The others can be tracked to (the input to )/ffs/fs.h to some uncapitalized macros (below - those bastards :-)). If you can cast input to be unsigned for everything, you might get rid of the unsigned/signed compares. A macro, or god forbid, a function - if it wasn't too slow, could take the place of this in fsck.h, if they aren't used elsewhere (I'll grep world later). Since I'm working nights (6pm to 6am - 5 days a week) and I'm moving by the end of the month, I have no time for this. Also, I'm still cutting my teeth on fs stuff, so everything I've written could be wrong... #define dblksize(fs, dip, lbn) \ (((lbn) >= NDADDR || (dip)->di_size >= smalllblktosize(fs, (lbn) + 1)) \ ? (fs)->fs_bsize \ : (fragroundup(fs, blkoff(fs, (dip)->di_size)))) #define smalllblktosize(fs, blk) /* calculates (blk * fs->fs_bsize) */ \ ((blk) << (fs)->fs_bshift) --------------4AAF98DA7ADECDBBB93BEEF3-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Apr 18 11:18:40 2000 Delivered-To: freebsd-fs@freebsd.org Received: from borderware.com (gateway.borderware.com [207.236.65.226]) by hub.freebsd.org (Postfix) with ESMTP id 0CBD837BA4C; Tue, 18 Apr 2000 11:18:32 -0700 (PDT) (envelope-from dchapes@borderware.com) Received: by gateway.borderware.com id <117123>; Tue, 18 Apr 2000 14:15:42 -0400 From: Dave Chapeskie Message-Id: <00Apr18.141542edt.117123@gateway.borderware.com> Date: Tue, 18 Apr 2000 14:18:24 -0400 To: Adrian Chadd Cc: Matthew Dillon , freebsd-fs@FreeBSD.ORG Subject: Re: vnode_free_list corruption References: <00Apr14.141908edt.117140@gateway.borderware.com> <20000415023148.F34852@ewok.creative.net.au> <200004141835.LAA71253@apollo.backplane.com> <20000418042733.I59015@ewok.creative.net.au> <00Apr17.185117edt.117127@gateway.borderware.com> <20000418174608.C71428@ewok.creative.net.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=BOKacYhQ+x31HxR3 X-Mailer: Mutt 0.93.2i In-Reply-To: <20000418174608.C71428@ewok.creative.net.au>; from Adrian Chadd on Tue, Apr 18, 2000 at 05:46:11PM +0800 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org --BOKacYhQ+x31HxR3 Content-Type: text/plain; charset=us-ascii On Tue, Apr 18, 2000 at 05:46:11PM +0800, Adrian Chadd wrote: > Yes, but from my take of the code, if a vnode reaches VDOOMED, its been > earmarked for recycling and is in the process of being flushed. If > the vnode is being used in the FS code somewhere (or anywhere for that > matter :) it shouldn't ever be considered for recycling as it should be > vref()'ed or at the least vhold()'ed. I see it the other way around, getnewvnode is perfectly within it's rights to call vgonel() on a VOP_LOCKED vnode from what I can see (see code comment below). I think the problem is that vhold and vbusy need to be checking for VXLOCK and returning an error just the way vget does. If that's the case it seems silly to have both vhold and vget. The vclean() call (from vgonel called by getnewvnode) must be blocking. It's the only place between getnewvnode()'s setting VDOOMED and it's later clearing of the flags (assuming VXLOCK isn't already set) where it can block. There is a comment in vclean() which says: /* * Even if the count is zero, the VOP_INACTIVE routine may still * have the object locked while it cleans it out. The VOP_LOCK * ensures that the VOP_INACTIVE routine is done with its work. * For active vnodes, it ensures that no other activity can * occur while the underlying object is being cleaned out. */ VOP_LOCK(vp, LK_DRAIN | LK_INTERLOCK, p); Alternatively it may sometimes be blocking in the vinvalbuf() call. I just repeated the problem again with some extra kernel debugging. One of the 'head' processes calls getnewvnode and it blocks with a wait message of "inode", I think that means its blocked in the VOP_LOCK call waiting for the VOP_LOCK. vprint() from the getnewvnode call looks like this: getnewvnode: 0xcd2e7200: type VREG, usecount 0, writecount 0, refcount 0, flags (VFREE) tag VT_UFS, ino 27201, on dev 0x20015 (0, 131093) lock type inode: EXCL (count 1) by pid 611 getnewvnode: pid 1455 recycling VOP_ISLOCKED vnode! The vprint() from vbusy looks like this: vbusy: 0xcd2e7200: type VREG, usecount 0, writecount 0, refcount 1, flags (VXLOCK|VDOOMED|VFREE) tag VT_UFS, ino 27201, on dev 0x20015 (0, 131093) lock type inode: EXCL (count 1) by pid 611 panic: vbusy on VDOOMED vnode pid 611 is 'rm', pid 1455 is 'head' with a wait message of "inode". So it looks like my panic call in vbusy should be checking for VXLOCK instead of VDOOMED (since the later is only set/checked from within getnewvnode it's better not to make other parts of the system know about it). > Right, I'll drop MAXMEM down and try to starve the system further, and > see what happens. Also make sure softupdates is off since it could easily be changing the timing (all my test were done with it off). Also try the attached patch in order to make the problem easier to replicate. It makes getnewvnode _try_ and find a VOP_LOCKED vnode to recycle. It still only picks vnodes that might otherwise have been selected (if they were closer to the front of the inactive list). It of course slows things down since it often walks the complete free list but that shouldn't matter for the purposes of this test. With similar changes in my kernel it can still takes a couple of minutes for the vbusy panic to occur (although I can do it with fewer instances of my test scripts running). You'll probably notice that getnewvnode does successfully recycle several VOP_LOCKED vnodes before vbusy() gets called on one. -- Dave Chapeskie Senior Software Engineer Borderware Technologies Inc. Mississauga, Ontario, Canada --BOKacYhQ+x31HxR3 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="test.diff" diff -u -r1.253 vfs_subr.c --- kern/vfs_subr.c 2000/03/20 11:28:45 1.253 +++ kern/vfs_subr.c 2000/04/18 18:12:58 @@ -112,6 +112,14 @@ SYSCTL_INT(_vfs, OID_AUTO, reassignbufsortbad, CTLFLAG_RW, &reassignbufsortbad, 0, ""); static int reassignbufmethod = 1; SYSCTL_INT(_vfs, OID_AUTO, reassignbufmethod, CTLFLAG_RW, &reassignbufmethod, 0, ""); +#ifdef DDB +static int skip_vop_locked = 0; +SYSCTL_INT(_debug, OID_AUTO, skip_vop_locked, CTLFLAG_RW, &skip_vop_locked, 0, + "move VOP_LOCKED vnodes to the back of the free list"); +static int find_vop_locked = 1; +SYSCTL_INT(_debug, OID_AUTO, find_vop_locked, CTLFLAG_RW, &find_vop_locked, 0, + "try and cause problems by looking for VOP_LOCKED vnodes to recycle"); +#endif #ifdef ENABLE_VFS_IOOPT int vfs_ioopt = 0; @@ -453,6 +461,9 @@ struct vnode *vp, *tvp, *nvp; vm_object_t object; TAILQ_HEAD(freelst, vnode) vnode_tmp_list; +#ifdef DDB + struct vnode *non_locked = NULL; +#endif /* * We take the least recently used vnode from the freelist @@ -507,7 +518,35 @@ /* Don't recycle if active in the namecache */ simple_unlock(&vp->v_interlock); continue; + } else if (VOP_ISLOCKED(vp)) { +#ifdef DDB + vprint("getnewvnode", vp); + if (!skip_vop_locked) { + printf (getnewvnode: "pid %ld recycling" + " VOP_ISLOCKED vnode!\n", + curproc ? curproc->p_pid : 0); + break; + } + printf ("getnewvnode: pushing VOP_ISLOCKED" + " vnode to end of list\n"); +#endif + TAILQ_REMOVE(&vnode_free_list, vp, v_freelist); + TAILQ_INSERT_TAIL(&vnode_tmp_list, vp, v_freelist); + continue; } else { +#ifdef DDB + if (!skip_vop_locked && find_vop_locked) { + /* + * To illistrate a problem look for + * VOP_LOCKED vnodes to recycle, + * but remember the first non-locked + * vnode + */ + if (non_locked == NULL) + non_locked = vp; + continue; + } else +#endif break; } } @@ -520,6 +559,11 @@ simple_unlock(&tvp->v_interlock); } +#ifdef DDB + /* If there are no locked vnodes, use the first non-locked one */ + if (vp == NULL && non_locked != NULL) + vp = non_locked; +#endif if (vp) { vp->v_flag |= VDOOMED; TAILQ_REMOVE(&vnode_free_list, vp, v_freelist); @@ -2613,6 +2657,13 @@ int s; s = splbio(); + if (vp->v_flag & VDOOMED|VXLOCK) { +#ifdef DIAGNOSTIC + vprint ("vbusy", vp); + printf ("vbusy by pid %ld\n", curproc ? curproc->p_pid : 0); +#endif + panic ("vbusy on VDOOMED or VXLOCKed vnode"); + } simple_lock(&vnode_free_list_slock); if (vp->v_flag & VTBFREE) { TAILQ_REMOVE(&vnode_tobefree_list, vp, v_freelist); --BOKacYhQ+x31HxR3-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Apr 18 11:28:37 2000 Delivered-To: freebsd-fs@freebsd.org Received: from borderware.com (gateway.borderware.com [207.236.65.226]) by hub.freebsd.org (Postfix) with ESMTP id B337737BAF9; Tue, 18 Apr 2000 11:28:32 -0700 (PDT) (envelope-from dchapes@borderware.com) Received: by gateway.borderware.com id <117125>; Tue, 18 Apr 2000 14:25:38 -0400 From: Dave Chapeskie Message-Id: <00Apr18.142538edt.117125@gateway.borderware.com> Date: Tue, 18 Apr 2000 14:28:22 -0400 To: Adrian Chadd Cc: Matthew Dillon , freebsd-fs@FreeBSD.ORG Subject: Re: vnode_free_list corruption References: <00Apr14.141908edt.117140@gateway.borderware.com> <20000415023148.F34852@ewok.creative.net.au> <200004141835.LAA71253@apollo.backplane.com> <20000418042733.I59015@ewok.creative.net.au> <00Apr17.185117edt.117127@gateway.borderware.com> <20000418174608.C71428@ewok.creative.net.au> <20000418141824.B25185@borderware.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <20000418141824.B25185@borderware.com>; from dchapes on Tue, Apr 18, 2000 at 02:18:24PM -0400 X-no-archive: yes Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, Apr 18, 2000 at 02:18:24PM -0400, dchapes wrote: > + if (vp->v_flag & VDOOMED|VXLOCK) { Of course this should be: > + if (vp->v_flag & (VDOOMED|VXLOCK)) { -- Dave Chapeskie Senior Software Engineer Borderware Technologies Inc. Mississauga, Ontario, Canada To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Apr 18 12:49:16 2000 Delivered-To: freebsd-fs@freebsd.org Received: from ewok.creative.net.au (fuzzy.aussie.com.au [203.30.44.82]) by hub.freebsd.org (Postfix) with SMTP id 3836337BB69 for ; Tue, 18 Apr 2000 12:49:12 -0700 (PDT) (envelope-from freebsd@ewok.creative.net.au) Received: (qmail 76128 invoked by uid 1008); 18 Apr 2000 19:49:08 -0000 Date: Wed, 19 Apr 2000 03:49:08 +0800 From: Adrian Chadd To: Dave Chapeskie Cc: Adrian Chadd , Matthew Dillon , freebsd-fs@FreeBSD.ORG Subject: Re: vnode_free_list corruption Message-ID: <20000419034906.I71428@ewok.creative.net.au> References: <00Apr14.141908edt.117140@gateway.borderware.com> <20000415023148.F34852@ewok.creative.net.au> <200004141835.LAA71253@apollo.backplane.com> <20000418042733.I59015@ewok.creative.net.au> <00Apr17.185117edt.117127@gateway.borderware.com> <20000418174608.C71428@ewok.creative.net.au> <00Apr18.141542edt.117123@gateway.borderware.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4i In-Reply-To: <00Apr18.141542edt.117123@gateway.borderware.com>; from Dave Chapeskie on Tue, Apr 18, 2000 at 02:18:24PM -0400 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, Apr 18, 2000, Dave Chapeskie wrote: > On Tue, Apr 18, 2000 at 05:46:11PM +0800, Adrian Chadd wrote: > > Yes, but from my take of the code, if a vnode reaches VDOOMED, its been > > earmarked for recycling and is in the process of being flushed. If > > the vnode is being used in the FS code somewhere (or anywhere for that > > matter :) it shouldn't ever be considered for recycling as it should be > > vref()'ed or at the least vhold()'ed. > > I see it the other way around, getnewvnode is perfectly within it's > rights to call vgonel() on a VOP_LOCKED vnode from what I can see (see > code comment below). I think the problem is that vhold and vbusy need > to be checking for VXLOCK and returning an error just the way vget does. > If that's the case it seems silly to have both vhold and vget. Hrm. When will you have a vnone which is VOP_LOCKED but not ref'ed or held? > The vclean() call (from vgonel called by getnewvnode) must be blocking. > It's the only place between getnewvnode()'s setting VDOOMED and it's > later clearing of the flags (assuming VXLOCK isn't already set) where it > can block. There is a comment in vclean() which says: > > /* > * Even if the count is zero, the VOP_INACTIVE routine may still > * have the object locked while it cleans it out. The VOP_LOCK > * ensures that the VOP_INACTIVE routine is done with its work. > * For active vnodes, it ensures that no other activity can > * occur while the underlying object is being cleaned out. > */ > VOP_LOCK(vp, LK_DRAIN | LK_INTERLOCK, p); > > Alternatively it may sometimes be blocking in the vinvalbuf() call. I've re-read the code *again* and I can see the code taking the vnode through being made inactive to possibly going to UFS_TRUNCATE and eventually end up at a vhold.But, the panics you've been throwing here on the list indicate a vnode being used in one process in some fs operation but having no refcount or holdcnt, and its then targeted for VDOOMED. Then the first process ends up at vbusy(), and things go strange from there. THIS is what sounds wrong, don't you agree? What you are saying is that getnewvnode is perfectly right to call vgonel() on a locked vnode if its not used or held. What I'm saying is tht its not right for getnewvnode() to recycle a vnode that is in the middle of some file op, which is what you've indicated as happening (I still can't reproduce the bug, if someone else out there can PLEASE tell me how :) People with vnode clue, please comment. I'm going to spend tomorrow looking at if and why the first is happening. I'll look at the patches you gave tomorrow. Adrian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Apr 20 3:57:42 2000 Delivered-To: freebsd-fs@freebsd.org Received: from florence.pavilion.net (florence.pavilion.net [212.74.0.25]) by hub.freebsd.org (Postfix) with ESMTP id B197537B883 for ; Thu, 20 Apr 2000 03:57:39 -0700 (PDT) (envelope-from joe@pavilion.net) Received: from genius.systems.pavilion.net (postfix@genius.systems.pavilion.net [212.74.1.100]) by florence.pavilion.net (8.9.3/8.8.8) with ESMTP id LAA92773 for ; Thu, 20 Apr 2000 11:56:56 +0100 (BST) (envelope-from joe@pavilion.net) Received: by genius.systems.pavilion.net (Postfix, from userid 100) id 8AC20338; Thu, 20 Apr 2000 11:57:33 +0100 (BST) Date: Thu, 20 Apr 2000 11:57:33 +0100 From: Joe Karthauser To: freebsd-fs@freebsd.org Subject: subscribe Message-ID: <20000420115733.B44137@pavilion.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i X-NCC-RegID: uk.pavilion Organisation: Pavilion Internet plc, Lees House, 21-23 Dyke Road, Brighton, England Phone: +44-845-333-5000 Fax: +44-845-333-5001 Mobile: +44-403-596893 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org subscribe To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message