From owner-freebsd-fs  Mon Apr 17 13:27:56 2000
Delivered-To: freebsd-fs@freebsd.org
Received: from ewok.creative.net.au (fuzzy.aussie.com.au [203.30.44.82])
	by hub.freebsd.org (Postfix) with SMTP id 9573A37B9A1
	for <freebsd-fs@FreeBSD.ORG>; Mon, 17 Apr 2000 13:27:48 -0700 (PDT)
	(envelope-from freebsd@ewok.creative.net.au)
Received: (qmail 67900 invoked by uid 1008); 17 Apr 2000 20:27:35 -0000
Date: Tue, 18 Apr 2000 04:27:35 +0800
From: Adrian Chadd <adrian@freebsd.org>
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: Adrian Chadd <adrian@FreeBSD.ORG>, freebsd-fs@FreeBSD.ORG,
	dchapes@borderware.com, freebsd-hackers@FreeBSD.ORG
Subject: Re: vnode_free_list corruption [patch]
Message-ID: <20000418042733.I59015@ewok.creative.net.au>
References: <00Apr14.141908edt.117140@gateway.borderware.com> <20000415023148.F34852@ewok.creative.net.au> <200004141835.LAA71253@apollo.backplane.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.95.4i
In-Reply-To: <200004141835.LAA71253@apollo.backplane.com>; from Matthew Dillon on Fri, Apr 14, 2000 at 11:35:21AM -0700
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Fri, Apr 14, 2000, Matthew Dillon wrote:
> 
> :On Fri, Apr 14, 2000, Dave Chapeskie wrote:
> :> Greetings.
> :> 
> :> I've been seeing a rash of "free vnode isn't" panics lately.  Some
> :> machines were panicing several times a day.  Along with this we saw
> :> occasional "object inconsistent state: RPC: %d, RC: %d" messages.
> :
> :
> :Throw it into a PR, and I'll assign it to myself and take a squizz..
> :
> :
> :Adrian
> 
>     I'll take a look at it too.  Either way we'll get something committed.
>     Beware, though, even though there is obviously a bug (Dave obviously 
>     found the bug!), the vgone/vdone/VDEAD interaction is extremely complex
>     so we have to be careful not to break other things while fixing this
>     one.

Ok, my take on the code is this:

* with the trace given, the vnode shouldn't even be marked VDOOMED, as its
  meant to be in use,
* a vnode shouldn't ever reach vbusy when marked VDOOMED, as it should be
  ref/held and so shouldn't ever be considered to be cleaned, 
* I think a KASSERT should be added in vbusy()

On my machine (400Mhz Celeron, 64mb RAM, single 4.2gig IDE disk) running
current from a day ago, I can't reproduce the bug. Are you running with
multiple spindles/softupdates ?

I'll look at the code some more over the next couple of days. Any opinions ?


Adrian


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Apr 17 15:54:37 2000
Delivered-To: freebsd-fs@freebsd.org
Received: from borderware.com (gateway.borderware.com [207.236.65.226])
	by hub.freebsd.org (Postfix) with ESMTP
	id DF60137BBA2; Mon, 17 Apr 2000 15:54:31 -0700 (PDT)
	(envelope-from dchapes@borderware.com)
Received: by gateway.borderware.com id <117127>; Mon, 17 Apr 2000 18:51:17 -0400
From: Dave Chapeskie <dchapes@borderware.com>
Message-Id: <00Apr17.185117edt.117127@gateway.borderware.com>
Date:  Mon, 17 Apr 2000 18:54:20 -0400
To: Adrian Chadd <adrian@freebsd.org>,
	Matthew Dillon <dillon@apollo.backplane.com>
Cc: freebsd-fs@freebsd.org
Subject: Re: vnode_free_list corruption [patch]
References: <00Apr14.141908edt.117140@gateway.borderware.com> <20000415023148.F34852@ewok.creative.net.au> <200004141835.LAA71253@apollo.backplane.com> <20000418042733.I59015@ewok.creative.net.au>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.93.2i
In-Reply-To: <20000418042733.I59015@ewok.creative.net.au>; from Adrian Chadd on Tue, Apr 18, 2000 at 04:27:35AM +0800
X-no-archive: yes
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

By the way, thanks for looking into this!

On Tue, Apr 18, 2000 at 04:27:35AM +0800, Adrian Chadd wrote:
> Ok, my take on the code is this:
> 
> * with the trace given, the vnode shouldn't even be marked VDOOMED, as its
>   meant to be in use,
> * a vnode shouldn't ever reach vbusy when marked VDOOMED, as it should be
>   ref/held and so shouldn't ever be considered to be cleaned, 
> * I think a KASSERT should be added in vbusy()

Since the situation is known to happen, at least I know it does :-),
I think it should be a real call to panic as in my patch instead of a
KASSERT that is only enabled if options INVARIANTS is used.  If the
system is fixed to prevent this situation then it can always be changed
to a KASSERT (if the quick check of the flag is too slow for people).


> On my machine (400Mhz Celeron, 64mb RAM, single 4.2gig IDE disk) running
> current from a day ago, I can't reproduce the bug. Are you running with
> multiple spindles/softupdates ?

Since I was able to reproduce it on machines with very different CPUs,
memory, and disks I didn't bother to include machine specifications.
The customers that were seeing the problem most often are running "high
end" (whatever that means) machines with SCSI disks.  The machine I used
for testing was a 200 MHz Pentium machine with IDE disks.  Softupdates
was never enabled on any of these systems.


Here is the dmesg output for the machine I did most of my testing on,
only partitions on wd2 were mounted during the tests.  On my workstation
with 64MB of RAM it took much longer to happen and I had some other
processes consuming memory, so it might be easier for you to reproduce
it if you lower your available system memory to 32 MB or less (via
MAXMEM or the boot loader of course).

Also, when I did my test if it didn't paniced within 10-15 minutes it
seemed to not panic at all.  I imagine that once the number of vnodes
grows to a certain size it's just less likely to happen or something.
Most often it would panic and not only that it paniced within a few
minutes.


Timecounter "i8254"  frequency 1193182 Hz
Timecounter "TSC"  frequency 200455163 Hz
CPU: Pentium/P55C (200.46-MHz 586-class CPU)
  Origin = "GenuineIntel"  Id = 0x543  Stepping = 3
  Features=0x8001bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,MMX>
real memory  = 33554432 (32768K bytes)
avail memory = 23916544 (23356K bytes)
Preloaded elf kernel "kernel" at 0xc0376000.
Preloaded userconfig_script "/boot/kernel.conf" at 0xc037609c.
Probing for devices on PCI bus 0:
chip0: <Intel 82439TX System Controller (MTXC)> rev 0x01 on pci0.0.0
chip1: <Intel 82371AB PCI to ISA bridge> rev 0x01 on pci0.1.0
ide_pci0: <Intel PIIX4 Bus-master IDE controller> rev 0x01 on pci0.1.1
chip2: <Intel 82371AB Power management controller> rev 0x01 on pci0.1.3
xl0: <3Com 3c905B-TX Fast Etherlink XL> rev 0x24 int a irq 9 on pci0.10.0
xl0: Ethernet address: 00:10:4b:9e:8c:b2
xl0: autoneg complete, link status good (full-duplex, 100Mbps)
xl1: <3Com 3c905B-TX Fast Etherlink XL> rev 0x24 int a irq 10 on pci0.11.0
xl1: Ethernet address: 00:10:4b:79:0a:10
xl1: autoneg complete, link status good (half-duplex, 10Mbps)
vr0: <VIA VT3043 Rhine I 10/100BaseTX> rev 0x06 int a irq 11 on pci0.12.0
vr0: Ethernet address: 00:80:c8:ec:73:51
vr0: autoneg complete, link status good (half-duplex, 100Mbps)
Probing for PnP devices:
Probing for devices on the ISA bus:
sc0 on isa
sc0: VGA color <16 virtual consoles, flags=0x0>
atkbdc0 at 0x60-0x6f on motherboard
atkbd0 irq 1 on isa
psm0 irq 12 on isa
psm0: model Generic PS/2 mouse, device ID 0
sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa
sio0: type 16550A
sio1 at 0x2f8-0x2ff irq 3 on isa
sio1: type 16550A
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1.44MB 3.5in
wdc0 at 0x1f0-0x1f7 irq 14 on isa
wdc0: unit 0 (wd0): <WDC AC23200L>
wd0: 3098MB (6346368 sectors), 6296 cyls, 16 heads, 63 S/T, 512 B/S
[^^^ not used/mounted at all ^^^]
wdc1 at 0x170-0x177 irq 15 on isa
wdc1: unit 0 (wd2): <FUJITSU MPC3032AT>
wd2: 3093MB (6335280 sectors), 6704 cyls, 15 heads, 63 S/T, 512 B/S
ida: port address (0xffffffff) out of range
Vendor Specific Word = ffff
Vendor Specific Word = ffff
Vendor Specific Word = ffff
Vendor Specific Word = ffff
Vendor Specific Word = ffff
vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa
npx0 on motherboard
npx0: INT 16 interface
Intel Pentium detected, installing workaround for F00F bug
IP packet filtering initialized, divert enabled, rule-based forwarding disabled, unlimited logging
changing root device to wd2s1a
Start pid=2 <pagedaemon>
Start pid=3 <vmdaemon>
Start pid=4 <syncer>
xl0: autoneg complete, link status good (full-duplex, 100Mbps)
xl1: autoneg complete, link status good (half-duplex, 10Mbps)
vr0: autoneg complete, link status good (half-duplex, 100Mbps)


> I'll look at the code some more over the next couple of days. Any opinions ?

I haven't had the time to look at the code since I came up with the
patch (which works for our setups so we're reasonably happy and I'm busy
doing other things) but after reading Kirk's opinions on the matter I'd
tend to agree and think that vbusy/vhold shouldn't be mucking with the
free list the way they do.

I'd guess that either they need to be able to check for and return
an error or else v_holdcnt should disappear in favour of just using
v_usecount.  I didn't see any semantically differences between the two
(but I didn't look too hard either).

-- 
Dave Chapeskie
Senior Software Engineer
Borderware Technologies Inc.
Mississauga, Ontario, Canada


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Apr 18  2:49:26 2000
Delivered-To: freebsd-fs@freebsd.org
Received: from ewok.creative.net.au (fuzzy.aussie.com.au [203.30.44.82])
	by hub.freebsd.org (Postfix) with SMTP id 4565737B6C3
	for <freebsd-fs@freebsd.org>; Tue, 18 Apr 2000 02:49:21 -0700 (PDT)
	(envelope-from freebsd@ewok.creative.net.au)
Received: (qmail 72134 invoked by uid 1008); 18 Apr 2000 09:46:11 -0000
Date: Tue, 18 Apr 2000 17:46:11 +0800
From: Adrian Chadd <adrian@freebsd.org>
To: Dave Chapeskie <dchapes@borderware.com>
Cc: Adrian Chadd <adrian@freebsd.org>,
	Matthew Dillon <dillon@apollo.backplane.com>, freebsd-fs@freebsd.org
Subject: Re: vnode_free_list corruption [patch]
Message-ID: <20000418174608.C71428@ewok.creative.net.au>
References: <00Apr14.141908edt.117140@gateway.borderware.com> <20000415023148.F34852@ewok.creative.net.au> <200004141835.LAA71253@apollo.backplane.com> <20000418042733.I59015@ewok.creative.net.au> <00Apr17.185117edt.117127@gateway.borderware.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.95.4i
In-Reply-To: <00Apr17.185117edt.117127@gateway.borderware.com>; from Dave Chapeskie on Mon, Apr 17, 2000 at 06:54:20PM -0400
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Mon, Apr 17, 2000, Dave Chapeskie wrote:
> By the way, thanks for looking into this!
> 
> On Tue, Apr 18, 2000 at 04:27:35AM +0800, Adrian Chadd wrote:
> > Ok, my take on the code is this:
> > 
> > * with the trace given, the vnode shouldn't even be marked VDOOMED, as its
> >   meant to be in use,
> > * a vnode shouldn't ever reach vbusy when marked VDOOMED, as it should be
> >   ref/held and so shouldn't ever be considered to be cleaned, 
> > * I think a KASSERT should be added in vbusy()
> 
> Since the situation is known to happen, at least I know it does :-),
> I think it should be a real call to panic as in my patch instead of a
> KASSERT that is only enabled if options INVARIANTS is used.  If the
> system is fixed to prevent this situation then it can always be changed
> to a KASSERT (if the quick check of the flag is too slow for people).

Yes, but from my take of the code, if a vnode reaches VDOOMED, its been
earmarked for recycling and is in the process of being flushed. If
the vnode is being used in the FS code somewhere (or anywhere for that
matter :) it shouldn't ever be considered for recycling as it should be
vref()'ed or at the least vhold()'ed.

> > On my machine (400Mhz Celeron, 64mb RAM, single 4.2gig IDE disk) running
> > current from a day ago, I can't reproduce the bug. Are you running with
> > multiple spindles/softupdates ?
> 
> Since I was able to reproduce it on machines with very different CPUs,
> memory, and disks I didn't bother to include machine specifications.
> The customers that were seeing the problem most often are running "high
> end" (whatever that means) machines with SCSI disks.  The machine I used
> for testing was a 200 MHz Pentium machine with IDE disks.  Softupdates
> was never enabled on any of these systems.
> 
> Here is the dmesg output for the machine I did most of my testing on,
> only partitions on wd2 were mounted during the tests.  On my workstation
> with 64MB of RAM it took much longer to happen and I had some other
> processes consuming memory, so it might be easier for you to reproduce
> it if you lower your available system memory to 32 MB or less (via
> MAXMEM or the boot loader of course).

Right, I'll drop MAXMEM down and try to starve the system further, and
see what happens.


Adrian


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Apr 18  4:37:51 2000
Delivered-To: freebsd-fs@freebsd.org
Received: from ewok.creative.net.au (fuzzy.aussie.com.au [203.30.44.82])
	by hub.freebsd.org (Postfix) with SMTP id 3BBAB37B523
	for <freebsd-fs@freebsd.org>; Tue, 18 Apr 2000 04:37:47 -0700 (PDT)
	(envelope-from freebsd@ewok.creative.net.au)
Received: (qmail 72570 invoked by uid 1008); 18 Apr 2000 11:37:42 -0000
Date: Tue, 18 Apr 2000 19:37:42 +0800
From: Adrian Chadd <adrian@freebsd.org>
To: freebsd-fs@freebsd.org
Subject: FFS and ints
Message-ID: <20000418193741.E71428@ewok.creative.net.au>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.95.4i
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


Hi,

I just waded through a friends of a friends FFS panic, and it turns uot
that the cg_rotor, cg_irotor and cg_frotor values were all outrageously
wrong. Now, how they got to be wrong is another matter entirely, but
the thing that kept tripping him was that fsck's cg checks were doing
stuff like :

if (cg->cg_frotor < newcg->cg_ndblk)
    newcg->cg_frotor = cg->cg_frotor;
else
    newcg->cg_frotor = 0;

Now, this makes sense EXCEPT that cg_rotor/frotor/irotor are defined
as int32_ts which mean any weirdnesses that corrupt these values to
negative values will not be picked up in fsck.

So, my question is this: should the cg definition change to change
things that should be unsigned to unsigned, or should fsck change
and we leave the kernel alone?


Adrian


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Apr 18  6:32: 4 2000
Delivered-To: freebsd-fs@freebsd.org
Received: from wcn4.wcnet.net (mail.wcnet.net [216.88.248.234])
	by hub.freebsd.org (Postfix) with ESMTP id B239B37B61C
	for <freebsd-fs@freebsd.org>; Tue, 18 Apr 2000 06:31:58 -0700 (PDT)
	(envelope-from jestess@wcnet.net)
Received: from wcnet.net [216.88.249.119] by wcn4.wcnet.net with ESMTP
  (SMTPD32-6.00) id A3CB3DD301EC; Tue, 18 Apr 2000 08:31:55 -0500
Message-ID: <38FC6474.7E16D9EF@wcnet.net>
Date: Tue, 18 Apr 2000 08:34:44 -0500
From: John Estess <jestess@wcnet.net>
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.0.36 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: Re: FFS and ints
Content-Type: multipart/mixed;
 boundary="------------4AAF98DA7ADECDBBB93BEEF3"
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

This is a multi-part message in MIME format.
--------------4AAF98DA7ADECDBBB93BEEF3
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Adrian Chadd wrote:
 
> Now, this makes sense EXCEPT that cg_rotor/frotor/irotor are defined
> as int32_ts which mean any weirdnesses that corrupt these values to
> negative values will not be picked up in fsck.
> 
> So, my question is this: should the cg definition change to change
> things that should be unsigned to unsigned, or should fsck change
> and we leave the kernel alone?
> 
> Adrian

Back to memory lane...

About two months ago I was going through warnings generated by fsck
during the compile and discovered three unsigned to signed warnings.

/usr/src/sbin/fsck/dir.c: In function `dirscan':
/usr/src/sbin/fsck/dir.c:127: warning: comparison between signed and
unsigned
/usr/src/sbin/fsck/dir.c: In function `expanddir':
/usr/src/sbin/fsck/dir.c:620: warning: comparison between signed and
unsigned
/usr/src/sbin/fsck/dir.c:634: warning: comparison between signed and
unsigned

One of these is easily gotten rid of in fsck. The others can be tracked
to (the input to )/ffs/fs.h to some uncapitalized macros (below - those
bastards :-)). If you can cast input to be unsigned for everything, you
might get rid of the unsigned/signed compares. A macro, or god forbid, a
function - if it wasn't too slow, could take the place of this in
fsck.h, if they aren't used elsewhere (I'll grep world later). Since I'm
working nights (6pm to 6am - 5 days a week) and I'm moving by the end of
the month, I have no time for this. Also, I'm still cutting my teeth on
fs stuff, so everything I've written could be wrong...


#define dblksize(fs, dip, lbn) \
        (((lbn) >= NDADDR || (dip)->di_size >= smalllblktosize(fs, (lbn)
+ 1)) \
            ? (fs)->fs_bsize \
            : (fragroundup(fs, blkoff(fs, (dip)->di_size))))
#define smalllblktosize(fs, blk)    /* calculates (blk * fs->fs_bsize)
*/ \
        ((blk) << (fs)->fs_bshift)
--------------4AAF98DA7ADECDBBB93BEEF3
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

X-Mozilla-Status2: 00000000
Message-ID: <38FC6408.CD7DFB2E@wcnet.net>
Date: Tue, 18 Apr 2000 08:32:56 -0500
From: John Estess <jestess@wcnet.net>
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.0.36 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Adrian Chadd <adrian@freebsd.org>
Subject: Re: FFS and ints
References: <20000418193741.E71428@ewok.creative.net.au>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Adrian Chadd wrote:
 
> Now, this makes sense EXCEPT that cg_rotor/frotor/irotor are defined
> as int32_ts which mean any weirdnesses that corrupt these values to
> negative values will not be picked up in fsck.
> 
> So, my question is this: should the cg definition change to change
> things that should be unsigned to unsigned, or should fsck change
> and we leave the kernel alone?
> 
> Adrian

Back to memory lane...

About two months ago I was going through warnings generated by fsck
during the compile and discovered three unsigned to signed warnings.

/usr/src/sbin/fsck/dir.c: In function `dirscan':
/usr/src/sbin/fsck/dir.c:127: warning: comparison between signed and
unsigned
/usr/src/sbin/fsck/dir.c: In function `expanddir':
/usr/src/sbin/fsck/dir.c:620: warning: comparison between signed and
unsigned
/usr/src/sbin/fsck/dir.c:634: warning: comparison between signed and
unsigned

One of these is easily gotten rid of in fsck. The others can be tracked
to (the input to )/ffs/fs.h to some uncapitalized macros (below - those
bastards :-)). If you can cast input to be unsigned for everything, you
might get rid of the unsigned/signed compares. A macro, or god forbid, a
function - if it wasn't too slow, could take the place of this in
fsck.h, if they aren't used elsewhere (I'll grep world later). Since I'm
working nights (6pm to 6am - 5 days a week) and I'm moving by the end of
the month, I have no time for this. Also, I'm still cutting my teeth on
fs stuff, so everything I've written could be wrong...


#define dblksize(fs, dip, lbn) \
        (((lbn) >= NDADDR || (dip)->di_size >= smalllblktosize(fs, (lbn)
+ 1)) \
            ? (fs)->fs_bsize \
            : (fragroundup(fs, blkoff(fs, (dip)->di_size))))
#define smalllblktosize(fs, blk)    /* calculates (blk * fs->fs_bsize)
*/ \
        ((blk) << (fs)->fs_bshift)

--------------4AAF98DA7ADECDBBB93BEEF3--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Apr 18 11:18:40 2000
Delivered-To: freebsd-fs@freebsd.org
Received: from borderware.com (gateway.borderware.com [207.236.65.226])
	by hub.freebsd.org (Postfix) with ESMTP
	id 0CBD837BA4C; Tue, 18 Apr 2000 11:18:32 -0700 (PDT)
	(envelope-from dchapes@borderware.com)
Received: by gateway.borderware.com id <117123>; Tue, 18 Apr 2000 14:15:42 -0400
From: Dave Chapeskie <dchapes@borderware.com>
Message-Id: <00Apr18.141542edt.117123@gateway.borderware.com>
Date:  Tue, 18 Apr 2000 14:18:24 -0400
To: Adrian Chadd <adrian@FreeBSD.ORG>
Cc: Matthew Dillon <dillon@apollo.backplane.com>,
	freebsd-fs@FreeBSD.ORG
Subject: Re: vnode_free_list corruption
References: <00Apr14.141908edt.117140@gateway.borderware.com> <20000415023148.F34852@ewok.creative.net.au> <200004141835.LAA71253@apollo.backplane.com> <20000418042733.I59015@ewok.creative.net.au> <00Apr17.185117edt.117127@gateway.borderware.com> <20000418174608.C71428@ewok.creative.net.au>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary=BOKacYhQ+x31HxR3
X-Mailer: Mutt 0.93.2i
In-Reply-To: <20000418174608.C71428@ewok.creative.net.au>; from Adrian Chadd on Tue, Apr 18, 2000 at 05:46:11PM +0800
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


--BOKacYhQ+x31HxR3
Content-Type: text/plain; charset=us-ascii

On Tue, Apr 18, 2000 at 05:46:11PM +0800, Adrian Chadd wrote:
> Yes, but from my take of the code, if a vnode reaches VDOOMED, its been
> earmarked for recycling and is in the process of being flushed. If
> the vnode is being used in the FS code somewhere (or anywhere for that
> matter :) it shouldn't ever be considered for recycling as it should be
> vref()'ed or at the least vhold()'ed.

I see it the other way around, getnewvnode is perfectly within it's
rights to call vgonel() on a VOP_LOCKED vnode from what I can see (see
code comment below).  I think the problem is that vhold and vbusy need
to be checking for VXLOCK and returning an error just the way vget does.
If that's the case it seems silly to have both vhold and vget.


The vclean() call (from vgonel called by getnewvnode) must be blocking.
It's the only place between getnewvnode()'s setting VDOOMED and it's
later clearing of the flags (assuming VXLOCK isn't already set) where it
can block.  There is a comment in vclean() which says:

	/*
	 * Even if the count is zero, the VOP_INACTIVE routine may still
	 * have the object locked while it cleans it out. The VOP_LOCK
	 * ensures that the VOP_INACTIVE routine is done with its work.
	 * For active vnodes, it ensures that no other activity can
	 * occur while the underlying object is being cleaned out.
	 */
	VOP_LOCK(vp, LK_DRAIN | LK_INTERLOCK, p);

Alternatively it may sometimes be blocking in the vinvalbuf() call.

I just repeated the problem again with some extra kernel debugging.  One
of the 'head' processes calls getnewvnode and it blocks with a wait
message of "inode", I think that means its blocked in the VOP_LOCK call
waiting for the VOP_LOCK.


vprint() from the getnewvnode call looks like this:

getnewvnode: 0xcd2e7200: type VREG, usecount 0, writecount 0,
		refcount 0, flags (VFREE)
	tag VT_UFS, ino 27201, on dev 0x20015 (0, 131093)
		lock type inode: EXCL (count 1) by pid 611
getnewvnode: pid 1455 recycling VOP_ISLOCKED vnode!


The vprint() from vbusy looks like this:

vbusy: 0xcd2e7200: type VREG, usecount 0, writecount 0,
		refcount 1, flags (VXLOCK|VDOOMED|VFREE)
	tag VT_UFS, ino 27201, on dev 0x20015 (0, 131093)
		lock type inode: EXCL (count 1) by pid 611
panic: vbusy on VDOOMED vnode


pid 611 is 'rm', pid 1455 is 'head' with a wait message of "inode".

So it looks like my panic call in vbusy should be checking for VXLOCK
instead of VDOOMED (since the later is only set/checked from within
getnewvnode it's better not to make other parts of the system know about
it).


> Right, I'll drop MAXMEM down and try to starve the system further, and
> see what happens.

Also make sure softupdates is off since it could easily be changing the
timing (all my test were done with it off).

Also try the attached patch in order to make the problem easier to
replicate.  It makes getnewvnode _try_ and find a VOP_LOCKED vnode to
recycle.  It still only picks vnodes that might otherwise have been
selected (if they were closer to the front of the inactive list).  It of
course slows things down since it often walks the complete free list but
that shouldn't matter for the purposes of this test.

With similar changes in my kernel it can still takes a couple of minutes
for the vbusy panic to occur (although I can do it with fewer instances
of my test scripts running).  You'll probably notice that getnewvnode
does successfully recycle several VOP_LOCKED vnodes before vbusy() gets
called on one.

-- 
Dave Chapeskie
Senior Software Engineer
Borderware Technologies Inc.
Mississauga, Ontario, Canada

--BOKacYhQ+x31HxR3
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="test.diff"

diff -u -r1.253 vfs_subr.c
--- kern/vfs_subr.c	2000/03/20 11:28:45	1.253
+++ kern/vfs_subr.c	2000/04/18 18:12:58
@@ -112,6 +112,14 @@
 SYSCTL_INT(_vfs, OID_AUTO, reassignbufsortbad, CTLFLAG_RW, &reassignbufsortbad, 0, "");
 static int reassignbufmethod = 1;
 SYSCTL_INT(_vfs, OID_AUTO, reassignbufmethod, CTLFLAG_RW, &reassignbufmethod, 0, "");
+#ifdef DDB
+static int skip_vop_locked = 0;
+SYSCTL_INT(_debug, OID_AUTO, skip_vop_locked, CTLFLAG_RW, &skip_vop_locked, 0,
+		"move VOP_LOCKED vnodes to the back of the free list");
+static int find_vop_locked = 1;
+SYSCTL_INT(_debug, OID_AUTO, find_vop_locked, CTLFLAG_RW, &find_vop_locked, 0,
+		"try and cause problems by looking for VOP_LOCKED vnodes to recycle");
+#endif
 
 #ifdef ENABLE_VFS_IOOPT
 int vfs_ioopt = 0;
@@ -453,6 +461,9 @@
 	struct vnode *vp, *tvp, *nvp;
 	vm_object_t object;
 	TAILQ_HEAD(freelst, vnode) vnode_tmp_list;
+#ifdef DDB
+	struct vnode *non_locked = NULL;
+#endif
 
 	/*
 	 * We take the least recently used vnode from the freelist
@@ -507,7 +518,35 @@
 				/* Don't recycle if active in the namecache */
 				simple_unlock(&vp->v_interlock);
 				continue;
+			} else if (VOP_ISLOCKED(vp)) {
+#ifdef DDB
+				vprint("getnewvnode", vp);
+				if (!skip_vop_locked) {
+					printf (getnewvnode: "pid %ld recycling"
+					    " VOP_ISLOCKED vnode!\n",
+					    curproc ? curproc->p_pid : 0);
+					break;
+				}
+				printf ("getnewvnode: pushing VOP_ISLOCKED"
+				    " vnode to end of list\n");
+#endif
+				TAILQ_REMOVE(&vnode_free_list, vp, v_freelist);
+				TAILQ_INSERT_TAIL(&vnode_tmp_list, vp, v_freelist);
+				continue;
 			} else {
+#ifdef DDB
+				if (!skip_vop_locked && find_vop_locked) {
+					/*
+					 * To illistrate a problem look for
+					 * VOP_LOCKED vnodes to recycle,
+					 * but remember the first non-locked
+					 * vnode
+					 */
+					if (non_locked == NULL)
+						non_locked = vp;
+					continue;
+				} else
+#endif
 				break;
 			}
 		}
@@ -520,6 +559,11 @@
 		simple_unlock(&tvp->v_interlock);
 	}
 
+#ifdef DDB
+	/* If there are no locked vnodes, use the first non-locked one */
+	if (vp == NULL && non_locked != NULL)
+		vp = non_locked;
+#endif
 	if (vp) {
 		vp->v_flag |= VDOOMED;
 		TAILQ_REMOVE(&vnode_free_list, vp, v_freelist);
@@ -2613,6 +2657,13 @@
 	int s;
 
 	s = splbio();
+	if (vp->v_flag & VDOOMED|VXLOCK) {
+#ifdef DIAGNOSTIC
+		vprint ("vbusy", vp);
+		printf ("vbusy by pid %ld\n", curproc ? curproc->p_pid : 0);
+#endif
+		panic ("vbusy on VDOOMED or VXLOCKed vnode");
+	}
 	simple_lock(&vnode_free_list_slock);
 	if (vp->v_flag & VTBFREE) {
 		TAILQ_REMOVE(&vnode_tobefree_list, vp, v_freelist);

--BOKacYhQ+x31HxR3--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Apr 18 11:28:37 2000
Delivered-To: freebsd-fs@freebsd.org
Received: from borderware.com (gateway.borderware.com [207.236.65.226])
	by hub.freebsd.org (Postfix) with ESMTP
	id B337737BAF9; Tue, 18 Apr 2000 11:28:32 -0700 (PDT)
	(envelope-from dchapes@borderware.com)
Received: by gateway.borderware.com id <117125>; Tue, 18 Apr 2000 14:25:38 -0400
From: Dave Chapeskie <dchapes@borderware.com>
Message-Id: <00Apr18.142538edt.117125@gateway.borderware.com>
Date:  Tue, 18 Apr 2000 14:28:22 -0400
To: Adrian Chadd <adrian@FreeBSD.ORG>
Cc: Matthew Dillon <dillon@apollo.backplane.com>,
	freebsd-fs@FreeBSD.ORG
Subject: Re: vnode_free_list corruption
References: <00Apr14.141908edt.117140@gateway.borderware.com> <20000415023148.F34852@ewok.creative.net.au> <200004141835.LAA71253@apollo.backplane.com> <20000418042733.I59015@ewok.creative.net.au> <00Apr17.185117edt.117127@gateway.borderware.com> <20000418174608.C71428@ewok.creative.net.au> <20000418141824.B25185@borderware.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.93.2i
In-Reply-To: <20000418141824.B25185@borderware.com>; from dchapes on Tue, Apr 18, 2000 at 02:18:24PM -0400
X-no-archive: yes
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Tue, Apr 18, 2000 at 02:18:24PM -0400, dchapes wrote:
> +	if (vp->v_flag & VDOOMED|VXLOCK) {

Of course this should be:
> +	if (vp->v_flag & (VDOOMED|VXLOCK)) {

-- 
Dave Chapeskie
Senior Software Engineer
Borderware Technologies Inc.
Mississauga, Ontario, Canada


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Apr 18 12:49:16 2000
Delivered-To: freebsd-fs@freebsd.org
Received: from ewok.creative.net.au (fuzzy.aussie.com.au [203.30.44.82])
	by hub.freebsd.org (Postfix) with SMTP id 3836337BB69
	for <freebsd-fs@FreeBSD.ORG>; Tue, 18 Apr 2000 12:49:12 -0700 (PDT)
	(envelope-from freebsd@ewok.creative.net.au)
Received: (qmail 76128 invoked by uid 1008); 18 Apr 2000 19:49:08 -0000
Date: Wed, 19 Apr 2000 03:49:08 +0800
From: Adrian Chadd <adrian@freebsd.org>
To: Dave Chapeskie <dchapes@borderware.com>
Cc: Adrian Chadd <adrian@FreeBSD.ORG>,
	Matthew Dillon <dillon@apollo.backplane.com>, freebsd-fs@FreeBSD.ORG
Subject: Re: vnode_free_list corruption
Message-ID: <20000419034906.I71428@ewok.creative.net.au>
References: <00Apr14.141908edt.117140@gateway.borderware.com> <20000415023148.F34852@ewok.creative.net.au> <200004141835.LAA71253@apollo.backplane.com> <20000418042733.I59015@ewok.creative.net.au> <00Apr17.185117edt.117127@gateway.borderware.com> <20000418174608.C71428@ewok.creative.net.au> <00Apr18.141542edt.117123@gateway.borderware.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.95.4i
In-Reply-To: <00Apr18.141542edt.117123@gateway.borderware.com>; from Dave Chapeskie on Tue, Apr 18, 2000 at 02:18:24PM -0400
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Tue, Apr 18, 2000, Dave Chapeskie wrote:
> On Tue, Apr 18, 2000 at 05:46:11PM +0800, Adrian Chadd wrote:
> > Yes, but from my take of the code, if a vnode reaches VDOOMED, its been
> > earmarked for recycling and is in the process of being flushed. If
> > the vnode is being used in the FS code somewhere (or anywhere for that
> > matter :) it shouldn't ever be considered for recycling as it should be
> > vref()'ed or at the least vhold()'ed.
> 
> I see it the other way around, getnewvnode is perfectly within it's
> rights to call vgonel() on a VOP_LOCKED vnode from what I can see (see
> code comment below).  I think the problem is that vhold and vbusy need
> to be checking for VXLOCK and returning an error just the way vget does.
> If that's the case it seems silly to have both vhold and vget.

Hrm. When will you have a vnone which is VOP_LOCKED but not ref'ed or
held?


> The vclean() call (from vgonel called by getnewvnode) must be blocking.
> It's the only place between getnewvnode()'s setting VDOOMED and it's
> later clearing of the flags (assuming VXLOCK isn't already set) where it
> can block.  There is a comment in vclean() which says:
> 
> 	/*
> 	 * Even if the count is zero, the VOP_INACTIVE routine may still
> 	 * have the object locked while it cleans it out. The VOP_LOCK
> 	 * ensures that the VOP_INACTIVE routine is done with its work.
> 	 * For active vnodes, it ensures that no other activity can
> 	 * occur while the underlying object is being cleaned out.
> 	 */
> 	VOP_LOCK(vp, LK_DRAIN | LK_INTERLOCK, p);
> 
> Alternatively it may sometimes be blocking in the vinvalbuf() call.

I've re-read the code *again* and I can see the code taking the vnode
through being made inactive to possibly going to UFS_TRUNCATE and
eventually end up at a vhold.But, the panics you've been throwing
here on the list indicate a vnode being used in one process in some
fs operation but having no refcount or holdcnt, and its then targeted
for VDOOMED. Then the first process ends up at vbusy(), and things
go strange from there. THIS is what sounds wrong, don't you agree?

What you are saying is that getnewvnode is perfectly right to call vgonel()
on a locked vnode if its not used or held. What I'm saying is tht
its not right for getnewvnode() to recycle a vnode that is in the middle
of some file op, which is what you've indicated as happening (I still can't
reproduce the bug, if someone else out there can PLEASE tell me how :)

People with vnode clue, please comment. I'm going to spend tomorrow
looking at if and why the first is happening. 

I'll look at the patches you gave tomorrow.


Adrian


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Apr 20  3:57:42 2000
Delivered-To: freebsd-fs@freebsd.org
Received: from florence.pavilion.net (florence.pavilion.net [212.74.0.25])
	by hub.freebsd.org (Postfix) with ESMTP id B197537B883
	for <freebsd-fs@freebsd.org>; Thu, 20 Apr 2000 03:57:39 -0700 (PDT)
	(envelope-from joe@pavilion.net)
Received: from genius.systems.pavilion.net (postfix@genius.systems.pavilion.net [212.74.1.100])
	by florence.pavilion.net (8.9.3/8.8.8) with ESMTP id LAA92773
	for <freebsd-fs@freebsd.org>; Thu, 20 Apr 2000 11:56:56 +0100 (BST)
	(envelope-from joe@pavilion.net)
Received: by genius.systems.pavilion.net (Postfix, from userid 100)
	id 8AC20338; Thu, 20 Apr 2000 11:57:33 +0100 (BST)
Date: Thu, 20 Apr 2000 11:57:33 +0100
From: Joe Karthauser <joe@pavilion.net>
To: freebsd-fs@freebsd.org
Subject: subscribe
Message-ID: <20000420115733.B44137@pavilion.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0.1i
X-NCC-RegID: uk.pavilion
Organisation: Pavilion Internet plc, Lees House, 21-23 Dyke Road, Brighton, England
Phone: +44-845-333-5000
Fax: +44-845-333-5001
Mobile: +44-403-596893
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

subscribe


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message