Date: Mon, 19 Aug 2002 00:00:11 -0700 (PDT) From: Vallo Kallaste <vallo@estcard.ee> To: freebsd-bugs@FreeBSD.org Subject: Re: kern/41740: vinum issues: page fault while rebuilding; inability to hot-rebuild striped plexes Message-ID: <200208190700.g7J70BJV010477@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/41740; it has been noted by GNATS. From: Vallo Kallaste <vallo@estcard.ee> To: Doug Swarin <doug@texas.net> Cc: freebsd-gnats-submit@FreeBSD.ORG, grog@lemis.com Subject: Re: kern/41740: vinum issues: page fault while rebuilding; inability to hot-rebuild striped plexes Date: Mon, 19 Aug 2002 09:51:14 +0300 On Fri, Aug 16, 2002 at 06:06:51PM -0700, Doug Swarin <doug@texas.net> wrote: > >Number: 41740 > >Category: kern > >Synopsis: vinum issues: page fault while rebuilding; inability to hot-rebuild striped plexes > >Confidential: no > >Severity: serious > >Priority: medium > >Responsible: freebsd-bugs > >State: open > >Quarter: > >Keywords: > >Date-Required: > >Class: sw-bug > >Submitter-Id: current-users > >Arrival-Date: Fri Aug 16 18:10:03 PDT 2002 > >Closed-Date: > >Last-Modified: > >Originator: Doug Swarin > >Release: 4-STABLE > >Organization: > >Environment: > FreeBSD vmware.localdomain 4.6-STABLE #12: Fri Aug 16 16:29:37 CDT 2002 root@vmware.localdomain:/usr/obj/usr/src/sys/VMWARE i386 > >Description: > 1. The launch_requests() function in vinumrequest.c needs splbio() protection around the lower loop. Without splbio(), complete_rqe() may be called at splx() in BUF_STRATEGY(). If there are inactive rqgs in rq (for example, with XFR_BAD_SUBDISK), rq may be deallocated before the loop completes walking the rqg queue in rq, causing either a page fault or an infinite loop. > > 2. A striped plex cannot be safely hot-rebuilt, and there is no warning as such in the documentation. Because all requests to the rebuilding plex return REQUEST_DOWN, the two plexes will be inconsistent after the rebuild finishes since writes to the already-rebuilt region of the rebuilding plex will only be written to the good plex. > >How-To-Repeat: > 1. Create a pair of striped plexes as a single volume. 'vinum stop' one plex, then 'vinum start' it to start it rebuilding. Run postmark or perform other heavy activity against the mounted filesystem while the rebuild takes place. > > 2. After the above hot-rebuild, demount it, fsck, and watch the errors fly. The splbio() fix will probably need to be applied before the hot-rebuild will succeed. > >Fix: > 1. Add 'int s;' to the top of launch_requests() and 's = splbio();' at line 395 and 'splx(s);' at line 439. I apologize for not providing an actual diff, because I am using the web form to submit this. > > 2. Add a mention to the documentation not to hot-rebuild a striped plex. The long-term fix would be to do the missing code in checksdstate() in vinumstate.c to return the proper result for a striped plex. > >Release-Note: > >Audit-Trail: > >Unformatted: This behaviour (corrupt FS after hot-rebuild involving user I/O at the same time) is same as I discovered for RAID-5 volume long ago. I don't have necessary hardware at the moment, but could it be this will fix RAID-5 hot-rebuild problem also? -- Vallo Kallaste vallo@estcard.ee To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200208190700.g7J70BJV010477>