Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Aug 2002 00:00:11 -0700 (PDT)
From:      Vallo Kallaste <vallo@estcard.ee>
To:        freebsd-bugs@FreeBSD.org
Subject:   Re: kern/41740: vinum issues: page fault while rebuilding; inability to hot-rebuild striped plexes
Message-ID:  <200208190700.g7J70BJV010477@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/41740; it has been noted by GNATS.

From: Vallo Kallaste <vallo@estcard.ee>
To: Doug Swarin <doug@texas.net>
Cc: freebsd-gnats-submit@FreeBSD.ORG, grog@lemis.com
Subject: Re: kern/41740: vinum issues: page fault while rebuilding; inability to hot-rebuild striped plexes
Date: Mon, 19 Aug 2002 09:51:14 +0300

 On Fri, Aug 16, 2002 at 06:06:51PM -0700, Doug Swarin <doug@texas.net> wrote:
 
 > >Number:         41740
 > >Category:       kern
 > >Synopsis:       vinum issues: page fault while rebuilding; inability to hot-rebuild striped plexes
 > >Confidential:   no
 > >Severity:       serious
 > >Priority:       medium
 > >Responsible:    freebsd-bugs
 > >State:          open
 > >Quarter:        
 > >Keywords:       
 > >Date-Required:
 > >Class:          sw-bug
 > >Submitter-Id:   current-users
 > >Arrival-Date:   Fri Aug 16 18:10:03 PDT 2002
 > >Closed-Date:
 > >Last-Modified:
 > >Originator:     Doug Swarin
 > >Release:        4-STABLE
 > >Organization:
 > >Environment:
 > FreeBSD vmware.localdomain 4.6-STABLE #12: Fri Aug 16 16:29:37 CDT 2002 root@vmware.localdomain:/usr/obj/usr/src/sys/VMWARE i386
 > >Description:
 >       1. The launch_requests() function in vinumrequest.c needs splbio() protection around the lower loop. Without splbio(), complete_rqe() may be called at splx() in BUF_STRATEGY(). If there are inactive rqgs in rq (for example, with XFR_BAD_SUBDISK), rq may be deallocated before the loop completes walking the rqg queue in rq, causing either a page fault or an infinite loop.
 > 
 >       2. A striped plex cannot be safely hot-rebuilt, and there is no warning as such in the documentation. Because all requests to the rebuilding plex return REQUEST_DOWN, the two plexes will be inconsistent after the rebuild finishes since writes to the already-rebuilt region of the rebuilding plex will only be written to the good plex.
 > >How-To-Repeat:
 >       1. Create a pair of striped plexes as a single volume. 'vinum stop' one plex, then 'vinum start' it to start it rebuilding. Run postmark or perform other heavy activity against the mounted filesystem while the rebuild takes place.
 > 
 >       2. After the above hot-rebuild, demount it, fsck, and watch the errors fly. The splbio() fix will probably need to be applied before the hot-rebuild will succeed.
 > >Fix:
 >       1. Add 'int s;' to the top of launch_requests() and 's = splbio();' at line 395 and 'splx(s);' at line 439. I apologize for not providing an actual diff, because I am using the web form to submit this.
 > 
 >       2. Add a mention to the documentation not to hot-rebuild a striped plex. The long-term fix would be to do the missing code in checksdstate() in vinumstate.c to return the proper result for a striped plex.
 > >Release-Note:
 > >Audit-Trail:
 > >Unformatted:
 
 This behaviour (corrupt FS after hot-rebuild involving user I/O at
 the same time) is same as I discovered for RAID-5 volume long ago. I
 don't have necessary hardware at the moment, but could it be this
 will fix RAID-5 hot-rebuild problem also?
 -- 
 
 Vallo Kallaste
 vallo@estcard.ee

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200208190700.g7J70BJV010477>