From owner-freebsd-stable@FreeBSD.ORG Thu Apr 25 15:58:20 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CEB31DEA for ; Thu, 25 Apr 2013 15:58:20 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta03.emeryville.ca.mail.comcast.net (qmta03.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:32]) by mx1.freebsd.org (Postfix) with ESMTP id B38681CC2 for ; Thu, 25 Apr 2013 15:58:20 +0000 (UTC) Received: from omta22.emeryville.ca.mail.comcast.net ([76.96.30.89]) by qmta03.emeryville.ca.mail.comcast.net with comcast id UCgz1l0031vN32cA3FyLDD; Thu, 25 Apr 2013 15:58:20 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta22.emeryville.ca.mail.comcast.net with comcast id UFyK1l00E1t3BNj8iFyKRq; Thu, 25 Apr 2013 15:58:19 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 0554673A1B; Thu, 25 Apr 2013 08:58:19 -0700 (PDT) Date: Thu, 25 Apr 2013 08:58:19 -0700 From: Jeremy Chadwick To: Guy Helmer Subject: Re: FreeBSD 9: fdisk -It crashes kernel Message-ID: <20130425155818.GA8454@icarus.home.lan> References: <80F41679-9C3A-4E61-8AAD-403410344C32@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <80F41679-9C3A-4E61-8AAD-403410344C32@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1366905500; bh=hVtfpEXlRYU0NTwKuAMIa/NhZsi73Dk4eEKoX5AuDDU=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=p0emkx2SBBR2DV4SfINJAiqsrPkVCPMv/3XbA+uxP/HyxRMZeVXFgubnj9Rs5/Fx+ O3sVgMl6YGoewG3jm8+fGv3sAidZvaz9myAEBhlKA3GEgZNTX4Vnu1Ue97aIAs8eVQ PZlj851U3DuFYCl8032LNRur2S053k5Rz1oauoc68o4pYfgX+AW98AlES+C9SASQSp IrTUM9PddSwdyDak688qXc3aXKb4Ju+8XnpjB0dUJ+8Xrkewy1MBeb4Qr3BfQGd0PK AWFI+g7Kj4p/r6+kwA+WPMVhbHVGDB2CYF8U4eZ2LTGgLRtMfgway3XEbC5cXdQ12I ZrVNA4Kqqtvow== Cc: FreeBSD Stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Apr 2013 15:58:21 -0000 On Thu, Apr 25, 2013 at 09:06:49AM -0500, Guy Helmer wrote: > Encountered a surprise when my disk resizing rc.d script caused FreeBSD 9.1-STABLE to crash. I used "fdisk -It ada0" to determine what the available size of the disk (which happened to be the root disk), and on FreeBSD 9.1 the kernel comes crashing down: > > + fdisk -It ada0 > + /rescue/sed -En 's,.*start ([0-9]+).*size ([0-9]+).*,\1 + \2,p' > vnode_pager_getpages: I/O read error > vm_fault: pager read error, pid 65 (fdisk) > pid 65 (fdisk), uid 0: exited on signal 11 > eval: arithmetic expression: expecting primary: "" > Entropy harvesting: point_to_pointeval: date: Device not configured > eval: df: Device not configured > eval: dmesg: Device not configured > cat: /bin/ls: Device not configured > kickstart. > eval: cannot open /etc/fstab: Device not configured > eval: cannot open /etc/fstab: Device not configured > eval: swapon: Device not configured > Warning! No /etc/fstab: skipping disk checks > fstab: /etc/fstab:0: Device not configured > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0x0 > fault code = supervisor read, page not present > instruction pointer = 0x20:0xc0825fc4 > stack pointer = 0x28:0xc5a088c8 > frame pointer = 0x28:0xc5a08914 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DLP 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 91 (mount) > [ thread pid 91 tid 100056 ] > Stopped at g_access+0x24: mlvl 0(%ebx),%eax > db> where > Tracing pid 91 tid 100056 td 0xc84c42f0 > g_access(c8481d34,0,1,1,0,…) at g_access+0x24/frame 0xc5a08914 > ffs_mount(c8481d34,c0d78380,2,c5a08c00,c829ae6c,…) af ffs_mount+0xf74/frame 0xc5a08a34 > vfs_donmount(c84c42f0,10000,0,c84cf200,c84cf200,…) at vfs_donmount+0x1423/frame 0xc5a08c24 > sys_nmount(c84c42f0,c5a08ccc,c5a08cc4,1010006,c5a08d08,…) at sys_nmount+0x7f/frame 0xc5a08c48 > syscall(c5a08d08) at syscall+0x443/frame 0xc508cfc > Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xc5a08cfc > --- syscall (378, FreeBSD ELF32, sys_nmount), eip = 0x480d5feb, esp = 0xbfbfce1c, ebp = 0xbfbfd378 --- > > I'll fix my script to not do this, but it seems odd that fdisk -It can make the disk "go away". Please provide a full, unmodified copy of your script. What's confusing to me is that after your sed call (which I don't even understand, because it doesn't appear to be operating on anything except stdin/stdout, and we don't know what that is -- again, show the script), the kernel starts outputting indications that the root disk/filesystem or its related metadata disappeared: > vnode_pager_getpages: I/O read error > vm_fault: pager read error, pid 65 (fdisk) > pid 65 (fdisk), uid 0: exited on signal 11 Except the kernel stack trace indicates something called sys_nmount(), which called vfs_donmount(), which called ffs_mount(), which calls g_access(). All of those scream to me "someone tried to mount something". fdisk does not do mounting. fdisk also shouldn't be writing to LBA 0 (the MBR) if you used -I -t. I've been staring at fdisk.c for about 20 minutes now and I can't work out a situation where -I -t would cause the MBR to be rewritten actively. The only GEOM calls I see in fdisk.c that would get called are g_device_path(), g_open(), and g_close(). Actual device I/O uses read() and write() (only in write_s0() which shouldn't be called). Furthermore, GEOM has foot-shooting-prevention mechanisms in place (I'm talking about kern.geom.debugflags) to keep LBA 0 from being modified. Is your script setting that sysctl to 16/0x10 blindly? Ahem. It would also help if you could state exactly what 9.1-STABLE source you're using; if using svn provide revision (rXXXXXX), else provide uname -a output. Finally: I would suggest using gpart(8) instead going forward. This is a separate recommendation though; if somehow I'm overlooking something in fdisk.c where writes to LBA 0 really do happen, then that needs to get fixed. But gpart(8) is what you should use in general these days anyway. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |