From owner-freebsd-current@FreeBSD.ORG Wed Feb 24 14:57:32 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E4DFF106566B for ; Wed, 24 Feb 2010 14:57:32 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id B59B48FC16 for ; Wed, 24 Feb 2010 14:57:32 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 6CA8646B7E; Wed, 24 Feb 2010 09:57:32 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 7BF1C8A024; Wed, 24 Feb 2010 09:57:31 -0500 (EST) From: John Baldwin To: Brandon Gooch Date: Wed, 24 Feb 2010 09:55:27 -0500 User-Agent: KMail/1.12.1 (FreeBSD/7.2-CBSD-20100120; KDE/4.3.1; amd64; ; ) References: <747dc8f31002220835g481b0baeqb1d6df32a79b7da2@mail.gmail.com> <201002231740.46478.jhb@freebsd.org> <179b97fb1002231659h742fd72enca5cfa1d09b822f6@mail.gmail.com> In-Reply-To: <179b97fb1002231659h742fd72enca5cfa1d09b822f6@mail.gmail.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201002240955.27357.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 24 Feb 2010 09:57:31 -0500 (EST) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.3 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-current@freebsd.org Subject: Re: ZFS boot problems with memory > 1MB X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Feb 2010 14:57:33 -0000 On Tuesday 23 February 2010 7:59:58 pm Brandon Gooch wrote: > On Tue, Feb 23, 2010 at 10:40 PM, John Baldwin wrote: > > On Tuesday 23 February 2010 5:04:03 pm Brandon Gooch wrote: > >> On Tue, Feb 23, 2010 at 3:03 PM, John Baldwin wrote: > >> > On Tuesday 23 February 2010 3:36:19 pm Brandon Gooch wrote: > >> >> On Tue, Feb 23, 2010 at 1:01 PM, John Baldwin wrote: > >> >> > On Tuesday 23 February 2010 12:36:31 pm Brandon Gooch wrote: > >> >> >> On Tue, Feb 23, 2010 at 10:24 AM, John Baldwin > > wrote: > >> >> >> > On Tuesday 23 February 2010 10:28:49 am Brandon Gooch wrote: > >> >> >> >> On Tue, Feb 23, 2010 at 7:29 AM, Andriy Gapon > > wrote: > >> >> >> >> > on 23/02/2010 13:18 Renato Botelho said the following: > >> >> >> >> >> On Mon, Feb 22, 2010 at 7:35 PM, Chris Hedley > >> >> >> >> >> wrote: > >> >> >> >> > [snip] > >> >> >> >> >>> Do you have USB legacy support enabled in your BIOS? I'm not > > sure > >> > if > >> >> >> >> >>> there's an option for the loader to use USB devices natively, > > but > >> > the BIOS's > >> >> >> >> >>> legacy option where it provides AT/PS2 emulation is probably > > the > >> > easiest way > >> >> >> >> >>> to get the keyboard working. > >> >> >> >> >> > >> >> >> >> >> Yes, I do, but it seems to be a regression on FreeBSD itself, I > > had > >> > this problem > >> >> >> >> >> in the past and I checked the same things i need to check in the > >> > past again and > >> >> >> >> >> everything is fine. > >> >> >> >> > > >> >> >> >> > A more precise way to state that would be "a regression in > > FreeBSD > >> > boot/loader". > >> >> >> >> > I think that you are referring to the issue that was fixed by > >> > r189017. > >> >> >> >> > It might be worthwhile investigating what was done in that > > revision > >> > and what > >> >> >> >> > happened in sys/boot code since then. > >> >> >> >> > > >> >> >> >> > One possibility is that your BIOS uses memory above 1MB for USB > >> > emulation, but > >> >> >> >> > doesn't mark that memory as used in system memory map. In that > > case > >> > that memory > >> >> >> >> > could be overwritten by the loader. If that's true then the > > blame > >> > is on the BIOS. > >> >> >> >> > Alternatively, our code might be parsing the system memory map > >> > incorrectly. > >> >> >> >> > But I am just making wild guesses here. > >> >> >> >> > > >> >> >> >> > >> >> >> >> I don't know if it is at all related, but this commit has caused > >> >> >> >> problems for me booting at least one of my machines: > >> >> >> >> > >> >> >> >> > >> > > > http://svn.freebsd.org/viewvc/base/head/sys/boot/i386/zfsboot/zfsboot.c?r1=199714&r2=200309 > >> >> >> >> > >> >> >> >> Commit message: > >> >> >> >> > >> >> >> >> Revision 200309 - (view) (annotate) - [select for diffs] > >> >> >> >> Modified Wed Dec 9 20:36:56 2009 UTC (2 months, 2 weeks ago) by jhb > >> >> >> >> File length: 24893 byte(s) > >> >> >> >> Diff to previous 199714 > >> >> >> >> - Port bios_getmem() from libi386 to {gpt,}zfsboot() and use it to > >> >> >> >> safely allocate a heap region above 1MB. This enables > >> > {gpt,}zfsboot() > >> >> >> >> to allocate much larger buffers than before. > >> >> >> >> - Use a larger buffer (1MB instead of 128K) for temporary ZFS > > buffers. > >> > This > >> >> >> >> allows more reliable reading of compressed files in a > > raidz/raidz2 > >> > pool. > >> >> >> >> > >> >> >> >> Submitted by: Matt Reimer mattjreimer of gmail > >> >> >> >> MFC after: 1 week > >> >> >> > > >> >> >> > Starting a new thread, which problems are you seeing with this > > change? > >> > ZFS is > >> >> >> > a good bit more memory hungry than UFS, so it really needs to use > > high > >> > memory > >> >> >> > for its heap. Also, I wonder if you still have problems if you use > > the > >> > older > >> >> >> > zfsboot with the newer zfsloader? Finally, you need to use > > disklabel - > >> > B or > >> >> >> > some such to update the zfsboot bits for this change to take effect. > >> >> >> > > >> >> >> > -- > >> >> >> > John Baldwin > >> >> >> > > >> >> >> > >> >> >> I filed a PR so it wouldn't fall through the cracks: > >> >> >> > >> >> >> http://www.freebsd.org/cgi/query-pr.cgi?pr=144234 > >> >> >> > >> >> >> I guess I tried a combination of various revisions of bootstrap code > >> >> >> and loaders when I first encountered the issue. It was when I wrote a > >> >> >> recent gptzfsboot to the geom that I saw the symptoms: > >> >> >> > >> >> >> error 1 lba 48 > >> >> >> error 1 lba 1 > >> >> >> No ZFS pools located, can't boot > >> >> >> > >> >> >> I just wound up using sys/boot/i386/zfsboot/zfsboot.c revision 199714 > >> >> >> to build a working gptzfsboot on another system and wrote that to the > >> >> >> disk to get the machine operational. > >> >> > > >> >> > Try this: > >> >> > > >> >> > Index: zfsboot.c > >> >> > =================================================================== > >> >> > --- zfsboot.c (revision 204207) > >> >> > +++ zfsboot.c (working copy) > >> >> > @@ -467,6 +467,7 @@ > >> >> > static inline void > >> >> > putc(int c) > >> >> > { > >> >> > + v86.ctl = 0; > >> >> > v86.addr = 0x10; > >> >> > v86.eax = 0xe00 | (c & 0xff); > >> >> > v86.ebx = 0x7; > >> >> > @@ -617,6 +618,8 @@ > >> >> > off_t off; > >> >> > struct dsk *dsk; > >> >> > > >> >> > + dmadat = (void *)(roundup2(__base + (int32_t)&_end, 0x10000) - > >> > __base); > >> >> > + > >> >> > bios_getmem(); > >> >> > > >> >> > if (high_heap_size > 0) { > >> >> > @@ -627,9 +630,6 @@ > >> >> > heap_end = (char *) PTOV(bios_basemem); > >> >> > } > >> >> > > >> >> > - dmadat = (void *)(roundup2(__base + (int32_t)&_end, 0x10000) - > >> > __base); > >> >> > - v86.ctl = V86_FLAGS; > >> >> > - > >> >> > dsk = malloc(sizeof(struct dsk)); > >> >> > dsk->drive = *(uint8_t *)PTOV(ARGS); > >> >> > dsk->type = dsk->drive & DRV_HARD ? TYPE_AD : TYPE_FD; > >> >> > @@ -1157,6 +1157,7 @@ > >> >> > * when no such key is pressed in reality. As far as I can tell, > >> >> > * this only happens shortly after a reboot. > >> >> > */ > >> >> > + v86.ctl = V86_FLAGS; > >> >> > v86.addr = 0x16; > >> >> > v86.eax = fn << 8; > >> >> > v86int(); > >> >> > > >> >> > -- > >> >> > John Baldwin > >> >> > > >> >> > >> >> It still breaks: > >> >> > >> >> error 1 lba 48 > >> >> error 1 lba 1 > >> >> No ZFS pools located, can't boot > >> > > >> > Ok. Can you add a printf to zfsboot.c to print out dsk->start in the case > >> > that you get an error? error 1 means that the BIOS thinks it got a bad > >> > parameter, presumably in the disk packet. If you wanted to be ambitious, > > just > >> > print out all of the fields in the packet when it fails. > >> > > >> > -- > >> > John Baldwin > >> > > >> > >> Adding printf statements to drvread(): > >> > >> printf("dsk->xxx: %u\n", dsk->xxx): > >> > >> Output: > >> > >> error 1 lba 48 > >> dsk->drive: 0 > >> dsk->type: 0 > >> dsk->unit: 0 > >> dsk->slice: 0 > >> dsk->part: 0 > >> dsk->init: 0 > >> dsk->start: 978673664 > > > > This value looks a bit high, do you have a partition that starts at an offset > > of about 466GB into the disk? > > > >> error 1 lba 1 > >> dsk->drive: 0 > >> dsk->type: 0 > >> dsk->unit: 0 > >> dsk->slice: 0 > >> dsk->part: 0 > >> dsk->init: 0 > >> dsk->start: 0 > >> No ZFS pools located, can't boot > > > > Sorry, I meant members of the 'packet' variable, though dsk->start is useful > > to have as well. > > > > -- > > John Baldwin > > > > Here it is (with some crazy dsk stuff included): > > error 1 lba 48 > packet.len: 16 > packet.seg: 8192 > packet.count: 16 > packet.lba: 47 > packet.off: 0 > dsk->drive: 4294967295 > dsk->slice: 4294967295 > dsk->type: 4294967295 > dsk->part: 4294967295 > dsk->unit: 4294967295 > dsk->init: 4294967295 > dsk->start: 4294967295 These are all -1 now which looks wrong. The raw LBA being 47 instead of 48 would seem to indicate that that is the case though. > error 1 lba 1 > packet.len: 16 > packet.seg: 8704 > packet.count: 1 > packet.lba: 1 > packet.off: 0 Odd that the lba here isn't 0. Can you add some more printfs, maybe to probe_drive() to try narrow down how many types that is being invoked and for which drive numbers? -- John Baldwin