From owner-freebsd-fs@FreeBSD.ORG Mon Nov 23 22:04:32 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8394C1065692 for ; Mon, 23 Nov 2009 22:04:32 +0000 (UTC) (envelope-from mattjreimer@gmail.com) Received: from mail-pz0-f185.google.com (mail-pz0-f185.google.com [209.85.222.185]) by mx1.freebsd.org (Postfix) with ESMTP id 5302E8FC0A for ; Mon, 23 Nov 2009 22:04:31 +0000 (UTC) Received: by pzk15 with SMTP id 15so4075585pzk.3 for ; Mon, 23 Nov 2009 14:04:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=rYgSuIh/1OipdJ2BouPSbvIzzM4YUvzU1d4CPftlVL4=; b=fsyVk32lrpUj4AWElX2zcDjwI0kmU2Vokm8b31d3lTe8u+ubOow6gDwFlRxCrO0KAy tr3tdgoKrRBj+mdQiw92LhiGWqLFniRyUgCvjKhS7uH6Bykm49cLSlkH5cT0A3ZI7iTp xazEkYO3v8Z5Bby/fIA5jDKPIpgtIBqjF9PTc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=tteNRIoo6rJ0qkthCldPwKE9wtpDR6SEKmEZLGBjQihMslTkM9T4JZKRCsvTbAi0Nj NC8WKJBq9DcHDYVI3Jlp2257cNTaIghrpFM0CsR95N+S00NfkFZnfOYbfKUGHwoBouri QV4BHhfykai8ZHu6VMYfeANAKcT1JncSgCHzg= MIME-Version: 1.0 Received: by 10.142.248.2 with SMTP id v2mr574889wfh.177.1259013870703; Mon, 23 Nov 2009 14:04:30 -0800 (PST) In-Reply-To: <200911231018.40815.jhb@freebsd.org> References: <200911231018.40815.jhb@freebsd.org> Date: Mon, 23 Nov 2009 14:04:30 -0800 Message-ID: From: Matt Reimer To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: Current gptzfsboot limitations X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Nov 2009 22:04:32 -0000 On Mon, Nov 23, 2009 at 7:18 AM, John Baldwin wrote: > On Friday 20 November 2009 7:46:54 pm Matt Reimer wrote: >> I've been analyzing gptzfsboot to see what its limitations are. I >> think it should now work fine for a healthy pool with any number of >> disks, with any type of vdev, whether single disk, stripe, mirror, >> raidz or raidz2. >> >> But there are currently several limitations (likely in loader.zfs >> too), mostly due to the limited amount of memory available (< 640KB) >> and the simple memory allocators used (a simple malloc() and >> zfs_alloc_temp()). ... >> >> I think I've also hit a stack overflow a couple of times while debugging= . >> >> I don't know enough about the gptzfsboot/loader.zfs environment to >> know whether the heap size could be easily enlarged, or whether there >> is room for a real malloc() with free(). loader(8) seems to use the >> malloc() in libstand. Can anyone shed some light on the memory >> limitations and possible solutions? >> >> I won't be able to spend much more time on this, but I wanted to pass >> on what I've learned in case someone else has the time and boot fu to >> take it the next step. > > One issue is that disk transfers need to happen in the lower 1MB due to B= IOS > limitations. =A0The loader uses a bounce buffer (in biosdisk.c in libi386= ) to > make this work ok. =A0The loader uses memory > 1MB for malloc(). =A0You c= ould > probably change zfsboot to do that as well if not already. =A0Just note t= hat > drvread() has to bounce buffer requests in that case. =A0The text + data = + bss > + stack is all in the lower 640k and there's not much you can do about th= at. > The stack grows down from 640k, and the boot program text + data starts a= t > 64k with the bss following. Ah, the stack growing down from 640k explains a problem I was seeing where a memcpy() to a temp buf would restart gptzfsboot--it must have been overwriting the stack. > Hmm, drvread() might already be bounce buffering > since boot2 has to do so since it copies the loader up to memory > 1MB as > well. Looks like it's already bounce buffering. All the I/O drvread does is to statically allocated char arrays, and the data is copied when necessary, e.g. in vdev_read(): if (drvread(dsk, dmadat->rdbuf, lba, nb)) return -1; memcpy(p, dmadat->rdbuf, nb * DEV_BSIZE); >=A0You might need to use memory > 2MB for zfsboot's malloc() so that the > loader can be copied up to 1MB. =A0It looks like you could patch malloc()= in > zfsboot.c to use 4*1024*1024 as heap_next and maybe 64*1024*1024 as heap_= end > (this assumes all machines that boot ZFS have at least 64MB of RAM, which= is > probably safe). So are the page tables etc. already configured such that RAM above 1MB is ready to use in gptzfsboot? (I'm not familiar with the details of how virtual memory is handled on i386.) Thanks for your help John. Matt