Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Nov 2009 14:04:30 -0800
From:      Matt Reimer <mattjreimer@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Current gptzfsboot limitations
Message-ID:  <f383264b0911231404i737a5cf2q384368827ab48a4d@mail.gmail.com>
In-Reply-To: <200911231018.40815.jhb@freebsd.org>
References:  <f383264b0911201646s702c8aa4u5e50a71f93a9e4eb@mail.gmail.com> <200911231018.40815.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Nov 23, 2009 at 7:18 AM, John Baldwin <jhb@freebsd.org> wrote:
> On Friday 20 November 2009 7:46:54 pm Matt Reimer wrote:
>> I've been analyzing gptzfsboot to see what its limitations are. I
>> think it should now work fine for a healthy pool with any number of
>> disks, with any type of vdev, whether single disk, stripe, mirror,
>> raidz or raidz2.
>>
>> But there are currently several limitations (likely in loader.zfs
>> too), mostly due to the limited amount of memory available (< 640KB)
>> and the simple memory allocators used (a simple malloc() and
>> zfs_alloc_temp()).
...
>>
>> I think I've also hit a stack overflow a couple of times while debugging=
.
>>
>> I don't know enough about the gptzfsboot/loader.zfs environment to
>> know whether the heap size could be easily enlarged, or whether there
>> is room for a real malloc() with free(). loader(8) seems to use the
>> malloc() in libstand. Can anyone shed some light on the memory
>> limitations and possible solutions?
>>
>> I won't be able to spend much more time on this, but I wanted to pass
>> on what I've learned in case someone else has the time and boot fu to
>> take it the next step.
>
> One issue is that disk transfers need to happen in the lower 1MB due to B=
IOS
> limitations. =A0The loader uses a bounce buffer (in biosdisk.c in libi386=
) to
> make this work ok. =A0The loader uses memory > 1MB for malloc(). =A0You c=
ould
> probably change zfsboot to do that as well if not already. =A0Just note t=
hat
> drvread() has to bounce buffer requests in that case. =A0The text + data =
+ bss
> + stack is all in the lower 640k and there's not much you can do about th=
at.
> The stack grows down from 640k, and the boot program text + data starts a=
t
> 64k with the bss following.

Ah, the stack growing down from 640k explains a problem I was seeing
where a memcpy() to a temp buf would restart gptzfsboot--it must have
been overwriting the stack.

> Hmm, drvread() might already be bounce buffering
> since boot2 has to do so since it copies the loader up to memory > 1MB as
> well.

Looks like it's already bounce buffering. All the I/O drvread does is
to statically allocated char arrays, and the data is copied when
necessary, e.g. in vdev_read():

                if (drvread(dsk, dmadat->rdbuf, lba, nb))
                        return -1;
                memcpy(p, dmadat->rdbuf, nb * DEV_BSIZE);


>=A0You might need to use memory > 2MB for zfsboot's malloc() so that the
> loader can be copied up to 1MB. =A0It looks like you could patch malloc()=
 in
> zfsboot.c to use 4*1024*1024 as heap_next and maybe 64*1024*1024 as heap_=
end
> (this assumes all machines that boot ZFS have at least 64MB of RAM, which=
 is
> probably safe).

So are the page tables etc. already configured such that RAM above 1MB
is ready to use in gptzfsboot? (I'm not familiar with the details of
how virtual memory is handled on i386.)

Thanks for your help John.

Matt



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f383264b0911231404i737a5cf2q384368827ab48a4d>