Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Jul 2008 01:03:27 -0400
From:      John Nielsen <lists@jnielsen.net>
To:        freebsd-questions@freebsd.org
Cc:        Steven Schlansker <stevenschlansker@berkeley.edu>
Subject:   Re: Using ccd with zfs
Message-ID:  <200807220103.27950.lists@jnielsen.net>
In-Reply-To: <DD98FA1F-04E2-40C3-BF97-7F80ACBB1006@berkeley.edu>
References:  <DD98FA1F-04E2-40C3-BF97-7F80ACBB1006@berkeley.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 22 July 2008 12:18:31 am Steven Schlansker wrote:
> Hello -questions,
> I have a FreeBSD ZFS storage system working wonderfully with 7.0.
> It's set up as three 3-disk RAIDZs -triplets of 500, 400, and 300GB
> drives.
>
> I recently purchased three 750GB drives and would like to convert to
> using a RAIDZ2.  As ZFS has no restriping capabilities yet, I will
> have to nuke the zpool from orbit and make a new one.  I would like to
> verify my methodology against your experience to see if what I wish to
> do is reasonable:
>
> I plan to first take 2 of the 750GB drives and make an unreplicated
> 1.5TB zpool as a temporary storage.  Since ZFS doesn't seem to have
> the ability to create zpools in degraded mode (with missing drives) I
> plan to use iSCSI to create two additional drives (backed by /dev/
> zero) to fake having two extra drives, relying on ZFS's RAIDZ2
> protection to keep everything running despite the fact that two of the
> drives are horribly broken ;)
>
> To make these 500, 400, and 300GB drives useful, I would like to
> stitch them together using ccd.  I would use it as 500+300 = 800GB and
> 400+400=800GB
>
> That way, in the end I would have
> 750 x 3
> 500 + 300 x3
> 400 + 400 x 1
> 400 + 200 + 200 x 1
> as the members in my RAIDZ2 group.  I understand that this is slightly
> less reliable than having "real" drives for all the members, but I am
> not interested in purchasing 5 more 750GB drives.  I'll replace the
> drives as they fail.
>
> I am wondering if there are any logistical problems.  The three parts
> I am worried about are:
>
> 1) Are there any problems with using an iSCSI /dev/zero drive to fake
> drives for creation of a new zpool, with the intent to replace them
> later with proper drives?

I don't know about the iSCSI approach but I have successfully created a 
degraded zpool using md and a sparse file in place of the missing disk. 
Worked like a charm and I was able to transfer everything to the zpool 
before nuking the real device (which I had been using for temporary 
storage) and replacing the md file with it.

You can create a sparse file using dd:
	dd if=/dev/zero of=sparsefile bs=512 seek=(size of the fake device in 
512-byte blocks) count=0

Turn it into a device node using mdconfig:
	mdconfig -a -t vnode -f sparsefile

Then create your zpool using the /dev/md0 device (unless the mdconfig 
operation returns a different node number).

The size of the sparse file should not be bigger than the size of the real 
device you plan to replace it with. If using GEOM (which I think you 
should, see below), be sure to remember to subtract 512 bytes for each 
level of each provider (GEOM modules store their metadata in the last 
sector of each provider so that space is unavailable for use). To be on the 
safe side you can whack a few KB off.

You can't remove the fake device from a running zpool but the first time you 
reboot it will be absent and the zpool will come up degraded.

> 2) Are there any problems with using CCD under zpool?  Should I stripe
> or concatenate?  Will the startup scripts (either by design or less
> likely intelligently) decide to start CCD before zfs?  The zpool
> should start without me interfering, correct?

I would suggest using gconcat rather than CCD. Since it's a GEOM module (and 
you will have remembered to load it via /boot/loader.conf) it will 
initialize its devices before ZFS starts. It's also much easier to set up 
than CCD. If you are concatenating two devices of the same size you could 
consider using gstripe instead, but think about the topology of your drives 
and controllers and the likely usage patterns your final setup will create 
to decide if that's a good idea.

> 3) I hear a lot about how you should use whole disks so ZFS can enable
> write caching for improved performance.  Do I need to do anything
> special to let the system know that it's OK to enable the write
> cache?  And persist across reboots?

Not that I know of. As I understand it ZFS _assumes_ it's working with whole 
disks so since it uses its own i/o scheduler performance can be degraded 
for anything sharing a physical device with a ZFS slice.

> Any other potential pitfalls?  Also, I'd like to confirm that there's
> no way to do this pure ZFS-like - I read the documentation but it
> doesn't seem to have support for nesting vdevs (which would let me do
> this without ccd)

You're right, you can't do this with ZFS alone. Good thing FreeBSD is so 
versatile. :)

JN

> Thanks for any information that you might be able to provide,
> Steven Schlansker
> _______________________________________________
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to
> "freebsd-questions-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200807220103.27950.lists>