Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Jan 2009 12:41:20 -0600
From:      "Paul Tice" <ptice@aldridge.com>
To:        "Terry Kennedy" <terry@tmk.com>, <freebsd-current@freebsd.org>
Subject:   RE: Help me select hardware....Some real world data that might help
Message-ID:  <E8FEAE26C87DED4EB49EFF99D1C7A51DFF692D@ald-mail02.corporate.aldridge.com>
References:  <01N4NEOEB7LY00EQWX@tmk.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Excuse my rambling, perhaps something in this mess will be useful.

I'm currently using 8 cores (2x Xeon E5405), 16G FB-DIMM, and 8 x 750GB =
drives on a backup system (I plan to add the other in the chassis one by =
one, testing the speed along the way)
8-current AMD64, ZFS, Marvell 88sx6081 PCI-X card (8 port SATA) + =
LSI1068E (8 port SAS/SATA) for the main Array, and the Intel onboard =
SATA for boot drive(s).
Data is sucked down through 3 gigabit ports, with another available but =
not yet activated.=20
Array drives all live on the LSI right now. Drives are  <ATA ST3750640AS =
K>.

ZFS is stable _IF_ you disable the prefetch and ZIL, otherwise the =
classic ZFS wedge rears it's ugly head. I haven't had a chance to test =
just one yet, but I'd guess it's the prefetch that's the quick killer. =
Even with prefetching and ZIL disabled, my current bottleneck is the =
GigE. I'm waiting to get new switches in that support jumbo frames, =
quick and dirty testing shows almost 2x increase in throughput, and ~40% =
drop in interrupt rates from the NICs compared to the current standard =
(1500 MTU) frames.

Pool was created with 'zpool create backup raidz da0 da1 da2 da3 da4 da5 =
da6 da7'

I've seen references to 8-Current having a kernel memory limit of 8G =
(compared to 2G for pre 8 from what I understand so far) and ZFS ARC =
(caching) is done in kernel memory space. (Please feel free to correct =
me if I'm wrong on any of this!)
Default ZFS (no disables) with a 1536M kern mem limit, and 512M ARC =
limit, I saw 2085 ARC memory throttles before the box wedged.

Using rsync over several machines with this setup, I'm getting a little =
over  1GB/min to the disks.=20
'zpool iostat 60' is a wonderful tool.=20
I would mention something I've noticed that doesn't seem to be =
documented:
The first reading from 'zpool iostat' (whether single run or with an =
interval) is a running average, although I haven't found the time period =
averaged yet. (from pool mount time maybe?)

The jumbo frame interrupt reduction may be important. I run 'netstat -i =
-w60' right beside 'zpool iostat 60', and the throughput is closely =
inversely related. I can predict a disk write (bursty writes in ZFS it =
seems) by throughput dropping to on the NIC side. The drop is up to 75% =
averaging around 50%. Using a 5 interval instead of 60, I see disk out =
throughput spikes up to 90MB/s, although 55, 0, 0, 0, 55 is more common. =

Possibly, binding interrupts to particular cpu's might help a bit too. I =
haven't found, and don't feel competent to write, userspace tools to do =
this.

CPU usage during all this is suprisingly low.  rsync is running with -z, =
the files themselves are compressed as they go onto the drives with =
pbzip2, and the whole thing runs on (ducking) BackupPC, which is all =
perl script.=20
With all that, 16 machines backing up, and 1+GB/Min going to the =
platters, CPU is still avg 40% idle using top. I'm considering remaking =
the array raidz2, I seem to have enough CPU to handle it.

Random ZFS thoughts:
You cannot shrink/grow a raidz or raidz2. You can grow a stripe array, =
I'm don't know if you can shrink it successfully.
You cannot promote a stripe array to raidz/z2, nor demote in the other =
direction.
You can have hot spares, haven't seen a provision for warm/cold spares.
/etc/default/rc.conf already has cron ZFS status/scrub checks, but not =
enabled.

Anyway, enough rambling, just thought I'd use something not too =
incredibly far from your suggested system to toss some data out.

Thanks
Paul





-----Original Message-----
From: owner-freebsd-current@freebsd.org on behalf of Terry Kennedy
Sent: Fri 1/23/2009 8:30 PM
To: freebsd-current@freebsd.org
Subject: Help me select hardware and software options for very large =
server
=20
  [I posted the following message to freebsd-questions, as I thought it
woule be the most appropriate list. As it has received no replies in two
weeks, I'm trying freebsd-current.]

--------

  [I decided to ask this question here as it overlaps -hardware, =
-current,
and a couple other lists. I'd be glad to redirect the conversation to a
list that's a better fit, if anyone would care to suggest one.]

  I'm in the process of planning the hardware and software for the =
second
generation of my RAIDzilla file servers (see =
http://www.tmk.com/raidzilla
for the current generation, in production for 4+ years).

  I expect that what I'm planning is probably "off the scale" in terms =
of
processing and storage capacity, and I'd like to find out and address =
any
issues before spending lots of money. Here's what I'm thinking of:

o Chassis - CI Design SR316 (same model as current chassis, except i2c =
link
  between RAID controller and front panel
o Motherboard - Intel S5000PSLSATAR
o CPU - 2x Intel Xeon E5450 BX80574E5450P
p Remote management - Intel Remote Management Module 2 - AXXRM2
o Memory - 16GB - 8x Kingston KVR667D2D4F5/2GI
o RAID controller - 3Ware 9650SE-16ML w/ BBU-MODULE-04
o Drives - 16x 2TB drives [not mentioning manufacturer yet]
o Cables - 4x multi-lane SATA cables
o DVD-ROM drive
o Auxiliary slot fan next to BBU card
o Adaptec AHA-39160 (for Quantum Superloader 3 tape drive)

  So much for the hardware. On the software front:

o FreeBSD 8.x?
o amd64 architecture
o MBR+UFS2 for operating system partitions (hard partition in =
controller)
o GPT+ZFS for data partitions
o Multiple 8TB data partitions (separate 8TB controller partitions or =
one
  big partition divided with GPT?)

  I looked at "Large data storage in FreeBSD", but that seems to be a =
stale
page from 2005 or so: http://www.freebsd.org/projects/bigdisk/index.html

  I'm pretty sure I need ZFS, since even with the 2TB partitions I have =
now,
taking snapshots for dump or doing a fsck take approximately forever 8-)
I'll be using the harware RAID 6 on the 3Ware controller, so I'd only be
using ZFS to get filesystems larger than 2TB.

  I've been following the ZFS discussions on -current and -stable, and I
think that while it isn't quite ready yet, it probably will be ready in
a few months, being available around the same time I get this hardware
asssembled. I recall reading that there will be an import of newer ZFS=20
code in the near future.

  Similarly, the ports collection seems to be moving along nicely with
amd64 support.

  I think this system may have the most storage ever configured on a
FreeBSD system, and it is probably up near the top in terms of CPU and
memory. Once I have it assembled I'd be glad to let any FreeBSD devel-
opers test and stress it if that would help improve FreeBSD on that
type of configuration.

In the meantime, any suggestions regarding the hardware or software con-
figuration would be welcomed.

        Terry Kennedy             http://www.tmk.com
        terry@tmk.com             New York, NY USA
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to =
"freebsd-current-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E8FEAE26C87DED4EB49EFF99D1C7A51DFF692D>