Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Feb 1999 12:07:01 -0500 (EST)
From:      Simon Shapiro <shimon@simon-shapiro.org>
To:        Greg Lehey <grog@lemis.com>
Cc:        "Bryn Wm. Moslow" <bryn@spacemonster.org>, freebsd-isp@FreeBSD.ORG
Subject:   Re: DPT 3334UW RAID-5 Slowness / Weird FS problems
Message-ID:  <XFMail.990217120701.shimon@simon-shapiro.org>
In-Reply-To: <19990216105959.P2207@lemis.com>

next in thread | previous in thread | raw e-mail | index | archive | help

Greg Lehey, On 16-Feb-99 you wrote:
>  On Monday, 15 February 1999 at 13:08:22 -0800, Bryn Wm. Moslow wrote:
> > I recently installed a DPT 3334UW with 64MB cache in a mail server
> > running RAID-5 with a 32K stripe on an external case on which the user
> > mail spool is mounted. The array is comprised of 6 Seagate Ultra Wide
> > 4.5GB SCA drives. The system is a P2 300 with 384MB and uses an
> > additional Seagate UW drive for boot, /usr, /var, swap, and staff home
> > directories. It doesn't go into swap often but if it does it only hits
> > about 5 to 10 MB. The system is running FreeBSD 2.2.8.
>  
>  I don't know the DPT controllers, but 32 kB stripes are far too small.
>  For better performance, you should increase them to between 256 kB and
>  512 kB.  Small stripe sizes create many more I/O requests at the drive
>  level.

It highly depends on the amount of cache memory on the card and the nature
of the access.  If I/O is sequential in nature, then Greg is correct.  If
it is highly fragmented and random (as in most DBMS), then Greg is only
correct if there is a lot of memory and access is to large blocks.

If you are running RAID-5, then 5-10MB/Sec is not so horrible;  Every WRITE
operation is actually a READ and TWO WRITEs (not exactly, but close). 
Changes to data have to be written, and the XOR parity has to be computed
and written.

The i2o cards I am working on promise to be much much better at that
(hardware based XOR, etc.).

> > My first problem was that I initially tried to do 16K per inode to
> > speed things up a little bit (also, I didn't need the millions of
> > inodes that came with a default newfs on 43GB =).)
>  
>  I might be missing something, but I don't see any performance
>  improvement by changing the number of inodes.

Perhaps an attempt to influence the caching on the kernel side...

> > However when trying to mount, fsck, find, umount, or otherwise
> > manipulate the filesystem about 1 times out of 5 the system would
> > hang to the point that I had to use the reset switch to get it back.

What size filesystems?  What kernel version?

>  What hung?  The FreeBSD system or the DPT subsystem?

If the DPT has its 10 LEDs scroll back and forth, it is simply idle.  If it
wedges, please tell me which LEDs glow or blink (LED #1, is closest to the
bracket).

> > If I format with the default settings to newfs, everything works
> > fine. The same hang also occurs if I try to do 16K block size. I
> > haven't tried anything bigger as I had an extremely tight window in
> > which to get the machine online.
>  
>  You should consider that once you have set the stripe size, you're
>  stuck with it.  Unless the DPTs have a good reason (like "not
>  supported"), take a 256 kB stripe size.

a.  Newfs parameters have nothing to do with the SCSI HBA (DSPT or
    otherwise.  If 16k block size hangs, it is a bug elsewhere.
    I studied such complaint long, long ago (on FreeBSD 3.0) and the
    requests did not even hit the DPT.
    But, like Greg said, this has almost no bearing on perfromance...

b.  The DPT firmware will take up to 1MB stripes.  Beware there is no more
    than 64MB of cache on the card (if that much), and very large stripes
    may consume all the cache quickly and inefficiently.
    Another thing, when you are in the DPTMGR, make sure the firmware does
    not allocate 30% of the cache to READ-AHEAD.  I doubt you need that
    much memory for this operation (I set it to zero).

> > Does FreeBSD have a problem with non-default fs settings? 
>  
>  Not that I know of.

Used to be  a problem with filesystems much larger than 4GB - would panic,
but I think it is gone.

> > Has anyone else tried this sort of thing on such a large filesystem?

My largest, routine filesystem is 28GB (RAID-5).  I created 56GB RAID-0 for
some testing.  Normally I avoid anything much larger than 4GB (I am a
paranoid :-)

>  I know of some people who have made file systems of this size with
>  vinum.  It took a while to create the file systems, but it worked.
>  
> > I ended up having to use the default settings for newfs to get the
> > system to work, wasting millions of inodes and bringing me to my
> > next problem: Under load the filesystem is horribly slow. I expected
> > some of this with the RAID-5 overhead but it's actually slower than
> > a CCD I just moved from that was using 5 regular 2GB fast SCSI-2
> > drives, much slower.

I am working on a filesystem especially tuned for such operations.  Early
tests show almost no difference between one file and 16million files in
directory operations (I am getting about 47,000 random OPENs per second on
16million files).  Access to huge files is identical (hint:  no indirects).

>  What read to write ratio do you have?  Writes are slow on RAID-5, but
>  reads should be the same as for a striped organization.

Yup. I am getting about 8-9MB/Sec WRITE and up to 28MB/Sec READ.

> > When running ktrace on the processes (qpopper and mail.local mainly)
> > and
> > watching top I can see that most of the processes are waiting for disk
> > access. I've tried enabling/disabling various DPT options in the kernel
> > but it's all about the same. I'd really like to stick with RAID-5 so
> > using 0 or 1 just isn't what I'm looking for.

Most of hte DPT parameters are (today, in CAM) either meaningless, or
debugging related.  Another thing to consider is PCI bandwidth.  The best
motherboards I have seen is good for no more than 100MB/Sec, and about 5000
Interrupts/Sec.  If you system consumes this much, you are maxxed out. 
Memory bandwidth is second.  Most PC based machines are I/O limited
(severely).  Some are memory limited.  In a server context, CPU-limited is
the rarity.

> >
> > The user directories for delivery are broken out into 1st letter, 1st
> > two letters, username (i.e.: /home/u/us/username) to speed up dir
> > lookups already.
>  
>  I'd guess that these would end up in cache anyway, so you shouldn't
>  see much improvement with this technique.
>  
> > Any suggestions on how else to speed things up? This wasn't a
> > problem on my old CCD, however.
>  
> > Lastly, I tried to find another RAID controller besides DPT that was
> > compatible with FreeBSD 2.2.x with no luck. Upgrading to 3.1 is not
> > an option at the moment, at least until things are more stable.
>  
>  3.1-RELEASE has just come out (or is in the process of being
>  packaged).  But I don't think this is the problem.
>  
> > Is anyone using anything in a host-based adapter (PCI) that is
> > non-DPT?

If you have memory and CPU cycles to burn (say on an Alpha), then host
based solutions may be better;  The PCI limit is much less severe then.

>  There's a Compaq driver out there.  It seems to have some
>  strangenesses which suggest that it'll need a lot of work before it
>  can be incorporated into the source tree.
>  
> > The only reason I ask is that I've seen debate recently about
> > whether there is a problem with the DPT losing interrupts

I have not seen this debate, and from my experience, the DPT does not lose
interrupts.  If it did, the CAM driver will cause I/O operations to simply
hang, or perhaps timeout.

>  This wouldn't be your problem.
>  
> > or the FreeBSD serial code is "broken".
>  
>  I'm not sure what relation this has with the DPT controller.
>  
>  I'm copying Shimon Shapiro on this reply.  He's the author of the DPT
>  driver, and he may have more insight.

I am currently working on re-certifying the driver to my standards, porting
it to the Alpha, and building the DPTMGR utilities on FreeBSD.  Also in the
works are the i2o cards (including the FCAL).

>  
>  Greg
>  --
>  When replying to this message, please copy the original recipients.
>  For more information, see http://www.lemis.com/questions.html
>  See complete headers for address, home page and phone numbers
>  finger grog@lemis.com for PGP public key



Sincerely Yours,                 Shimon@Simon-Shapiro.ORG
                                             770.265.7340
Simon Shapiro

Unwritten code has no bugs and executes at twice the speed of mouth



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-isp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.990217120701.shimon>