Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 11 Apr 1999 18:49:47 -0600 (MDT)
From:      "Kenneth D. Merry" <ken@plutotech.com>
To:        gallatin@cs.duke.edu (Andrew Gallatin)
Cc:        freebsd-scsi@FreeBSD.ORG
Subject:   Re: odd performance 'bug' & other questions
Message-ID:  <199904120049.SAA01682@panzer.plutotech.com>
In-Reply-To: <14097.8430.806061.277769@grasshopper.cs.duke.edu> from Andrew Gallatin at "Apr 11, 1999  7:14:25 pm"

next in thread | previous in thread | raw e-mail | index | archive | help
Andrew Gallatin wrote...
> 
> We're setting up a few large servers, each with 6 9GB seagate medalist 
> pro drives spread across 2 scsi controllers (aic7890 & ncr53c875).
> 
> We've noticed that if we set up the disks using a simple ccd stripe,
> after trying various interleaves, the best read bandwidth we can get
> is only ~35-40MB/sec (using dd if=/dev/rccd0 of=/dev/null bs=64k),
> which is odd because we'd thought we should be getting at least
> 55-60MB/sec, as we get about 13.5MB/sec from each drive with the same
> test.
> 
> Upon closer examination, we discovered that on some of the drives the
> performance wanders all over the place -- if you do a dd if=/dev/rX
> of=/dev/null bs=64k on an individual disk on an otherwise idle system
> & watch with iostat or systat, you can see the bandwidth jump around
> quite a bit.  I'm thinking that my performance problems might be due
> to the fact that the reads aren't really sequential, rather the disk
> arm is moving all over the place to read remapped defective blocks.

There are a couple of things going on here that may affect the performance
numbers you're seeing:

 - There was a performance problem in getnewbuf() that was supposedly fixed
   on April 6th.  I haven't tested -current since then, so I don't know for
   sure whether the problem is really fixed.

 - The Medalist Pro drives are known to be rather crappy.  That's why we've
   got the number of tags set to 2 in the quirk entry for those drives.

> Using camcontrol to look at the defects list on some of these drives,
> I see that its HUGE.  I've seen one disk with over 1100 entries in the
> primary defects list.  Should I be alarmed at the size of the defects
> list?  Should I complain to my vendor, or is this typical?

Well, it varies.  I've got four disks on a heavily used server:

<SEAGATE ST19171N 0023>            at scbus0 target 0 lun 0 (pass0,da0)
<SEAGATE ST19171N 0023>            at scbus0 target 1 lun 0 (pass1,da1)
<IBM DGHS18Z 03E0>                 at scbus0 target 3 lun 0 (pass2,da2)
<SEAGATE ST19171N 0024>            at scbus1 target 0 lun 0 (pass3,da3)

Here are the defect numbers, in order:

Got 464 defects:
Got 144 defects:
Got 1145 defects:
Got 579 defects:

I've also got the following disk on my home machine:

<IBM DGVS09U 03B0>                 at scbus1 target 1 lun 0 (pass1,da1)

And it has 660 defects in the permanant list, none in the grown defect
list.  It is a 9G drive, and still gives pretty good performance.
(14-16MB/sec)  So your numbers are a bit high for a 9G drive, but I'm not
sure whether that would be considered excessive.  Of course the drives I've
got above are higher-end Seagate and IBM disks, not low-end models.  And
you'd expect the number of defects to be somewhat proportional to the
capacity of the drive.  Your numbers are closer to the 18G IBM disk above.

From the sound of it, your performance numbers sound a lot like the numbers
I saw with the performance-impaired version of Matt's getnewbuf() changes.
For instance, I've got a machine with 3 2G Seagate Hawks striped together.
Normally, I get 5MB/sec per drive, for a total of 15MB/sec.  After Matt's
changes, I got about 6MB/sec or so, and the CPU was pegged during
sequential I/O operations.  (this is on a pentium 133)  The performance
numbers would almost return to normal when doing sequential I/O to/from the
raw device.  On my Ultrastar 9ZX, which is on a dual P6-200 system, I could
only get an average of 12MB/sec (through the filesystem), again with one
CPU pegged.  The numbers from iostat were all over the place, which was
quite unusual.

In other words, the first thing I would suspect is some VM system type
problem.

> Also, the ncr controller fails to give me a defects list, I assume
> this is a bug in the driver? (I'm running -current, dated this Thurs).
> camcontrol complains: error reading defect list: Input/output error,
> and I see this on console:
> 
> (pass3:ncr0:0:0:0): extraneous data discarded.
> (pass3:ncr0:0:0:0): COMMAND FAILED (9 0) @0xc39a3600.

It could be a driver bug, not sure.  What arguments were you using with
camcontrol?

Ken
-- 
Kenneth Merry
ken@plutotech.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199904120049.SAA01682>