Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 09 Nov 1999 19:24:51 -0500
From:      Simon Shapiro <shimon@simon-shapiro.org>
To:        freebsd-arch@freebsd.org
Subject:   I/O Evaluation Questions (Long but interesting!)
Message-ID:  <3828BB53.DD482CD2@simon-shapiro.org>

next in thread | raw e-mail | index | archive | help
Hi Y'll

Am working on a paper describing some I/O findings.
The full text will appear on my web page some day 
soon, but I encountered some questions I would like 
input on.  If this forum is the wrong one, tell me.

All my measurements were done (unless otherwise specified)
on a Dell PowerEdge 1300/600.  Running UP RELENG_3 (SMP
crashes on this box with some bizarre errors - irrelevant
here).

Disk subsystem is the DPT PM3755UW2 hooked up to 3 disk
shelves; 2 with 7,200 ST39173LC (Ultra2 Barracuda), and
1 with 10,000 ST39102LC (Ultra2 Cheetah, with which I 
have no end of trouble - irrelevant).

Very Relevant:

The firmware on these controllers does NOT perform any 
READ caching (there is a way to trigger it on, but beats
me what it is).  It will cache WRITE operations, but the
main use of the cache is to manage RAID-5 parity.

Keep this in mind as you look at raw I/O results as these
are with no cache in the kernel.  Thus READ operations are
not cached at all, except on block devices.

2 Main programs were used;

dd:   Used for sequential I/O, single user evaluation
st.d: Used for random I/O, multi-user and stress loading.

Base Line:

# dd if=/dev/zero of=/dev/null bs=128k count=2048
2048+0 records in
2048+0 records out
268435456 bytes transferred in 0.603104 secs (445,089,832 bytes/sec)

# st.d -f /dev/zero -s 1024m -i 100 &

Throughput   =  467893.33   I/O ops/Sec
Bandwidth    =   1827.7083 MB/Sec

[  The above means to run st.d in (default) read mode, 
   use the file /dev/zero for input, use (by default)
   random
   seeks), force a file size of 1GB (st.d normally finds
   sizes by itself), run 100 concurrent instances ]

This establishes, that the given system can,
from software point of view, perform close to half million
I/O operations per second (ops/Sec), and generate and
move almost 2GB of data per second, on a single CPU.

Next, was to compile the KERNEL  with the option
I2O_IS_DEV_NULL.  This leaves the i2o driver intact,
except that it always completes everything successfully,
without ever calling the hardware. This excludes any DMA
overhead, but all the queue management, locking and other
mess still runs.

st.d -f /dev/ri2o0 -s 1024m -p 1000000 -i 100 &
Throughput   =  4043250.39   I/O ops/Sec
Bandwidth    =  15793.9469 MB/Sec

[  The -p 1000000 forces 1 million passes.  It is too fast
   otherwise ]

From this we see that the driver consumes somewhat less
than 0.25 microsecond per call. The megabytes per second
are bogus as no data gets copied (raw device).

Tests Run:

The full story will be in the above mentioned paper, but 
essentially, we ran random seek and sequential tests on both
raw devices and on block devices.  We used either single disks
(2GB partition), or a RAID-0 array (15GB partition).

Results Highlights:

RAW Devices (mostly with 100 workers):

  Sequential READ:
    RAID-0   = 47,001,025 bytes/sec (358 128K ops/Sec, 11,456 4K ops/Sec?)
    RAID-5+0 = 43,308,308 bytes/sec (330 128K ops/Sec, 10,560 4K ops/Sec?)

  Sequential WRITE:
    RAID-0   = 34,234,343 bytes/sec (261 128K ops/Sec, 8,352 4K ops/Sec?)
    RAID-5+0 = 24,624,350 bytes/sec (187 128K ops/Sec, 5,984 4k ops/Sec?)

  Random READ:
    RAID-0:  = 1,660 ops/Sec
    RAID-5+0 = 1,943 ops/Sec

  Random WRITE:
    RAID-0   =  1,500 ops/Sec
    RAID-5+0 =  2,397 ops/Sec

Block Devices:

  Sequential READ:
    RAID-0   = 14,263,937 bytes/sec (108 128K ops/Sec, 3,456 4K ops/Sec?)
    RAID-5+0 = 14,110,286 bytes/sec (107 128K ops/Sec, 3,425 4K ops/Sec?)

  Sequential WRITE:
    RAID-0   = 31,407,413 bytes/sec (958 128K ops/Sec, 30,656 4K ops/Sec?)
    RAID-5+0 = 33,707,151 bytes/sec (257 128K ops/Sec, 8,224 4K ops/Sec?)

  Random READ:
    RAID-0   = 33,578 ops/Sec (300 workers)
    RAID-5+0 = 42,784 ops/Sec (400 workers)

  Random WRITE:
    RAID-0   = 112.68 ops/Sec (200 workers)
    RAID-5+0 = 112.46 ops/Sec (100 workers)

And this, ladies and gentlemen is what I do not understand;

Why is random WRITE to a block device about 10-11 times
slower than raw device?
Actually, sequential read is 1/3 of raw device too.  Why?

I would _love_ it to be the driver, but the driver has no
clue who/what calls it.  Besides, the driver contributes less
than 0.02% of these numbers.
The IOP?  It definitely has no clue.

One interesting observation;  The disks are way to busy
for the random block device numbers.

Parting Notes:
 
*  This is not a CAM problem.  The OSM has nothing to do
   with CAM.

*  Source code for everything discussed here is at my
   ftp server, or my web server.

*  I'd really like to understand this problem.
   Any help rendered will be greatly appreciated.

-- 


Sincerely Yours,                 Shimon@Simon-Shapiro.ORG
                                             404.664.6401
Simon Shapiro

Unwritten code has no bugs and executes at twice the speed of mouth




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3828BB53.DD482CD2>