From owner-freebsd-arch Tue Nov 9 16:25:22 1999 Delivered-To: freebsd-arch@freebsd.org Received: from ns1.yes.no (ns1.yes.no [195.204.136.10]) by hub.freebsd.org (Postfix) with ESMTP id 33CB1151DA for ; Tue, 9 Nov 1999 16:25:11 -0800 (PST) (envelope-from eivind@bitbox.follo.net) Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218]) by ns1.yes.no (8.9.3/8.9.3) with ESMTP id BAA27461 for ; Wed, 10 Nov 1999 01:25:10 +0100 (CET) Received: (from eivind@localhost) by bitbox.follo.net (8.8.8/8.8.6) id BAA04250 for freebsd-arch@freebsd.org; Wed, 10 Nov 1999 01:25:09 +0100 (MET) Received: from nomis.simon-shapiro.org (nomis.simon-shapiro.org [209.86.126.163]) by hub.freebsd.org (Postfix) with SMTP id E743B151DA for ; Tue, 9 Nov 1999 16:24:56 -0800 (PST) (envelope-from shimon@simon-shapiro.org) Received: (qmail 9135 invoked from network); 10 Nov 1999 00:24:51 -0000 Received: from localhost.simon-shapiro.org (HELO simon-shapiro.org) (127.0.0.1) by localhost.simon-shapiro.org with SMTP; 10 Nov 1999 00:24:51 -0000 Message-ID: <3828BB53.DD482CD2@simon-shapiro.org> Date: Tue, 09 Nov 1999 19:24:51 -0500 From: Simon Shapiro Organization: Simon's Garage X-Mailer: Mozilla 4.6 [en] (X11; I; FreeBSD 3.3-STABLE i386) X-Accept-Language: en MIME-Version: 1.0 To: freebsd-arch@freebsd.org Subject: I/O Evaluation Questions (Long but interesting!) Content-Type: text/plain; charset= Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hi Y'll Am working on a paper describing some I/O findings. The full text will appear on my web page some day soon, but I encountered some questions I would like input on. If this forum is the wrong one, tell me. All my measurements were done (unless otherwise specified) on a Dell PowerEdge 1300/600. Running UP RELENG_3 (SMP crashes on this box with some bizarre errors - irrelevant here). Disk subsystem is the DPT PM3755UW2 hooked up to 3 disk shelves; 2 with 7,200 ST39173LC (Ultra2 Barracuda), and 1 with 10,000 ST39102LC (Ultra2 Cheetah, with which I have no end of trouble - irrelevant). Very Relevant: The firmware on these controllers does NOT perform any READ caching (there is a way to trigger it on, but beats me what it is). It will cache WRITE operations, but the main use of the cache is to manage RAID-5 parity. Keep this in mind as you look at raw I/O results as these are with no cache in the kernel. Thus READ operations are not cached at all, except on block devices. 2 Main programs were used; dd: Used for sequential I/O, single user evaluation st.d: Used for random I/O, multi-user and stress loading. Base Line: # dd if=/dev/zero of=/dev/null bs=128k count=2048 2048+0 records in 2048+0 records out 268435456 bytes transferred in 0.603104 secs (445,089,832 bytes/sec) # st.d -f /dev/zero -s 1024m -i 100 & Throughput = 467893.33 I/O ops/Sec Bandwidth = 1827.7083 MB/Sec [ The above means to run st.d in (default) read mode, use the file /dev/zero for input, use (by default) random seeks), force a file size of 1GB (st.d normally finds sizes by itself), run 100 concurrent instances ] This establishes, that the given system can, from software point of view, perform close to half million I/O operations per second (ops/Sec), and generate and move almost 2GB of data per second, on a single CPU. Next, was to compile the KERNEL with the option I2O_IS_DEV_NULL. This leaves the i2o driver intact, except that it always completes everything successfully, without ever calling the hardware. This excludes any DMA overhead, but all the queue management, locking and other mess still runs. st.d -f /dev/ri2o0 -s 1024m -p 1000000 -i 100 & Throughput = 4043250.39 I/O ops/Sec Bandwidth = 15793.9469 MB/Sec [ The -p 1000000 forces 1 million passes. It is too fast otherwise ] From this we see that the driver consumes somewhat less than 0.25 microsecond per call. The megabytes per second are bogus as no data gets copied (raw device). Tests Run: The full story will be in the above mentioned paper, but essentially, we ran random seek and sequential tests on both raw devices and on block devices. We used either single disks (2GB partition), or a RAID-0 array (15GB partition). Results Highlights: RAW Devices (mostly with 100 workers): Sequential READ: RAID-0 = 47,001,025 bytes/sec (358 128K ops/Sec, 11,456 4K ops/Sec?) RAID-5+0 = 43,308,308 bytes/sec (330 128K ops/Sec, 10,560 4K ops/Sec?) Sequential WRITE: RAID-0 = 34,234,343 bytes/sec (261 128K ops/Sec, 8,352 4K ops/Sec?) RAID-5+0 = 24,624,350 bytes/sec (187 128K ops/Sec, 5,984 4k ops/Sec?) Random READ: RAID-0: = 1,660 ops/Sec RAID-5+0 = 1,943 ops/Sec Random WRITE: RAID-0 = 1,500 ops/Sec RAID-5+0 = 2,397 ops/Sec Block Devices: Sequential READ: RAID-0 = 14,263,937 bytes/sec (108 128K ops/Sec, 3,456 4K ops/Sec?) RAID-5+0 = 14,110,286 bytes/sec (107 128K ops/Sec, 3,425 4K ops/Sec?) Sequential WRITE: RAID-0 = 31,407,413 bytes/sec (958 128K ops/Sec, 30,656 4K ops/Sec?) RAID-5+0 = 33,707,151 bytes/sec (257 128K ops/Sec, 8,224 4K ops/Sec?) Random READ: RAID-0 = 33,578 ops/Sec (300 workers) RAID-5+0 = 42,784 ops/Sec (400 workers) Random WRITE: RAID-0 = 112.68 ops/Sec (200 workers) RAID-5+0 = 112.46 ops/Sec (100 workers) And this, ladies and gentlemen is what I do not understand; Why is random WRITE to a block device about 10-11 times slower than raw device? Actually, sequential read is 1/3 of raw device too. Why? I would _love_ it to be the driver, but the driver has no clue who/what calls it. Besides, the driver contributes less than 0.02% of these numbers. The IOP? It definitely has no clue. One interesting observation; The disks are way to busy for the random block device numbers. Parting Notes: * This is not a CAM problem. The OSM has nothing to do with CAM. * Source code for everything discussed here is at my ftp server, or my web server. * I'd really like to understand this problem. Any help rendered will be greatly appreciated. -- Sincerely Yours, Shimon@Simon-Shapiro.ORG 404.664.6401 Simon Shapiro Unwritten code has no bugs and executes at twice the speed of mouth To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message