Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 11 Feb 1997 13:38:03 -0800 (PST)
From:      Simon Shapiro <Shimon@i-Connect.Net>
To:        freebsd-hackers@freebsd.org
Subject:   Raw I/O Question
Message-ID:  <XFMail.970211141038.Shimon@i-Connect.Net>

next in thread | raw e-mail | index | archive | help
Can someone take a moment and describe briefly the execution path of a
lseek/read/write system call to a raw (character) SCSI partition?

We are very interested in the most optimal, shortest path to I/O on
a large number of disks.

We performed some measurements and see some results we would like to
understand;

For example, we did READ and WRITE to random records in a block device.
The test was run several times, each using a different block size
(starting at 512 bytes and ending with 128KB).  All our measurements
are in I/O Transfers/Sec.

We see a depression in READ and WRITE performance, until block size
reaches 2K. At this point performance picks up and levels off until
block size reaches 8KB.  At this point it starts gradual, linear
decline.

What we see is a flat WRITE response until 2K.  then it starts a linear
decline until it reaches 8K block size.  At this point it converges 
with READ performance.  The initial WRITE performance, for small blocks
is quite poor compared to READ.  We attribute it to the need to do
read-modify-write when blocks are smaller than a certain ``natural block
size (page?).  Another attribute of performance loss, we think to be the
lack of O_SYNC) option to the write(2) system call.  This forces the 
application to do an fsync after EVERY WRITE.  We have to do that for
many good reasons.

The READ performance is even more peculiar.  It starts higher than
WRITE, declines rapidly until block size reaches 2K.  It peaks at 4K
blocks and starts a linear decline from that point on (as block size 
increases).

We intend to use the RAW (character) device with the mpool buffering
system and would like to understand its behavior without reading the
WHOLE kernel source :-)

We are very interested in the flow of control and flow of data.

How do synchronous WRITE operations pass through?  We need this to
guarantee transaction completion (commits)
There are several problems here we want to understand:

How does the system call logic transfer control to the SCSI layer?
All we see is the condtruction of a struct buf and a call to
scsi_scsi_cmd.  How is the SCSI FLUSH CACHE passed down?  We may need
to trap it in the HBA driver, so the HBA can flush its buffers too.

What block size I/O do we need so that we do not ever do
read-modify-write?

This sort of questions...  Easy stuf...

I hope this community (which has welcomed me very warmly and has been
so helpful, will find these questions useful.  Maybe when one of us is
older and has more time on their hands {s}he will write``FreeBSD
Internals'' book and all will be well in Zion...

Simon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.970211141038.Shimon>