Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 1 May 2005 11:36:58 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Arne WXrner <arne_woerner@yahoo.com>
Cc:        Petri Helenius <pete@he.iki.fi>
Subject:   Re: Very low disk performance on 5.x
Message-ID:  <20050501112902.S66519@fledge.watson.org>
In-Reply-To: <20050430211230.20778.qmail@web41203.mail.yahoo.com>
References:  <20050430211230.20778.qmail@web41203.mail.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sat, 30 Apr 2005, Arne WXrner wrote:

> 3. The man page geom(4) of R5.3 says "The GEOM framework
>   provides an infrastructure in which "classes" can per-
>   form transformations on disk I/O requests on their path
>   from the upper kernel to the device drivers and back.
>
> Could it be, that geom slows something down (in some boxes the reading 
> ops are very slow; in my box the writing ops are very slow)?

There are three types of overhead associated with GEOM, some of which 
existed in 4.x also, just not under the name "GEOM".  Some can be easily 
characterized through benchmarking just on 5.x, other bits cannot.

Here they are:

(1) Fixed overhead per-transaction of entering and leaving the GEOM
     framework.  Because this involves context switches and queueing, this
     overhead can be amortized under high transaction rates.

(2) Cost of entering each "GEOM module" as part of the framework, or
     costs assocated with any GEOM module you might run, which typically
     involves allocating a bio, as well as queueing operations.

(3) Cost of specific GEOM modules, such as transforms, RAID, etc -- may
     include computation, scatter/gather of small I/Os into larger ones,
     etc.

However, it's worth noting that GEOM also introdues performance benefits, 
such as create a clean hand-off separation between the file system code 
and the device code, so that MPSAFE devices can interact safely with 
non-MPSAFE file systems, and in 6.x, MPSAFE file systems can interact 
safely wit non-MPSAFE storage devices.  It also permits parallelism -- 
various bits of storage processing and handling can be running on a 
separate CPU from a file system generating a set of synchronous I/O's.

One interesting set of micro-benchmarks to identify the incremental costs 
of (2) and (3) is to run identical I/O transations against the same 
regions of physical disk using different layers in the partition stack. 
I.e., against a region of ad0s1a, against an offset region on ad0s1, and 
against a further offset region of ad0.  If they're against the same 
bits of disk, the main difference here will be the additional processing 
of the layers in the stack.  A little bit of math is required to figure 
out the offset, but dd should be usable to figure out the incremental 
cost.

Robert N M Watson



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050501112902.S66519>