Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Oct 2007 13:48:28 +0200
From:      Fabio Checconi <fabio@freebsd.org>
To:        Ulf Lilleengen <lulf@stud.ntnu.no>
Cc:        hackers@freebsd.org
Subject:   Re: Pluggable Disk Scheduler Project
Message-ID:  <20071011114828.GE18725@gandalf.sssup.it>
In-Reply-To: <20071011080734.GA20897@stud.ntnu.no>
References:  <20071011022001.GC13480@gandalf.sssup.it> <20071011080734.GA20897@stud.ntnu.no>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

> From: Ulf Lilleengen <lulf@stud.ntnu.no>
> Date: Thu, Oct 11, 2007 10:07:34AM +0200
>
> On tor, okt 11, 2007 at 04:20:01 +0200, Fabio Checconi wrote:
> >     o is working on disk scheduling worth at all?
> It is hard to say, but I'd like to run some benchmarks with this to see.
> Also, noted in [2], newer hardware does more magic on their own, as well as
> solid state drives coming along.
> 

this is why I wanted to start with some kind of prototype, hoping
that its simplicity does not limit too much the results we can obtain.


> >     o Where is the right place (in GEOM) for a disk scheduler?
> As discussed in [2], some suggested that disk scheduling should be done on a
> lower part of a kernel due to knowledge of hardware capabilities.
> 
> As discussed in [1], ata for instance do it's own scheduling, so this might
> ruin performance (Even the hardware might do some magic of it's own). I
> think I tried disabling it though, so shouldn't be a big deal for testing.
> 

I don't know if disabling the lower level queueing is needed, because
if you have only one outstanding request (or just a few ones, for
hardware that supports that, and that can be a parameter for the
scheduler) the lower level queueing will not reorder the higher level
schedule.


> >     o How can anticipation be introduced into the GEOM framework?
> This is actually perhaps one of the most interesting points, since the
> anticipation principle in itself fits here, but some other scheduling
> features might not be useful.
> 

Ok.  Decoupling the anticipation from other scheduling details may
not be easy, but this thing is all about trying :)


> >     o What can be an interface for disk schedulers?
> I think the interface developed in [1] is a pretty good one actually. I think
> the disksort-routines looked as a good place to do this. Even there it might
> not know enough about the hardware.
> 
> >     o How to deal with devices that handle multiple request per time?
> This is an example of the problems you get doing this in GEOM. You don't have
> very good knowledge of the hardware.
> 
> > So, as I've said, I'd like to know what you think about the subject,
> > if I'm missing something, if there is some kind of interest on this
> > and if/how can this work proceed.
> 
> Also, what would be interesting is implementing I/O priorities for processes
> to be able to give I/O time more fairly(or at least being able to set after
> preference) to processes. This was done in the Hybrid project, but this is
> something that definately could be done in GEOM. (I see you have some
> fairness in the g_as_dispatch routine though). 
> 

I totally agree.  My primary concern with this email was to know
what others have done/think about the problem, and to try to identify
some kind of interface and positioning for the scheduler.  The
actual scheduler has to be something _much_ more complex than this
little thing.  Hybrid ideas can be mapped to a CFQ-like scheduler
(one C-LOOK queue per process, fair sharing among queues, anticipation
on a per queue basis,) and I'm working on that with Paolo Valente (in CC,)
but I think the infrastructure behind the scheduler is more important
now, as it defines what the scheduler can do.


> However, I'll try testing the work you've got. I'll see if I can get some
> numbers with this when I get some disks up.
> 
> Btw, I did run some benchmark when I tried chaning bioq_disksort into a FIFO
> queue which didn's seem to lower performance (on SCSI and UMASS, but need to
> test again with ATA). It was a long time ago, so it should be tried again
> though.

I think this can depend on the access patterns used for testing (on
disks; of course on flash devices disk sorting is not needed at
all.)  If you have processes that do only synchronous requests there
is almost no difference between a .*LOOK elevator and FIFO queueing,
since in the queue there will always be only one request per process,
and you switch between processes every time you serve a new request
(of course the actual order will change, but the number of seeks
is the factor that really limits the throughput in this situation.
At least this is my understanding of the problem :) )

The test patterns we are using with Paolo try to pessimize the disk
throughput reading in parallel (simply with a dd, that generates a
typical example of greedy synchronous sequential read patterns,)
from two or more files put on partitions at the opposite ends (at
least considering their logical addresses) of the disk.  This
kind of access should generate something near to the worst case
behavior for a work-conserving .*LOOK scheduler.  Of course also
the behavior for asyncronous requests has to be tested.

Thank you very much for your feedback, I hope we can get some
numbers to substantiate this topic, remembering also that a
good interface is a requirement for a good scheduler.


> > 
> > [1]  http://wiki.freebsd.org/Hybrid
> > 
> > [2]  http://lists.freebsd.org/pipermail/freebsd-geom/2007-January/001854.html
> > 
> > [3]  The details of the anticipation are really not interesting as it
> >     is extremely simplified by purpose.
> > 
> > [4]  http://feanor.sssup.it/~fabio/freebsd/g_sched/ contains also an userspace
> >     client to experiment with the GEOM class.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071011114828.GE18725>