From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 11 11:48:33 2007 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7869016A41B for ; Thu, 11 Oct 2007 11:48:33 +0000 (UTC) (envelope-from fabio@freebsd.org) Received: from sssup.it (ms01.sssup.it [193.205.80.99]) by mx1.freebsd.org (Postfix) with ESMTP id EDBC413C45D for ; Thu, 11 Oct 2007 11:48:32 +0000 (UTC) (envelope-from fabio@freebsd.org) Received: from [10.30.3.4] (HELO granpasso.retis) by sssup.it (CommuniGate Pro SMTP 4.1.8) with SMTP id 34897922 for hackers@freebsd.org; Thu, 11 Oct 2007 13:37:17 +0200 Received: (qmail 19536 invoked from network); 11 Oct 2007 11:48:30 -0000 Received: from unknown (HELO granpasso.retis) (127.0.0.1) by localhost.retis with SMTP; 11 Oct 2007 11:48:30 -0000 Received: (from fabio@localhost) by granpasso.retis (8.14.1/8.14.1/Submit) id l9BBmSMw019534; Thu, 11 Oct 2007 13:48:28 +0200 (CEST) (envelope-from fabio@freebsd.org) X-Authentication-Warning: granpasso.retis: fabio set sender to fabio@freebsd.org using -f Date: Thu, 11 Oct 2007 13:48:28 +0200 From: Fabio Checconi To: Ulf Lilleengen Message-ID: <20071011114828.GE18725@gandalf.sssup.it> References: <20071011022001.GC13480@gandalf.sssup.it> <20071011080734.GA20897@stud.ntnu.no> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071011080734.GA20897@stud.ntnu.no> User-Agent: Mutt/1.4.2.3i Cc: hackers@freebsd.org Subject: Re: Pluggable Disk Scheduler Project X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Oct 2007 11:48:33 -0000 Hi, > From: Ulf Lilleengen > Date: Thu, Oct 11, 2007 10:07:34AM +0200 > > On tor, okt 11, 2007 at 04:20:01 +0200, Fabio Checconi wrote: > > o is working on disk scheduling worth at all? > It is hard to say, but I'd like to run some benchmarks with this to see. > Also, noted in [2], newer hardware does more magic on their own, as well as > solid state drives coming along. > this is why I wanted to start with some kind of prototype, hoping that its simplicity does not limit too much the results we can obtain. > > o Where is the right place (in GEOM) for a disk scheduler? > As discussed in [2], some suggested that disk scheduling should be done on a > lower part of a kernel due to knowledge of hardware capabilities. > > As discussed in [1], ata for instance do it's own scheduling, so this might > ruin performance (Even the hardware might do some magic of it's own). I > think I tried disabling it though, so shouldn't be a big deal for testing. > I don't know if disabling the lower level queueing is needed, because if you have only one outstanding request (or just a few ones, for hardware that supports that, and that can be a parameter for the scheduler) the lower level queueing will not reorder the higher level schedule. > > o How can anticipation be introduced into the GEOM framework? > This is actually perhaps one of the most interesting points, since the > anticipation principle in itself fits here, but some other scheduling > features might not be useful. > Ok. Decoupling the anticipation from other scheduling details may not be easy, but this thing is all about trying :) > > o What can be an interface for disk schedulers? > I think the interface developed in [1] is a pretty good one actually. I think > the disksort-routines looked as a good place to do this. Even there it might > not know enough about the hardware. > > > o How to deal with devices that handle multiple request per time? > This is an example of the problems you get doing this in GEOM. You don't have > very good knowledge of the hardware. > > > So, as I've said, I'd like to know what you think about the subject, > > if I'm missing something, if there is some kind of interest on this > > and if/how can this work proceed. > > Also, what would be interesting is implementing I/O priorities for processes > to be able to give I/O time more fairly(or at least being able to set after > preference) to processes. This was done in the Hybrid project, but this is > something that definately could be done in GEOM. (I see you have some > fairness in the g_as_dispatch routine though). > I totally agree. My primary concern with this email was to know what others have done/think about the problem, and to try to identify some kind of interface and positioning for the scheduler. The actual scheduler has to be something _much_ more complex than this little thing. Hybrid ideas can be mapped to a CFQ-like scheduler (one C-LOOK queue per process, fair sharing among queues, anticipation on a per queue basis,) and I'm working on that with Paolo Valente (in CC,) but I think the infrastructure behind the scheduler is more important now, as it defines what the scheduler can do. > However, I'll try testing the work you've got. I'll see if I can get some > numbers with this when I get some disks up. > > Btw, I did run some benchmark when I tried chaning bioq_disksort into a FIFO > queue which didn's seem to lower performance (on SCSI and UMASS, but need to > test again with ATA). It was a long time ago, so it should be tried again > though. I think this can depend on the access patterns used for testing (on disks; of course on flash devices disk sorting is not needed at all.) If you have processes that do only synchronous requests there is almost no difference between a .*LOOK elevator and FIFO queueing, since in the queue there will always be only one request per process, and you switch between processes every time you serve a new request (of course the actual order will change, but the number of seeks is the factor that really limits the throughput in this situation. At least this is my understanding of the problem :) ) The test patterns we are using with Paolo try to pessimize the disk throughput reading in parallel (simply with a dd, that generates a typical example of greedy synchronous sequential read patterns,) from two or more files put on partitions at the opposite ends (at least considering their logical addresses) of the disk. This kind of access should generate something near to the worst case behavior for a work-conserving .*LOOK scheduler. Of course also the behavior for asyncronous requests has to be tested. Thank you very much for your feedback, I hope we can get some numbers to substantiate this topic, remembering also that a good interface is a requirement for a good scheduler. > > > > [1] http://wiki.freebsd.org/Hybrid > > > > [2] http://lists.freebsd.org/pipermail/freebsd-geom/2007-January/001854.html > > > > [3] The details of the anticipation are really not interesting as it > > is extremely simplified by purpose. > > > > [4] http://feanor.sssup.it/~fabio/freebsd/g_sched/ contains also an userspace > > client to experiment with the GEOM class.