From owner-freebsd-current Mon Mar 20 14: 4:58 2000 Delivered-To: freebsd-current@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 8372D37BA14 for ; Mon, 20 Mar 2000 14:04:54 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id OAA72087; Mon, 20 Mar 2000 14:04:48 -0800 (PST) (envelope-from dillon) Date: Mon, 20 Mar 2000 14:04:48 -0800 (PST) From: Matthew Dillon Message-Id: <200003202204.OAA72087@apollo.backplane.com> To: Poul-Henning Kamp Cc: current@FreeBSD.ORG Subject: Re: patches for test / review References: <20074.953579833@critter.freebsd.dk> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG : :In message <200003201846.KAA70820@apollo.backplane.com>, Matthew Dillon writes: : :> Well, let me tell you what the fuzzy goal is first and then maybe we :> can work backwards. :> :> Eventually all physical I/O needs a physical address. The quickest :> way to get to a physical address is to be given an array of vm_page_t's :> (which can be trivially translated to physical addresses). : :Not all: :PIO access to ATA needs virtual access. :RAID5 needs virtual access to calculate parity. ... which means that the initial implementation for PIO and RAID5 utilizes the mapped-buffer bioops interface rather then the b_pages[] bioops interface. But here's the point: We need to require that all entries *INTO* the bio system start with at least b_pages[] and then generate b_data only when necessary. If a particular device needs a b_data mapping, it can get one, but I think it would be a huge mistake to allow entry into the device subsystem to utilize *either* a b_data mapping *or* a b_pages[] mapping. Big mistake. There has to be a lowest common denominator that the entire system can count on and it pretty much has to be an array of vm_page_t's. If a particular subsystem needs b_data, then that subsystem is obviously willing to take the virtual mapping / unmapping hit. If you look at Greg's current code this is, in fact, what is occuring.... the critical path through the buffer cache in a heavily loaded system tends to require a KVA mapping *AND* a KVA unmapping on every buffer access (just that the unmappings tend to be for unrelated buffers). The reason this occurs is because even with the larger amount of KVA we made available to the buffer cache in 4.x, there still isn't enough to leave mappings intact for long periods of time. A 'systat -vm 1' will show you precisely what I mean (also sysctl -a | fgrep bufspace). So we will at least not be any worse off then we are now, and probably better off since many of the buffers in the new system will not have to be mapped. For example, when vinum's RAID5 breaks up a request and issues a driveio() it passes a buffer which is assigned to b_data which must be translated (through page table lookups) to physical addresses anyway, so the fact that that vinum does not populate b_pages[] does *NOT* help it in the least. It actually makes the job harder. -Matt Matthew Dillon :-- :Poul-Henning Kamp FreeBSD coreteam member :phk@FreeBSD.ORG "Real hackers run -current on their laptop." :FreeBSD -- It will take a long time before progress goes too far! : To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message