From owner-freebsd-arch@freebsd.org Mon Mar 12 18:07:29 2018 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 61FE2F31FB7; Mon, 12 Mar 2018 18:07:29 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id E2A0A69817; Mon, 12 Mar 2018 18:07:28 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.55.3]) by phk.freebsd.dk (Postfix) with ESMTP id A915327374; Mon, 12 Mar 2018 18:07:19 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.15.2/8.15.2) with ESMTPS id w2CI73BO056621 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 12 Mar 2018 18:07:03 GMT (envelope-from phk@critter.freebsd.dk) Received: (from phk@localhost) by critter.freebsd.dk (8.15.2/8.15.2/Submit) id w2CI72Hc056620; Mon, 12 Mar 2018 18:07:02 GMT (envelope-from phk) To: Warner Losh cc: Andriy Gapon , "freebsd-arch@freebsd.org" , freebsd-geom@freebsd.org Subject: Re: geom->access problem and workaround In-reply-to: From: "Poul-Henning Kamp" References: <809d9254-ee56-59d8-69a4-08838e985cea@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <56618.1520878022.1@critter.freebsd.dk> Content-Transfer-Encoding: quoted-printable Date: Mon, 12 Mar 2018 18:07:02 +0000 Message-ID: <56619.1520878022@critter.freebsd.dk> X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Mar 2018 18:07:29 -0000 -------- In message , Warner Losh writ es: >The storage layer generally doesn't expect higher-level locks around call= s >to it, and feels that it's free to sleep in the open routine for resource= s >to become available. This is true across most 'open' routines (eg, tty wi= ll >wait for the right signals, etc). In a world of removable media, I'm not >sure that one can avoid this. The original intent was that we would. Things would probably have been clearer if I had called it g_reserve() instead of g_access(). Removable media state was supposed to be a job for the driver(s background polling), and the geom event queue was supposed to do what needed to be done as a result of the g_access() calls. The primary reason is that messing around with the geom topology is a global operation, in order to keep things simple[1], and we really don't want global locks held for any amount of time and certainly not for mechanical-movement / failure-retry kinds of time. The secondary reason was to be able to present a consistent and precise view of the system *without* opening devices, so that disk maintenance tools would not spin up all disks, rattle all drawers and bang all doors before telling you what you wanted to know. >But I'm not sure that calling open on the underlying device is at all >compatible with the design goal of access being cheap. I think you can't >have both: either you open the device, and cope with the fact that open m= ay >sleep, or it looks like you'll have broken code. Once we've updated the >access counts, we can drop the topology lock to call open. So this is where it gets slightly tricky: When you open /dev/foobar, do you open the media or only a drivemechanism that *may* hold a media ? For any normal "hard-disk", there is no difference. But for floppies, CDROMs, ZIP drives, WORM drives, Robots-with-ATA-disks and other interesting hardware, which were relevant when GEOM was designed, and to some extent still are, you only open the drive, and will have to find out next if it has a media in it or not. In particular CDROMs forced this design decision, because the ioctls to open & close the tray on CDROM drives operated on the media access device node, and too many ports knew about that. Compare that with a tape-changer, which has one device node for the robotic parts and another for (each of) the tape drive(s). If we want to have an architectural sound way to do slow operations before any "user-I/O" is initiated, the right way to do so is to define new BIO_OPEN and BIO_CLOSE operation, and insist via asserts than all BIO_{READ|WRITE|DELETE} are wrapped in these. BIO_GETATTR should probably not require a BIO_OPEN/BIO_CLOSE. Poul-Henning [1] The alternative would be to have different sub-trees, each of which can be locked individually, but that requires a LOT of housekeeping and class-complexity in order to find out what those sub-trees actually are. -- = Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe = Never attribute to malice what can adequately be explained by incompetence= .