From owner-freebsd-geom@FreeBSD.ORG Sat Dec 20 20:56:51 2014 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3A3FF8A0 for ; Sat, 20 Dec 2014 20:56:51 +0000 (UTC) Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com [IPv6:2a00:1450:400c:c05::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BC7B81BBB for ; Sat, 20 Dec 2014 20:56:50 +0000 (UTC) Received: by mail-wi0-f171.google.com with SMTP id bs8so4953903wib.4 for ; Sat, 20 Dec 2014 12:56:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=5WSHFoIjP3U1Y2bqm3hBc6VW3Fvu9Nn9slkdYs1Wfpg=; b=kxsoZfHH48Cq75y9F5wHnwIjy/3N9V5UljbIpeYcKBheyV9hnq/FY1mHbbc44uctAB FebwxE1sG2UpqHPmlcDhVluwj3ag7Y+d9JOayO3OePkBciRK5WaUMTVhksdtFGoaqP0V 7xjmAw0LdIg07fY4qHfBI/aoJik8RuWRUVVX6l67hgCiDXWy6NnCOE0o63ybH/RqdKTW qiuK29DQP63zlysH4Yi7U8ojSDxdGljpYggp+afcp7sfPVvgKjpe0+tM6UKTeXooAIPP s0J+vDGad1DnlsIwZloE3M3sxbL6G6r2cB76ngVZWYdP+/soR/7ObSqefoI/qvcE0RVA 4IrA== MIME-Version: 1.0 X-Received: by 10.180.20.6 with SMTP id j6mr16915353wie.59.1419109009217; Sat, 20 Dec 2014 12:56:49 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.216.106.195 with HTTP; Sat, 20 Dec 2014 12:56:49 -0800 (PST) In-Reply-To: References: <20141219015210.GY25139@funkthat.com> Date: Sat, 20 Dec 2014 12:56:49 -0800 X-Google-Sender-Auth: ff2GUKpmMNxb7GjRMD4KrE6xKdY Message-ID: Subject: Re: Converting LBAs to byte offsets through the GEOM stack From: Adrian Chadd To: "Pokala, Ravi" Content-Type: text/plain; charset=UTF-8 Cc: John-Mark Gurney , "freebsd-geom@freebsd.org" X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Dec 2014 20:56:51 -0000 On 20 December 2014 at 11:54, Pokala, Ravi wrote: > Hi Adrian, > >>So when doing stuff like this, I ended up piggybacking commands through >>the translation layers, so stuff was done (a) in line with the rest of IO >>processing, and (b) wouldn't suffer from stale data. > > Could you expand on that a little? So say you had a geom layer that was doing bad block remapping. It's a black box with a queue (and now it'd be a black box with locks protecting the state, since there's direct dispatch GEOM, but ..) where you push in IO requests to some particular offsets, and the black box figures out which real disk / real offsets those requests are for. So to start with, you issue a request for block 0 from your geom black box, and it maps it to block 0 on disk 0. At some point it decides that it should map it to block 100 on disk 0 (or block 0 on disk 1, etc.) The only thing that knows about the current state of the mapping is that black box. And it's up to that black box to make sure that the IO requests that are coming in get mapped to the right places. If you have multiple dispatch threads that are sending the black box requests, it's up to the black box to ensure that some ordering/consistency for where things are mapped to occurs. So, imagine then you want to do a reverse lookup. You ask through the layer for what disk/block backs "block 0." It tells you, "block 0, disk 0." Now, that's valid as long as the remapping layer doesn't change that underneath you. If it decides to, you don't know - so when you send your direct-to-disk request as you said, it may be right for the time you did the reverse lookup, but it's certainly not right "now." When i was doing this stuff, it was a kind of bad block remapping and disk mirroring thing for caching disk blocks. So when you issued a request for "block 0 from this provider", it (a) would map to some arbitrary disk and arbitrary offset, (b) that could change at any point and your information would be stale, and (c) it may have mapped to multiple backend disks, so what you really needed to do was send that command to "all" the disks that backed that particular block. So I had a thing that I attached commands to that would funnel down to the geom layer that did this mirroring/caching/remapping thing, and it would handle schedule the commands to whatever block(s) on whatever disk(s) actually represented that particular logical offset. I actually had something that'd let me issue commands that would map to a single command to a single disk, or could be replicated to multiple commands to multiple disks (and then i'd just get the completion from them all in the reply message, as the bio didn't have enough space to write multiple block reads into, and mostly I was issuing status check commands like you are. :) Is that making more sense? I can whiteboard it up next time we're in the same place. -adrian