From owner-freebsd-geom@FreeBSD.ORG  Sat Dec 20 20:56:51 2014
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3A3FF8A0
 for <freebsd-geom@freebsd.org>; Sat, 20 Dec 2014 20:56:51 +0000 (UTC)
Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com
 [IPv6:2a00:1450:400c:c05::22b])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id BC7B81BBB
 for <freebsd-geom@freebsd.org>; Sat, 20 Dec 2014 20:56:50 +0000 (UTC)
Received: by mail-wi0-f171.google.com with SMTP id bs8so4953903wib.4
 for <freebsd-geom@freebsd.org>; Sat, 20 Dec 2014 12:56:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=5WSHFoIjP3U1Y2bqm3hBc6VW3Fvu9Nn9slkdYs1Wfpg=;
 b=kxsoZfHH48Cq75y9F5wHnwIjy/3N9V5UljbIpeYcKBheyV9hnq/FY1mHbbc44uctAB
 FebwxE1sG2UpqHPmlcDhVluwj3ag7Y+d9JOayO3OePkBciRK5WaUMTVhksdtFGoaqP0V
 7xjmAw0LdIg07fY4qHfBI/aoJik8RuWRUVVX6l67hgCiDXWy6NnCOE0o63ybH/RqdKTW
 qiuK29DQP63zlysH4Yi7U8ojSDxdGljpYggp+afcp7sfPVvgKjpe0+tM6UKTeXooAIPP
 s0J+vDGad1DnlsIwZloE3M3sxbL6G6r2cB76ngVZWYdP+/soR/7ObSqefoI/qvcE0RVA
 4IrA==
MIME-Version: 1.0
X-Received: by 10.180.20.6 with SMTP id j6mr16915353wie.59.1419109009217; Sat,
 20 Dec 2014 12:56:49 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.216.106.195 with HTTP; Sat, 20 Dec 2014 12:56:49 -0800 (PST)
In-Reply-To: <D0BB136C.1280A4%rpokala@panasas.com>
References: <D0B89F30.127DAE%rpokala@panasas.com>
 <20141219015210.GY25139@funkthat.com>
 <D0B8C76C.127E55%rpokala@panasas.com>
 <CAJ-VmokV3-ZRQmVZWcHUSxccwaRxySDExoSiF8+sgHtkHN5_yg@mail.gmail.com>
 <D0BB136C.1280A4%rpokala@panasas.com>
Date: Sat, 20 Dec 2014 12:56:49 -0800
X-Google-Sender-Auth: ff2GUKpmMNxb7GjRMD4KrE6xKdY
Message-ID: <CAJ-Vmomm2yst=NN6hYopY7DR_Nw=HDa2v-Y9xtqji8xZn5b92A@mail.gmail.com>
Subject: Re: Converting LBAs to byte offsets through the GEOM stack
From: Adrian Chadd <adrian@freebsd.org>
To: "Pokala, Ravi" <rpokala@panasas.com>
Content-Type: text/plain; charset=UTF-8
Cc: John-Mark Gurney <jmg@funkthat.com>,
 "freebsd-geom@freebsd.org" <freebsd-geom@freebsd.org>
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: GEOM-specific discussions and implementations
 <freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-geom>,
 <mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom/>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
 <mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Dec 2014 20:56:51 -0000

On 20 December 2014 at 11:54, Pokala, Ravi <rpokala@panasas.com> wrote:
> Hi Adrian,
>
>>So when doing stuff like this, I ended up piggybacking commands through
>>the translation layers, so stuff was done (a) in line with the rest of IO
>>processing, and (b) wouldn't suffer from stale data.
>
> Could you expand on that a little?

So say you had a geom layer that was doing bad block remapping.

It's a black box with a queue (and now it'd be a black box with locks
protecting the state, since there's direct dispatch GEOM, but ..)
where you push in IO requests to some particular offsets, and the
black box figures out which real disk / real offsets those requests
are for.

So to start with, you issue a request for block 0 from your geom black
box, and it maps it to block 0 on disk 0.

At some point it decides that it should map it to block 100 on disk 0
(or block 0 on disk 1, etc.)

The only thing that knows about the current state of the mapping is
that black box. And it's up to that black box to make sure that the IO
requests that are coming in get mapped to the right places. If you
have multiple dispatch threads that are sending the black box
requests, it's up to the black box to ensure that some
ordering/consistency for where things are mapped to occurs.

So, imagine then you want to do a reverse lookup. You ask through the
layer for what disk/block backs "block 0." It tells you, "block 0,
disk 0." Now, that's valid as long as the remapping layer doesn't
change that underneath you. If it decides to, you don't know - so when
you send your direct-to-disk request as you said, it may be right for
the time you did the reverse lookup, but it's certainly not right
"now."

When i was doing this stuff, it was a kind of bad block remapping and
disk mirroring thing for caching disk blocks. So when you issued a
request for "block 0 from this provider", it (a) would map to some
arbitrary disk and arbitrary offset, (b) that could change at any
point and your information would be stale, and (c) it may have mapped
to multiple backend disks, so what you really needed to do was send
that command to "all" the disks that backed that particular block.

So I had a thing that I attached commands to that would funnel down to
the geom layer that did this mirroring/caching/remapping thing, and it
would handle schedule the commands to whatever block(s) on whatever
disk(s) actually represented that particular logical offset. I
actually had something that'd let me issue commands that would map to
a single command to a single disk, or could be replicated to multiple
commands to multiple disks (and then i'd just get the completion from
them all in the reply message, as the bio didn't have enough space to
write multiple block reads into, and mostly I was issuing status check
commands like you are. :)

Is that making more sense? I can whiteboard it up next time we're in
the same place.


-adrian