From owner-freebsd-stable@FreeBSD.ORG  Fri Feb 21 17:14:55 2014
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 5D4101E7;
 Fri, 21 Feb 2014 17:14:55 +0000 (UTC)
Received: from secure.freebsdsolutions.net (secure.freebsdsolutions.net
 [69.55.234.48])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 350A4165A;
 Fri, 21 Feb 2014 17:14:54 +0000 (UTC)
Received: from [10.10.1.198] (office.betterlinux.com [199.58.199.60])
 (authenticated bits=0)
 by secure.freebsdsolutions.net (8.14.4/8.14.4) with ESMTP id s1LHEk1E079094
 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT);
 Fri, 21 Feb 2014 12:14:47 -0500 (EST)
 (envelope-from lists@jnielsen.net)
Content-Type: text/plain; charset=iso-8859-2
Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\))
Subject: Re: recovering from or increasing timeouts on virtio block device
From: John Nielsen <lists@jnielsen.net>
In-Reply-To: <CAGaYwLf+EhtUjLGfz6GynCGe3SwFijETLaqDxNjYA5rpN-HOHQ@mail.gmail.com>
Date: Fri, 21 Feb 2014 10:15:15 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <FB4CC1CC-FF06-4354-87D4-72DB79CB7D3C@jnielsen.net>
References: <920CC320-1A95-46E2-BB18-B6987805885E@jnielsen.net>
 <18D133C0-E71B-4E66-A13F-6DC3B1BF620C@FreeBSD.org>
 <6F4E2014-5489-4055-962C-4DFC6184A18E@jnielsen.net>
 <CAGaYwLf+EhtUjLGfz6GynCGe3SwFijETLaqDxNjYA5rpN-HOHQ@mail.gmail.com>
To: Bryan Venteicher <bryanv@freebsd.org>
X-Mailer: Apple Mail (2.1827)
X-DCC-Etherboy-Metrics: ns1.jnielsen.net 1002; Body=2 Fuz1=2 Fuz2=2
X-Virus-Scanned: clamav-milter 0.97.8 at ns1.jnielsen.net
X-Virus-Status: Clean
Cc: "freebsd-stable@freebsd.org Stable" <freebsd-stable@freebsd.org>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Feb 2014 17:14:55 -0000

On Feb 18, 2014, at 10:14 AM, Bryan Venteicher <bryanv@freebsd.org> =
wrote:

> On Tue, Feb 18, 2014 at 10:57 AM, John Nielsen <lists@jnielsen.net> =
wrote:
>> On Feb 18, 2014, at 3:32 AM, Edward Tomasz Napiera=B3a =
<trasz@freebsd.org> wrote:
>>=20
>> > Wiadomo=B6=E6 napisana przez John Nielsen w dniu 17 lut 2014, o =
godz. 21:21:
>> >> I run several FreeBSD virtual machines in a Linux KVM environment =
with a SAN. The VMs use virtio block storage, and the KVM hosts map the =
virtual volumes to targets on the SAN. Occasionally, failover or other =
maintenance events on the SAN cause it to be unavailable for 30+ =
seconds. When this happens, the FreeBSD VMs have hard failures on the =
vtbd* devices, and thereafter any attempted reads or writes return =
immediately with an error (even after the SAN is responsive again). The =
only way to recover a VM once that happens is to hard boot it.
>> >>
>> >> Is there any way to adjust the timeouts or enable some kind of =
retry for the virtio block devices? It would be nice to be able to =
recover gracefully after a SAN event without needing to reboot the VMs.
>> >
>> > Use gmountver(8) perhaps?
>>=20
>> Thanks for the tip (and for writing it :), I haven't encountered that =
one before. I will experiment with it but I'm not sure it's a fit for =
this particular scenario (at least not by itself). When a SAN event =
happens the virtual machine's vtbd0 device doesn't disappear, the =
underlying hardware just fails to respond for a long-ish time. I suspect =
that the driver gives up after either a certain length of time or number =
of errors, but my C driver-fu isn't up to figuring it out exactly. Once =
it gives up, any I/O requests to the (still "present") device fail =
immediately, and I can't see a way to get the driver to actually try any =
(new or old) I/O again.
>=20
> The vtbd driver has no internal retry mechanism, and pays no attention =
to errors other than report then, and never gives up :)
>=20
> It is not clear to me whether IO is getting turned around in FreeBSD =
before it reaches the driver, or within the host. Do you continue to see =
"hard error ..." messages on the console?

Thanks for chiming in. I was in too much of a hurry to get the VM =
running again last time the issue appeared to capture any useful log =
messages, and of course none of them were committed to disk so nothing =
was available following a reboot.

I will see what I can get next time it happens and follow up on this =
thread again.

JN