Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Feb 2014 09:57:56 -0700
From:      John Nielsen <lists@jnielsen.net>
To:        =?iso-8859-2?Q?Edward_Tomasz_Napiera=B3a?= <trasz@freebsd.org>
Cc:        Bryan Venteicher <bryanv@freebsd.org>, freebsd-stable@freebsd.org
Subject:   Re: recovering from or increasing timeouts on virtio block device
Message-ID:  <6F4E2014-5489-4055-962C-4DFC6184A18E@jnielsen.net>
In-Reply-To: <18D133C0-E71B-4E66-A13F-6DC3B1BF620C@FreeBSD.org>
References:  <920CC320-1A95-46E2-BB18-B6987805885E@jnielsen.net> <18D133C0-E71B-4E66-A13F-6DC3B1BF620C@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Feb 18, 2014, at 3:32 AM, Edward Tomasz Napiera=B3a =
<trasz@freebsd.org> wrote:

> Wiadomo=B6=E6 napisana przez John Nielsen w dniu 17 lut 2014, o godz. =
21:21:
>> I run several FreeBSD virtual machines in a Linux KVM environment =
with a SAN. The VMs use virtio block storage, and the KVM hosts map the =
virtual volumes to targets on the SAN. Occasionally, failover or other =
maintenance events on the SAN cause it to be unavailable for 30+ =
seconds. When this happens, the FreeBSD VMs have hard failures on the =
vtbd* devices, and thereafter any attempted reads or writes return =
immediately with an error (even after the SAN is responsive again). The =
only way to recover a VM once that happens is to hard boot it.
>>=20
>> Is there any way to adjust the timeouts or enable some kind of retry =
for the virtio block devices? It would be nice to be able to recover =
gracefully after a SAN event without needing to reboot the VMs.
>=20
> Use gmountver(8) perhaps?

Thanks for the tip (and for writing it :), I haven't encountered that =
one before. I will experiment with it but I'm not sure it's a fit for =
this particular scenario (at least not by itself). When a SAN event =
happens the virtual machine's vtbd0 device doesn't disappear, the =
underlying hardware just fails to respond for a long-ish time. I suspect =
that the driver gives up after either a certain length of time or number =
of errors, but my C driver-fu isn't up to figuring it out exactly. Once =
it gives up, any I/O requests to the (still "present") device fail =
immediately, and I can't see a way to get the driver to actually try any =
(new or old) I/O again.

JN




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6F4E2014-5489-4055-962C-4DFC6184A18E>