From owner-freebsd-stable@FreeBSD.ORG Tue Feb 18 16:57:43 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BCE107A9; Tue, 18 Feb 2014 16:57:43 +0000 (UTC) Received: from secure.freebsdsolutions.net (secure.freebsdsolutions.net [69.55.234.48]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 81C9A1AF7; Tue, 18 Feb 2014 16:57:43 +0000 (UTC) Received: from [10.10.1.198] (office.betterlinux.com [199.58.199.60]) (authenticated bits=0) by secure.freebsdsolutions.net (8.14.4/8.14.4) with ESMTP id s1IGvaVV079088 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Tue, 18 Feb 2014 11:57:37 -0500 (EST) (envelope-from lists@jnielsen.net) Content-Type: text/plain; charset=iso-8859-2 Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\)) Subject: Re: recovering from or increasing timeouts on virtio block device From: John Nielsen In-Reply-To: <18D133C0-E71B-4E66-A13F-6DC3B1BF620C@FreeBSD.org> Date: Tue, 18 Feb 2014 09:57:56 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <6F4E2014-5489-4055-962C-4DFC6184A18E@jnielsen.net> References: <920CC320-1A95-46E2-BB18-B6987805885E@jnielsen.net> <18D133C0-E71B-4E66-A13F-6DC3B1BF620C@FreeBSD.org> To: =?iso-8859-2?Q?Edward_Tomasz_Napiera=B3a?= X-Mailer: Apple Mail (2.1827) X-DCC-x.dcc-servers-Metrics: ns1.jnielsen.net 104; Body=3 Fuz1=3 Fuz2=3 X-Virus-Scanned: clamav-milter 0.97.8 at ns1.jnielsen.net X-Virus-Status: Clean Cc: Bryan Venteicher , freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Feb 2014 16:57:43 -0000 On Feb 18, 2014, at 3:32 AM, Edward Tomasz Napiera=B3a = wrote: > Wiadomo=B6=E6 napisana przez John Nielsen w dniu 17 lut 2014, o godz. = 21:21: >> I run several FreeBSD virtual machines in a Linux KVM environment = with a SAN. The VMs use virtio block storage, and the KVM hosts map the = virtual volumes to targets on the SAN. Occasionally, failover or other = maintenance events on the SAN cause it to be unavailable for 30+ = seconds. When this happens, the FreeBSD VMs have hard failures on the = vtbd* devices, and thereafter any attempted reads or writes return = immediately with an error (even after the SAN is responsive again). The = only way to recover a VM once that happens is to hard boot it. >>=20 >> Is there any way to adjust the timeouts or enable some kind of retry = for the virtio block devices? It would be nice to be able to recover = gracefully after a SAN event without needing to reboot the VMs. >=20 > Use gmountver(8) perhaps? Thanks for the tip (and for writing it :), I haven't encountered that = one before. I will experiment with it but I'm not sure it's a fit for = this particular scenario (at least not by itself). When a SAN event = happens the virtual machine's vtbd0 device doesn't disappear, the = underlying hardware just fails to respond for a long-ish time. I suspect = that the driver gives up after either a certain length of time or number = of errors, but my C driver-fu isn't up to figuring it out exactly. Once = it gives up, any I/O requests to the (still "present") device fail = immediately, and I can't see a way to get the driver to actually try any = (new or old) I/O again. JN