From owner-freebsd-xen@freebsd.org Wed Sep 20 15:30:03 2017 Return-Path: Delivered-To: freebsd-xen@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 70715E13574 for ; Wed, 20 Sep 2017 15:30:03 +0000 (UTC) (envelope-from SRS0=m11i=AV=quip.cz=000.fbsd@elsa.codelab.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 360466F5B2 for ; Wed, 20 Sep 2017 15:30:02 +0000 (UTC) (envelope-from SRS0=m11i=AV=quip.cz=000.fbsd@elsa.codelab.cz) Received: from elsa.codelab.cz (localhost [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id AE6002842B; Wed, 20 Sep 2017 17:23:15 +0200 (CEST) Received: from illbsd.quip.test (ip-86-49-16-209.net.upcbroadband.cz [86.49.16.209]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id C60BF28411; Wed, 20 Sep 2017 17:23:14 +0200 (CEST) Subject: Re: Storage 'failover' largely kills FreeBSD 10.x under XenServer? To: Karl Pielorz , =?UTF-8?Q?Roger_Pau_Monn=c3=a9?= Cc: freebsd-xen@freebsd.org References: <62BC29D8E1F6EA5C09759861@[10.12.30.106]> <20170920114418.pq6fhnexol2mvkxv@dhcp-3-128.uk.xensource.com> From: Miroslav Lachman <000.fbsd@quip.cz> Message-ID: <59C287E2.1030500@quip.cz> Date: Wed, 20 Sep 2017 17:23:14 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:42.0) Gecko/20100101 Firefox/42.0 SeaMonkey/2.39 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-xen@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion of the freebsd port to xen - implementation and usage List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Sep 2017 15:30:03 -0000 Karl Pielorz wrote on 2017/09/20 16:54: > > > --On 20 September 2017 at 12:44:18 +0100 Roger Pau Monné > wrote: > >>> Is there some 'tuneable' we can set to make the 10.3 boxes more tolerant >>> of the I/O delays that occur during a storage fail over? >> >> Do you know whether the VMs saw the disks disconnecting and then >> connecting again? > > I can't see any evidence the drives actually get 'disconnected' from the > VM's point of view. Plenty of I/O errors - but no "device destroyed" > type stuff. > > I have seen that kind of error logged on our test kit - when > deliberately failed non-HA storage, but I don't see it this time. > >> Hm, I have the feeling that part of the problem is that in-flight >> requests are basically lost when a disconnect/reconnect happens. > > So if a disconnect doesn't happen (as it appears it isn't) - is there > any tunable to set the I/O timeout? > > 'sysctl -a | grep timeout' finds things like: > > kern.cam.ada.default_timeout=30 Yes, you can try to set kern.cam.ada.default_timeout to 60 or more, but it can has downside too. Miroslav Lachman