From owner-freebsd-scsi@FreeBSD.ORG Thu Aug 14 17:09:27 2008 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 100B01065694 for ; Thu, 14 Aug 2008 17:09:27 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id 997B38FC15 for ; Thu, 14 Aug 2008 17:09:26 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from phobos.local ([192.168.254.200]) (authenticated bits=0) by pooker.samsco.org (8.14.2/8.14.2) with ESMTP id m7EH7uqI053770; Thu, 14 Aug 2008 11:07:56 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <48A4666C.6080008@samsco.org> Date: Thu, 14 Aug 2008 11:07:56 -0600 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.13) Gecko/20080313 SeaMonkey/1.1.9 MIME-Version: 1.0 To: Carole Macheret References: <4874F53A0200001300130DE3@gw.vibro-meter.com> <48A465B10200001300132295@gw.vibro-meter.com> <48A46586.1F16.0013.0@ch.meggitt.com> In-Reply-To: <48A46586.1F16.0013.0@ch.meggitt.com> X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.4 required=3.8 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org Cc: freebsd-scsi@freebsd.org, Roland Rothen Subject: Re: g_vfs_done X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Aug 2008 17:09:27 -0000 Carole Macheret wrote: > Hello, > > We are using FreeBSD 7.0-RELEASE #1 running Squid and Zabbix on vmware ESX 3.0.2 and our vmware ESX servers access our SAN through IpStor cluster (Storage virtualization and mirroring). > > We have 2 storages (EVA 6100) and the IpStor solution allows us to mirror disks on both EVAs. > > We have a problem with both the Zabbix and Squid FreeBSD virtual machines, when the virtual machine is loosing its disks (EVA controller reboot or ipstor cluster failover), we have several "g_vfs_done() : da1s1d[WRITE(offset=2312431234, length=12453)] error= 5" errors then the host is definitively frozen. The disk loss lasts 1-5 seconds. Windows virtual machines do freeze during the loss then continue working. On Windows we had to specify a longer timeout for local disk in registry. > > Does anybody has an idea what could be tuned to avoid this problem ? > > Attached you can find the dmesg and a screenshot of the g_vfs_done error... > > Thanks in advance for your help > So the virtual disks that the FreeBSD images are using in VMWare are on an IpStor, and those periodically go away, yes? What's probably happening is that the VMWare host is triggering an event in the FreeBSD client VM that essentially is making the virtual disks go away. Inside the FreeBSD VM, the SCSI layer tries to talk to the disk and gets a selection timeout since the disk is no longer there. It doesn't know that this is a temporary state, and it declares the I/O as failed. At that point, the BSD VM gets upset and everything gets bad. There is a property called kern.cam.da.default_timeout. It's set to 60 seconds, but I don't think that it will help you in this case, since it's likely that the i/o is failing because of a selection timeout, not because the virtual disk is slow in completing the i/o. The kern.cam.da.retry_count property is set to 5, and changing it might help since it might be able to force enough retries to give time for the virtual disk to come back. Try the following command on a running system: sysctl kern.cam.da.retry_count=100 This will allow for about 25 seconds worth of retries (a selection attempt takes 250ms, so you'll get about 4 retries per second). If this doesn't work, try configuring VMWare to give you a serial console that you can capture on the host, then set bootverbose during boot and send me the log once the problem happens. Scott