From owner-freebsd-stable@FreeBSD.ORG Mon Nov 17 01:13:23 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8EDC11065670; Mon, 17 Nov 2008 01:13:23 +0000 (UTC) (envelope-from p.christias@noc.ntua.gr) Received: from achilles.noc.ntua.gr (achilles.noc.ntua.gr [IPv6:2001:648:2000:de::210]) by mx1.freebsd.org (Postfix) with ESMTP id F109D8FC17; Mon, 17 Nov 2008 01:13:22 +0000 (UTC) (envelope-from p.christias@noc.ntua.gr) Received: from ajax.noc.ntua.gr (ajax6.noc.ntua.gr [IPv6:2001:648:2000:dc::1]) by achilles.noc.ntua.gr (8.14.3/8.14.3) with ESMTP id mAH1DIRJ013941 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Mon, 17 Nov 2008 03:13:18 +0200 (EET) (envelope-from p.christias@noc.ntua.gr) Received: from ajax.noc.ntua.gr (localhost.noc.ntua.gr [127.0.0.1]) by ajax.noc.ntua.gr (8.13.8/8.13.8) with ESMTP id mAH1DHqN056431 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Mon, 17 Nov 2008 03:13:17 +0200 (EET) (envelope-from p.christias@noc.ntua.gr) Received: (from christia@localhost) by ajax.noc.ntua.gr (8.13.8/8.13.8/Submit) id mAH1DHVK056430; Mon, 17 Nov 2008 03:13:17 +0200 (EET) (envelope-from p.christias@noc.ntua.gr) X-Authentication-Warning: ajax.noc.ntua.gr: christia set sender to p.christias@noc.ntua.gr using -f Date: Mon, 17 Nov 2008 03:13:17 +0200 From: Panagiotis Christias To: Oleg Sharoiko Message-ID: <20081117011317.GB52109@noc.ntua.gr> References: <20081014222343.GA8706@noc.ntua.gr> <1224049455.1277.44.camel@brain.cc.rsu.ru> <20081015175453.GA3260@noc.ntua.gr> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20081015175453.GA3260@noc.ntua.gr> User-Agent: Mutt/1.5.16 (2007-06-09) X-Virus-Scanned: ClamAV version 0.94, clamav-milter version 0.94 on achilles.noc.ntua.gr X-Virus-Status: Clean Cc: freebsd-scsi@freebsd.org, freebsd-stable@freebsd.org Subject: Re: FreeBSD 7-STABLE, isp(4), QLE2462: panic & deadlocks X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2008 01:13:23 -0000 On Wed, Oct 15, 2008 at 08:54:53PM +0300, Panagiotis Christias wrote: > On Wed, Oct 15, 2008 at 09:44:15AM +0400, Oleg Sharoiko wrote: > > Hi! > > > > On Wed, 2008-10-15 at 01:23 +0300, Panagiotis Christias wrote: > > > > > However, when we connect them to the CX3-40, create and mount a new > > > partition and then do something as simple as "tar -C /san -xf ports.tgz" > > > the system panics and deadlocks. We have tried several FreeBSD versions > > > (6.3 i386/adm64, 7.0 i386/adm64, 7.1 i386/adm64 and lastly 7-STABLE i386 > > > - we also tried the latest 8-CURRENT snapshot but it panicked too soon). > > > The result is always the same; panic and deadlock. > > > > Try reducing the number of "tagged openings" with 'camcontrol tags' down > > to 46. If it doesn't work try reducing it further to 2. Also be advised > > that I've seen panics with geom_multipath in FreeBSD-7, unfortunately I > > had no time to test it in -current. > > > Hm.. that would probably explain the fact that I was unable to panic the > system when I had set the hint.isp.0.debug="0x1F" in /boot/device.hints. > > Currently I am stress testing the server with the tagged openings set to > 44 (first value tested). Until now there is no panic or deadlock. I am > trying concurrent tar extractions and rsync copies. The filesystem looks > ok till now according to fsck. I will let it write/copy/delete overnight > and tomorrow I will try different tagged opening values. > > Thank you for the hint! I am wondering what is the performance penalty > with decreased tagged openings. Also, is there anything else I could try > in order to get more useful debug output? I have at least three servers > that I could use for any kind of tests and I am willing to spend as much > time I can get to help solving the problem. > > Finally, the only output in the logs is: > > Expensive timeout(9) function: 0xc06f4210(0xc67e1200) 0.059422635 s > Expensive timeout(9) function: 0xc08d4fd0(0) 0.060676147 s > > I suppose that is related to the CAMDEBUG kernel config options. For the record, I have done many tests using several stressing tools in parallel, different FreeBSD versions (up to 7.1beta2), various filesystem configurations (plain ufs2 with softupdates, ufs2 and gjournal, zfs) and various tag openings values (down to 2). Regardless of the configuration, the system deadlocks, panics or the filesystem gets awfully corrupted within seconds, minutes or a few hours. The only configuration that seems to work without problems(?) but with a unacceptable *severe* performance penalty is when tag openings are set to minimum value of 2 (that is more or less same as disabling tagged command queueing at all). All tests ran using a 500 GB RAID5 LUN on an EMC Clariion CX340: da0 at isp0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-4 device da0: Serial Number CK200083100148 da0: 400.000MB/s transfers da0: Command Queueing Enabled da0: 512000MB (1048576000 512 byte sectors: 255H 63S/T 65270C) Previously, a Sun StorEdge T3 was tested which worked flawlessly but it had a 1 Gbps fibre channel interface, instead of a 4 Gbps that Clariion has, was recognized as a SCSI-3 device and had 2 tags openings (no surprise) by default: da1 at isp1 bus 0 target 0 lun 0 da1: Fixed Direct Access SCSI-3 device da1: 100.000MB/s transfers da1: 241724MB (495050752 512 byte sectors: 255H 63S/T 30815C) As I mentioned before, I am willing to spend time or/and provide access to the system for testing and debugging. Regards, Panagiotis -- Panagiotis J. Christias Network Management Center P.Christias@noc.ntua.gr National Technical Univ. of Athens, GREECE