From owner-freebsd-scsi@FreeBSD.ORG Sun Oct 23 11:55:42 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 859D2106564A for ; Sun, 23 Oct 2011 11:55:42 +0000 (UTC) (envelope-from prvs=1277a15f81=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 1067B8FC12 for ; Sun, 23 Oct 2011 11:55:41 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Sun, 23 Oct 2011 12:45:01 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sun, 23 Oct 2011 12:45:01 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50015761800.msg; Sun, 23 Oct 2011 12:45:00 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1277a15f81=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <02B04E968B8648CC83F274B32090B937@multiplay.co.uk> From: "Steven Hartland" To: Date: Sun, 23 Oct 2011 12:44:52 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: Eygene Ryabinkin Subject: Looking for a committer for cam fixes / enhancements X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Oct 2011 11:55:42 -0000 Hi guys I'm looking for someone who is willing to commit some changes to cam. First is a fix so long timeouts work correctly:- http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/161901 Its been discussed on the freebsd-stable and reviewed by Eygene Ryabinkin, cc'ed in case your willing to commit it Eygene? The latest version included in the pr includes an addition to Eygene's original patch to add support in the new mps driver but apart from that all the same. We've been running this for quite some time now off 8.2-RELEASE and had zero issues. Next up is an enhancement to camcontrol to enable to support ata secure erase functionality:- http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/159833 Again been discussed on freebsd-stable no issues raised. We've been using this quite a bit recently to perform resets on ssds in order to restore their performance, again no issues detected. Would be great if someone could commit these as they are really useful and with a very low risk might be nice if we could see them in 9.0 :) Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-scsi@FreeBSD.ORG Mon Oct 24 11:07:11 2011 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C791106566B for ; Mon, 24 Oct 2011 11:07:11 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 517A98FC12 for ; Mon, 24 Oct 2011 11:07:11 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9OB7BM1025412 for ; Mon, 24 Oct 2011 11:07:11 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9OB7A1C025410 for freebsd-scsi@FreeBSD.org; Mon, 24 Oct 2011 11:07:10 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 24 Oct 2011 11:07:10 GMT Message-Id: <201110241107.p9OB7A1C025410@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-scsi@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Oct 2011 11:07:11 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/161809 scsi [cam] [patch] set kern.cam.boot_delay via build option o kern/159412 scsi [ciss] 7.3 RELEASE: ciss0 ADAPTER HEARTBEAT FAILED err o kern/157770 scsi [iscsi] [panic] iscsi_initiator panic o kern/154432 scsi [xpt] run_interrupt_driven_hooks: still waiting after o kern/153514 scsi [cam] [panic] CAM related panic o kern/153361 scsi [ciss] Smart Array 5300 boot/detect drive problem o kern/152250 scsi [ciss] [patch] Kernel panic when hw.ciss.expose_hidden o kern/151564 scsi [ciss] ciss(4) should increase CISS_MAX_LOGICAL to 10 o docs/151336 scsi Missing documentation of scsi_ and ata_ functions in c s kern/149927 scsi [cam] hard drive not stopped before removing power dur o kern/148083 scsi [aac] Strange device reporting o kern/147704 scsi [mpt] sys/dev/mpt: new chip revision, partially unsupp o kern/146287 scsi [ciss] ciss(4) cannot see more than one SmartArray con o kern/145768 scsi [mpt] can't perform I/O on SAS based SAN disk in freeb o kern/144648 scsi [aac] Strange values of speed and bus width in dmesg o kern/144301 scsi [ciss] [hang] HP proliant server locks when using ciss o kern/142351 scsi [mpt] LSILogic driver performance problems o kern/141934 scsi [cam] [patch] add support for SEAGATE DAT Scopion 130 o kern/134488 scsi [mpt] MPT SCSI driver probes max. 8 LUNs per device o kern/132250 scsi [ciss] ciss driver does not support more then 15 drive o kern/132206 scsi [mpt] system panics on boot when mirroring and 2nd dri o kern/130621 scsi [mpt] tranfer rate is inscrutable slow when use lsi213 o kern/129602 scsi [ahd] ahd(4) gets confused and wedges SCSI bus o kern/128452 scsi [sa] [panic] Accessing SCSI tape drive randomly crashe o kern/128245 scsi [scsi] "inquiry data fails comparison at DV1 step" [re o kern/127927 scsi [isp] isp(4) target driver crashes kernel when set up o kern/127717 scsi [ata] [patch] [request] - support write cache toggling o kern/124667 scsi [amd] [panic] FreeBSD-7 kernel page faults at amd-scsi o kern/123674 scsi [ahc] ahc driver dumping o kern/123520 scsi [ahd] unable to boot from net while using ahd o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/120487 scsi [sg] scsi_sg incompatible with scanners o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load o kern/60598 scsi wire down of scsi devices conflicts with config s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o bin/57088 scsi [cam] [patch] for a possible fd leak in libcam.c o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce 49 problems total. From owner-freebsd-scsi@FreeBSD.ORG Mon Oct 24 17:38:09 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D8831106564A for ; Mon, 24 Oct 2011 17:38:09 +0000 (UTC) (envelope-from ken@kdm.org) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) by mx1.freebsd.org (Postfix) with ESMTP id 8A8EE8FC08 for ; Mon, 24 Oct 2011 17:38:09 +0000 (UTC) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id p9OHAe5p044221; Mon, 24 Oct 2011 11:10:40 -0600 (MDT) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id p9OHAdeb044205; Mon, 24 Oct 2011 11:10:39 -0600 (MDT) (envelope-from ken) Date: Mon, 24 Oct 2011 11:10:39 -0600 From: "Kenneth D. Merry" To: Steven Hartland Message-ID: <20111024171039.GA39194@nargothrond.kdm.org> References: <02B04E968B8648CC83F274B32090B937@multiplay.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <02B04E968B8648CC83F274B32090B937@multiplay.co.uk> User-Agent: Mutt/1.4.2i Cc: freebsd-scsi@freebsd.org, Eygene Ryabinkin Subject: Re: Looking for a committer for cam fixes / enhancements X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Oct 2011 17:38:09 -0000 On Sun, Oct 23, 2011 at 12:44:52 +0100, Steven Hartland wrote: > Hi guys I'm looking for someone who is willing to commit > some changes to cam. > > First is a fix so long timeouts work correctly:- > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/161901 > > Its been discussed on the freebsd-stable and reviewed > by Eygene Ryabinkin, cc'ed in case your willing to > commit it Eygene? > > The latest version included in the pr includes an addition > to Eygene's original patch to add support in the new mps > driver but apart from that all the same. > > We've been running this for quite some time now off > 8.2-RELEASE and had zero issues. This looks good. > Next up is an enhancement to camcontrol to enable to > support ata secure erase functionality:- > http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/159833 > > Again been discussed on freebsd-stable no issues raised. > > We've been using this quite a bit recently to perform > resets on ssds in order to restore their performance, > again no issues detected. > > Would be great if someone could commit these as they > are really useful and with a very low risk might > be nice if we could see them in 9.0 :) Thanks for doing these! They seem like they would be very useful. Hopefully, once we get trim support plumbed all the way down, it won't be necessary to issue the erase manually. I do have a few comments: - The patches should be generated against head, since they would be committed there first and merged back. (They don't apply cleanly to head.) - There are a number of style issues in the patches: - Lines longer than 80 characters - Spaces/formatting problems (e.g. at the beginning of atasecurity_erase()). - The prevailing style of the file isn't followed for line continuations. (It isn't always KNF, either.) - I'm not really a fan of getopt_long. There are lots of password arguments, how about turning those into '-p foopasswd=bar' instead? Anyway, if you could, please address those things and send me the diffs. Thanks! Ken -- Kenneth Merry ken@FreeBSD.org From owner-freebsd-scsi@FreeBSD.ORG Mon Oct 24 23:55:24 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB21F106564A; Mon, 24 Oct 2011 23:55:24 +0000 (UTC) (envelope-from prvs=1278012240=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 36C998FC13; Mon, 24 Oct 2011 23:55:23 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Tue, 25 Oct 2011 00:45:11 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 25 Oct 2011 00:45:11 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50015801973.msg; Tue, 25 Oct 2011 00:45:10 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1278012240=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <7E5239236AA04319B4C090E5F199DC64@multiplay.co.uk> From: "Steven Hartland" To: "Kenneth D. Merry" References: <02B04E968B8648CC83F274B32090B937@multiplay.co.uk> <20111024171039.GA39194@nargothrond.kdm.org> Date: Tue, 25 Oct 2011 00:45:05 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-scsi@freebsd.org, Eygene Ryabinkin Subject: Re: Looking for a committer for cam fixes / enhancements X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Oct 2011 23:55:24 -0000 ----- Original Message ----- From: "Kenneth D. Merry" > Thanks for doing these! They seem like they would be very useful. > Hopefully, once we get trim support plumbed all the way down, it > won't be necessary to issue the erase manually. Indeed, would so love to see trim added to zfs :) > I do have a few comments: > > - The patches should be generated against head, since they would be > committed there first and merged back. (They don't apply cleanly to > head.) > > - There are a number of style issues in the patches: > - Lines longer than 80 characters > - Spaces/formatting problems (e.g. at the beginning of > atasecurity_erase()). > - The prevailing style of the file isn't followed for line > continuations. (It isn't always KNF, either.) > > - I'm not really a fan of getopt_long. There are lots of password > arguments, how about turning those into '-p foopasswd=bar' instead? > > Anyway, if you could, please address those things and send me the diffs. Thanks for the feedback. I will get them updated as per comments as soon as I can. Could you give me some points on the things you spotted weren't "KNR" I've tried to stick with what I saw as the current formatting but clearly missed something's ;-) Not really a fan myself of mixing in getopt_long either but if beats the current use of totally random / meaningless letters for options if we stuck with short opts. Add that to how dangerous the options are and I think forcing long opts is the right move, what do others think? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-scsi@FreeBSD.ORG Tue Oct 25 04:14:22 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 08DED106567A for ; Tue, 25 Oct 2011 04:14:22 +0000 (UTC) (envelope-from rea@codelabs.ru) Received: from 0.mx.codelabs.ru (0.mx.codelabs.ru [144.206.177.45]) by mx1.freebsd.org (Postfix) with ESMTP id AD99A8FC0C for ; Tue, 25 Oct 2011 04:14:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=codelabs.ru; s=two; h=Sender:In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date; bh=zHU4cD8ZpNdA0v3ECZ6U3TJ81mJLlqKILJMyoKBNO7Q=; b=PVdXddSZ5iCY1euPwuXYNuhbBxYmZIPEFaftMztHXy2A07JaTZTDB9UjCQozSTnYvSPGbPgvB/hCYDsFRoelt/+MOiz3vSO8sOXUDI7hPBXqS65LsJ2WKXoBAQIi561GQ69FQXegbNgRK2JWSEtfb44ZLsle4T8HDehOayOvNnVnQLGtynr+CREZhe/WHTSh+9DeGqRoyGReAPBPTrYS854Xaq95WLkudku7aW4CPC0JrR8IXFZNi71+EuPO3LGMrGI7rhIEXG1mJu4JSog3/pYWCNIfxjeogqGlQ3G03ZGXsr/QB1WCzrnihmt4W2LARe/fJrubAytuqV5LX8OCqw==; Received: from shadow.codelabs.ru (ppp91-77-182-181.pppoe.mtu-net.ru [91.77.182.181]) by 0.mx.codelabs.ru with esmtpsa (TLSv1:AES256-SHA:256) id 1RIY7v-000GxU-OO; Tue, 25 Oct 2011 07:56:36 +0400 Date: Tue, 25 Oct 2011 07:56:34 +0400 From: Eygene Ryabinkin To: Steven Hartland Message-ID: References: <02B04E968B8648CC83F274B32090B937@multiplay.co.uk> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="FFoLq8A0u+X9iRU8" Content-Disposition: inline In-Reply-To: <02B04E968B8648CC83F274B32090B937@multiplay.co.uk> Sender: rea@codelabs.ru Cc: freebsd-scsi@freebsd.org Subject: Re: Looking for a committer for cam fixes / enhancements X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Oct 2011 04:14:22 -0000 --FFoLq8A0u+X9iRU8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Steven, good day. Sun, Oct 23, 2011 at 12:44:52PM +0100, Steven Hartland wrote: > Hi guys I'm looking for someone who is willing to commit > some changes to cam. >=20 > First is a fix so long timeouts work correctly:- > http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/161901 >=20 > Its been discussed on the freebsd-stable and reviewed > by Eygene Ryabinkin, cc'ed in case your willing to > commit it Eygene? If SCSI people will approve it, yes. But as per discussion with kib@ at that time of extending your patch, there should be some general-purpose routine to convert milliseconds to ticks, not the CAM-only stuff: some other drivers can benefit from having it in-place. Moreover, the routine should gracefully handle zero HZ as the result, probably, just adding one as tvtohz() does. I'll try to come up with the revised patch, but not instantly: drowned with job at $WORK, so this will take some days. --=20 Eygene Ryabinkin ,,,^..^,,, [ Life's unfair - but root password helps! | codelabs.ru ] [ 82FE 06BC D497 C0DE 49EC 4FF0 16AF 9EAE 8152 ECFB | freebsd.org ] --FFoLq8A0u+X9iRU8 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iF4EAREIAAYFAk6mM3IACgkQFq+eroFS7Pv4/QEAiDHH+AsRL0sGBa5dMFtyFDIm uZjy1Ie/z8XwxjAh/osA/3vbqFd1rCMSMtVKP0SjUEe/JVRPYG9gBjrhgHghzu2/ =ySoC -----END PGP SIGNATURE----- --FFoLq8A0u+X9iRU8-- From owner-freebsd-scsi@FreeBSD.ORG Tue Oct 25 06:46:29 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8FF54106564A for ; Tue, 25 Oct 2011 06:46:29 +0000 (UTC) (envelope-from Karli.Sjoberg@slu.se) Received: from edge1-1.slu.se (edge1-1.slu.se [193.10.100.96]) by mx1.freebsd.org (Postfix) with ESMTP id 141B48FC16 for ; Tue, 25 Oct 2011 06:46:28 +0000 (UTC) Received: from Exchange1.ad.slu.se (193.10.100.94) by edge1-1.slu.se (193.10.100.96) with Microsoft SMTP Server (TLS) id 8.3.213.0; Tue, 25 Oct 2011 08:46:25 +0200 Received: from exmbx3.ad.slu.se ([193.10.100.93]) by Exchange1.ad.slu.se ([193.10.100.94]) with mapi; Tue, 25 Oct 2011 08:46:25 +0200 From: =?iso-8859-1?Q?Karli_Sj=F6berg?= To: Martin Nilsson Date: Tue, 25 Oct 2011 08:46:24 +0200 Thread-Topic: AOC-USAS2-L8i zfs panics and SCSI errors in messages Thread-Index: AcyS4dBiARP0DQQySN6oxW4sHKcYTw== Message-ID: <052E3A8A-898A-4D78-8327-45EB193F81C6@slu.se> References: <82B38DBF-DD3A-46CD-93F6-02CDB6506E05@slu.se> <4EA0526E.4050301@mullet.se> In-Reply-To: <4EA0526E.4050301@mullet.se> Accept-Language: sv-SE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: sv-SE, en-US MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "freebsd-scsi@freebsd.org" Subject: Re: AOC-USAS2-L8i zfs panics and SCSI errors in messages X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Oct 2011 06:46:29 -0000 I am using version 10.00.02.00 IT Firmware. Update: I have now experienced the exact same panic four times. It is the exact sam= e panic every time, except that it happens on different cpu=B4s. It is alwa= ys at exactly 03:01, when daily periodics is run. I have tried shuffling ov= er the same filesystem from the Oracle machine every time and it always hav= e had time to finish properly. Last time it finished and was idle for about= 6 hours and was working fine- I checked in at about 22:00 and looked at zp= ool status; it was clean. Restarting the machine and running periodics dail= y manually works. If I don=B4t send any filesystems over, the machine is st= able over the nights, but once I=B4ve sent something over, at 03:01, it pan= ics. I am going to try shuffling over another filesystem to see if there=B4s any= thing in that specific filesystem that causes the crash, or if it happens r= egardless of which filesystem has been recieved. /Karli Sj=F6berg 20 okt 2011 kl. 18.55 skrev Martin Nilsson: On 2011-10-20 13:28, Karli Sj=F6berg wrote: 2x Supermicro AOC-USAS2-L8i How old firmware do you have in these LSI2008 cards? Latest at LSI:s web is Phase11 and there are phase 9 & 10 on the Supermicro ftp site. These boards should be the same as a LSI 9211-8i except that they have components and the brackets on the wrong side. /Martin -- Martin Nilsson, CEO, Mullet Scandinavia AB, Malm=F6, SWEDEN E-mail: martin@mullet.se, Phone: +46-(0)708-59 99 = 91, Web: www.mullet.se Our business is well engineered servers optimised for FreeBSD& Linux Med V=E4nliga H=E4lsningar ---------------------------------------------------------------------------= ---- Karli Sj=F6berg Swedish University of Agricultural Sciences Box 7079 (Visiting Address Kron=E5sv=E4gen 8) S-750 07 Uppsala, Sweden Phone: +46-(0)18-67 15 66 karli.sjoberg@slu.se From owner-freebsd-scsi@FreeBSD.ORG Tue Oct 25 17:31:49 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 643091065670; Tue, 25 Oct 2011 17:31:49 +0000 (UTC) (envelope-from ken@kdm.org) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) by mx1.freebsd.org (Postfix) with ESMTP id 24EA18FC0A; Tue, 25 Oct 2011 17:31:48 +0000 (UTC) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id p9PHVmuA008070; Tue, 25 Oct 2011 11:31:48 -0600 (MDT) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id p9PHVmKE008069; Tue, 25 Oct 2011 11:31:48 -0600 (MDT) (envelope-from ken) Date: Tue, 25 Oct 2011 11:31:48 -0600 From: "Kenneth D. Merry" To: Steven Hartland Message-ID: <20111025173148.GA93047@nargothrond.kdm.org> References: <02B04E968B8648CC83F274B32090B937@multiplay.co.uk> <20111024171039.GA39194@nargothrond.kdm.org> <7E5239236AA04319B4C090E5F199DC64@multiplay.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7E5239236AA04319B4C090E5F199DC64@multiplay.co.uk> User-Agent: Mutt/1.4.2i Cc: freebsd-scsi@freebsd.org, Eygene Ryabinkin Subject: Re: Looking for a committer for cam fixes / enhancements X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Oct 2011 17:31:49 -0000 On Tue, Oct 25, 2011 at 00:45:05 +0100, Steven Hartland wrote: > > ----- Original Message ----- > From: "Kenneth D. Merry" > > >Thanks for doing these! They seem like they would be very useful. > >Hopefully, once we get trim support plumbed all the way down, it > >won't be necessary to issue the erase manually. > > Indeed, would so love to see trim added to zfs :) > > >I do have a few comments: > > > >- The patches should be generated against head, since they would be > > committed there first and merged back. (They don't apply cleanly to > > head.) > > > >- There are a number of style issues in the patches: > >- Lines longer than 80 characters > >- Spaces/formatting problems (e.g. at the beginning of > > atasecurity_erase()). > >- The prevailing style of the file isn't followed for line > > continuations. (It isn't always KNF, either.) > > > >- I'm not really a fan of getopt_long. There are lots of password > > arguments, how about turning those into '-p foopasswd=bar' instead? > > > >Anyway, if you could, please address those things and send me the diffs. > > Thanks for the feedback. I will get them updated as per comments as > soon as I can. > > Could you give me some points on the things you spotted weren't "KNR" > I've tried to stick with what I saw as the current formatting but clearly > missed something's ;-) To match the rest of the file this: static int atasecurity_set_password(struct cam_device *device, union ccb *ccb, int retry_count, u_int32_t timeout, struct ata_security_password *pwd, int quiet) { Should look like this: static int atasecurity_set_password(struct cam_device *device, union ccb *ccb, int retry_count, u_int32_t timeout, struct ata_security_password *pwd, int quiet) { And this: cam_fill_ataio(&ccb->ataio, retry_count, NULL, /*flags*/CAM_DIR_OUT, MSG_SIMPLE_Q_TAG, /*data_ptr*/(u_int8_t *)pwd, /*dxfer_len*/sizeof(struct ata_security_password), timeout); Shoud look like this: cam_fill_ataio(&ccb->ataio, retry_count, NULL, /*flags*/CAM_DIR_OUT, MSG_SIMPLE_Q_TAG, /*data_ptr*/(u_int8_t *)pwd, /*dxfer_len*/sizeof(struct ata_security_password), timeout); Everything that exceeds 80 columns should not exceed that. Make sure the tab stop in your editor is set to 8 when editing code. That helps uncover things like this (in atasecurity()): while ((c = getopt_long(argc, argv, combinedopt, combinedoptlong, NULL)) != -1) { switch(c){ case 'f': action = ATA_SECURITY_ACTION_FREEZE; actions++; break; case 'r': With a tabstop set to 4, that particular space/tab problem "hides". > Not really a fan myself of mixing in getopt_long either but if beats the > current use of totally random / meaningless letters for options if we > stuck with short opts. Add that to how dangerous the options are and I > think forcing long opts is the right move, what do others think? I'm not suggesting short options, but a slightly different approach. Instead of --option, or -r, use -o foopassword=blah. It might be interesting to do this: cd src find bin sbin usr.sbin usr.bin -name "*.c" -print |xargs grep getopt_long The list of programs that use getopt_long() is very short, and almost all of them are either GNU programs, or maintaining compatibility with GNU programs (e.g. tar, cpio) Ken -- Kenneth Merry ken@FreeBSD.ORG From owner-freebsd-scsi@FreeBSD.ORG Tue Oct 25 18:43:05 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0CF9106564A for ; Tue, 25 Oct 2011 18:43:05 +0000 (UTC) (envelope-from dgilbert@interlog.com) Received: from smtp.infotech.no (smtp.infotech.no [82.134.31.41]) by mx1.freebsd.org (Postfix) with ESMTP id 6A2B08FC14 for ; Tue, 25 Oct 2011 18:43:05 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp.infotech.no (Postfix) with ESMTP id 7956B2041B9 for ; Tue, 25 Oct 2011 20:26:50 +0200 (CEST) X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no Received: from smtp.infotech.no ([127.0.0.1]) by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rkHvK3qohyAG for ; Tue, 25 Oct 2011 20:26:48 +0200 (CEST) Received: from [192.168.48.66] (ip-191.55.99.216.dsl-cust.ca.inter.net [216.99.55.191]) by smtp.infotech.no (Postfix) with ESMTPA id 5E1782041B6 for ; Tue, 25 Oct 2011 20:26:48 +0200 (CEST) Message-ID: <4EA6FF66.1000508@interlog.com> Date: Tue, 25 Oct 2011 14:26:46 -0400 From: Douglas Gilbert User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.23) Gecko/20110922 Thunderbird/3.1.15 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: <02B04E968B8648CC83F274B32090B937@multiplay.co.uk> <20111024171039.GA39194@nargothrond.kdm.org> <7E5239236AA04319B4C090E5F199DC64@multiplay.co.uk> <20111025173148.GA93047@nargothrond.kdm.org> In-Reply-To: <20111025173148.GA93047@nargothrond.kdm.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Looking for a committer for cam fixes / enhancements X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dgilbert@interlog.com List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Oct 2011 18:43:05 -0000 On 11-10-25 01:31 PM, Kenneth D. Merry wrote: > On Tue, Oct 25, 2011 at 00:45:05 +0100, Steven Hartland wrote: >> >> ----- Original Message ----- >> From: "Kenneth D. Merry" >> >>> Thanks for doing these! They seem like they would be very useful. >>> Hopefully, once we get trim support plumbed all the way down, it >>> won't be necessary to issue the erase manually. >> >> Indeed, would so love to see trim added to zfs :) >> >>> I do have a few comments: >>> >>> - The patches should be generated against head, since they would be >>> committed there first and merged back. (They don't apply cleanly to >>> head.) >>> >>> - There are a number of style issues in the patches: >>> - Lines longer than 80 characters >>> - Spaces/formatting problems (e.g. at the beginning of >>> atasecurity_erase()). >>> - The prevailing style of the file isn't followed for line >>> continuations. (It isn't always KNF, either.) >>> >>> - I'm not really a fan of getopt_long. There are lots of password >>> arguments, how about turning those into '-p foopasswd=bar' instead? >>> >>> Anyway, if you could, please address those things and send me the diffs. >> >> Thanks for the feedback. I will get them updated as per comments as >> soon as I can. >> >> Could you give me some points on the things you spotted weren't "KNR" >> I've tried to stick with what I saw as the current formatting but clearly >> missed something's ;-) > > To match the rest of the file this: > > static int > atasecurity_set_password(struct cam_device *device, union ccb *ccb, int retry_count, > u_int32_t timeout, struct ata_security_password *pwd, int quiet) > { > > Should look like this: > static int > atasecurity_set_password(struct cam_device *device, union ccb *ccb, > int retry_count, u_int32_t timeout, > struct ata_security_password *pwd, int quiet) > { > > And this: > > cam_fill_ataio(&ccb->ataio, > retry_count, > NULL, > /*flags*/CAM_DIR_OUT, > MSG_SIMPLE_Q_TAG, > /*data_ptr*/(u_int8_t *)pwd, > /*dxfer_len*/sizeof(struct ata_security_password), > timeout); > > Shoud look like this: > > cam_fill_ataio(&ccb->ataio, > retry_count, > NULL, > /*flags*/CAM_DIR_OUT, > MSG_SIMPLE_Q_TAG, > /*data_ptr*/(u_int8_t *)pwd, > /*dxfer_len*/sizeof(struct ata_security_password), > timeout); > > > Everything that exceeds 80 columns should not exceed that. Make sure the > tab stop in your editor is set to 8 when editing code. That helps uncover > things like this (in atasecurity()): > > while ((c = getopt_long(argc, argv, combinedopt, combinedoptlong, NULL)) != -1) { > switch(c){ > case 'f': > action = ATA_SECURITY_ACTION_FREEZE; > actions++; > break; > > case 'r': > > With a tabstop set to 4, that particular space/tab problem "hides". > >> Not really a fan myself of mixing in getopt_long either but if beats the >> current use of totally random / meaningless letters for options if we >> stuck with short opts. Add that to how dangerous the options are and I >> think forcing long opts is the right move, what do others think? > > I'm not suggesting short options, but a slightly different approach. > Instead of --option, or -r, use -o foopassword=blah. > > It might be interesting to do this: > > cd src > find bin sbin usr.sbin usr.bin -name "*.c" -print |xargs grep getopt_long > > The list of programs that use getopt_long() is very short, and almost all > of them are either GNU programs, or maintaining compatibility with GNU > programs (e.g. tar, cpio) I can think of about 60 utilities that I have written, all open source and running on FreeBSD, that use getopt_long(). IMO the "-o foopassword=blah" style options don't scale well (and remind me of misplaced dd style "operands" (e.g. 'bs=512 of=/dev/null')). About the only OS that gave me problems with getopt_long() was Tru64 and it is becoming a distant memory. Doug Gilbert Here is an example: $ sg_format --help usage: sg_format [--cmplst=0|1] [--count=COUNT] [--dcrt] [--early] [--fmtpinfo=FPI] [--format] [--help] [--long] [--pfu=PFU] [--pie=PIE] [--pinfo] [--resize] [--rto_req] [--security] [--six] [--size=SIZE] [--verbose] [--version] [--wait] DEVICE where: --cmplst=0|1 -C 0|1 sets CMPLST bit in format cdb (default: 1) --count=COUNT|-c COUNT number of blocks to report after format or resize. With format defaults to same as current --dcrt|-D disable certification (doesn't verify media) --early|-e exit once format started (user can monitor progress) --fmtpinfo=FPI|-f FPI FMTPINFO field value (default: 0) --format|-F format unit (default: report current count and size) --help|-h prints out this usage message --long|-l allow for 64 bit lbas (default: assume 32 bit lbas) --pfu=PFU|-P PFU Protection Field Usage value (default: 0) --pie=PIE|-q PIE Protection Information Exponent (default: 0) --pinfo|-p set upper bit of FMTPINFO field (deprecated use '--fmtpinfo=FPI' instead) --resize|-r resize (rather than format) to COUNT value --rto_req|-R set lower bit of FMTPINFO field (deprecated use '--fmtpinfo=FPI' instead) --security|-S set security initialization (SI) bit --six|-6 use 6 byte MODE SENSE/SELECT to probe device (def: use 10 byte MODE SENSE/SELECT) --size=SIZE|-s SIZE bytes per block, defaults to DEVICE's current block size. Only needed to change current block size --verbose|-v increase verbosity --version|-V print version details and exit --wait|-w format command waits until format operation completes (default: set IMMED=1 and poll with Test Unit Ready) Example: sg_format --format /dev/sdc This utility formats or resizes SCSI disks. WARNING: This utility will destroy all the data on DEVICE when '--format' is given. Check that you have the correct DEVICE. From owner-freebsd-scsi@FreeBSD.ORG Tue Oct 25 19:33:04 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7454B106564A for ; Tue, 25 Oct 2011 19:33:04 +0000 (UTC) (envelope-from ken@kdm.org) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) by mx1.freebsd.org (Postfix) with ESMTP id 3B0738FC08 for ; Tue, 25 Oct 2011 19:33:03 +0000 (UTC) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id p9PJX3LE037869; Tue, 25 Oct 2011 13:33:03 -0600 (MDT) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id p9PJX2s9037868; Tue, 25 Oct 2011 13:33:02 -0600 (MDT) (envelope-from ken) Date: Tue, 25 Oct 2011 13:33:02 -0600 From: "Kenneth D. Merry" To: Karli Sj?berg Message-ID: <20111025193302.GA30409@nargothrond.kdm.org> References: <82B38DBF-DD3A-46CD-93F6-02CDB6506E05@slu.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <82B38DBF-DD3A-46CD-93F6-02CDB6506E05@slu.se> User-Agent: Mutt/1.4.2i Cc: "freebsd-scsi@freebsd.org" , fs@freebsd.org Subject: Re: AOC-USAS2-L8i zfs panics and SCSI errors in messages X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Oct 2011 19:33:04 -0000 On Thu, Oct 20, 2011 at 13:28:17 +0200, Karli Sj?berg wrote: > Hi, > > I?m in the process of vacating a Sun/Oracle system to a another Supermicro/FreeBSD system, doing zfs send/recv between. Two times now, the system has panicked while not doing anything at all, and it?s throwing alot of SCSI/CAM-related errors while doing IO-intensive operations, like send/recv, resilver, and zpool has sometimes reported read/write errors on the hard drives. Best part is that the errors in messages are about all hard drives at one time or another, and they are connected with separate cables, controllers and caddies. Specs: > > HW: > 1x Supermicro X8SIL-F > 2x Supermicro AOC-USAS2-L8i > 2x Supermicro CSE-M35T-1B > 1x Intel Core i5 650 3,2GHz > 4x 2GB 1333MHZ DDR3 ECC UDIMM > 10x SAMSUNG HD204UI (in a raidz2 zpool) > 1x OCZ Vertex 3 240GB (L2ARC) > > SW: > # uname -a > FreeBSD server 8.2-STABLE FreeBSD 8.2-STABLE #0: Mon Oct 10 09:12:25 UTC 2011 root@server:/usr/obj/usr/src/sys/GENERIC amd64 > # zpool get version pool1 > NAME PROPERTY VALUE SOURCE > pool1 version 28 default[/CODE] > > I got the panic from the IPMI KVM: > http://i55.tinypic.com/synpzk.png In looking at the panic, this is a ZFS panic. Nothing the disks do should be able to cause ZFS to panic. ZFS is panicing in avl_add(): /* * This is unfortunate. We want to call panic() here, even for * non-DEBUG kernels. In userland, however, we can't depend on anything * in libc or else the rtld build process gets confused. So, all we can * do in userland is resort to a normal ASSERT(). */ if (avl_find(tree, new_node, &where) != NULL) #ifdef _KERNEL panic("avl_find() succeeded inside avl_add()"); #else ASSERT(0); #endif There are certainly timeouts and two terminated IOCs in the log below. That does suggest a hardware or driver problem, but it isn't very obvious what it might be. I have seen bad behavior with SATA drives behind 3Gb Maxim expanders talking to 6GB LSI controllers, but your particular configuration does not involve any expanders, and therefore is not that particular STP issue. My best guess, and it is a guess, is that either the drives are misbehaving (i.e. firmware type problem) or you've got a cabling issue. If you have more hardware available, you might try swapping out the cables and/or drives to see if you can reproduce the drive errors with a different setup. If you swap the drives, I would use a different brand if you've got them available. I'm CCing the fs list, perhaps someone there can look at the stack trace above and figure out what ZFS might be doing. Again, ZFS should survive any errors from the drives, and the panic above looks like ZFS is flagging a logic bug somewhere. > > And an extract from /var/log/messages: > Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): WRITE(10). CDB: 2a 0 6 13 66 f 0 0 f 0 > Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): CAM status: SCSI Status Error > Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI status: Check Condition > Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) > Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): WRITE(6). CDB: a 0 1 b2 2 0 > Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): CAM status: SCSI Status Error > Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI status: Check Condition > Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) > Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on device handle 0x000c SMID 859 > Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on device handle 0x000c SMID 495 > Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on device handle 0x000c SMID 725 > Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on device handle 0x000c SMID 722 > Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on device handle 0x000c SMID 438 > Oct 19 17:40:38 fs2-7 kernel: mps1: (1:4:0) terminated ioc 804b scsi 0 state c xfer 0 > Oct 19 17:40:38 fs2-7 last message repeated 3 times > Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on handle 0x0c SMID 859 complete > Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending deferred task management request for handle 0x0c SMID 495 > Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on handle 0x0c SMID 495 complete > Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending deferred task management request for handle 0x0c SMID 725 > Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on handle 0x0c SMID 725 complete > Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending deferred task management request for handle 0x0c SMID 722 > Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on handle 0x0c SMID 722 complete > Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending deferred task management request for handle 0x0c SMID 438 > Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on handle 0x0c SMID 438 complete > Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): WRITE(10). CDB: 2a 0 6 25 4f 75 0 0 b 0 > Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): CAM status: SCSI Status Error > Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI status: Check Condition > Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) > Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): WRITE(10). CDB: 2a 0 2d a5 10 ca 0 0 80 0 > Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): CAM status: SCSI Status Error > Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI status: Check Condition > Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) > Oct 19 17:45:40 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID 976 > Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID 636 > Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID 888 > Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID 983 > Oct 19 17:45:41 fs2-7 kernel: mps0: (0:1:0) terminated ioc 804b scsi 0 state c xfer 0 > Oct 19 17:45:41 fs2-7 last message repeated 2 times > Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on handle 0x0a SMID 976 complete > Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x0a SMID 636 > Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on handle 0x0a SMID 636 complete > Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x0a SMID 888 > Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on handle 0x0a SMID 888 complete > Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x0a SMID 983 > Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on handle 0x0a SMID 983 complete > Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): WRITE(10). CDB: 2a 0 6 40 a7 2 0 0 3 0 > Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): CAM status: SCSI Status Error > Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI status: Check Condition > Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) > Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): WRITE(10). CDB: 2a 0 6 40 b0 9 0 0 9 0 > Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): CAM status: SCSI Status Error > Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): SCSI status: Check Condition > Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) > > What?s going on? > > Regards > Karli Sj?berg_______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" Ken -- Kenneth Merry ken@FreeBSD.ORG From owner-freebsd-scsi@FreeBSD.ORG Wed Oct 26 09:37:20 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 86DF5106566B for ; Wed, 26 Oct 2011 09:37:20 +0000 (UTC) (envelope-from Karli.Sjoberg@slu.se) Received: from edge1-1.slu.se (edge1-1.slu.se [193.10.100.96]) by mx1.freebsd.org (Postfix) with ESMTP id B2B768FC13 for ; Wed, 26 Oct 2011 09:37:19 +0000 (UTC) Received: from Exchange2.ad.slu.se (193.10.100.95) by edge1-1.slu.se (193.10.100.96) with Microsoft SMTP Server (TLS) id 8.3.213.0; Wed, 26 Oct 2011 11:36:46 +0200 Received: from exmbx3.ad.slu.se ([193.10.100.93]) by Exchange2.ad.slu.se ([193.10.100.95]) with mapi; Wed, 26 Oct 2011 11:36:46 +0200 From: =?iso-8859-1?Q?Karli_Sj=F6berg?= To: "Kenneth D. Merry" Date: Wed, 26 Oct 2011 11:36:44 +0200 Thread-Topic: AOC-USAS2-L8i zfs panics and SCSI errors in messages Thread-Index: AcyTwsbQRZ7Pq3e1RYWivOE8XyGqgQ== Message-ID: References: <82B38DBF-DD3A-46CD-93F6-02CDB6506E05@slu.se> <20111025193302.GA30409@nargothrond.kdm.org> In-Reply-To: <20111025193302.GA30409@nargothrond.kdm.org> Accept-Language: sv-SE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: sv-SE, en-US MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "freebsd-scsi@freebsd.org" , "fs@freebsd.org" Subject: Re: AOC-USAS2-L8i zfs panics and SCSI errors in messages X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Oct 2011 09:37:20 -0000 Hi all, I tracked down what causes the panics! I got a tip from aragon and phoenix at the forum about /etc/periodic/security/100.chksetuid And to put: daily_status_security_chksetuid_enable=3D"NO" into /etc/periodic.conf I can now run periodic daily without any panics. I=B4m still wondering abou= t the cause of this, the explanation from the forum was that that phase is = too demanding for multi TB systems. But I have several multi TB servers wit= h FreeBSD and ZFS, and none of them has ever behaved this way. Besides, the= panic is instantaneous, not degenerative. I imagine that a run like that w= ould start out OK and then just get worse and worse, getting gradually slow= er and slower until it just wouldn=B4t cope any more and hang. This feels m= ore like hitting a wall. As if it found something that is couldn=B4t deal w= ith and has no choice but to panic immediately. I=B4m hoping this can be resolved without having to know beforehand about p= utting stuff into periodic.conf that you couldn=B4t have anticipated? @Ken The hard drives are connected with two breakout cables from each controller= to the caddies with CABMS2FN05 from: http://www.promise.com/single_page_session/page.aspx?region=3Den-US&m=3D575= &rsn=3D114 >From controller 1 -> channel 1 -> ports 1,2,3 -> ports 1,2,3 in caddie 1 >From controller 1 -> channel 2 -> ports 1,2 -> ports 4,5 in caddie 1 >From controller 2 -> channel 1 -> ports 1,2,3 -> ports 1,2,3 in caddie 2 >From controller 2 -> channel 2 -> ports 1,2 -> ports 4,5 in caddie 2 Is there any problem with that type of cabling? These timeouts happens with all harddrives at one time or another, would th= at mean that all cables are bad? Or of a worse quality perhaps? Regarding t= he firmware, they are all running version 1AQ10001. I=B4m going to search f= or known problems with that, and if you know something, your are welcome to= share;) Best Regards Karli Sj=F6berg 25 okt 2011 kl. 21.33 skrev Kenneth D. Merry: On Thu, Oct 20, 2011 at 13:28:17 +0200, Karli Sj?berg wrote: Hi, I?m in the process of vacating a Sun/Oracle system to a another Supermicro/= FreeBSD system, doing zfs send/recv between. Two times now, the system has = panicked while not doing anything at all, and it?s throwing alot of SCSI/CA= M-related errors while doing IO-intensive operations, like send/recv, resil= ver, and zpool has sometimes reported read/write errors on the hard drives.= Best part is that the errors in messages are about all hard drives at one = time or another, and they are connected with separate cables, controllers a= nd caddies. Specs: HW: 1x Supermicro X8SIL-F 2x Supermicro AOC-USAS2-L8i 2x Supermicro CSE-M35T-1B 1x Intel Core i5 650 3,2GHz 4x 2GB 1333MHZ DDR3 ECC UDIMM 10x SAMSUNG HD204UI (in a raidz2 zpool) 1x OCZ Vertex 3 240GB (L2ARC) SW: # uname -a FreeBSD server 8.2-STABLE FreeBSD 8.2-STABLE #0: Mon Oct 10 09:12:25 UTC 20= 11 root@server:/usr/obj/usr/src/sys/GENERIC amd64 # zpool get version pool1 NAME PROPERTY VALUE SOURCE pool1 version 28 default[/CODE] I got the panic from the IPMI KVM: http://i55.tinypic.com/synpzk.png In looking at the panic, this is a ZFS panic. Nothing the disks do should be able to cause ZFS to panic. ZFS is panicing in avl_add(): /* * This is unfortunate. We want to call panic() here, even for * non-DEBUG kernels. In userland, however, we can't depend on anything * in libc or else the rtld build process gets confused. So, all we can * do in userland is resort to a normal ASSERT(). */ if (avl_find(tree, new_node, &where) !=3D NULL) #ifdef _KERNEL panic("avl_find() succeeded inside avl_add()"); #else ASSERT(0); #endif There are certainly timeouts and two terminated IOCs in the log below. Tha= t does suggest a hardware or driver problem, but it isn't very obvious what it might be. I have seen bad behavior with SATA drives behind 3Gb Maxim expanders talking to 6GB LSI controllers, but your particular configuration does not involve any expanders, and therefore is not that particular STP issue. My best guess, and it is a guess, is that either the drives are misbehaving (i.e. firmware type problem) or you've got a cabling issue. If you have more hardware available, you might try swapping out the cables and/or drives to see if you can reproduce the drive errors with a different setup. If you swap the drives, I would use a different brand if you've got them available. I'm CCing the fs list, perhaps someone there can look at the stack trace above and figure out what ZFS might be doing. Again, ZFS should survive any errors from the drives, and the panic above looks like ZFS is flagging a logic bug somewhere. And an extract from /var/log/messages: Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): WRITE(10). CDB: 2a 0 6 13 6= 6 f 0 0 f 0 Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): CAM status: SCSI Status Err= or Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI status: Check Conditio= n Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI sense: UNIT ATTENTION = asc:29,0 (Power on, reset, or bus device reset occurred) Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): WRITE(6). CDB: a 0 1 b2 2 0 Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): CAM status: SCSI Status Err= or Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI status: Check Conditio= n Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI sense: UNIT ATTENTION = asc:29,0 (Power on, reset, or bus device reset occurred) Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev= ice handle 0x000c SMID 859 Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev= ice handle 0x000c SMID 495 Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev= ice handle 0x000c SMID 725 Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev= ice handle 0x000c SMID 722 Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev= ice handle 0x000c SMID 438 Oct 19 17:40:38 fs2-7 kernel: mps1: (1:4:0) terminated ioc 804b scsi 0 stat= e c xfer 0 Oct 19 17:40:38 fs2-7 last message repeated 3 times Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on= handle 0x0c SMID 859 complete Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending def= erred task management request for handle 0x0c SMID 495 Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on= handle 0x0c SMID 495 complete Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending def= erred task management request for handle 0x0c SMID 725 Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on= handle 0x0c SMID 725 complete Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending def= erred task management request for handle 0x0c SMID 722 Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on= handle 0x0c SMID 722 complete Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending def= erred task management request for handle 0x0c SMID 438 Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on= handle 0x0c SMID 438 complete Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): WRITE(10). CDB: 2a 0 6 25 4= f 75 0 0 b 0 Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): CAM status: SCSI Status Err= or Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI status: Check Conditio= n Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI sense: UNIT ATTENTION = asc:29,0 (Power on, reset, or bus device reset occurred) Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): WRITE(10). CDB: 2a 0 2d a5 = 10 ca 0 0 80 0 Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): CAM status: SCSI Status Err= or Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI status: Check Conditio= n Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI sense: UNIT ATTENTION = asc:29,0 (Power on, reset, or bus device reset occurred) Oct 19 17:45:40 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on dev= ice handle 0x000a SMID 976 Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on dev= ice handle 0x000a SMID 636 Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on dev= ice handle 0x000a SMID 888 Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on dev= ice handle 0x000a SMID 983 Oct 19 17:45:41 fs2-7 kernel: mps0: (0:1:0) terminated ioc 804b scsi 0 stat= e c xfer 0 Oct 19 17:45:41 fs2-7 last message repeated 2 times Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on= handle 0x0a SMID 976 complete Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_complete_tm_request: sending def= erred task management request for handle 0x0a SMID 636 Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on= handle 0x0a SMID 636 complete Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_complete_tm_request: sending def= erred task management request for handle 0x0a SMID 888 Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on= handle 0x0a SMID 888 complete Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_complete_tm_request: sending def= erred task management request for handle 0x0a SMID 983 Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on= handle 0x0a SMID 983 complete Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): WRITE(10). CDB: 2a 0 6 40 a= 7 2 0 0 3 0 Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): CAM status: SCSI Status Err= or Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI status: Check Conditio= n Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI sense: UNIT ATTENTION = asc:29,0 (Power on, reset, or bus device reset occurred) Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): WRITE(10). CDB: 2a 0 6 40 b= 0 9 0 0 9 0 Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): CAM status: SCSI Status Err= or Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): SCSI status: Check Conditio= n Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): SCSI sense: UNIT ATTENTION = asc:29,0 (Power on, reset, or bus device reset occurred) What?s going on? Regards Karli Sj?berg_______________________________________________ freebsd-scsi@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-scsi To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" Ken -- Kenneth Merry ken@FreeBSD.ORG Med V=E4nliga H=E4lsningar ---------------------------------------------------------------------------= ---- Karli Sj=F6berg Swedish University of Agricultural Sciences Box 7079 (Visiting Address Kron=E5sv=E4gen 8) S-750 07 Uppsala, Sweden Phone: +46-(0)18-67 15 66 karli.sjoberg@slu.se From owner-freebsd-scsi@FreeBSD.ORG Wed Oct 26 10:29:14 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C97AE106566C for ; Wed, 26 Oct 2011 10:29:14 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta11.emeryville.ca.mail.comcast.net (qmta11.emeryville.ca.mail.comcast.net [76.96.27.211]) by mx1.freebsd.org (Postfix) with ESMTP id 595618FC16 for ; Wed, 26 Oct 2011 10:29:14 +0000 (UTC) Received: from omta13.emeryville.ca.mail.comcast.net ([76.96.30.52]) by qmta11.emeryville.ca.mail.comcast.net with comcast id pNFv1h00117UAYkABNFxii; Wed, 26 Oct 2011 10:15:57 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta13.emeryville.ca.mail.comcast.net with comcast id pNEx1h0051t3BNj8ZNExRq; Wed, 26 Oct 2011 10:14:57 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 94615102C1A; Wed, 26 Oct 2011 03:16:02 -0700 (PDT) Date: Wed, 26 Oct 2011 03:16:02 -0700 From: Jeremy Chadwick To: Karli Sj?berg Message-ID: <20111026101602.GA9768@icarus.home.lan> References: <82B38DBF-DD3A-46CD-93F6-02CDB6506E05@slu.se> <20111025193302.GA30409@nargothrond.kdm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "freebsd-scsi@freebsd.org" , "Kenneth D. Merry" , "fs@freebsd.org" Subject: Re: AOC-USAS2-L8i zfs panics and SCSI errors in messages X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Oct 2011 10:29:14 -0000 On Wed, Oct 26, 2011 at 11:36:44AM +0200, Karli Sj?berg wrote: > Hi all, > > I tracked down what causes the panics! > > I got a tip from aragon and phoenix at the forum about > /etc/periodic/security/100.chksetuid > > And to put: > daily_status_security_chksetuid_enable="NO" > into /etc/periodic.conf This is not truly the cause of the panic, it simply exacerbates it. Many of the periodic scripts will do things like iterate over all files on the filesystem looking for specific attributes, etc.. This tends to stress filesystems heavily. This isn't the only one. :-) > I can now run periodic daily without any panics. I?m still wondering > about the cause of this, the explanation from the forum was that that > phase is too demanding for multi TB systems. But I have several multi > TB servers with FreeBSD and ZFS, and none of them has ever behaved > this way. Besides, the panic is instantaneous, not degenerative. I > imagine that a run like that would start out OK and then just get > worse and worse, getting gradually slower and slower until it just > wouldn?t cope any more and hang. This feels more like hitting a wall. > As if it found something that is couldn?t deal with and has no choice > but to panic immediately. It may be possible that you have some underlying filesystem corruption that triggers this situation. Have you actually tried doing a "zpool scrub" of your pools and seeing if any errors happen or if the panic occurs there? I'm inclined to think what you're experiencing is probably a bug or "quirk" in the storage controller driver you're using. There are other drivers that have had fixes applied to them "to make them work decently with ZFS", meaning the kind of stressful I/O ZFS puts on them results in the controller driver behaving oddly or freaking out, case in point. It could also be a controller firmware bug/quirk/design issue. Seriously. I believe the AOC-USAS2-L8i controller has been discussed on freebsd-stable, re: mps(4) driver problems or equivalent, but I'm not going to CC that list given that there would be 3 cross-posted lists involved and that is liable to upset some folks. You should search the mailing lists for discussion of Supermicro controllers that work reliably with FreeBSD. It would be worthwhile to discuss this condition on -stable, mainly with something like "Anyone else using the AOC-USAS2-L8i reliably with ZFS?" You get the idea. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-scsi@FreeBSD.ORG Wed Oct 26 14:05:40 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6B1021065673 for ; Wed, 26 Oct 2011 14:05:40 +0000 (UTC) (envelope-from dgilbert@interlog.com) Received: from smtp.infotech.no (smtp.infotech.no [82.134.31.41]) by mx1.freebsd.org (Postfix) with ESMTP id E40E58FC16 for ; Wed, 26 Oct 2011 14:05:39 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp.infotech.no (Postfix) with ESMTP id 492802041BA; Wed, 26 Oct 2011 16:05:36 +0200 (CEST) X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no Received: from smtp.infotech.no ([127.0.0.1]) by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id k3u1jVLwRDTP; Wed, 26 Oct 2011 16:05:36 +0200 (CEST) Received: from [10.7.0.30] (unknown [10.7.0.30]) by smtp.infotech.no (Postfix) with ESMTPA id 06BB12041B9; Wed, 26 Oct 2011 16:05:34 +0200 (CEST) Message-ID: <4EA813AD.7000205@interlog.com> Date: Wed, 26 Oct 2011 10:05:33 -0400 From: Douglas Gilbert User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.23) Gecko/20110922 Thunderbird/3.1.15 MIME-Version: 1.0 To: Jeremy Chadwick References: <82B38DBF-DD3A-46CD-93F6-02CDB6506E05@slu.se> <20111025193302.GA30409@nargothrond.kdm.org> <20111026101602.GA9768@icarus.home.lan> In-Reply-To: <20111026101602.GA9768@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-scsi@freebsd.org" , "Kenneth D. Merry" , "fs@freebsd.org" , Karli Sj?berg Subject: Re: AOC-USAS2-L8i zfs panics and SCSI errors in messages X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dgilbert@interlog.com List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Oct 2011 14:05:40 -0000 On 11-10-26 06:16 AM, Jeremy Chadwick wrote: > On Wed, Oct 26, 2011 at 11:36:44AM +0200, Karli Sj?berg wrote: >> Hi all, >> >> I tracked down what causes the panics! >> >> I got a tip from aragon and phoenix at the forum about >> /etc/periodic/security/100.chksetuid >> >> And to put: >> daily_status_security_chksetuid_enable="NO" >> into /etc/periodic.conf > > This is not truly the cause of the panic, it simply exacerbates it. > > Many of the periodic scripts will do things like iterate over all files > on the filesystem looking for specific attributes, etc.. This tends to > stress filesystems heavily. This isn't the only one. :-) > >> I can now run periodic daily without any panics. I?m still wondering >> about the cause of this, the explanation from the forum was that that >> phase is too demanding for multi TB systems. But I have several multi >> TB servers with FreeBSD and ZFS, and none of them has ever behaved >> this way. Besides, the panic is instantaneous, not degenerative. I >> imagine that a run like that would start out OK and then just get >> worse and worse, getting gradually slower and slower until it just >> wouldn?t cope any more and hang. This feels more like hitting a wall. >> As if it found something that is couldn?t deal with and has no choice >> but to panic immediately. > > It may be possible that you have some underlying filesystem corruption > that triggers this situation. Have you actually tried doing a "zpool > scrub" of your pools and seeing if any errors happen or if the panic > occurs there? > > I'm inclined to think what you're experiencing is probably a bug or > "quirk" in the storage controller driver you're using. There are other > drivers that have had fixes applied to them "to make them work decently > with ZFS", meaning the kind of stressful I/O ZFS puts on them results in > the controller driver behaving oddly or freaking out, case in point. It > could also be a controller firmware bug/quirk/design issue. Seriously. > > I believe the AOC-USAS2-L8i controller has been discussed on > freebsd-stable, re: mps(4) driver problems or equivalent, but I'm not > going to CC that list given that there would be 3 cross-posted lists > involved and that is liable to upset some folks. You should search the > mailing lists for discussion of Supermicro controllers that work > reliably with FreeBSD. > > It would be worthwhile to discuss this condition on -stable, mainly with > something like "Anyone else using the AOC-USAS2-L8i reliably with ZFS?" > You get the idea. There is a steady stream of patches from LSI staff to both the mptsas and mpt2sas drivers on the Linux SCSI list (e.g. the most recent patch set to mpt2sas was on 20111019 and contained 7 separate "fixes"). I don't see these patches appearing on this list. Is there a mechanism to get driver corrections incorporated into the relevant FreeBSD drivers? LSI do keep FreeBSD drivers on their site. For example for the 9212-4i4e HBA, see this page: http://www.lsi.com/products/storagecomponents/Pages/LSISAS9212-4i4e.aspx That FreeBSD zip is dated 20110808 and has mps drivers for FreeBSD 7.2.0, 7.4.0, 8.2.0 Doug Gilbert From owner-freebsd-scsi@FreeBSD.ORG Wed Oct 26 14:56:40 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BEE88106566C for ; Wed, 26 Oct 2011 14:56:40 +0000 (UTC) (envelope-from ken@kdm.org) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) by mx1.freebsd.org (Postfix) with ESMTP id 7C98E8FC1A for ; Wed, 26 Oct 2011 14:56:40 +0000 (UTC) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id p9QEudw6026215; Wed, 26 Oct 2011 08:56:39 -0600 (MDT) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id p9QEudwb026214; Wed, 26 Oct 2011 08:56:39 -0600 (MDT) (envelope-from ken) Date: Wed, 26 Oct 2011 08:56:39 -0600 From: "Kenneth D. Merry" To: Douglas Gilbert Message-ID: <20111026145639.GA25538@nargothrond.kdm.org> References: <82B38DBF-DD3A-46CD-93F6-02CDB6506E05@slu.se> <20111025193302.GA30409@nargothrond.kdm.org> <20111026101602.GA9768@icarus.home.lan> <4EA813AD.7000205@interlog.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4EA813AD.7000205@interlog.com> User-Agent: Mutt/1.4.2i Cc: freebsd-fs@freebsd.org, "freebsd-scsi@freebsd.org" , Jeremy Chadwick , Karli Sj?berg Subject: Re: AOC-USAS2-L8i zfs panics and SCSI errors in messages X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Oct 2011 14:56:40 -0000 On Wed, Oct 26, 2011 at 10:05:33 -0400, Douglas Gilbert wrote: > On 11-10-26 06:16 AM, Jeremy Chadwick wrote: > >On Wed, Oct 26, 2011 at 11:36:44AM +0200, Karli Sj?berg wrote: > >>Hi all, > >> > >>I tracked down what causes the panics! > >> > >>I got a tip from aragon and phoenix at the forum about > >>/etc/periodic/security/100.chksetuid > >> > >>And to put: > >>daily_status_security_chksetuid_enable="NO" > >>into /etc/periodic.conf > > > >This is not truly the cause of the panic, it simply exacerbates it. > > > >Many of the periodic scripts will do things like iterate over all files > >on the filesystem looking for specific attributes, etc.. This tends to > >stress filesystems heavily. This isn't the only one. :-) > > > >>I can now run periodic daily without any panics. I?m still wondering > >>about the cause of this, the explanation from the forum was that that > >>phase is too demanding for multi TB systems. But I have several multi > >>TB servers with FreeBSD and ZFS, and none of them has ever behaved > >>this way. Besides, the panic is instantaneous, not degenerative. I > >>imagine that a run like that would start out OK and then just get > >>worse and worse, getting gradually slower and slower until it just > >>wouldn?t cope any more and hang. This feels more like hitting a wall. > >>As if it found something that is couldn?t deal with and has no choice > >>but to panic immediately. > > > >It may be possible that you have some underlying filesystem corruption > >that triggers this situation. Have you actually tried doing a "zpool > >scrub" of your pools and seeing if any errors happen or if the panic > >occurs there? > > > >I'm inclined to think what you're experiencing is probably a bug or > >"quirk" in the storage controller driver you're using. There are other > >drivers that have had fixes applied to them "to make them work decently > >with ZFS", meaning the kind of stressful I/O ZFS puts on them results in > >the controller driver behaving oddly or freaking out, case in point. It > >could also be a controller firmware bug/quirk/design issue. Seriously. > > > >I believe the AOC-USAS2-L8i controller has been discussed on > >freebsd-stable, re: mps(4) driver problems or equivalent, but I'm not > >going to CC that list given that there would be 3 cross-posted lists > >involved and that is liable to upset some folks. You should search the > >mailing lists for discussion of Supermicro controllers that work > >reliably with FreeBSD. > > > >It would be worthwhile to discuss this condition on -stable, mainly with > >something like "Anyone else using the AOC-USAS2-L8i reliably with ZFS?" > >You get the idea. > > There is a steady stream of patches from LSI staff to > both the mptsas and mpt2sas drivers on the Linux SCSI > list (e.g. the most recent patch set to mpt2sas was on > 20111019 and contained 7 separate "fixes"). > > I don't see these patches appearing on this list. Is there > a mechanism to get driver corrections incorporated into > the relevant FreeBSD drivers? > > LSI do keep FreeBSD drivers on their site. For example for > the 9212-4i4e HBA, see this page: > http://www.lsi.com/products/storagecomponents/Pages/LSISAS9212-4i4e.aspx > That FreeBSD zip is dated 20110808 and has mps drivers for > FreeBSD 7.2.0, 7.4.0, 8.2.0 They do have a developer working on their version of the mps driver. Release of the driver has been hung up by LSI's legal department since February, unfortunately. I'm not sure what the issue is, but that is why it isn't in FreeBSD. The plan, once LSI's legal department approves it, is to hopefully give their developer commit access so he can just check fixes in to the driver. For now, though, their binary-only drivers may fix things for some folks. e.g., those drivers should support their Integrated RAID features. (The driver in the tree doesn't support them.) The error recovery code in their driver is a bit better (the error recovery part was written by Isilon), but I'm not sure whether it would fix this particular problem. This really looks like a ZFS issue. Ken -- Kenneth Merry ken@FreeBSD.ORG From owner-freebsd-scsi@FreeBSD.ORG Thu Oct 27 13:33:24 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AB7FF106564A for ; Thu, 27 Oct 2011 13:33:24 +0000 (UTC) (envelope-from armand.delcros@gmail.com) Received: from mail-pz0-f44.google.com (mail-pz0-f44.google.com [209.85.210.44]) by mx1.freebsd.org (Postfix) with ESMTP id 87F708FC0A for ; Thu, 27 Oct 2011 13:33:24 +0000 (UTC) Received: by pzk4 with SMTP id 4so14719015pzk.3 for ; Thu, 27 Oct 2011 06:33:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=1LTyEpDetaXPum+iShrmSVDvw/1RWTKOkt8ECJu2Yjs=; b=EMMTZ2bTZQ7Yu3vImt5MDq/cP3B/gM6DPCCoG7USo5JRjuIYyqCZQu1H2PwZ0r3/vM 5+9PR0KYRVXfmdIM67A2WZATW7h5/koQMSHAGScB+ZJUCrIfHO/Gzj6Xjbs/kT98pZbT X9oq+qM8Dr5ZYyScNp/ip2aGN+z8QVWVTAxxI= MIME-Version: 1.0 Received: by 10.68.20.134 with SMTP id n6mr17707526pbe.16.1319720514995; Thu, 27 Oct 2011 06:01:54 -0700 (PDT) Received: by 10.68.42.2 with HTTP; Thu, 27 Oct 2011 06:01:54 -0700 (PDT) In-Reply-To: <8606490.140484.1301057374592.JavaMail.root@mail.corp> References: <4D8A1EDB.50206@soe.ucsc.edu> <8606490.140484.1301057374592.JavaMail.root@mail.corp> Date: Thu, 27 Oct 2011 15:01:54 +0200 Message-ID: From: Armand Delcros To: freebsd-scsi@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: Serious Dell Sadness - H200, H700, and H800 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Oct 2011 13:33:24 -0000 Hello, We have a strange behaviour between the Freebsd 9.0 beta 3 and FreeBSD 9.0 RC1. We are working on a Dell R610 with a PERC RAID H200 (SAS2008-IR). When we starting the server with the livecd or liveusb key, the mps driver is not loaded, we have to manually load the driver (just one time we didn't need to do that manual loaded mps driver ...): # camcontrol devlist at scbus2 target 0 lun 0 (pass),cd0_ # kldstat Id Refs Address Size Name 1 1 0xc0400000 ea3c9c kernel # kldload mps mps0: port 0xfc00-0xfcff mem 0xdf3b0000-0xdf3c0000-0xdf 3fffff irq 16 at device 0.0 on pci3 mps0: Firmware : 07.15.08.00 mps0: IOCCapapabilities .......... da0: Fixed Direct Access SCSI-5 device da0: 600.000MB/s tranfers da0: Command Queueuing enabled da0: 1400014MB ......... da1 at mps0 bus0 scbus6 target 0 lun 0 da1 : Fixed Direct Access SCSI-5 device .... # camcontrol devlist at scbus2 target 0 lun 0 (pass0,cd0) at scbus6 target 0 lun 0 (da1,pass2) at scbus6 target 1 lun 0 (da0,pass1) It would be great that the mps be loaded at startup without handy manipulations. Best regards, Armand Delcros From owner-freebsd-scsi@FreeBSD.ORG Fri Oct 28 00:27:59 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 92EAE106564A for ; Fri, 28 Oct 2011 00:27:59 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from fallbackmx09.syd.optusnet.com.au (fallbackmx09.syd.optusnet.com.au [211.29.132.242]) by mx1.freebsd.org (Postfix) with ESMTP id 1BD438FC16 for ; Fri, 28 Oct 2011 00:27:58 +0000 (UTC) Received: from mail15.syd.optusnet.com.au (mail15.syd.optusnet.com.au [211.29.132.196]) by fallbackmx09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p9RMQAuw029867 for ; Fri, 28 Oct 2011 09:26:10 +1100 Received: from server.vk2pj.dyndns.org (c220-239-116-103.belrs4.nsw.optusnet.com.au [220.239.116.103]) by mail15.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p9RMQ7ub029305 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 28 Oct 2011 09:26:08 +1100 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.5/8.14.4) with ESMTP id p9RMQ4c5040568 for ; Fri, 28 Oct 2011 09:26:04 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.5/8.14.4/Submit) id p9RMQ3mc040567 for freebsd-scsi@freebsd.org; Fri, 28 Oct 2011 09:26:03 +1100 (EST) (envelope-from peter) Date: Fri, 28 Oct 2011 09:26:03 +1100 From: Peter Jeremy To: freebsd-scsi@freebsd.org Message-ID: <20111027222603.GA39946@server.vk2pj.dyndns.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="7AUc2qLy4jB3hD7Z" Content-Disposition: inline X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Watchdog timeouts on isp(4) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Oct 2011 00:27:59 -0000 --7AUc2qLy4jB3hD7Z Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I have a 16-CPU SunFire V890 running FreeBSD-current/sparc64 r226167. Whilst doing some stress testing (6 parallel -j16 buildworlds with regular "sysctl vm.vmtotal"), I have started seeing isp watchdog timeouts. I didn't see this with similar testing on about r223035. Does anyone have any suggestions? The messages look like: (da4:isp0:0:4:0): first watchdog (handle 0xb4d820d9) timed out- deferring f= or grace period (da4:isp0:0:4:0): first watchdog (handle 0xb4d920ba) timed out- deferring f= or grace period (da11:isp1:0:5:0): first watchdog (handle 0x5f0620c9) timed out- deferring = for grace period (da11:isp1:0:5:0): first watchdog (handle 0x5f07203a) timed out- deferring = for grace period (da11:isp1:0:5:0): first watchdog (handle 0x5f08208c) timed out- deferring = for grace period =2E.. (da11:isp1:0:5:0): first watchdog (handle 0x5f762088) timed out- deferring = for grace period isp1: isp_watchdog: timeout for handle 0x5f0620c9 (da11:isp1:0:5:0): FIN dl16384 resid 0 CDB=3D0x2a 0x00 0x0f 0x4e 0x66 0x20 = 0x00 0x00 0x20 0x00 STS 0x0 XS_ERR=3D0xb isp1: isp_watchdog: timeout for handle 0x5f07203a (da11:isp1:0:5:0): FIN dl16384 resid 0 CDB=3D0x2a 0x00 0x0f 0x4e 0x66 0x40 = 0x00 0x00 0x20 0x00 STS 0x0 XS_ERR=3D0xb isp1: isp_watchdog: timeout for handle 0x5f08208c (da11:isp1:0:5:0): FIN dl16384 resid 0 CDB=3D0x2a 0x00 0x0f 0x4e 0x66 0x60 = 0x00 0x00 0x20 0x00 STS 0x0 XS_ERR=3D0xb isp1: isp_watchdog: timeout for handle 0x5f09201f (da11:isp1:0:5:0): FIN dl16384 resid 0 CDB=3D0x2a 0x00 0x0f 0x4e 0x66 0x80 = 0x00 0x00 0x20 0x00 STS 0x0 XS_ERR=3D0xb isp1: bad request handle 0x5f0620c9 (iocb type 0x3) isp1: bad request handle 0x5f07203a (iocb type 0x3) =2E.. (da4:isp0:0:4:0): first watchdog (handle 0x5cf1206d) timed out- deferring f= or grace period (da4:isp0:0:4:0): first watchdog (handle 0x5cf2203a) timed out- deferring f= or grace period isp0: isp_watchdog: timeout for handle 0x5cad2046 (da4:isp0:0:4:0): FIN dl16384 resid 0 CDB=3D0x2a 0x00 0x0f 0xdd 0xe8 0xe0 0= x00 0x00 0x20 0x00 STS 0x0 XS_ERR=3D0xb isp0: bad request handle 0x5cad2046 (iocb type 0x3) isp0: isp_watchdog: timeout for handle 0x5cdb20cb (da4:isp0:0:4:0): FIN dl16384 resid 0 CDB=3D0x2a 0x00 0x0f 0xe3 0xa8 0x00 0= x00 0x00 0x20 0x00 STS 0x0 XS_ERR=3D0xb isp0: isp_watchdog: timeout for handle 0x5cdc2059 (da4:isp0:0:4:0): FIN dl16384 resid 0 CDB=3D0x2a 0x00 0x0f 0xe3 0xa8 0x20 0= x00 0x00 0x20 0x00 STS 0x0 XS_ERR=3D0xb isp0: isp_watchdog: timeout for handle 0x5cdd2020 (da4:isp0:0:4:0): FIN dl16384 resid 0 CDB=3D0x2a 0x00 0x0f 0xe3 0xa8 0x40 0= x00 0x00 0x20 0x00 STS 0x0 XS_ERR=3D0xb isp0: bad request handle 0x5cdb20cb (iocb type 0x3) isp0: bad request handle 0x5cdc2059 (iocb type 0x3) isp0: bad request handle 0x5cdd2020 (iocb type 0x3) The da11 errors are unexpected because there should be no activity on filesystems on that disk. $ df -k Filesystem 1024-blocks Used Avail Capacity Mounted on /dev/da4a 9850318 3495908 5566386 39% / devfs 1 1 0 100% /dev /dev/da4e 981998 17936 885504 2% /tmp /dev/da4d 127993860 60068290 57686062 51% /var /dev/da11a 9850318 1404716 7657578 16% /8 /dev/da11e 981998 7870 895570 1% /8/tmp /dev/da11d 127993860 22228452 95525900 19% /8/var /dev/md0 28784540 1482784 24998996 6% /a $ swapinfo Device 1K-blocks Used Avail Capacity /dev/da0b 67110720 0 67110720 0% /dev/da6b 67110720 0 67110720 0% Total 134221440 0 134221440 0% The isp's and associated disks are: isp0: port 0x300-0x3ff mem 0x400000-0x4= 00fff at device 2.0 on pci1 isp0: invalid NVRAM header isp0: invalid NVRAM header isp0: bad frame length (0) from NVRAM- using 1024 isp0: bad execution throttle of 0- using 16 =2E.. isp1: port 0x1000-0x10ff mem 0x100000-0= x100fff at device 4.0 on pci4 isp2: port 0x1100-0x11ff mem 0x102000-0= x102fff at device 5.0 on pci4 =2E.. (probe6:isp0:0:6:0): TEST UNIT READY. CDB: 0 0 0 0 0 0=20 (probe6:isp0:0:6:0): CAM status: SCSI Status Error (probe6:isp0:0:6:0): SCSI status: Check Condition (probe6:isp0:0:6:0): SCSI sense: UNIT ATTENTION asc:29,1 (Power on occurred) (probe518:isp1:0:6:0): TEST UNIT READY. CDB: 0 0 0 0 0 0=20 (probe518:isp1:0:6:0): CAM status: SCSI Status Error (probe518:isp1:0:6:0): SCSI status: Check Condition (probe518:isp1:0:6:0): SCSI sense: UNIT ATTENTION asc:29,1 (Power on occurr= ed) uhub0: 4 ports with 4 removable, self powered ses0 at isp0 bus 0 scbus1 target 6 lun 0 ses0: Fixed Enclosure Services SCSI-3 device= =20 ses0: 100.000MB/s transfers WWNN 0x508002000025a7c0 WWPN 0x508002000025a7c1= PortID 0xdc ses0: SCSI-3 SES Device ses1 at isp1 bus 0 scbus2 target 6 lun 0 ses1: Fixed Enclosure Services SCSI-3 device= =20 ses1: 100.000MB/s transfers WWNN 0x508002000065a568 WWPN 0x508002000065a569= PortID 0xdc ses1: SCSI-3 SES Device da0 at isp0 bus 0 scbus1 target 0 lun 0 da0: Fixed Direct Access SCSI-3 device=20 da0: 100.000MB/s transfers WWNN 0x20000000875017a5 WWPN 0x21000000875017a5 = PortID 0xef da0: Command Queueing enabled da0: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da1 at isp0 bus 0 scbus1 target 1 lun 0 da1: Fixed Direct Access SCSI-3 device=20 da1: 100.000MB/s transfers WWNN 0x20000000875014ba WWPN 0x21000000875014ba = PortID 0xe8 da1: Command Queueing enabled da1: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da2 at isp0 bus 0 scbus1 target 2 lun 0 da2: Fixed Direct Access SCSI-3 device=20 da2: 100.000MB/s transfers WWNN 0x2000000087501828 WWPN 0x2100000087501828 = PortID 0xe4 da2: Command Queueing enabled da2: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da3 at isp0 bus 0 scbus1 target 3 lun 0 da3: Fixed Direct Access SCSI-3 device=20 da3: 100.000MB/s transfers WWNN 0x200000008796f785 WWPN 0x210000008796f785 = PortID 0xe2 da3: Command Queueing enabled da3: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da4 at isp0 bus 0 scbus1 target 4 lun 0 da4: Fixed Direct Access SCSI-3 device=20 da4: 100.000MB/s transfers WWNN 0x2000000087501755 WWPN 0x2100000087501755 = PortID 0xe1 da4: Command Queueing enabled da4: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da5 at isp0 bus 0 scbus1 target 5 lun 0 da5: Fixed Direct Access SCSI-3 device=20 da5: 100.000MB/s transfers WWNN 0x2000000087505ebc WWPN 0x2100000087505ebc = PortID 0xe0 da5: Command Queueing enabled da5: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da6 at isp1 bus 0 scbus2 target 0 lun 0 da6: Fixed Direct Access SCSI-4 device=20 da6: 100.000MB/s transfers WWNN 0x500000e01999c6b0 WWPN 0x500000e01999c6b1 = PortID 0xef da6: Command Queueing enabled da6: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da7 at isp1 bus 0 scbus2 target 1 lun 0 da7: Fixed Direct Access SCSI-4 device=20 da7: 100.000MB/s transfers WWNN 0x500000e01365b4e0 WWPN 0x500000e01365b4e1 = PortID 0xe8 da7: Command Queueing enabled da7: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da8 at isp1 bus 0 scbus2 target 2 lun 0 da8: Fixed Direct Access SCSI-4 device=20 da8: 100.000MB/s transfers WWNN 0x500000e014422a50 WWPN 0x500000e014422a51 = PortID 0xe4 da8: Command Queueing enabled da8: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da9 at isp1 bus 0 scbus2 target 3 lun 0 da9: Fixed Direct Access SCSI-4 device=20 da9: 100.000MB/s transfers WWNN 0x500000e01998e820 WWPN 0x500000e01998e821 = PortID 0xe2 da9: Command Queueing enabled da9: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da10 at isp1 bus 0 scbus2 target 4 lun 0 da10: Fixed Direct Access SCSI-4 device=20 da10: 100.000MB/s transfers WWNN 0x500000e014397bb0 WWPN 0x500000e014397bb1= PortID 0xe1 da10: Command Queueing enabled da10: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da11 at isp1 bus 0 scbus2 target 5 lun 0 da11: Fixed Direct Access SCSI-4 device=20 da11: 100.000MB/s transfers WWNN 0x500000e0143f8ee0 WWPN 0x500000e0143f8ee1= PortID 0xe0 da11: Command Queueing enabled da11: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) pciconf reports: isp0@pci1:0:2:0: class=3D0x010000 card=3D0x00000000 chip=3D0x2200107= 7 rev=3D0x05 hdr=3D0x00 vendor =3D 'QLogic Corp.' device =3D 'QLA2200 64-bit Fibre Channel Adapter' class =3D mass storage subclass =3D SCSI bar [10] =3D type I/O Port, range 32, base 0x300, size 256, enabled bar [14] =3D type Memory, range 32, base 0x400000, size 4096, enabled cap 01[44] =3D powerspec 1 supports D0 D3 current D0 isp1@pci3:1:4:0: class=3D0x010000 card=3D0x40831077 chip=3D0x2200107= 7 rev=3D0x05 hdr=3D0x00 vendor =3D 'QLogic Corp.' device =3D 'QLA2200 64-bit Fibre Channel Adapter' class =3D mass storage subclass =3D SCSI bar [10] =3D type I/O Port, range 32, base 0x1000, size 256, enabled bar [14] =3D type Memory, range 32, base 0x100000, size 4096, enabled cap 01[44] =3D powerspec 1 supports D0 D3 current D0 isp2@pci3:1:5:0: class=3D0x010000 card=3D0x40831077 chip=3D0x2200107= 7 rev=3D0x05 hdr=3D0x00 vendor =3D 'QLogic Corp.' device =3D 'QLA2200 64-bit Fibre Channel Adapter' class =3D mass storage subclass =3D SCSI bar [10] =3D type I/O Port, range 32, base 0x1100, size 256, enabled bar [14] =3D type Memory, range 32, base 0x102000, size 4096, enabled cap 01[44] =3D powerspec 1 supports D0 D3 current D0 --=20 Peter Jeremy --7AUc2qLy4jB3hD7Z Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk6p2nsACgkQ/opHv/APuIfIEACfWA9wU0okJROhk5ij7SJ3wgXE G00AoImKh2MUQ+6VEB2Nx7IvaDpKBRgY =rlxj -----END PGP SIGNATURE----- --7AUc2qLy4jB3hD7Z-- From owner-freebsd-scsi@FreeBSD.ORG Fri Oct 28 02:29:08 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 20609106564A for ; Fri, 28 Oct 2011 02:29:08 +0000 (UTC) (envelope-from mj@feral.com) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id E05278FC17 for ; Fri, 28 Oct 2011 02:29:07 +0000 (UTC) Received: from [192.168.135.109] (c-24-7-47-62.hsd1.ca.comcast.net [24.7.47.62]) (authenticated bits=0) by ns1.feral.com (8.14.4/8.14.4) with ESMTP id p9S2T5KU079939 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Thu, 27 Oct 2011 19:29:06 -0700 (PDT) (envelope-from mj@feral.com) Message-ID: <4EAA136D.4050003@feral.com> Date: Thu, 27 Oct 2011 19:29:01 -0700 From: Matthew Jacob Organization: Feral Software User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: <20111027222603.GA39946@server.vk2pj.dyndns.org> In-Reply-To: <20111027222603.GA39946@server.vk2pj.dyndns.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (ns1.feral.com [192.67.166.1]); Thu, 27 Oct 2011 19:29:07 -0700 (PDT) Subject: Re: Watchdog timeouts on isp(4) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: mj@feral.com List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Oct 2011 02:29:08 -0000 On 10/27/2011 3:26 PM, Peter Jeremy wrote: > I have a 16-CPU SunFire V890 running FreeBSD-current/sparc64 r226167. > Whilst doing some stress testing (6 parallel -j16 buildworlds with > regular "sysctl vm.vmtotal"), I have started seeing isp watchdog > timeouts. I didn't see this with similar testing on about r223035. > Does anyone have any suggestions? Yes, there was a change recently in this area. Can you send me the actual logs instead of excerpts?