From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 15 18:54:32 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A23DD106566C for ; Wed, 15 Sep 2010 18:54:32 +0000 (UTC) (envelope-from PHeyman@adaranet.com) Received: from barracuda.adaranet.com (smtp.adaranet.com [72.5.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 80AAA8FC14 for ; Wed, 15 Sep 2010 18:54:32 +0000 (UTC) X-ASG-Debug-ID: 1284576871-506119a40001-P5m3U7 Received: from SJ-EXCH-1.adaranet.com ([10.10.1.29]) by barracuda.adaranet.com with ESMTP id P3HKNALWk1viV6wp for ; Wed, 15 Sep 2010 11:54:31 -0700 (PDT) X-Barracuda-Envelope-From: PHeyman@adaranet.com Received: from SJ-EXCH-1.adaranet.com ([fe80::7042:d8c2:5973:c523]) by SJ-EXCH-1.adaranet.com ([fe80::7042:d8c2:5973:c523%14]) with mapi; Wed, 15 Sep 2010 11:54:31 -0700 From: Paul Heyman X-Barracuda-BBL-IP: fe80::7042:d8c2:5973:c523 X-Barracuda-RBL-IP: fe80::7042:d8c2:5973:c523 To: "freebsd-hackers@freebsd.org" Date: Wed, 15 Sep 2010 11:53:16 -0700 X-ASG-Orig-Subj: Crash dump on HP Proliant G6 broken as of V8.0 Thread-Topic: Crash dump on HP Proliant G6 broken as of V8.0 Thread-Index: AQHLVGJZfuEva5LEmEq7OF4hw9IBSpMTQSawgAAYjsOAAAwKSg== Message-ID: <32AB5C9615CC494997D9ABB1DB12783C024C8C5A9F@SJ-EXCH-1.adaranet.com> References: <32AB5C9615CC494997D9ABB1DB12783C024C8C5A95@SJ-EXCH-1.adaranet.com>, <32AB5C9615CC494997D9ABB1DB12783C024C8DE83F@SJ-EXCH-1.adaranet.com>, <32AB5C9615CC494997D9ABB1DB12783C024C8C5A9C@SJ-EXCH-1.adaranet.com> In-Reply-To: <32AB5C9615CC494997D9ABB1DB12783C024C8C5A9C@SJ-EXCH-1.adaranet.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Barracuda-Connect: UNKNOWN[10.10.1.29] X-Barracuda-Start-Time: 1284576871 X-Barracuda-URL: http://172.16.10.203:8000/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at adaranet.com Cc: Patrick Mahan Subject: Crash dump on HP Proliant G6 broken as of V8.0 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2010 18:54:32 -0000 ALL, The crash dump worked fine in V7.3. I am debugging crash dump problem on a HP Proliant G6 which uses a SATA drive connected to a CISS Raid Controller. I have tried this on a x86 box using a non-raid ATA/SATA disk controller and it works well. I noticed that in V8.0 there is a new SCSI operating method. In the v7.3 ve= rsion there was only CISS_TRANSPORT_METHOD_SIMPLE, but in v8.0 there has been CISS_TRANSPORT_ME= THOD_PERF method added. These methods have different function calls in ciss_poll_request. The dump comand starts with a call to dadump. This function will setup a struct ccb_scsiio structure. This is done by cal= ling scsi_read_write. Then the meat of dump happens when it calls xpt_polled_action, which manag= es and simualtes interrupt functionality that is working fine. The disk operations work fine= except during a crash dump. I have turned debug on for CISS and CAMDEBUG to debug this problem. In xpt_polled_action (cam_xpt.c) we get past the first polling loop at line= 3013, as both devq->send_opening and dev->ccbq.dev_openings are > 0 ( 256 and 254 )= . But we do get stuck in the second one at line 3025. We eventually time out setting start_ccb->ccb_h.status to CAM_CMD_TIMEOUT. The timeout is set with DA_DEFAULT_TIMEOUT (scsi_da.c) which is set to 60, and is used in the call = to scsi_read_write. Here is the debug trace: Dumping 1240 MB: ciss_cam_action_io: XPT_SCSI_IO 0:0:0 ciss_get_request: called ciss_start: post command 150 tag 600 ciss_map_request: called ciss_request_map_helper: called ciss_cam_poll: called ciss_perf_done: completed command 150 ciss_perf_done: completed command 150 ciss_complete: called ciss_unmap_request: called ciss_cam_complete: called _ciss_report_request: called ciss_cam_complete: SCSI_STATUS_OK ciss_release_request: called ciss_complete: called ciss_unmap_request: called ciss0: WARNING: completing non-busy request ciss_cam_complete: called _ciss_report_request: called ciss_cam_complete: SCSI_STATUS_OK . . . . after about 60 seconds ciss0: WARNING: completing non-busy request ciss0: WARNING: completed command with no submitter ciss_unmap_request: called . . . This goes on forever Thanks Paul Paul Heyman pheyman@adaranetworks.com