From owner-freebsd-stable@FreeBSD.ORG Fri Apr 27 19:20:06 2007 Return-Path: X-Original-To: freebsd-stable@FreeBSD.ORG Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 13CC516A402; Fri, 27 Apr 2007 19:20:06 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id 7F42213C465; Fri, 27 Apr 2007 19:20:05 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from phobos.samsco.home (phobos.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.8/8.13.8) with ESMTP id l3RJK0Dw029253; Fri, 27 Apr 2007 13:20:01 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <46324CCF.7040109@samsco.org> Date: Fri, 27 Apr 2007 13:19:43 -0600 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.2pre) Gecko/20070111 SeaMonkey/1.1 MIME-Version: 1.0 To: Nikolay Pavlov , Thomas Quinot , "Ganbold.TS" , freebsd-stable@FreeBSD.ORG, mjacob@FreeBSD.ORG, linimon@FreeBSD.ORG, bug-followup@FreeBSD.ORG References: <20070427150134.64D3713C448@mx1.freebsd.org> <20070427153218.GA9091@melamine.cuivre.fr.eu.org> <20070427174922.GA5655@zone3000.net> In-Reply-To: <20070427174922.GA5655@zone3000.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (pooker.samsco.org [168.103.85.57]); Fri, 27 Apr 2007 13:20:01 -0600 (MDT) X-Spam-Status: No, score=-1.4 required=5.5 tests=ALL_TRUSTED autolearn=failed version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org Cc: Subject: Re: kern/112119: system hangs when starts k3b on RELENG_6 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Apr 2007 19:20:06 -0000 Nikolay Pavlov wrote: > On Friday, 27 April 2007 at 17:32:18 +0200, Thomas Quinot wrote: >> * Ganbold.TS, 2007-04-27 : >> >>> I tried your patch at >>> http://www.freebsd.org/cgi/query-pr.cgi?pr=103602&getpatch=12 and the >>> problem is still the same. Ssytem freezes upon start of k3b. >>> >>> I also tried your attached patch, which reverts part of rev. 1.42.2.3 >>> and the problem is still the same, system hangs when starts k3b. >> Thanks, that's useful info. Please try the attached patch instead, which >> reverts another part of 1.42.2.3 (I'm trying to figure out exactly >> *which* part of this change is causing the problem). >> >> Also, were you able to capture system console output at the point where >> the crash occurs? We might have some indications there. > > This patch works for me. I do not have a reboot and i am able to > succesfully burn a cd. > >> Thomas. >> > >> Index: atapi-cam.c >> =================================================================== >> RCS file: /space/mirror/ncvs/src/sys/dev/ata/atapi-cam.c,v >> retrieving revision 1.42.2.3 >> retrieving revision 1.42.2.2 >> diff -u -r1.42.2.3 -r1.42.2.2 >> --- atapi-cam.c 29 Mar 2007 20:08:32 -0000 1.42.2.3 >> +++ atapi-cam.c 6 Mar 2007 16:56:50 -0000 1.42.2.2 >> @@ -697,39 +680,32 @@ >> csio->ccb_h.status |= CAM_AUTOSNS_VALID; >> } >> } else if (request->result != 0) { >> - if ((request->flags & ATA_R_TIMEOUT) != 0) { >> - rc = CAM_CMD_TIMEOUT; >> - } else { >> - rc = CAM_SCSI_STATUS_ERROR; >> - csio->scsi_status = SCSI_STATUS_CHECK_COND; >> + rc = CAM_SCSI_STATUS_ERROR; >> + csio->scsi_status = SCSI_STATUS_CHECK_COND; >> >> - if ((csio->ccb_h.flags & CAM_DIS_AUTOSENSE) == 0) { >> + if ((csio->ccb_h.flags & CAM_DIS_AUTOSENSE) == 0) { >> #if 0 >> - static const int8_t ccb[16] = { ATAPI_REQUEST_SENSE, 0, 0, 0, >> - sizeof(struct atapi_sense), 0, 0, 0, 0, 0, 0, >> - 0, 0, 0, 0, 0 }; >> - >> - bcopy (ccb, request->u.atapi.ccb, sizeof ccb); >> - request->data = (caddr_t)&csio->sense_data; >> - request->bytecount = sizeof(struct atapi_sense); >> - request->transfersize = min(request->bytecount, 65534); >> - request->timeout = csio->ccb_h.timeout / 1000; >> - request->retries = 2; >> - request->flags = ATA_R_QUIET|ATA_R_ATAPI|ATA_R_IMMEDIATE; >> - hcb->flags |= AUTOSENSE; >> + static const int8_t ccb[16] = { ATAPI_REQUEST_SENSE, 0, 0, 0, >> + sizeof(struct atapi_sense), 0, 0, 0, 0, 0, 0, >> + 0, 0, 0, 0, 0 }; >> + >> + bcopy (ccb, request->u.atapi.ccb, sizeof ccb); >> + request->data = (caddr_t)&csio->sense_data; >> + request->bytecount = sizeof(struct atapi_sense); >> + request->transfersize = min(request->bytecount, 65534); >> + request->timeout = csio->ccb_h.timeout / 1000; >> + request->retries = 2; >> + request->flags = ATA_R_QUIET|ATA_R_ATAPI|ATA_R_IMMEDIATE; >> + hcb->flags |= AUTOSENSE; >> >> - ata_queue_request(request); >> - return; >> + ata_queue_request(request); >> + return; >> #else >> - /* >> - * Use auto-sense data from the ATA layer, if it has >> - * issued a REQUEST SENSE automatically and that operation >> - * returned without error. >> - */ >> - if (request->u.atapi.saved_cmd != 0 && request->error == 0) { >> - bcopy (&request->u.atapi.sense, &csio->sense_data, sizeof(struct atapi_sense)); >> - csio->ccb_h.status |= CAM_AUTOSNS_VALID; >> - } >> + /* The ATA driver has already requested sense for us. */ >> + if (request->error == 0) { >> + /* The ATA autosense suceeded. */ >> + bcopy (&request->u.atapi.sense, &csio->sense_data, sizeof(struct atapi_sense)); >> + csio->ccb_h.status |= CAM_AUTOSNS_VALID; >> } >> #endif >> } > My best guess is that request->u.atapi.saved_cmd isn't getting preserved when ata_completed() does an automatic REQUEST_SENSE. Not sure if this is true or why it would happen. But if that's the case, then CAM is going to manually request sense, which atapi-cam and ata will likely treat as a normal DMA capable command. Note that the autosense code in the ATA driver disables DMA for the REQUEST_SENSE command. This might be a key issue; the drive might be getting very unhappy with a DMA flagged REQUEST_SENSE command, especially if it's already in a CHECK_CONDITION state. This unhappiness might be leading to the interrupt storm and observed deadlock on UP system. With the patch above, sense info is reported to CAM regardless of the contents of saved_cmd, preventing CAM from generating the troublesome REQUEST_SENSE on its own. Oh hell, I know exactly what the problem is! The opcode for a TEST_UNIT_READY is 0x00. This is probably the command that is generating the CHECK_CONDITION. The test for saved_cmd is entirely bogus. What really needs to happen if for ATA to have an "autosense valid" flag in the request. But without that, the best that you can do is to just ignore the contents of saved_cmd and also zero out request->u.atapi.sense before issuing every command. Scott