From owner-svn-src-all@FreeBSD.ORG Tue Jul 1 06:23:50 2014 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 092BBE87; Tue, 1 Jul 2014 06:23:50 +0000 (UTC) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E04D72936; Tue, 1 Jul 2014 06:23:49 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.8/8.14.8) with ESMTP id s616NnHi081556; Tue, 1 Jul 2014 06:23:49 GMT (envelope-from scottl@svn.freebsd.org) Received: (from scottl@localhost) by svn.freebsd.org (8.14.8/8.14.8/Submit) id s616NmAx081549; Tue, 1 Jul 2014 06:23:48 GMT (envelope-from scottl@svn.freebsd.org) Message-Id: <201407010623.s616NmAx081549@svn.freebsd.org> From: Scott Long Date: Tue, 1 Jul 2014 06:23:48 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org Subject: svn commit: r268073 - in stable/10/sys/dev/isci: . scil X-SVN-Group: stable-10 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Jul 2014 06:23:50 -0000 Author: scottl Date: Tue Jul 1 06:23:48 2014 New Revision: 268073 URL: http://svnweb.freebsd.org/changeset/base/268073 Log: Merge r268024, 268025: Fix a case in ndling ATA_PASSTHROUGH commands that have an unaligned buffer. This impacts some home-rolled SMART tools. In rare cases, a SATA drive can stop responding to commands and trigger a reset device task request from the driver. If the drive fails to respond with a signature FIS, the driver would previously get into an endless retry loop, stalling all I/O to the drive and keeping user processes stranded. Instead, fail the i/o and invalidate the device if the task management command times out. This is controllable with the sysctl and tunable hw.isci.fail_on_task_timeout dev.isci.0.fail_on_task_timeout The default for these is 1. Obtained from: Netflix, Inc. Modified: stable/10/sys/dev/isci/isci.h stable/10/sys/dev/isci/isci_controller.c stable/10/sys/dev/isci/isci_sysctl.c stable/10/sys/dev/isci/isci_task_request.c stable/10/sys/dev/isci/scil/scic_sds_stp_request.c Modified: stable/10/sys/dev/isci/isci.h ============================================================================== --- stable/10/sys/dev/isci/isci.h Tue Jul 1 04:44:18 2014 (r268072) +++ stable/10/sys/dev/isci/isci.h Tue Jul 1 06:23:48 2014 (r268073) @@ -164,6 +164,7 @@ struct ISCI_CONTROLLER uint32_t initial_discovery_mask; BOOL is_frozen; BOOL release_queued_ccbs; + BOOL fail_on_task_timeout; uint8_t *remote_device_memory; struct ISCI_MEMORY cached_controller_memory; struct ISCI_MEMORY uncached_controller_memory; Modified: stable/10/sys/dev/isci/isci_controller.c ============================================================================== --- stable/10/sys/dev/isci/isci_controller.c Tue Jul 1 04:44:18 2014 (r268072) +++ stable/10/sys/dev/isci/isci_controller.c Tue Jul 1 06:23:48 2014 (r268073) @@ -300,6 +300,8 @@ SCI_STATUS isci_controller_initialize(st SCI_CONTROLLER_HANDLE_T scic_controller_handle; char led_name[64]; unsigned long tunable; + uint32_t io_shortage; + uint32_t fail_on_timeout; int i; scic_controller_handle = @@ -365,10 +367,12 @@ SCI_STATUS isci_controller_initialize(st * this io_shortage parameter, which will tell CAM that we have a * large queue depth than we really do. */ - uint32_t io_shortage = 0; + io_shortage = 0; TUNABLE_INT_FETCH("hw.isci.io_shortage", &io_shortage); controller->sim_queue_depth += io_shortage; + fail_on_timeout = 1; + TUNABLE_INT_FETCH("hw.isci.fail_on_task_timeout", &fail_on_timeout); /* Attach to CAM using xpt_bus_register now, then immediately freeze * the simq. It will get released later when initial domain discovery * is complete. Modified: stable/10/sys/dev/isci/isci_sysctl.c ============================================================================== --- stable/10/sys/dev/isci/isci_sysctl.c Tue Jul 1 04:44:18 2014 (r268072) +++ stable/10/sys/dev/isci/isci_sysctl.c Tue Jul 1 06:23:48 2014 (r268073) @@ -222,6 +222,24 @@ isci_sysctl_log_frozen_lun_masks(SYSCTL_ return (0); } +static int +isci_sysctl_fail_on_task_timeout(SYSCTL_HANDLER_ARGS) +{ + struct isci_softc *isci = (struct isci_softc *)arg1; + int32_t fail_on_timeout = 0; + int error, i; + + error = sysctl_handle_int(oidp, &fail_on_timeout, 0, req); + + if (error || fail_on_timeout == 0) + return (error); + + for (i = 0; i < isci->controller_count; i++) + isci->controllers[i].fail_on_task_timeout = fail_on_timeout; + + return (0); +} + void isci_sysctl_initialize(struct isci_softc *isci) { struct sysctl_ctx_list *sysctl_ctx = device_get_sysctl_ctx(isci->device); @@ -259,5 +277,10 @@ void isci_sysctl_initialize(struct isci_ "log_frozen_lun_masks", CTLTYPE_UINT| CTLFLAG_RW, isci, 0, isci_sysctl_log_frozen_lun_masks, "IU", "Log frozen lun masks to kernel log"); + + SYSCTL_ADD_PROC(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO, + "fail_on_task_timeout", CTLTYPE_UINT | CTLFLAG_RW, isci, 0, + isci_sysctl_fail_on_task_timeout, "IU", + "Fail a command that has encountered a task management timeout"); } Modified: stable/10/sys/dev/isci/isci_task_request.c ============================================================================== --- stable/10/sys/dev/isci/isci_task_request.c Tue Jul 1 04:44:18 2014 (r268072) +++ stable/10/sys/dev/isci/isci_task_request.c Tue Jul 1 06:23:48 2014 (r268073) @@ -206,8 +206,17 @@ isci_task_request_complete(SCI_CONTROLLE break; case SCI_FAILURE_TIMEOUT: - retry_task = TRUE; - isci_log_message(0, "ISCI", "task timeout - retrying\n"); + if (isci_controller->fail_on_task_timeout) { + retry_task = FALSE; + isci_log_message(0, "ISCI", + "task timeout - not retrying\n"); + scif_cb_domain_device_removed(isci_controller, + isci_remote_device->domain, isci_remote_device); + } else { + retry_task = TRUE; + isci_log_message(0, "ISCI", + "task timeout - retrying\n"); + } break; case SCI_TASK_FAILURE: Modified: stable/10/sys/dev/isci/scil/scic_sds_stp_request.c ============================================================================== --- stable/10/sys/dev/isci/scil/scic_sds_stp_request.c Tue Jul 1 04:44:18 2014 (r268072) +++ stable/10/sys/dev/isci/scil/scic_sds_stp_request.c Tue Jul 1 06:23:48 2014 (r268073) @@ -1222,6 +1222,7 @@ SCI_STATUS scic_sds_stp_request_pio_data length -= copy_length; sgl_offset += copy_length; data_offset += copy_length; + source_address += copy_length; #endif } }