From owner-freebsd-current@FreeBSD.ORG Thu Oct 14 17:20:54 2010 Return-Path: Delivered-To: Current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E646B106566B; Thu, 14 Oct 2010 17:20:53 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id E114A8FC1A; Thu, 14 Oct 2010 17:20:52 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA08296; Thu, 14 Oct 2010 20:20:49 +0300 (EEST) (envelope-from avg@freebsd.org) Message-ID: <4CB73BF1.1070400@freebsd.org> Date: Thu, 14 Oct 2010 20:20:49 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: "Sam Fourman Jr." References: <4CB5D5E1.9080505@freebsd.org> <4CB5FDC0.6000701@freebsd.org> In-Reply-To: <4CB5FDC0.6000701@freebsd.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: FreeBSD Current , Martin Matuska Subject: Re: Locked up nfsd after avg@ sendfile patch X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Oct 2010 17:20:54 -0000 on 13/10/2010 21:43 Andriy Gapon said the following: > Further walking child zio hierarchy we reach the one that looks like this: > $59 = {io_bookmark = {zb_objset = 400, zb_object = 0, zb_level = -1, zb_blkid = > 22437}, io_prop = {zp_checksum = ZIO_CHECKSUM_INHERIT, zp_compress = > ZIO_COMPRESS_INHERIT, zp_type = DMU_OT_NONE, > zp_level = 0 '\0', zp_ndvas = 0 '\0'}, io_type = ZIO_TYPE_WRITE, io_child_type > = ZIO_CHILD_VDEV, io_cmd = 0, io_priority = 0 '\0', io_reexecute = 0 '\0', > io_state = "\001", io_txg = 0, > io_spa = 0xffffff00056c6000, io_bp = 0xffffff01acdbaa30, io_bp_copy = {blk_dva = > {{dva_word = {12884902144, 1678614837}}, {dva_word = {0, 0}}, {dva_word = {0, > 0}}}, blk_prop = 9225910817809957119, > blk_pad = {0, 0, 0}, blk_birth = 236695, blk_fill = 0, blk_cksum = {zc_word = > {15569186404091016741, 3408946246337318984, 400, 22437}}}, io_parent_list = > {list_size = 48, list_offset = 16, > list_head = {list_next = 0xffffff000826b7c0, list_prev = 0xffffff000826b7c0}}, > io_child_list = {list_size = 48, list_offset = 32, list_head = {list_next = > 0xffffff00080aca98, > list_prev = 0xffffff00080aca98}}, io_walk_link = 0x0, io_logical = > 0xffffff0008b8d660, io_transform_stack = 0x0, io_ready = 0, io_done = > 0xffffffff80b99ab0 , > io_private = 0xffffff00b5f469a8, io_bp_orig = {blk_dva = {{dva_word = > {12884902144, 1678614837}}, {dva_word = {0, 0}}, {dva_word = {0, 0}}}, blk_prop = > 9225910817809957119, blk_pad = {0, 0, 0}, > blk_birth = 236695, blk_fill = 0, blk_cksum = {zc_word = > {15569186404091016741, 3408946246337318984, 400, 22437}}}, io_data = > 0xffffff80e6565000, io_size = 131072, io_vd = 0xffffff00084cd000, > io_vsd = 0x0, io_vsd_free = 0, io_offset = 859454990848, io_deadline = 20883, > io_offset_node = {avl_child = {0x0, 0x0}, avl_pcb = 18446742974333891893}, > io_deadline_node = {avl_child = {0x0, 0x0}, > avl_pcb = 1}, io_vdev_tree = 0xffffff00084cd578, io_flags = 179, io_stage = > ZIO_STAGE_VDEV_IO_START, io_pipeline = 47104, io_orig_flags = 131, io_orig_stage = > ZIO_STAGE_READY, > io_orig_pipeline = 47104, io_error = 0, io_child_error = {0, 0, 0}, io_children > = {{0, 0}, {0, 0}, {0, 0}}, io_stall = 0x0, io_gang_leader = 0x0, io_gang_tree = 0x0, > io_executor = 0xffffff000875a8a0, io_waiter = 0x0, io_lock = {lock_object = > {lo_name = 0xffffffff80c29a8b "zio->io_lock", lo_flags = 40960000, lo_data = 0, > lo_witness = 0x0}, sx_lock = 1}, > io_cv = {cv_description = 0xffffffff80c29a9a "zio->io_cv)", cv_waiters = 0}, > io_ena = 0, io_task = {ost_task = {ta_running = 0x0, ta_link = {stqe_next = 0x0}, > ta_pending = 0, ta_priority = 0, > ta_func = 0, ta_context = 0x0}, ost_func = 0, ost_arg = 0x0, ost_magic = 0}} So, after some more investigation, it looks like this zio is genuinely stuck, because its bio is stuck in geom because its ccb/command is stuck in arcmsr. Looks like the driver (controller/firmware) isn't processing any more requests. Perhaps a hardware issue, but I reckon that the driver should have detected the situation, timed out the commands and reset the hardware (if needed). Anyway, it looks that this is not related to ZFS[*]. Maybe firmware and BIOS should be updated, maybe hardware replaced. [*] Perhaps ZFS should have its own zio timeout mechanism. And/or GEOM. And/or peripheral or transport layer of CAM. But, IMO, the SIM drivers must have it. -- Andriy Gapon