From owner-freebsd-bugs Wed Dec 13 19:30:10 2000 From owner-freebsd-bugs@FreeBSD.ORG Wed Dec 13 19:30:01 2000 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from freefall.freebsd.org (freefall.FreeBSD.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id D947737B698 for ; Wed, 13 Dec 2000 19:30:01 -0800 (PST) Received: (from gnats@localhost) by freefall.freebsd.org (8.11.1/8.11.1) id eBE3U1R02094; Wed, 13 Dec 2000 19:30:01 -0800 (PST) (envelope-from gnats) Resent-Date: Wed, 13 Dec 2000 19:30:01 -0800 (PST) Resent-Message-Id: <200012140330.eBE3U1R02094@freefall.freebsd.org> Resent-From: gnats-admin@FreeBSD.org (GNATS Management) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: gnats-admin@FreeBSD.org, tegge@cvsup.no.freebsd.org Received: from midten.fast.no (midten.fast.no [213.188.8.11]) by hub.freebsd.org (Postfix) with ESMTP id F359E37B699 for ; Wed, 13 Dec 2000 19:27:21 -0800 (PST) Received: from c2h5oh.idi.ntnu.no (c2h5oh.idi.ntnu.no [129.241.103.69]) by midten.fast.no (8.9.3/8.9.3) with ESMTP id EAA85717 for ; Thu, 14 Dec 2000 04:27:20 +0100 (CET) Received: (from tegge@localhost) by c2h5oh.idi.ntnu.no (8.11.1/8.11.1) id eBE3RJ501463; Thu, 14 Dec 2000 03:27:19 GMT (envelope-from tegge) Message-Id: <200012140327.eBE3RJ501463@c2h5oh.idi.ntnu.no> Date: Thu, 14 Dec 2000 03:27:19 GMT From: Tor Egge Reply-To: tegge@cvsup.no.freebsd.org To: FreeBSD-gnats-submit@freebsd.org X-Send-Pr-Version: 3.2 Subject: kern/23538: ata device driver fails to abort queued commands when device disappears Resent-Sender: gnats@FreeBSD.org Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >Number: 23538 >Category: kern >Synopsis: ata device driver fails to abort queued commands when device disappears >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Dec 13 19:30:01 PST 2000 >Closed-Date: >Last-Modified: >Originator: Tor Egge >Release: FreeBSD 4.2-RELEASE i386 >Organization: Fast Search & Transfer ASA >Environment: FreeBSD c2h5oh.idi.ntnu.no 4.2-RELEASE FreeBSD 4.2-RELEASE #0: Fri Nov 24 15:04:56 GMT 2000 root@c2h5oh.idi.ntnu.no:/usr/src/sys/compile/VINUM i386 atapci0: port 0xffa0-0xffaf at device 7.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 ad0: 29314MB [59560/16/63] at ata1-master UDMA33 ad2: 29314MB [59560/16/63] at ata1-master UDMA33 >Description: When a drive completely hangs, it apparently partially disappears from the ata configuration while resetting the devices. ad0: READ command timeout tag=0 serv=0 - resetting ata0: resetting devices .. done ad0: READ command timeout tag=0 serv=0 - resetting ata0: resetting devices .. done ad0: READ command timeout tag=0 serv=0 - resetting ad0: READ command timeout tag=0 serv=0 - resetting ad0: READ command timeout tag=0 serv=0 - resetting ata0-master: timeout waiting for command=ef s=00 e=00 ad0: trying fallback to PIO mode ad0: READ command timeout tag=0 serv=0 - resetting ad0: READ command timeout tag=0 serv=0 - resetting ad0: READ command timeout tag=0 serv=0 - resetting vinum2.p0.s0: fatal read I/O error vinum: vinum2.p0.s0 is crashed by force vinum: vinum2.p0 is faulty (kgdb) print *(struct ata_softc *) (((device_t) ata_devclass->devices[0])->softc) $47 = {dev = 0xc3286780, channel = 0, r_io = 0xc3282700, r_altio = 0xc3282680, r_bmio = 0xc3286804, r_irq = 0xc3282600, ih = 0xc0e6e920, ioaddr = 496, altioaddr = 1014, bmaddr = 65440, chiptype = 1896972422, alignment = 1, dev_param = {0xc3284200, 0x0}, dev_softc = {0xc331b400, 0x0}, mode = {0, 0}, flags = 16, devices = 0, status = 0 '\000', error = 0 '\000', active = 0, ata_queue = {tqh_first = 0x0, tqh_last = 0xc32866d8}, atapi_queue = { tqh_first = 0x0, tqh_last = 0xc32866e0}, running = 0x0} (kgdb) print *((struct ad_softc *) ((struct ata_softc *) (((device_t) ata_devclass->devices[0])->softc))->dev_softc[0]) $48 = {controller = 0xc3286680, unit = 0, lun = 0, total_secs = 60036480, heads = 16 '\020', sectors = 63 '?', transfersize = 8192, num_tags = 0, flags = 2, tags = {0x0 }, outstanding = -41300580, queue = {queue = {tqh_first = 0xc4d5ae20, tqh_last = 0xc4d59638}, last_pblkno = 21131160, insert_point = 0x0, switch_point = 0xcb30a2d8}, stats = {dev_links = {stqe_next = 0xc331b0b8}, device_number = 1, device_name = "ad", '\000' , unit_number = 0, bytes_read = 176262452224, bytes_written = 14137288704, bytes_freed = 0, num_reads = 40563316, num_writes = 737265, num_frees = 0, num_other = 0, busy_count = 7, block_size = 512, tag_types = {0, 0, 0}, dev_creation_time = {tv_sec = 0, tv_usec = 30618}, busy_time = { tv_sec = 204575, tv_usec = 389409}, start_time = {tv_sec = 1625444, tv_usec = 220943}, last_comp_time = {tv_sec = 1626835, tv_usec = 368335}, flags = DEVSTAT_NO_ORDERED_TAGS, device_type = DEVSTAT_TYPE_IF_IDE, priority = DEVSTAT_PRIORITY_DISK}, disk = {d_flags = 0, d_dsflags = 0, d_devsw = 0xc02d8720, d_dev = 0xc3318380, d_slice = 0xc3349800, d_label = {d_magic = 0, d_type = 0, d_subtype = 0, d_typename = '\000' , d_un = {un_d_packname = '\000' , un_b = { un_d_boot0 = 0x0, un_d_boot1 = 0x0}}, d_secsize = 512, d_nsectors = 63, d_ntracks = 16, d_ncylinders = 59560, d_secpercyl = 1008, d_secperunit = 60036480, d_sparespertrack = 0, d_sparespercyl = 0, d_acylinders = 0, d_rpm = 0, d_interleave = 0, d_trackskew = 0, d_cylskew = 0, d_headswitch = 0, d_trkseek = 0, d_flags = 0, d_drivedata = {0, 0, 0, 0, 0}, d_spare = {0, 0, 0, 0, 0}, d_magic2 = 0, d_checksum = 0, d_npartitions = 0, d_bbsize = 0, d_sbsize = 0, d_partitions = {{p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}}}}, dev1 = 0xc3318480, dev2 = 0xc3318380} Note that devices is now 0 on ata0, thus queued commands for ad0 are never removed from the queue. This is bad, since access to other vinum drives on the same physical disk will now never fail, just block infinitely. bash-2.04$ ps axl -N kernel.4 -M vmcore.4 UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND 0 238 1 1 3 0 928 0 ttyin Is+ #C5 0:00.01 (getty) 0 245 1 1 3 0 928 0 ttyin Is+ #C2 0:00.01 (getty) 0 244 1 0 3 0 928 0 ttyin Is+ #C2 0:00.01 (getty) 0 243 1 0 3 0 928 0 ttyin Is+ #C2 0:00.01 (getty) 0 242 1 1 3 0 928 0 ttyin Is+ #C2 0:00.02 (getty) 0 241 1 1 3 0 928 0 ttyin Is+ #C2 0:00.02 (getty) 0 240 1 0 3 0 928 0 ttyin Is+ #C2 0:00.02 (getty) 0 239 1 0 3 0 928 0 ttyin Is+ #C2 0:00.02 (getty) 0 0 0 0 -18 0 0 0 sched DLs ?? 0:02.45 (swapper) 0 1 0 0 10 0 528 0 wait ILs ?? 0:00.43 (init) 0 2 0 0 -18 0 0 0 psleep DL ?? 13:22.04 (pagedaemon) 0 3 0 0 18 0 0 0 psleep DL ?? 0:00.00 (vmdaemon) 0 4 0 0 -18 0 0 0 psleep DL ?? 0:04.34 (bufdaemon) 0 5 0 0 -2 0 0 0 getblk DL ?? 105:11.54 (syncer) 0 63 1 0 -6 0 596 0 biowr DLs ?? 0:00.01 (vinum) 0 166 1 0 -2 0 924 0 ffsfsn Ds ?? 0:26.47 (syslogd) 0 173 1 0 -6 -12 1260 0 biord DHow-To-Repeat: Use 2 IDE disks (one slightly bad), use disk partitioning to get more than one vinum logical drive on each physical disk. >Fix: Change ata_reinit to check for scp->devices being changed during the ata_reset call and flush the request queues for the 'gone' devices by setting b_error to ENXIO, setting the B_ERROR bit in b_flags and calling biodone. Change adstrategy to check for the device being 'gone' and return ENXIO at once if so. >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message