Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Dec 2000 03:27:19 GMT
From:      Tor Egge <tegge@cvsup.no.freebsd.org>
To:        FreeBSD-gnats-submit@freebsd.org
Subject:   kern/23538: ata device driver fails to abort queued commands when device disappears
Message-ID:  <200012140327.eBE3RJ501463@c2h5oh.idi.ntnu.no>
Resent-Message-ID: <200012140330.eBE3U1R02094@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         23538
>Category:       kern
>Synopsis:       ata device driver fails to abort queued commands when device disappears
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Dec 13 19:30:01 PST 2000
>Closed-Date:
>Last-Modified:
>Originator:     Tor Egge
>Release:        FreeBSD 4.2-RELEASE i386
>Organization:
Fast Search & Transfer ASA
>Environment:

FreeBSD c2h5oh.idi.ntnu.no 4.2-RELEASE FreeBSD 4.2-RELEASE #0: Fri Nov 24 15:04:56 GMT 2000     root@c2h5oh.idi.ntnu.no:/usr/src/sys/compile/VINUM  i386

atapci0: <Intel PIIX4 ATA33 controller> port 0xffa0-0xffaf at device 7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
ad0: 29314MB <IBM-DTLA-307030> [59560/16/63] at ata1-master UDMA33
ad2: 29314MB <IBM-DTLA-307030> [59560/16/63] at ata1-master UDMA33

>Description:

When a drive completely hangs, it apparently partially disappears from the
ata configuration while resetting the devices.

	ad0: READ command timeout tag=0 serv=0 - resetting
	ata0: resetting devices .. done
	ad0: READ command timeout tag=0 serv=0 - resetting
	ata0: resetting devices .. done
	ad0: READ command timeout tag=0 serv=0 - resetting
	ad0: READ command timeout tag=0 serv=0 - resetting
	ad0: READ command timeout tag=0 serv=0 - resetting
	ata0-master: timeout waiting for command=ef s=00 e=00
	ad0: trying fallback to PIO mode
	ad0: READ command timeout tag=0 serv=0 - resetting
	ad0: READ command timeout tag=0 serv=0 - resetting
	ad0: READ command timeout tag=0 serv=0 - resetting
	vinum2.p0.s0: fatal read I/O error
	vinum: vinum2.p0.s0 is crashed by force
	vinum: vinum2.p0 is faulty

(kgdb) print *(struct ata_softc *) (((device_t) ata_devclass->devices[0])->softc)
$47 = {dev = 0xc3286780, channel = 0, r_io = 0xc3282700, r_altio = 0xc3282680, 
  r_bmio = 0xc3286804, r_irq = 0xc3282600, ih = 0xc0e6e920, ioaddr = 496, 
  altioaddr = 1014, bmaddr = 65440, chiptype = 1896972422, alignment = 1, 
  dev_param = {0xc3284200, 0x0}, dev_softc = {0xc331b400, 0x0}, mode = {0, 0}, 
  flags = 16, devices = 0, status = 0 '\000', error = 0 '\000', active = 0, 
  ata_queue = {tqh_first = 0x0, tqh_last = 0xc32866d8}, atapi_queue = {
    tqh_first = 0x0, tqh_last = 0xc32866e0}, running = 0x0}
(kgdb) print *((struct ad_softc *) ((struct ata_softc *) (((device_t) ata_devclass->devices[0])->softc))->dev_softc[0])
$48 = {controller = 0xc3286680, unit = 0, lun = 0, total_secs = 60036480, 
  heads = 16 '\020', sectors = 63 '?', transfersize = 8192, num_tags = 0, 
  flags = 2, tags = {0x0 <repeats 32 times>}, outstanding = -41300580, 
  queue = {queue = {tqh_first = 0xc4d5ae20, tqh_last = 0xc4d59638}, 
    last_pblkno = 21131160, insert_point = 0x0, switch_point = 0xcb30a2d8}, 
  stats = {dev_links = {stqe_next = 0xc331b0b8}, device_number = 1, 
    device_name = "ad", '\000' <repeats 13 times>, unit_number = 0, 
    bytes_read = 176262452224, bytes_written = 14137288704, bytes_freed = 0, 
    num_reads = 40563316, num_writes = 737265, num_frees = 0, num_other = 0, 
    busy_count = 7, block_size = 512, tag_types = {0, 0, 0}, 
    dev_creation_time = {tv_sec = 0, tv_usec = 30618}, busy_time = {
      tv_sec = 204575, tv_usec = 389409}, start_time = {tv_sec = 1625444, 
      tv_usec = 220943}, last_comp_time = {tv_sec = 1626835, 
      tv_usec = 368335}, flags = DEVSTAT_NO_ORDERED_TAGS, 
    device_type = DEVSTAT_TYPE_IF_IDE, priority = DEVSTAT_PRIORITY_DISK}, 
  disk = {d_flags = 0, d_dsflags = 0, d_devsw = 0xc02d8720, 
    d_dev = 0xc3318380, d_slice = 0xc3349800, d_label = {d_magic = 0, 
      d_type = 0, d_subtype = 0, d_typename = '\000' <repeats 15 times>, 
      d_un = {un_d_packname = '\000' <repeats 15 times>, un_b = {
          un_d_boot0 = 0x0, un_d_boot1 = 0x0}}, d_secsize = 512, 
      d_nsectors = 63, d_ntracks = 16, d_ncylinders = 59560, 
      d_secpercyl = 1008, d_secperunit = 60036480, d_sparespertrack = 0, 
      d_sparespercyl = 0, d_acylinders = 0, d_rpm = 0, d_interleave = 0, 
      d_trackskew = 0, d_cylskew = 0, d_headswitch = 0, d_trkseek = 0, 
      d_flags = 0, d_drivedata = {0, 0, 0, 0, 0}, d_spare = {0, 0, 0, 0, 0}, 
      d_magic2 = 0, d_checksum = 0, d_npartitions = 0, d_bbsize = 0, 
      d_sbsize = 0, d_partitions = {{p_size = 0, p_offset = 0, p_fsize = 0, 
          p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, 
            sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, 
          p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, 
            sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, 
          p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, 
            sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, 
          p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, 
            sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, 
          p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, 
            sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, 
          p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, 
            sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, 
          p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, 
            sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, 
          p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, 
            sgs = 0}}}}}, dev1 = 0xc3318480, dev2 = 0xc3318380}


Note that devices is now 0 on ata0, thus queued commands for ad0 are
never removed from the queue.  This is bad, since access to other
vinum drives on the same physical disk will now never fail, just block
infinitely.

bash-2.04$ ps axl  -N kernel.4 -M vmcore.4 
  UID   PID  PPID CPU PRI NI   VSZ  RSS WCHAN  STAT  TT       TIME COMMAND
    0   238     1   1   3  0   928    0 ttyin  Is+  #C5    0:00.01  (getty)
    0   245     1   1   3  0   928    0 ttyin  Is+  #C2    0:00.01  (getty)
    0   244     1   0   3  0   928    0 ttyin  Is+  #C2    0:00.01  (getty)
    0   243     1   0   3  0   928    0 ttyin  Is+  #C2    0:00.01  (getty)
    0   242     1   1   3  0   928    0 ttyin  Is+  #C2    0:00.02  (getty)
    0   241     1   1   3  0   928    0 ttyin  Is+  #C2    0:00.02  (getty)
    0   240     1   0   3  0   928    0 ttyin  Is+  #C2    0:00.02  (getty)
    0   239     1   0   3  0   928    0 ttyin  Is+  #C2    0:00.02  (getty)
    0     0     0   0 -18  0     0    0 sched  DLs   ??    0:02.45  (swapper)
    0     1     0   0  10  0   528    0 wait   ILs   ??    0:00.43  (init)
    0     2     0   0 -18  0     0    0 psleep DL    ??   13:22.04  (pagedaemon)
    0     3     0   0  18  0     0    0 psleep DL    ??    0:00.00  (vmdaemon)
    0     4     0   0 -18  0     0    0 psleep DL    ??    0:04.34  (bufdaemon)
    0     5     0   0  -2  0     0    0 getblk DL    ??  105:11.54  (syncer)
    0    63     1   0  -6  0   596    0 biowr  DLs   ??    0:00.01  (vinum)
    0   166     1   0  -2  0   924    0 ffsfsn Ds    ??    0:26.47  (syslogd)
    0   173     1   0  -6 -12  1260    0 biord  D<s   ??    3:38.62  (ntpd)
    0   194     1   0  -6  0   972    0 biord  Ds    ??    0:19.50  (cron)
    0   197     1   0 -18  0  2096    0 spread DLs   ??    0:19.77  (sshd)
 1001   232     1   0  -6  0  1636    0 biord  Ds    ??    0:35.23  (cvsupd)
 1001 53151   232  28  -6  0  2740    0 biord  D     ??    2:12.81  (cvsupd)
 1001 53155   232   1  -6  0  3352    0 biord  D     ??    0:34.44  (cvsupd)
 1001 53159   232   2 -14  0  3084    0 inode  D     ??    0:05.40  (cvsupd)

>How-To-Repeat:

Use 2 IDE disks (one slightly bad), use disk partitioning to get
more than one vinum logical drive on each physical disk.

>Fix:

Change ata_reinit to check for scp->devices being changed during the
ata_reset call and flush the request queues for the 'gone' devices by
setting b_error to ENXIO, setting the B_ERROR bit in b_flags and calling
biodone.

Change adstrategy to check for the device being 'gone' and return
ENXIO at once if so.

>Release-Note:
>Audit-Trail:
>Unformatted:


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200012140327.eBE3RJ501463>