From owner-freebsd-bugs Wed Oct 18 8:10:11 2000 Delivered-To: freebsd-bugs@freebsd.org Received: from freefall.freebsd.org (freefall.FreeBSD.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id DA71A37B4F9 for ; Wed, 18 Oct 2000 08:10:01 -0700 (PDT) Received: (from gnats@localhost) by freefall.freebsd.org (8.9.3/8.9.2) id IAA54769; Wed, 18 Oct 2000 08:10:01 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: by hub.freebsd.org (Postfix, from userid 32767) id 5946F37B4E5; Wed, 18 Oct 2000 08:05:56 -0700 (PDT) Message-Id: <20001018150556.5946F37B4E5@hub.freebsd.org> Date: Wed, 18 Oct 2000 08:05:56 -0700 (PDT) From: rasmith@aristotle.tamu.edu To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-1.0 Subject: kern/22086: DMA errors during intensive disk activity on vinum volume Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >Number: 22086 >Category: kern >Synopsis: DMA errors during intensive disk activity on vinum volume >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Oct 18 08:10:01 PDT 2000 >Closed-Date: >Last-Modified: >Originator: Robin Smith >Release: 4.1.1-RELEASE #1 >Organization: Dept. of Philosophy, Texas A&M University >Environment: FreeBSD test-drive.tamu.edu 4.1.1-RELEASE FreeBSD 4.1.1-RELEASE #1: Tue Oct 10 18:09:51 CDT 2000 root@test-drive.tamu.edu:/usr/src/sys/compile/TEST-DRIVE i386 >Description: System is a dual-PIII, 512MB, 4 AT disks (see below); a raid5 vinum volume is mounted as /usr (striped across all four disks). Heavy access to a vinum raid5 device causes DMA errors; if the system does not recover (i.e. a sync fails after multiple attempts), a reboot is triggered. The initial error message is of the following form: Oct 18 08:36:19 test-drive /kernel: ata0-slave: zero length DMA transfer attempted This message may mention either controller. If there is no recovery, then either the terminal hangs indefinitely or the system restarts. From a hang caused by 'du /usr' (which worked its way to /usr/src/games before giving up: Oct 18 08:36:19 test-drive /kernel: ata0-slave: zero length DMA transfer attempted Oct 18 08:36:29 test-drive /kernel: ad1: WRITE command timeout - resetting Oct 18 08:36:29 test-drive /kernel: ata0: resetting devices .. done Oct 18 08:36:29 test-drive /kernel: ata0-slave: zero length DMA transfer attempted The error message "zero length DMA transfer attempted" is found in / >How-To-Repeat: any intensive disk activity on the vinum raid5 volume triggers it. A good choice is 'du /usr'. Here is some tracking information on the error message "zero length DMA transfer" 512: grep -C10 "zero length DMA transfer" /usr/src/sys/dev/ata/ata-dma.c ata_dmasetup(struct ata_softc *scp, int32_t device, int8_t *data, int32_t count, int32_t flags) ### ^^^^ ARGUMENT 4 { struct ata_dmaentry *dmatab; u_int32_t dma_count, dma_base; int i = 0; if (((uintptr_t)data & 1) || (count & 1)) return -1; ##### FOR COUNT=0 if (!count) { ata_printf(scp, device, "zero length DMA transfer attempted\n"); return -1; } dmatab = scp->dmatab[ATA_DEV(device)]; dma_base = vtophys(data); dma_count = min(count, (PAGE_SIZE - ((uintptr_t)data & PAGE_MASK))); data += dma_count; count -= dma_count; while (count) { Called by: 516: grep dmasetup /usr/src/sys/dev/ata/* |less /usr/src/sys/dev/ata/ata-all.h:int32_t ata_dmasetup(struct ata_softc *, int32_t, int8_t *, int32_t, int32_t); /usr/src/sys/dev/ata/ata-disk.c: !ata_dmasetup(adp->controller, adp-> unit, /usr/src/sys/dev/ata/ata-dma.c:ata_dmasetup(struct ata_softc *scp, int32_t devic e, /usr/src/sys/dev/ata/ata-dma.c:ata_dmasetup(struct ata_softc *scp, int32_t devic e, /usr/src/sys/dev/ata/atapi-all.c: !ata_dmasetup(atp->controller, atp->unit , In ad_transfer(struct ad_request *request): 524: grep -C6 -n dmasetup /usr/src/sys/dev/ata/ata-disk.c 394- 395- devstat_start_transaction(&adp->stats); 396- 397- /* does this drive & transfer work with DMA ? */ 398- request->flags &= ~ADR_F_DMA_USED; 399- if ((adp->controller->mode[ATA_DEV(adp->unit)] >= ATA_DMA) && 400: !ata_dmasetup(adp->controller, adp->unit, 401- (void *)request->data, request->bytecount, ####### ^^^^^^^^^ = count #### 402- (request->flags & ADR_F_READ))) { 403- request->flags |= ADR_F_DMA_USED; 404- cmd = request->flags&ADR_F_READ ? ATA_C_READ_DMA : ATA_C_WRITE_DMA; 405- request->currentsize = request->bytecount; 406- } The only call to ata_dmasetup in ata-disk is from ad_transfer. There are two calls to ad_transfer: At line 570: else { request->bytecount -= request->currentsize; request->donecount += request->currentsize; if (request->bytecount > 0) { ad_transfer(request); return ATA_OP_CONTINUES; } At line 290: while (request.bytecount > 0) { ad_transfer(&request); Both are protected by comparisons; bytecount==0 should never happen. root@test-drive [/usr/home/rasmith] 525: grep -C6 -n dmasetup /usr/src/sys/dev/ata/atapi-all.c 254- if ((atp->controller->mode[ATA_DEV(atp->unit)] >= ATA_DMA) && 255- (request->ccb[0] == ATAPI_READ || 256- request->ccb[0] == ATAPI_READ_BIG || 257- ((request->ccb[0] == ATAPI_WRITE || 258- request->ccb[0] == ATAPI_WRITE_BIG) && 259- !(atp->controller->flags & ATA_ATAPI_DMA_RO))) && 260: !ata_dmasetup(atp->controller, atp->unit, 261- (void *)request->data, request->bytecount, ########### ^^^^^^^^^^^^^ = count #### 262- request->flags & ATPR_F_READ)) { 263- request->flags |= ATPR_F_DMA_USED; 264- } 265- 266- /* start ATAPI operation */ 518: cat /var/log/mount.today /dev/ad0s1a / ufs rw 1 1 /dev/ad0s1e /mnt/tmp ufs rw 2 2 /dev/ad0s1f /var ufs rw 2 2 /dev/vinum/raid5 /usr ufs rw 2 2 procfs /proc procfs rw 0 0 520: cat /var/log/dmesg.today Copyright (c) 1992-2000 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.1.1-RELEASE #1: Tue Oct 10 18:09:51 CDT 2000 root@test-drive.tamu.edu:/usr/src/sys/compile/TEST-DRIVE Timecounter "i8254" frequency 1193182 Hz CPU: Pentium III/Pentium III Xeon/Celeron (501.14-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x683 Stepping = 3 Features=0x383fbff real memory = 536870912 (524288K bytes) avail memory = 518524928 (506372K bytes) Programming 24 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 IOAPIC #0 intpin 16 -> irq 5 IOAPIC #0 intpin 17 -> irq 11 IOAPIC #0 intpin 18 -> irq 10 IOAPIC #0 intpin 19 -> irq 9 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec00000 Preloaded elf kernel "kernel" at 0xc041d000. Pentium Pro MTRR support enabled md0: Malloc disk npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 pcib2: at device 1.0 on pci0 pci1: on pcib2 pci1: at 0.0 irq 11 isab0: at device 7.0 on pci0 isa0: on isab0 atapci0: port 0xffa0-0xffaf at device 7.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 uhci0: port 0xef80-0xef9f irq 9 at devic e 7.2 on pci0 usb0: on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered Timecounter "PIIX" frequency 3579545 Hz chip1: port 0x440-0x44f at device 7.3 on pci0 xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xec00-0xec7f mem 0xfebfef80-0xfebfef ff irq 5 at device 15.0 on pci0 xl0: Ethernet address: 00:01:02:be:6d:85 miibus0: on xl0 xlphy0: <3Com internal media interface> on miibus0 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl1: <3Com 3c905B-TX Fast Etherlink XL> port 0xe480-0xe4ff mem 0xfebfef00-0xfebfef 7f irq 11 at device 16.0 on pci0 xl1: Ethernet address: 00:01:02:be:6d:41 miibus1: on xl1 xlphy1: <3Com internal media interface> on miibus1 xlphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto ahc0: port 0xe800-0xe8ff mem 0xfebff000-0xfe bfffff irq 10 at device 18.0 on pci0 aic7892: Wide Channel A, SCSI Id=7, 32/255 SCBs pcib1: on motherboard pci2: on pcib1 fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold atkbdc0: at port 0x60,0x64 on isa0 atkbd0: flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: irq 12 on atkbdc0 psm0: model MouseMan+, device ID 0 vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A ppc0: at port 0x378-0x37f irq 7 on isa0 lip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 APIC_IO: Testing 8254 interrupt delivery APIC_IO: routing 8254 via IOAPIC #0 intpin 2 SMP: AP CPU #1 Launched! ad0: 29314MB [59560/16/63] at ata0-master using UDMA33 ad1: 29314MB [59560/16/63] at ata0-slave using UDMA33 ad2: 29314MB [59560/16/63] at ata1-master using UDMA33 ad3: 29314MB [59560/16/63] at ata1-slave using UDMA33 Waiting 15 seconds for SCSI devices to settle Mounting root from ufs:/dev/ad0s1a WARNING: / was not properly dismounted vinum: loaded vinum: reading configuration from /dev/ad3s1e vinum: updating configuration from /dev/ad2s1e vinum: updating configuration from /dev/ad1s1e vinum: updating configuration from /dev/ad0s1h cd0 at ahc0 bus 0 target 4 lun 0 cd0: Removable CD-ROM SCSI-2 device cd0: 20.000MB/s transfers (20.000MHz, offset 16) cd0: cd present [328499 x 2048 byte records] >Fix: None known >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message