Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Apr 1997 02:04:50 -0400 (EDT)
From:      Brian Tao <taob@nbc.netcom.ca>
To:        freebsd-scsi@freebsd.org
Cc:        freebsd-current@freebsd.org
Subject:   "Data overrun" with 3.0-SNAP, 2940UW controllers
Message-ID:  <Pine.GSO.3.95.970419233210.11036M-100000@tor-adm1.nbc.netcom.ca>

next in thread | raw e-mail | index | archive | help
    I'm stress testing a new NFS server with 2.0-970209-SNAP to see
how it deals with having a couple of Adaptec 2940UW controllers.
About half an hour into the tests, the machine appears to have crashed
(no response to pings), and I don't have physical access to the
machine right now.  :(

    The system is a P200 with two 2940UW's and a Buslogic BT-946C with
no devices attached to it (it will be dedicated to a DLT drive in the
future).  The first Adaptec has four 4GB Seagate ST34371W's and the
second has three.  The first is a boot drive and the remaining six are
striped together (alternating controllers) into ccd0.

    The tests consist of four scripts:  replicate the entire FreeBSD
source tree into a new directory and then delete it, touch 10000
randomly named files in a directory and delete them, dd /dev/zero into
files with random block sizes, and run bonnie.  Five copies of each
script are run concurrently.  Only three of the drives are shown here,
but the others have identical stats:

% iostat -w5 sd1 sd2 sd3
      tty           sd1           sd2           sd3         cpu
 tin tout  sps tps msps  sps tps msps  sps tps msps us ni sy in id
   0   19  192   7  0.0  193   8  0.0  192   7  0.0  1  0  3  0 96
   0   25 4359 159  0.0 4332 157  0.0 4470 159  0.0  1  0 65  6 28
   0   25 4525 155  0.0 4601 159  0.0 4414 151  0.0  1  0 66  5 28
   0   25 4450 159  0.0 4378 157  0.0 4492 158  0.0  1  0 63  6 29
   0   25 4825 183  0.0 4681 181  0.0 4690 177  0.0  1  0 70  7 23
   0   25 4521 173  0.0 4529 174  0.0 4622 174  0.0  0  0 73  6 20
   0   43 4664 180  0.0 4890 191  0.0 4547 175  0.0  0  0 66  7 27
   0   25 4246 152  0.0 4207 154  0.0 4484 161  0.0  1  0 67  5 27
   0   25 4204 157  0.0 4279 159  0.0 3955 147  0.0  2  0 59  6 34
   0   25 4596 174  0.0 4584 179  0.0 4611 176  0.0  0  0 73  6 21
   0   25 4454 158  0.0 4490 160  0.0 4497 160  0.0  0  0 66  6 28


    These messages started showing up in the syslog during the tests.
I tried running with /dev/ccd0c sync and async:  the messages appeared
for both.  No other syslog messages were logged other than ones
produced purposely by the scripts themselves to track their progress.
Is there anything I can do to correct this problem (which I believe is
related to the crash)?  I'm going to try a pre-2.2 release as most
people have suggested to see if that fixes the problems.  dmesg output
is included below as well.

% fgrep 'data overrun' /var/log/messages
Apr 19 23:22:51 nfs /kernel: sd5: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:22:53 nfs /kernel: sd4: data overrun of 4068 bytes detected.  Forcing a retry.
Apr 19 23:24:49 nfs /kernel: sd6: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:24:53 nfs /kernel: sd2: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:24:58 nfs /kernel: sd6: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:25:18 nfs /kernel: sd5: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:26:02 nfs /kernel: sd1: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:26:57 nfs /kernel: sd1: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:27:22 nfs /kernel: sd3: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:27:50 nfs /kernel: sd5: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:28:12 nfs /kernel: sd6: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:29:02 nfs /kernel: sd2: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:30:00 nfs /kernel: sd3: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:30:18 nfs /kernel: sd4: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:30:51 nfs /kernel: sd5: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:31:19 nfs /kernel: sd4: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:31:26 nfs /kernel: sd6: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:32:01 nfs /kernel: sd4: data overrun of 484 bytes detected.  Forcing a retry.
Apr 19 23:33:30 nfs /kernel: sd2: data overrun of 484 bytes detected.  Forcing a retry.
[...]
Apr 19 23:35:28 nfs /kernel: sd2: data overrun of 497 bytes detected.  Forcing a retry.
Apr 19 23:35:29 nfs /kernel: sd2: data overrun of 484 bytes detected.  Forcing a retry.


FreeBSD 3.0-970209-SNAP #0: Sat Apr 19 16:08:33 EDT 1997
    root@nfs:/usr/depot/src/sys/compile/NFS
Calibrating clock(s) relative to mc146818A clock ... i586 clock: 199426620 Hz, i8254 clock: 1193152 Hz
CPU: Pentium (199.43-MHz 586-class CPU)
  Origin = "GenuineIntel"  Id = 0x52c  Stepping=12
  Features=0x1bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8>
real memory  = 134217728 (131072K bytes)
avail memory = 129490944 (126456K bytes)
Probing for devices on PCI bus 0:
chip0 <Intel 82439> rev 3 on pci0:0:0
chip1 <Intel 82371SB PCI-ISA bridge> rev 1 on pci0:7:0
chip2 <Intel 82371SB IDE interface> rev 0 on pci0:7:1
ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 5 on pci0:13:0
ahc0: aic7880 Wide Channel, SCSI Id=7, 16 SCBs
ahc0: waiting for scsi devices to settle
scbus0 at ahc0 bus 0
sd0 at scbus0 target 0 lun 0
sd0: <SEAGATE ST34371W 0484> type 0 fixed SCSI 2
sd0: Direct-Access 4148MB (8496884 512 byte sectors)
sd1 at scbus0 target 1 lun 0
sd1: <SEAGATE ST34371W 0484> type 0 fixed SCSI 2
sd1: Direct-Access 4148MB (8496884 512 byte sectors)
sd2 at scbus0 target 2 lun 0
sd2: <SEAGATE ST34371W 0484> type 0 fixed SCSI 2
sd2: Direct-Access 4148MB (8496884 512 byte sectors)
sd3 at scbus0 target 3 lun 0
sd3: <SEAGATE ST34371W 0484> type 0 fixed SCSI 2
sd3: Direct-Access 4148MB (8496884 512 byte sectors)
ahc1 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 9 on pci0:14:0
ahc1: aic7880 Wide Channel, SCSI Id=7, 16 SCBs
ahc1: waiting for scsi devices to settle
scbus1 at ahc1 bus 0
sd4 at scbus1 target 1 lun 0
sd4: <SEAGATE ST34371W 0484> type 0 fixed SCSI 2
sd4: Direct-Access 4148MB (8496884 512 byte sectors)
sd5 at scbus1 target 2 lun 0
sd5: <SEAGATE ST34371W 0484> type 0 fixed SCSI 2
sd5: Direct-Access 4148MB (8496884 512 byte sectors)
sd6 at scbus1 target 3 lun 0
sd6: <SEAGATE ST34371W 0338> type 0 fixed SCSI 2
sd6: Direct-Access 4148MB (8496960 512 byte sectors)
bt0 <Buslogic 946 SCSI host adapter> rev 0 int a irq 11 on pci0:15:0
bt0: Bt946C/ 0-(32bit) bus
bt0: reading board settings, busmastering, int=11
bt0: version 4.28D, async only, parity, 32 mbxs, 32 ccbs
bt0: Using Strict Round robin scheme
bt0: waiting for scsi devices to settle
scbus2 at bt0 bus 0
de0 <Digital 21140 Fast Ethernet> rev 18 int a irq 10 on pci0:16:0
de0: SMC 9332 21140 [10-100Mb/s] pass 1.2
de0: address 00:00:c0:99:00:e6
de0: enabling 100baseTX port
Probing for devices on the ISA bus:
sc0 at 0x60-0x6f irq 1 on motherboard
sc0: VGA color <16 virtual consoles, flags=0x0>
sio0 at 0x3f8-0x3ff irq 4 on isa
sio0: type 16550A
sio1 not found at 0x2f8
lpt0 not found at 0xffffffff
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: NEC 72065B
fd0: 1.44MB 3.5in
bt: unit number (1) too high
bt1 not found at 0x330
npx0 on motherboard
npx0: INT 16 interface
ccd0-3: Concatenated disk drivers

-- 
Brian Tao (BT300, taob@netcom.ca)
"Though this be madness, yet there is method in't"





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.3.95.970419233210.11036M-100000>