Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 29 Aug 1999 12:43:13 -0700
From:      Parag Patel <parag@cgt.com>
To:        freebsd-current@freebsd.org
Subject:   4.0-CURRENT SMP crash with vinum raid-5 and softupdates
Message-ID:  <76835.935955793@pinhead.parag.codegen.com>

next in thread | raw e-mail | index | archive | help

Hello.  I'm not sure which list this should go to as I'm not sure what
caused this fault.  Unfortunately, I can't get a trace out of ddb - it
simply faults again, so advice on how to debug it or what to look at
would be much appreciated.

The machine is dead right now, and I'll leave it that way so I can run
any ddb commands you like, or give you a login on my machine here and
let you cu to the PPro console.


A friend of mine left a quad-PentiumPro system at my house to play with
'till he builds a new house with enough room to house it.  It's a big
heavy rack-mount machine with dual power-supplies and such - a recently
retired NT server of some sort.  Naturally I put FBSD SMP on it.  :)

/ and /usr are on an IDE drive, which the only disk the firmware can
see at the moment as my NCR cards don't have the appropriate BIOS code.

This leaves 7x4Gb UW SCSI disks to play around with vinum.  I setup two
RAID volumes on 6 disks, and a regular filesystem on the 7th for
comparison and testing.  Each disk has a single slice, 256Mb swap, and
the rest for either vinum or 4.2BSD filesystems.  Swap is enabled only
on drives 0, 3, and 6 right now, although none of it was actually
touched.  (The IDE drive is much slower for swap.)

The bootup messages from the kernel plus the vinum config file and the
crash output are appended below.


Anyway, I was copying the /usr/src tree (find|cpio) onto the raid5
volume when it died.  The same command worked fine earlier for the
raid10 volume (striped and mirrored only) and the single "noraid"
vanilla FFS+softupdates volume on da0 (also in the same array).  Both
RAID filesystems were also running with softupdates.

Earlier I had 3.2-STABLE (also as of Friday evening) installed, and the
raid5 volume also crashed the system.  As I hadn't built DDB into the
kernel, I don't know why it died there, but it's probably the same as
whatever nuked 4.0-CURRENT.  Assuming that 4.0 would have newer vinum
code, I installed that hoping things had improved.  (The 3.2-STABLE
raid5 volume had also crashed without softupdates.)

There are no apparent SCSI errors, and access to the raid10 volume plus
a single vanilla FFS filesystem on the extra SCSI drive are fine.


Thoughts?  Thanks!


	-- Parag Patel


-----ddb crash output-----

Fatal trap 12: page fault while in kernel mode
mp_lock = 03000003; cpuid = 3; lapic.id = 02000000
fault virtual address   = 0x0
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0x0
stack pointer           = 0x10:0xd5730b08
frame pointer           = 0x10:0xd5730b4c
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 376 (cpio)
interrupt mask          = bio  <- SMP: XXX
kernel: type 12 trap, code=0
Stopped at      0:

Fatal trap 12: page fault while in kernel mode
mp_lock = 03000004; cpuid = 3; lapic.id = 02000000
fault virtual address   = 0x0
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc022f168
stack pointer           = 0x10:0xd5730980
frame pointer           = 0x10:0xd5730984
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 376 (cpio)
interrupt mask          = bio  <- SMP: XXX
kernel: type 12 trap, code=0

db>
db> trace


Fatal trap 12: page fault while in kernel mode
mp_lock = 03000005; cpuid = 3; lapic.id = 02000000
fault virtual address   = 0x0
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc022f168
stack pointer           = 0x10:0xd57308b0
frame pointer           = 0x10:0xd57308b4
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 376 (cpio)
interrupt mask          = bio  <- SMP: XXX
kernel: type 12 trap, code=0

db>
db> ps
  pid   proc     addr    uid  ppid  pgrp  flag stat wmesg   wchan   cmd
  376 d2f8dec0 d572f000    0   374   374 804006  2                  cpio
  375 d2f8ef40 d5700000    0   374   374 004086  3  pipdwt d56e8d40 find
  374 d2f8e180 d5721000    0   246   374 004086  3    wait d2f8e180 sh
  246 d2f8e020 d572b000 1000   244   246 004086  3   pause d572b108 ksh
  244 d2f8e440 d571a000    0   218   218 000084  2                  sshd1
  241 d2f8f780 d56e0000    0     1   241 004086  3   ttyin c13dfa10 ksh
  218 d2f8e2e0 d571e000    0     1   218 000084  3  select c02d82ec sshd1
  177 d2f8e5a0 d5717000    0     1   177 000184  3  select c02d82ec sendmail
  173 d2f8ec80 d5706000    0     1   173 000084  3  nanslp c02c18a0 cron
  170 d2f8f620 d56e3000    0     1   170 000084  3  select c02d82ec inetd
  146 d2f8e700 d5713000    0     1   141 000084  3  nfsidl c02da68c nfsiod
  145 d2f8e860 d5710000    0     1   141 000084  3  nfsidl c02da688 nfsiod
  144 d2f8e9c0 d570c000    0     1   141 000084  3  nfsidl c02da684 nfsiod
  143 d2f8eb20 d5709000    0     1   141 000084  3  nfsidl c02da680 nfsiod
  130 d2f8ede0 d5703000    1     1   130 000184  3  select c02d82ec portmap
  125 d2f8f0a0 d56fc000    0     1   125 000084  3  select c02d82ec xntpd
  118 d2f8f4c0 d56ea000    0     1   118 000084  2                  syslogd
   33 d2f8f200 d56f1000    0     1    33 000084  3  mfsidl d2f87bc0 mount_mfs
   18 d2f8f360 d56ee000    0     1    18 000004  3   vinum c142eb74 vinum
    5 d2f8f8e0 d2f9c000    0     0     0 500284  3  vrlock   460001 syncer
    4 d2f8fa40 d2f9a000    0     0     0 500204  3  psleep c02c19bc bufdaemon
    3 d2f8fba0 d2f98000    0     0     0 400204  3  psleep c02cdabc vmdaemon
    2 d2f8fd00 d2f96000    0     0     0 500204  3  psleep c02b1bf8 pagedaemon
    1 d2f8fe60 d2f94000    0     0     1 004284  3    wait d2f8fe60 init
    0 c02d76c0 c0340000    0     0     0 000204  3   sched c02d76c0 swapper
db>



-----dmesg/bootup-----

System: 4xPPro/200Mhz, 512Mb RAM, NCR 875 SCSI, 7x4Gb Fujitsu array,
	serial console, no VGA, no keyboard
FreeBSD: 4.0-CURRENT from Friday Aug 27 evening, and also 3.2-STABLE
kernel config: available upon request, but system is down right now
    essentially GENERIC + SOFTUPDATES + INVARIANTS + DDB - unused devices
    vinum loaded as module

bootup messages (cut-paste from tty console window):

Copyright (c) 1992-1999 The FreeBSD Project.
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California. All rights reserved.
FreeBSD 4.0-CURRENT #3: Sat Aug 28 23:29:38 PDT 1999
    parag@quadhead.parag.codegen.com:/usr/src/sys/compile/PPRO
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium Pro (686-class CPU)
  Origin = "GenuineIntel"  Id = 0x619  Stepping = 9
  Features=0xfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV>
real memory  = 536870912 (524288K bytes)
avail memory = 517791744 (505656K bytes)
Programming 16 pins in IOAPIC #0
EISA INTCONTROL = 00001e00
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  3, version: 0x00040011, at 0xfee00000
 cpu1 (AP):  apic id:  0, version: 0x00040011, at 0xfee00000
 cpu2 (AP):  apic id:  1, version: 0x00040011, at 0xfee00000
 cpu3 (AP):  apic id:  2, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  4, version: 0x000f0011, at 0xfec00000
Preloaded elf kernel "kernel" at 0xc032d000.
Pentium Pro MTRR support enabled
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Intel 82454KX/GX (Orion) host to PCI bridge> on motherboard
pci0: <PCI bus> on pcib0
isab0: <Intel 82375EB PCI-EISA bridge> at device 2.0 on pci0
eisa0: <EISA bus> on isab0
mainboard0: <UNIa541 (System Board)> on eisa0 slot 0
isa0: <ISA bus> on isab0
ide_pci0: <PCI IDE controller (busmaster capable)> irq 14 at device 3.0 on pci0
xl0: <3Com 3c905-TX Fast Etherlink XL> irq 9 at device 14.0 on pci0
xl0: Ethernet address: 00:60:97:a1:88:23
xl0: autoneg complete, link status good (half-duplex, 100Mbps)
ncr0: <ncr 53c875 fast20 wide scsi> irq 11 at device 15.0 on pci0
chip0: <Intel 82453KX/GX (Orion) PCI memory controller> at device 20.0 on pci0
pcib1: <Intel 82454KX/GX (Orion) host to PCI bridge> on motherboard
pci1: <PCI bus> on pcib1
ncr1: <ncr 53c810a fast10 scsi> irq 10 at device 12.0 on pci1
Probing for PnP devices:
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
wdc0 at port 0x1f0-0x1f7 irq 14 on isa0
wdc0: unit 0 (wd0): <IBM-DJAA-31700>
wd0: 1628MB (3334464 sectors), 3308 cyls, 16 heads, 63 S/T, 512 B/S
wdc1: not probed (disabled)
atkbdc0: <keyboard controller (i8042)> at port 0x60-0x6f on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
cu: Got hangup signal

Disconnected.
$ cu-test    
Connected.
sa0 at ncr1 bus 0 target 5 lun 0
sa0: <SONY SDT-5000 330B> Removable Sequential Access SCSI-2 device 
sa0: 5.000MB/s transfers (5.000MHz, offset 8)
da6 at ncr0 bus 0 target 14 lun 0
da6: <UNISYS 003557M2954E-512 0641> Fixed Direct Access SCSI-2 device 
da6: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
da6: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C)
da5 at ncr0 bus 0 target 13 lun 0
da5: <UNISYS 003557M2954E-512 0641> Fixed Direct Access SCSI-2 device 
da5: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
da5: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C)
da4 at ncr0 bus 0 target 12 lun 0
da4: <UNISYS 003557M2954E-512 0641> Fixed Direct Access SCSI-2 device 
da4: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
da4: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C)
da3 at ncr0 bus 0 target 11 lun 0
da3: <UNISYS 003557M2954E-512 0641> Fixed Direct Access SCSI-2 device 
da3: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
da3: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C)
da2 at ncr0 bus 0 target 10 lun 0
da2: <UNISYS 003557M2954E-512 0641> Fixed Direct Access SCSI-2 device 
da2: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
da2: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C)
da1 at ncr0 bus 0 target 9 lun 0
da1: <UNISYS 003557M2954E-512 0641> Fixed Direct Access SCSI-2 device 
da1: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
da1: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C)
da0 at ncr0 bus 0 target 8 lun 0
da0: <UNISYS 003557M2954E-512 0641> Fixed Direct Access SCSI-2 device 
da0: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
da0: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C)
changing root device to wd0s1a
cd0 at ncr1 bus 0 target 6 lun 0
cd0: <TOSHIBA CD-ROM XM-5701TA 0167> Removable CD-ROM SCSI-2 device 
cd0: 10.000MB/s transfers (10.000MHz, offset 8)
cd0: Attempt to query device size failed: NOT READY, Medium not present



-----vinum config-----

$ s vinum list
Configuration summary

Drives:         6 (8 configured)
Volumes:        2 (4 configured)
Plexes:         3 (8 configured)
Subdisks:       12 (16 configured)

D d1                    State: up       Device /dev/da1a        Avail: 0/3892 MB (0%)
D d2                    State: up       Device /dev/da2a        Avail: 0/3892 MB (0%)
D d3                    State: up       Device /dev/da3a        Avail: 0/3892 MB (0%)
D d4                    State: up       Device /dev/da4a        Avail: 0/3892 MB (0%)
D d5                    State: up       Device /dev/da5a        Avail: 0/3892 MB (0%)
D d6                    State: up       Device /dev/da6a        Avail: 0/3892 MB (0%)

V raid5                 State: up       Plexes:       1 Size:       9731 MB
V raid10                State: up       Plexes:       2 Size:       5838 MB

P raid5.p0           R5 State: up       Subdisks:     6 Size:       9731 MB
P raid10.p0           S State: up       Subdisks:     3 Size:       5838 MB
P raid10.p1           S State: up       Subdisks:     3 Size:       5838 MB

S raid5.p0.s0           State: up       PO:        0  B Size:       1946 MB
S raid5.p0.s1           State: up       PO:      256 kB Size:       1946 MB
S raid5.p0.s2           State: up       PO:      512 kB Size:       1946 MB
S raid5.p0.s3           State: up       PO:      768 kB Size:       1946 MB
S raid5.p0.s4           State: up       PO:     1024 kB Size:       1946 MB
S raid5.p0.s5           State: up       PO:     1280 kB Size:       1946 MB
S raid10.p0.s0          State: up       PO:        0  B Size:       1946 MB
S raid10.p0.s1          State: up       PO:      256 kB Size:       1946 MB
S raid10.p0.s2          State: up       PO:      512 kB Size:       1946 MB
S raid10.p1.s0          State: up       PO:        0  B Size:       1946 MB
S raid10.p1.s1          State: up       PO:      256 kB Size:       1946 MB
S raid10.p1.s2          State: up       PO:      512 kB Size:       1946 MB

df :
Filesystem        1K-blocks     Used    Avail Capacity  Mounted on
/dev/da0a           3922958        1  3609121     0%    /noraid
/dev/vinum/raid5    9806179        1  9021684     0%    /raid5
/dev/vinum/raid10   5883701        1  5413004     0%    /raid10

mount :
/dev/da0a on /noraid (local, soft-updates, writes: sync 2 async 781)
/dev/vinum/raid5 on /raid5 (local, soft-updates, writes: sync 2 async 0)
/dev/vinum/raid10 on /raid10 (local, soft-updates, writes: sync 2 async 0)




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?76835.935955793>