Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Apr 2003 13:50:48 +0200 (CEST)
From:      Oliver Fromme <olli@secnetix.de>
To:        freebsd-stable@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG
Subject:   panic: vinvalbuf: flush failed
Message-ID:  <200304101150.h3ABom0O034933@lurza.secnetix.de>

next in thread | raw e-mail | index | archive | help
Hi,

We have a pretty serious problem with our news server crashing
during the expire cronjob.  This happened with 4.7-RELEASE, so
we upgraded to 4.8-RELEASE recently, hoping that the problem
might be fixed, but it isn't.  The machine is a Compaq DL360-G2.

I've searched the PR database as well as the mailing list
archives for the panic string, but didn't find anything.

What makes the problem even worse is the fact that the machine
freezes after the "syncing disks" output.  Normally it should
reboot, because we have DDB_UNATTENDED in the kernel, but it
doesn't work.

The crash always happens shortly after INN's expire cronjob
starts, shortly after 1:00am in the night (at that time, the
CPU load and NFS traffic increases noticably).  But it doesn't
always happen, only every 3 to 4 days.  On the other days, the
expire job finishes without problems.

The machine has pretty good network traffic (about 40 - 50
Mbit/s constantly), half of which is NNTP, and the other half
is NFS.  That's during normal operation -- during the expire
job, the NFS traffic is even higher.  The news spool and INN's
overview database are on an NFS mount (a NetApp filer), as
well as binaries, logfiles and everything else.  The NFS
mounts are v3+UDP, as far as I can tell (that should be the
default).  The network interface is a Broadcom BCM5701 gigabit
one, connected to a Cisco switch with a VLAN trunk (there are
several virtual VLAN interfaces on this trunk).  If it
matters, we're using IPFilter for packet filtering, plus
IPFW+Dummynet for traffic shaping.

The load on the machine is moderate (usually below 1.0).  As
far as I can tell, there is no resource shortage.  There's
plenty of RAM, free file descriptors, mbufs / mbuf clusters.
Well, at least during normal operation.  Maybe it is a bit
different during the expire run at night.

This is the console output:

panic: vinvalbuf: flush failed

syncing disks...

At this point, the machine freezes completely, it does not
display any numbers nor "done".  It just sits there for hours.
When I come in the morning, I break into DDB (fortunately
that still works):

Stopped at      siointr1+0xf2:  movl    $0,brk_state2.757
db> 
db> trace
siointr1(c884b000,e9bcaaa8,c033bb86,c884b000,e9bc0010) at siointr1+0xf2
siointr(c884b000) at siointr+0xb
Xfastintr4(e94cba7c,110,c0393145,0) at Xfastintr4+0x16
nfs_asyncio(d51b97d0,0,0) at nfs_asyncio+0xf4
nfs_strategy(e9bcab14) at nfs_strategy+0x59
nfs_writebp(d51b97d0,1,e95bc1a0,e9bcabfc,c02a7040) at nfs_writebp+0xdc
nfs_bwrite(e9bcaba0) at nfs_bwrite+0x16
nfs_flush(e9bb0600,c27f5900,2,c0434be0,1) at nfs_flush+0x68c
nfs_fsync(e9bcac34) at nfs_fsync+0x19
nfs_sync(c8c1f400,2,c27f5900,c0434be0,c8c1f400) at nfs_sync+0x99
sync(c0434be0,0,c03861ec,c038aabc,100) at sync+0x63
boot(100,e94ef2a0,68c0c0,e9bcad00,c021ba99) at boot+0x8a
panic(c038aabc,e95002c0,7d0,1,68c0c0) at panic+0x79
vinvalbuf(e95262c0,1,c8c3e280,e95bc1a0,100,0) at vinvalbuf+0x395
nfs_vinvalbuf(e95262c0,1,c8c3e280,e95bc1a0,1) at nfs_vinvalbuf+0x108
nfs_open(e9bcadfc,0,c911b440,e9bcaf80,e95262c0) at nfs_open+0xf5
vn_open(e9bcaec8,1,0,e95bc1a0,3) at vn_open+0x3d7
open(e95bc1a0,e9bcaf80,38229dac,0,0) at open+0xb8
syscall2(81c002f,2f,bfbf002f,0,0) at syscall2+0x1f5
Xint0x80_syscall() at Xint0x80_syscall+0x25
db> ps
  pid   proc     addr    uid  ppid  pgrp  flag stat wmesg   wchan   cmd
37692 e95bc1a0 e9bc8000    8   727   727 4004004  2                  nnrpd
37657 e9bd2e00 e9bd3000    8 37656 37407 004005  2                  expireover
37656 e95bf0c0 e9abf000    8 37410 37407 000084  3    wait e95bf0c0 sh
37410 e95bd860 e9b40000    8 37407 37407 004084  3    wait e95bd860 sh
37407 e95bfdc0 e991f000    0 37403 37407 004084  3    wait e95bfdc0 sh
37403 e95bd520 e9b5e000    0   103   103 000084  3  piperd e9440540 cron
31158 e95bc680 e9ba8000    8   727   727 004085  2                  overchan
31157 e95bcea0 e9b68000    8   727   727 004084  3  sbwait e554e888 perl
31156 e95c0ac0 e95e2000    8   727   727 004484  2                  innfeed
30033 e95bc340 e9bc1000    8 30031 30031 004086  3   ttyin c882b430 zsh
30031 e95c02a0 e9907000    0 30021 30031 004086  3    wait e95c02a0 sh
30021 e33fe380 e95b6000    0 30019 30021 2004086  3   pause e95b6260 zsh
30019 e95bc820 e9b99000    0   992 30019 000584  2                  sshd
 6279 e33fe040 e95c4000    0  1778  6279 004086  2                  zsh
 1778 e33ffbe0 e94dc000    0     1  1778 004186  3    wait e33ffbe0 login
 1369 e33ff220 e950c000    0     1  1369 004086  3   ttyin c8c3c710 getty
 1368 e34012a0 e943c000    0     1  1368 004086  3   ttyin c8c3b110 getty
 1367 e33ffd80 e94c8000    0     1  1367 004086  3   ttyin c8c41f10 getty
 1366 e3400400 e94ad000    0     1  1366 004086  3   ttyin c8c41b10 getty
 1365 e33fff20 e94c4000    0     1  1365 004086  3   ttyin c8c44310 getty
 1364 e33ff3c0 e94fd000    0     1  1364 004086  3   ttyin c8c32410 getty
 1363 e33feee0 e9527000    0     1  1363 004086  3   ttyin c8a42b10 getty
 1362 e3401440 e9433000    0     1  1362 004086  3   ttyin c885cd10 getty
 1358 e33fe6c0 e9576000    0     1  1358 000084  2                  syslogd
  992 e33ff560 e9505000    0     1   992 000184  2                  sshd
  991 e33ff8a0 e94f5000    0     1   991 000084  2                  snmpd
  727 e33ffa40 e94e1000    8     1   727 000005  2                  innd
  109 e34000c0 e94b7000   25     1   109 2000184  2                  sendmail
  106 e3400260 e94b2000    0     1   106 000584  2                  sendmail
  103 e3400c20 e9499000    0     1   103 000484  2                  cron
   96 e34005a0 e94a9000    0     1    91 000084  2                  nfsiod
   95 e3400740 e94a5000    0     1    91 000084  2                  nfsiod
   94 e34008e0 e94a1000    0     1    91 000084  2                  nfsiod
   93 e3400a80 e949d000    0     1    91 000084  2                  nfsiod
   55 e3401100 e9444000    0     1    55 000484  2                  ipmon
   19 e3400f60 e9448000    0     1    19 000084  3  mfsidl e7ba6000 mount_mfs
    6 e34015e0 e7bb4000    0     0     0 000204  2                  syncer
    5 e3401780 e7bb1000    0     0     0 000604  2                  vnlru
    4 e3401920 e7bae000    0     0     0 000604  2                  bufdaemon
    3 e3401ac0 e7bab000    0     0     0 000204  3  psleep c042af20 vmdaemon
    2 e3401c60 e7ba8000    0     0     0 000604  2                  pagedaemon
    1 e3401e00 e3406000    0     0     1 004284  3    wait e3401e00 init
    0 c0434be0 c04de000    0     0     0 000204  3   sched c0434be0 swapper
db> panic
panic: from debugger
Uptime: 2d23h10m42s

dumping to dev #da/0x20001, offset 3670056
dump 1279 1278 1277 1276 1275 1274 1273 1272 1271 1270 1269 1268
[...]
12 11 10 9 8 7 6 5 4 3 2 1 0 succeeded
Automatic reboot in 15 seconds - press a key on the console to abort
BIOS drive A: is disk0
BIOS drive C: is disk1
BIOS 637kB/1309676kB available memory

FreeBSD/i386 bootstrap loader, Revision 0.8
(olli@monos.secnetix.net, Mon Mar 31 01:11:35 CEST 2003)
Loading /boot/defaults/loader.conf 
/kernel text=0x2bb060 data=0x46b88+0x3842c syms=[0x4+0x3caa0+0x4+0x44979]

Hit [Enter] to boot immediately, or any other key for command prompt.
Booting [kernel]...               
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 4.8-RELEASE #0: Mon Mar 31 11:27:28 CEST 2003
    olli@monos.secnetix.net:/usr/src/sys/compile/FARM
Timecounter "i8254"  frequency 1193182 Hz
CPU: Intel(R) Pentium(R) III CPU family      1400MHz (1396.45-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x6b1  Stepping = 1
  Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
real memory  = 1342156800 (1310700K bytes)
avail memory = 1300033536 (1269564K bytes)
Preloaded elf kernel "kernel" at 0xc04be000.
Pentium Pro MTRR support enabled
md0: Malloc disk
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib1: <ServerWorks host to PCI bridge> on motherboard
pci1: <PCI bus> on pcib1
ciss0: <Compaq Smart Array 5i> port 0x3000-0x30ff mem 0xf7ef0000-0xf7ef3fff,0xf7fc0000-0xf7ffffff irq 11 at device 4.0 on pci1
ciss0: using 256 of 1024 available commands
ciss0:   1 logical drive configured
ciss0:   firmware 1.80
ciss0:   2 SCSI channels
ciss0:   signature 'CISS'
ciss0:   valence 1
ciss0:   supported I/O methods 0xe<simple,performant,MEMQ>
ciss0:   active I/O method 0x3<simple>
ciss0:   4G page base 0x00000000
ciss0:   interrupt coalesce delay 1000us
ciss0:   interrupt coalesce count 16
ciss0:   max outstanding commands 1024
ciss0:   bus types 0x2<ultra3>
ciss0:   server name ''
ciss0:   heartbeat 0x30000033
ciss0: 1 logical drive
ciss0: logical drive 1: RAID 0, 16896MB online
bge0: <Broadcom BCM5701 Gigabit Ethernet, ASIC rev. 0x105> mem 0xf7fb0000-0xf7fbffff irq 5 at device 5.0 on pci1
bge0: Ethernet address: 00:08:02:a0:c5:06
miibus0: <MII bus> on bge0
brgphy0: <BCM5701 10/100/1000baseTX PHY> on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
bge1: <Broadcom BCM5701 Gigabit Ethernet, ASIC rev. 0x105> mem 0xf7fa0000-0xf7faffff irq 10 at device 6.0 on pci1
bge1: Ethernet address: 00:08:02:a0:c5:07
miibus1: <MII bus> on bge1
brgphy1: <BCM5701 10/100/1000baseTX PHY> on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
pcib0: <ServerWorks host to PCI bridge> on motherboard
pci0: <PCI bus> on pcib0
pci0: <ATI Mach64-GR graphics accelerator> at 3.0 irq 7
pci0: <unknown card> (vendor=0x0e11, dev=0xb203) at 5.0 irq 3
pci0: <unknown card> (vendor=0x0e11, dev=0xb204) at 5.2 irq 15
isab0: <PCI to ISA bridge (vendor=1166 device=0201)> at device 15.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <ServerWorks CSB5 ATA100 controller> port 0-0x3,0x2000-0x200f,0x374-0x377,0x170-0x177,0x3f4-0x3f7,0x1f0-0x1f7 at device 15.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
ohci0: <OHCI (generic) USB controller> mem 0xf5ef0000-0xf5ef0fff irq 7 at device 15.2 on pci0
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: (0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 4 ports with 4 removable, self powered
pcib2: <ServerWorks host to PCI bridge> on motherboard
pci2: <PCI bus> on pcib2
pcib7: <ServerWorks host to PCI bridge> on motherboard
pci7: <PCI bus> on pcib7
pcib3: <Host to PCI bridge> on motherboard
pci3: <PCI bus> on pcib3
eisa0: <EISA bus> on motherboard
mainboard0: <CPQ0724 (System Board)> on eisa0 slot 0
orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcbfff,0xee000-0xeffff on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x100>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A, console
sio1: configured irq 3 not in bitmap of probed irqs 0
ppc0: parallel port not found.
DUMMYNET initialized (011031)
ipfw2 initialized, divert disabled, rule-based forwarding enabled, default to accept, logging disabled
IP Filter: v3.4.31 initialized.  Default = pass all, Logging = enabled
acd0: CDROM <CRN-8245B> at ata0-master PIO4
Mounting root from ufs:/dev/da0s1a
da0 at ciss0 bus 0 target 0 lun 0
da0: <COMPAQ RAID 0  VOLUME OK> Fixed Direct Access SCSI-0 device 
da0: 135.168MB/s transfers
da0: 17359MB (35553120 512 byte sectors: 255H 32S/T 4357C)
WARNING: / was not properly dismounted
bge0: gigabit link up
bge0: gigabit link up

The kernel config is derived from GENERIC.  I've removed
devices which are not needed, and added the following:

options         MAXDSIZ="(768*1024*1024)"
options         MAXSSIZ="(256*1024*1024)"
options         DFLDSIZ="(512*1024*1024)"
options         NMBCLUSTERS=32768
options         INCLUDE_CONFIG_FILE     # Include this file in kernel
options         CPU_ENABLE_SSE          # Enable SSE/MMX2 instructions support.
options         USER_LDT                # Allow user-level control of i386 LDT.
options         DDB                     # Enable the kernel debugger.
options         DDB_UNATTENDED          # Don't drop into DDB for a panic.
options         KTRACE                  # Enable system-call tracing facility.
pseudo-device   vlan    1               # VLAN support
pseudo-device   stf                     # 6to4 IPv6 over IPv4 encapsulation
options         IPFIREWALL
options         IPFIREWALL_DEFAULT_TO_ACCEPT
options         IPFW2                   #   Use next-generation IPFW.
options         DUMMYNET
options         IPFILTER
options         IPFILTER_LOG
options         DEVICE_POLLING
options         HZ=1000
pseudo-device   vn                      # Vnode driver, see vnconfig(8)
options         MSGBUF_SIZE=81920
options         AUTO_EOI_1
options         MAXCONS=16              # number of virtual consoles
options         SC_HISTORY_SIZE=400     # number of history buffer lines
options         ALT_BREAK_TO_DEBUGGER   # <ENTER> ~ Ctrl-B
device          smbus
device          intpm
device          alpm
device          ichsmb
device          viapm
device          smb
device          iicbus
device          iicbb
device          ic
device          iic
device          iicsmb

I can send the whole config if required, but there's really
nothing else which is special.  Also, if there's any more
information that I should send, please let me know.  I still
have the crash dump from the manual "panic", in case I can do
anything with it.

I don't think it's a hardware problem, because we moved the
news service to a different machine (identically equipped
DL360-G2) without a difference.

Does anyone have an idea what might cause the problem, and
-- even more important -- how to fix it, or at least work
around it?  We can't really afford to have a dead news server
for several hours every few days.

Thanks a bunch!

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co KG, Oettingenstr. 2, 80538 München
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200304101150.h3ABom0O034933>